دانلود مقاله ISI انگلیسی شماره 22038
ترجمه فارسی عنوان مقاله

اثر کاربر نهایی شبکه عصبی و نرم افزار داده کاوی برای پیش بینی عملکرد سیستم های پیچیده

عنوان انگلیسی
Efficacy of end-user neural network and data mining software for predicting complex system performance
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
22038 2003 23 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : International Journal of Production Economics, Volume 84, Issue 3, 11 June 2003, Pages 231–253

ترجمه کلمات کلیدی
شبکه های عصبی - داده کاوی - تجزیه و تحلیل رگرسیون -
کلمات کلیدی انگلیسی
Neural networks, Data mining, Regression analysis,
پیش نمایش مقاله
پیش نمایش مقاله  اثر کاربر نهایی شبکه عصبی و نرم افزار داده کاوی برای پیش بینی عملکرد سیستم های پیچیده

چکیده انگلیسی

The performance of a university's dial-up modem pool under various time limit policies and customer behavior patterns was studied. Because the system is very complex, simulation offered the only method to obtain a limited set of steady-state performance measure estimates. A more generalized predictive model must be built from the simulated output. Traditional methods available to practitioners for predicting system performance across a range of environmental and decision variables have typically been limited to linear regression models. However, when the system being studied is highly complex and its performance is nonlinear in nature, the effectiveness of linear models can be limited. While more advanced nonlinear methods, such as neural networks, have been shown to perform better than traditional regression analysis in these situations, the knowledge needed to implement them “from scratch” is beyond most practitioners. Fortunately, these advanced methods are now available in ready-to-use desktop software programs, making them more attainable for practitioner use. The efficacy of these end-user programs compared to more traditional methods in practice is of interest. Multiple variable linear regression models were developed for predicting six output measures in a simulation study and were compared to nonlinear regression models developed using a data mining software package (PolyAnalyst 4.3 Evaluation Software from Megaputer Intelligence) and two commercial neural network software packages (Statistica Neural Networks from Statsoft, and Predict from NeuralWorks). Comparisons of the models’ predictive ability were made on both the data used to design the models and on a test set of data. Statistical analysis shows that predictive performance on the test data was usually best with one of the neural network models, but relative performance of the different models varied widely.

مقدمه انگلیسی

The current study stems from ongoing research into the performance of a major university's dial-up modem pool (DMP) system under various configurations and resource allocation policies. Simulation was used to examine system performance as time limits were imposed on customers in two alternative types of DMP systems. Of interest in the ongoing research is how the systems will perform across a wide range of configuration options and customer behavior patterns. To answer that question, different predictive models were built using the simulation output data. The performance measures in most cases proved to be nonlinear functions of the decision variables. While neural networks have been shown to generally perform better than regression on nonlinear data, the specialized skills needed to create and work with neural networks are generally beyond information systems managers. Recently, commercial desktop software programs that implement neural networks and other nonlinear modeling techniques, and which are easy to use, have become more widely available. These packages are purposefully generic in nature, and it is unknown whether these “one size fits all” end-user programs will perform as well as specialized neural networks have proven to. The predictive models were therefore built using traditional multiple linear regression, and commercially available data mining and neural network software packages. In this current study, we compare the predictive capability of the more traditional regression models to those generated by the end-user data mining and neural network software packages. 1.1. Physical model and problem description Universities, corporations, and on-line network service providers (e.g., CompuServe) have long-provided external access to large central computer networks via modems. Modems allow data to be transmitted over standard analog telephone connections, providing long-distance access to the central computer(s) from any place that has a telephone connection and a modem-equipped computer. Although the data transfer rates through a modem-to-modem connection are much slower than a direct network connection, modems are essential for providing flexible external access to a central network for most users. Typically, the operator of a central computer network will establish a DMP, a collection of modems that can all be accessed via a single telephone number. Each modem can provide a connection to a single user at any one time. Therefore, the maximum number of users that can be connected via a DMP at any given time is equal to the number of modems in the pool. When all of the modems are busy, the system is full, and a user attempting to dial into the system will be unable to immediately connect to the system. Our ongoing research examines various configurations for an existing DMP system at a large university. This DMP system is characterized by full system usage for most of the 15 hour time period between 9:00 A.M. and 12:00 A.M. This usage is driven by 12% of the customers who consume approximately half of all available capacity (Schikora, 1999). In many queuing systems where service times are assumed to be exponentially distributed, some skewing of the service time is expected, but in this DMP system this characteristic is grossly exaggerated. It is reasonable to assume that these high consumption customers are wasting a great deal of capacity on nonproductive connect time. DMP systems fit a general queuing model known as a retrial queue. Generally, a retrial queue (also called a queue with repeated calls, returning customers, repeated orders, etc.) describes a system that operates in the following manner: a customer arriving at the system when all servers are busy (i.e., the customer is blocked) leaves the service area but returns after some random time to repeat his demand (Falin, 1990). Previously blocked customers waiting to retry for service are said to be in orbit (or an orbit queue or a retrial queue). Obviously, there is no queue per se—the orbit is an artificial construct to account for the blocked customers who will be returning for service. As such, there is no queuing discipline in orbit, and determination of the next customer to be served follows a random process. Customers in orbit are in a sense competing with other orbiting customers and new arrivals for the next available server (modem). A DMP system can further be defined by its availability, i.e., by the number of separate modem pools available. When there is a single pool (by definition accessed through a single telephone number), any free server (modem) in the system can be accessed by any customer. This type of system is called a full-available system. On the other hand, when the modems serving a local area are broken into m different pools, with each pool i (i=1,2,…, m) accessed by a different phone number, then any caller into pool i can access only the modems in that pool. This latter type of system is termed nonfull-available ( Yang and Templeton, 1987). A model of the full-available DMP system with retrial queuing is shown in Fig. 1. The dashed line represents the DMP system borders. The system can be viewed as a two-station queuing network ( Greenberg and Wolff, 1987). The modem pool is the first station, and customers in orbit waiting to redial comprise the second station, an infinite capacity queue. There is no waiting space in the system other than the orbit queue.What sets DMP retrial queues apart from most other commonly studied queuing systems is that in the DMP system, the customer, to a large extent, determines the length of the service encounter. Regardless of the speed of an individual server—in this case a modem—the customer basically determines when the service encounter is complete by hanging up. We assume that a customer arrives at the system with a predetermined amount of work to be done, requiring a certain amount of time and system resources to process. However, the customer will frequently stay connected, and thus occupy a server, for some time longer than absolutely necessary to process that work. The excess time the customer stays connected can be thought of as nonproductive time. In many instances the nonproductive time for a customer will be quite significant, often exceeding that portion of the service time that is productive. The tendency for a customer to remain connected far beyond the time needed to complete the work at hand obviously reduces the available capacity of a DMP over any period of time. Depending on the current demand for service, this reduction in capacity may be insignificant (often the case in the middle of the night), or it may cause a DMP system to become completely overloaded (in the middle of the day). Obviously, it is the latter case that is most noticed, and where attention is needed. When the expansion of physical capacity is not possible, DMP operators have several options to effectively increase available capacity by reducing the amount of existing capacity that is lost to nonproductive time. One easily implemented method of doing this is to implement per-call connect time limits. Compared to other options, these limits are easily implemented, perceived as fair, and can be set at virtually any level. While these types of limits will reduce the amount of nonproductive resource use, they have the obvious disadvantage of disconnecting some callers before they are finished with their productive work. However, the simplicity of these limits makes them an attractive option for operators of an overloaded DMP system. Therefore, it is important for DMP operators to know how changes in these time limits will affect system performance. 1.2. Predictive models For many complex stochastic systems like DMPs, it is difficult, if not impossible, to develop formulae for computing steady-state measures of system performance. A common method of learning about such systems is through simulation. Under varying sets of system and environmental parameters, the system can be simulated and estimates of the steady-state performance measures can be gathered from the simulation output. Often, the parameter sets under which the system is studied are not the only ones of interest—for example, when we are trying to develop a general method for predicting system performance under any set of system parameters. Because simulation can be time consuming and there can be infinite combinations of possible system parameters, a researcher commonly simulates a finite set of system parameters, and from the output builds a model for predicting system performance for any such set. Of primary concern, of course, is how accurate a model's predictions are. These predictive models can be built with various tools, three of which were used in this paper: (1) linear regression, (2) data mining software (in particular, the nonlinear regression module of the software), and (3) neural network software. For the latter two we approached the problem from a practitioner standpoint and chose commercially available desktop software packages rather than building specialized models for this particular system. Of interest is whether these general-purpose software packages would outperform the more traditional regression models. A description of each of these types of models follows. 1.2.1. Linear regression models Linear regression is a well-known method of mathematically modeling the relationship between a dependent variable and one or more independent variables. Even though the DMP system's performance may be highly nonlinear in nature, linear regression will often be useful through appropriate transformations of variables and/or output measures. The popularity of commercial spreadsheet packages, with built-in regression modules, allows many practitioners to apply relatively advanced regression models to relevant problems. However, the development of these models still requires a model builder to have an understanding of the underlying concepts of regression to make appropriate transformations for a system with nonlinear performance measures. Further, linear regression models may not perform well when predicting system performance under parameter sets that the model was not built with—the same will hold true for most any prediction model under the same circumstances. Therefore, it would benefit a researcher to develop other predictive models, which could be compared against each other. Increasingly popular methods for developing such models include the use of data mining and neural network software. 1.2.2. Nonlinear data mining models Data mining software discovers patterns and relationships hidden in data and forms an integral element of customer relationship management. Data mining business applications can be found in a diverse group of businesses including, for example, banks (Tillett, 2000), healthcare (Milley, 2000), insurance (Iannotte, 2000), and sporting goods (Hicks, 2000). PolyAnalyst 4.3 Evaluation Software from Megaputer Intelligence was used in the current study to design nonlinear predictive models. PolyAnalyst 4.3 contains a suite of advanced knowledge discovery algorithms that extract knowledge from an investigated database and present this knowledge in symbolic rules that can be interpreted by an analyst. An analyst should be able to use these rules to reliably predict outcomes of future situations. One of these algorithms is Find Laws, which uses PolyAnalyst's Symbolic Knowledge Acquisition Technology (SKAT). The stated purpose of this algorithm is “… the automated discovery of multi-dimensional nonlinear relations in data and the presentation of these relations in the form of explicit mathematical formulae.” These formulae include rational polynomials in some cases, while in other cases these formulae may also include conditional statements coded in the form of Excel Visual Basic for Applications macros (Bugher et al., 2000). The creation of these formulae is a key difference between this software application and neural network packages. Development of a model requires the user to input a set of dependent and independent variables, either directly or by importing from an existing file (e.g., an Excel spreadsheet). The user must simply specify a dependent variable, the independent variable(s), a time limit for running the algorithm, and a desired standard error. Use of the package seems quite appropriate for many practitioners. It should be further mentioned that PolyAnalyst 4.3 contains an entire suite of additional algorithms that can be used to perform linear regression (including stepwise linear regression), classification of records, market basket analysis, and design of neural network models. At the time of this writing, the current version of PolyAnalyst is 4.5. 1.2.3. Neural network models Neural networks model the structure of neurons in the human brain, with the network consisting of processing units arranged in layers. Typically, there will exist an input layer of data, which feeds into a middle layer of hidden units through variable weight connections. The middle layer of hidden units then feeds into an output layer through variable weight connections. The neural network learns by adjusting the values of these weights through a back-propagation algorithm that permits error corrections to be fed through the layers (Shtub and Zimerman, 1993). The reader is directed to (Schalkott, 1997) for more background on the structure of neural networks. Neural networks have been applied to a wide range of applications including forecasting (Poli and Jones, 1994), credit approval problems (Lee and Jung, 1999), target marketing (O’Brien, 1994), cost comparison of assembly systems (Schalkoff, 1997; Shtub and Versano, 1999), development of maintenance policies (Bellandi et al., 1998), and for coordinating and filtering information (Eberts and Habibi, 1995). Sharda (1994) has compiled an excellent bibliography of neural network applications to management science/operations research problems. Development of good neural networks can be very time consuming and requires the building, training, and testing of many different network structures to arrive at a “good” model. Fortunately, end-user programs are available to automate much of this process and to develop good-fitting models without requiring extensive theoretical knowledge of neural networks. Two such programs were used in this study, and both provide an automated network development feature that builds, trains, and tests multiple networks, ultimately selecting the best-fitting network for use. Once developed, each network can be saved and run on any appropriate data set. Brief descriptions of the use of both programs are discussed below. One of the two neural network software packages used in this study was NeuralWorks Predict from NeuralWare. Predict is an add-in for Microsoft Excel, and as such it is particularly easy to use. Predict has a Network Wizard that steps a user through dialog boxes that specify the input data and require the user to enter several high-level criteria that prescribe how Predict should build models (e.g., how noisy the data is and how extensive of a variable selection should be tested). Recommendations are provided at each of these steps if the user is unsure which to select. Low-level details on how Predict should proceed are also available, and the Wizard can be customized in great detail through these, but they require more knowledge on the part of the user. The second software package used to develop neural networks was Statistica Neural Networks (SNN) from Statsoft. SNN requires proprietary data sets to be developed, but data can easily be imported from various sources. SNN also has an automated feature called the Automatic Network Designer. Similar to the Wizard in Predict, the Automatic Network Designer builds and tests multiple networks, selecting the best one. A user can select various types of networks to be built and tested, including linear regression, generalized regression, radial basis function, and multi-layer (with three or four layers) perceptron networks. The designer does not include any variable selection, although SNN includes a genetic algorithm to separately recommend which variable(s) should be used in developing a network. Section 2 of this paper discusses the simulation experiment used to study the DMP system. Section 3 describes the development of the predictive models. Section 4 presents the results and analysis, and Section 5 provides conclusions and implications for future research.

نتیجه گیری انگلیسی

5. Conclusions and implications for future research The results of this study indicate the potential benefit to practitioners from considering commercially available neural network and other nonlinear modeling desktop software packages when trying to develop a predictive model for a complex system. With very little work and mathematical knowledge on the part of the modeler, these programs can be used to develop some very good predictive models. However, the software packages tested here appear to provide inconsistent results, at least when left with the default or recommended program settings, as in this study. One of the most attractive features of the software packages used here is that they require virtually no understanding of the underlying theory to implement, contrary to the use of multiple linear regression models, which requires at least the understanding of how and when to transform data to better fit with a linear model. The downside of this is that the user gains no insight into the behavior of the system itself. The software packages basically do all of the work in building, testing, and selecting a good model. Even with the model created and ready for implementation, the average user will not gain any insight into the modeled system's behavior. (We use the term “average user” to differentiate most practitioners from those well versed in neural network theory, who would be able to use the more advanced features of both packages.) As a result, there is a risk of trading potentially better predictive capabilities for lesser understanding of basic system operation. Unlike the neural network models, PolyAnalyst does provide its model in explicit symbolic form, but as discussed earlier, the usefulness of these rules in understanding underlying system behavior is questionable. Despite the temptation to let the software “do it all,” users of these software packages are encouraged to learn how the specific package works and how the software can be customized. The default settings of the neural network software package used in this study are designed to produce “good” models, but obviously not the absolute best model. By changing different settings on the network generation routines of each package, many different good models could, and should, be generated and compared. Our experience leads us to advise paying special attention to the variable selection process in the software package; which if available, should be used cautiously. From our experience, this feature of Predict is very sensitive to how the initial training data set is partitioned by Predict into training and validation subsets. It has a feature to randomly “shuffle” the cases between the two subsets. In several tests on different data sets, the variable selection tool selected different variables for inclusion in the network based solely on reshuffling the data cases. This sensitivity to user settings creates a bit of a paradox for practitioners. The programs are initially attractive due to their ease of use and the lack of any need for understanding of the underlying mathematical concepts. However, to gain full benefit from the packages, the user will need to adjust multiple settings, many of which do require better understanding of the underlying modeling technique. We feel it only fair to point out that the results of our study are in effect a snapshot in time. As the reader is certainly aware, commercial software packages routinely undergo revisions and upgrades. It is our experience that software packages like those we studied here do not change as frequently as common office application software, but they do change. Future revisions of the software may introduce new features and perhaps better performance when used with default settings. Future research may extend the comparison in this study to examine how well these packages perform when used by modelers with extensive experience and knowledge in the use of neural networks and other advanced modeling techniques. Also, the varied results from the different approaches used here leads one to ask how an average practitioner might improve predictive results overall. One way might be to combine the output from several different models—perhaps in some sort of weighted average prediction and then compare the performance of the combined model to that of the individual models.