Neural networks are used increasingly as statistical models. The performance of multilayer perceptron (MLP) and that of linear regression (LR) were compared, with regard to the quality of prediction and estimation and the robustness to deviations from underlying assumptions of normality, homoscedasticity and independence of errors. Taking into account those deviations, five designs were constructed, and, for each of them, 3000 data were simulated. The comparison between connectionist and linear models was achieved by graphic means including prediction intervals, as well as by classical criteria including goodness-of-fit and relative errors. The empirical distribution of estimations and the stability of MLP and LR were studied by re-sampling methods. MLP and linear regression had comparable performance and robustness. Despite the flexibility of connectionist models, their predictions were stable. The empirical variances of weight estimations result from the distributed representation of the information among the processing elements. This emphasizes the major role of variances of weight estimations in the interpretation of neural networks. This needs, however, to be confirmed by further studies. Therefore MLP could be useful statistical models, as long as convergence conditions are respected.
Neural networks are modeling tools for neurophysiology and artificial intelligence (Reggia, 1993). They are also used as statistical models instead of classical approaches. In the medical field, they are applied to an increasing range of epidemiological problems, using mostly multi-layer perceptron. Examples of applications of neural networks to medical data include Baxt (1995); Bottaci et al. (1997); Cross et al. (1995); Fogel et al. (1998); Guh et al. (1998); Lapeer et al. (1995); Ottenbacher et al. (2001); Sonke et al. (2000).
Many epidemiological studies provide insufficient information regarding the statistical properties of the covariates studied, such as the normality of their distribution patterns, or the existence of colinearity. In addition there is often no information about the link function between variables, and the linearity has to be assumed. In recent epidemiological studies using neural networks, multi-layer perceptron (MLP) appears to be a solution to those problems, as it has been proven that three-layer perceptron networks are theoretically universal approximators (Hornik et al., 1989). Moreover, some works suggest that they can match or exceed the performance of classical statistical methods regarding their goodness of fit and their estimation and prediction capability. Connectionist models are seen as flexible methods, and used as a generalization of regression methods (Mariani et al., 1997). But their frequent utilization as “black boxes” is controversial (Schwarzer et al., 2000). Thus the use of connectionist models as a particular class of statistical models is an ongoing problem (Flexer, 1996).
Neural networks are directed and weighted graphs, where nodes are processing elements (PEs) with an inner state called activation. Those PEs are usually arranged in layers and are connected to many PEs in other layers via directed arcs. Associated with each connection is a real-valued weight wij. Each PE processes the input vector X it receives via these connections. Usually, this process consists in transformation of this input vector by an activation function h(W;X), and then by a transfer function g(h(W;X)). The simplest PE possible has a linear activation function and an identity transfer function. Its output is
After this process, the PE provides the continuous value yL, called local output value, to other PEs via its outgoing weighted connections. In feed-forward models, connections run forward from input to hidden PEs and from hidden PEs to output ones. Deciding how the PEs are connected, how the PEs process their information and how the connection weights are estimated all contribute to creating a neural network. Those models depend on the number, the design and the connections of the PEs (architecture), and also depend on the weights as well as their estimation methods (learning rules). In other words, the architecture specifies the model; weights between processing elements are the parameters of the model and the learning rule is the estimation method.
Each PE analyzes one part of the problem, thus the information is distributed among the processing elements. The output of the network, Y=fW(X), is a combination of local functions. This combination depends on the number of hidden PEs, and on the different classes of activation and transfer functions. Thus, complex overall behavior could result from simple local behavior.
To study the statistical behavior of neural networks, it is necessary to compare them to classical tools, by formal comparisons and simulations. During the last few years several comparisons have been published, including logistical models (Schumacher et al., 1996; Vach et al., 1996), principal components extractions (Nicole, 2000), time series analysis (Lisi and Schiavo, 1999) and autoregressive models (Tian et al., 1997), Cox regressions (Xiang et al., 2000). The conditions for use and the robustness of connectionist models are two frequently asked questions (Cheng and Titterington, 1994; Capobianco, 2000). Our study is in keeping with those works and focuses on comparison between MLP and linear regression.
The aim of this study is to provide a comparative evaluation of those two methods for simulated data sets and variables. The quality of the models and the effect of deviations from underlying assumptions of normality, independence and homoscedasticity are compared regarding their ability to predict and estimate.
The paper is organized as follows. The models are described in the following section. The learning rules, and some procedures to improve them, are presented in Section 3. In Section 4, the simulations are described. In Section 5 we report on the experimental results obtained. Finally, the discussion section analyzes the comparisons and the estimations, and attempts to discuss the conditions for use and interpretation of the MLP as a statistical model.