طبقه بندی از کار افتادگی دائمی با ترکیبی از تابع تعمیم یافته پایه شعاعی تکاملی و روش های رگرسیون لجستیک
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|1442||2012||6 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 39, Issue 9, July 2012, Pages 8350–8355
Recently, a novelty multinomial logistic regression method where the initial covariate space is increased by adding the nonlinear transformations of the input variables given by Gaussian Radial Basis Functions (RBFs) obtained by an evolutionary algorithm was proposed. However, there still exist some problems with the standard Gaussian RBF, for example, the approximation of constant valued functions or the approximation of high dimensionality associated to some real problems. In order to face these problems, we propose the use of the generalized Gaussian RBF (GRBF) instead of the standard Gaussian RBF. Our approach has been validated with a real problem of disability classification, to evaluate its effectiveness. Experimental results show that this approach is able to achieve good generalization performance.
In artificial neural networks (ANNs), the hidden neurons are the functional units and can be considered as generators of function spaces. Most existing neuron models are based on the summing operation of the inputs, and, more particularly, on sigmoidal unit functions, resulting in what is known as the Multilayer Perceptron (MLP). However, alternatives to MLP emerged in the last few years: Product Unit Neural Network (PUNN) models are an alternative to MLPs and are based on multiplicative neurons instead of additive ones. They correspond to a special class of feed-forward neural network introduced by Durbin and Rumelhart (1989). While MLP network models have been very successful, networks that make use of Product Units (PUs) have the added advantage of increased information capacity (Durbin & Rumelhart, 1989). That is, smaller PUNNs architectures can be used rather than those used with MLPs (Ismail & Engelbrecht, 2002). They aim to overcome the non-linear effects of variables by means of non-linear basis functions, constructed with the product of the inputs raised to arbitrary powers. These basis functions express possible strong interactions between the variables, where the exponents may even take on real values and are suitable for automatic adjustment. Another interesting alternative to MLPs are Radial Basis Function Neural Networks (RBFNNs). RBFNNs can be considered a local approximation procedure, and the improvement in both its approximation ability, as well as in the construction of its architecture has been noteworthy (Bishop, 1991). RBFNNs have been used in the most varied domains, from function approximation to pattern classification, time series prediction, data mining, signals processing, health monitoring, and non-linear system modelling and control (Howlett and Jain, 2001 and Zheng et al., 2011). RBFNNs use, in general, hyper-ellipsoids to split the pattern space. In many cases, MLP, PU and RBF networks are trained by using evolutionary algorithms (EAs), thus obtaining advantages with respect to traditional training approaches (Chakravarty and Dash, 2011, Fernández-Navarro et al., 2011a, Fernández-Navarro et al., 2011d, Tallón-Ballesteros and Hervás-Martínez, 2011 and Yao, 1999). On the other hand, logistic regression (LR) has become a widely used and accepted method of analysis of binary or multi-class outcome variables as it is more flexible and it can predict the probability of the state of a multi-class variable based on the predictor variables. Guti’errez, Hervás-Martínez, and Martínez-Estudillo (2011) proposed a multinomial logistic regression method, combining evolutionary Radial Basis Function (ERBF) and LR methods. The LR methods apply a logit function to the linear combination of the input variables. The coefficients values of each input variable are estimated by means of the Iterative Reweighted Least Square (IRLS) algorithm. Roughly, the methodology is divided into 3 steps. Firstly, an evolutionary algorithm (EA) is applied to estimate the parameters of the RBF. Secondly, the input space is increased by adding the nonlinear transformation of the input variables given by the RBFs of the best individual in the last generation of the EA. Finally, the LR algorithms are applied in this new covariate space. The standard Gaussian RBF has some drawbacks, for example, its performance decreases drastically when it is applied to approximate constant valued function or when dimensionality grows. For this reason, we propose the use of a Generalized RBF (GRBF) (Castaño et al., 2010 and Fernández-Navarro et al., 2011), instead of the standard Gaussian RBF. This novelty basis function incorporates a new parameter, τ, that allows the contraction–relaxation of the standard RBF, solving the problems previously stated. The performance of the proposed multinomial logistic regression methodology was evaluated in a real problem of permanent disability classification. Permanent disability is a term used in the insurance industry and law. Generally speaking, it means that due to a sickness or injury a person is unable to work in their own, or any occupation for which they are suited by training, education, or experience. In Spain, the evaluation and classification of permanent disability follows a procedure which is clearly defined and divided into three development phases: introduction, instruction and resolution. The main principles of the measures adopted with the aim of obtaining a consolidated and rationalized system for the determination of permanent disability are the contributory element, equity and solidarity. Furthermore, in order to establish greater legal security in the process of determining permanent disability, it is necessary to elaborate a list of diseases and the evaluation of their influence on the reduction of work capacity. This list must be created according to objective criteria based on the actual evaluations and proceedings of the disability assessment teams. To understand the nature of permanent disability, it is necessary to define the terminology first. Permanent disability takes into account continuous alteration of health and its impact on the worker’s occupational situation. The disability assessment team is supported by a medical unit. The medical unit’s competencies are: to examine the disability situation of the worker, to determine the reduction or alteration of the physical integrity of the worker, to determine the level of incapacity for work, to determine whether the character of the disease is common or professional, to extend the period of medical observation in case of professional diseases, to monitor programs for the control of temporal disability compensations, and to provide technical assistance and advice on any contentious issues concerning occupational disabilities. In our work we consider three main categories that can be assigned to a worker depending on the degree of permanent disability: no disability (when the worker is not assigned the status of permanent disability), permanent disability (when the worker is assigned some degree of permanent disability) and fee (when the worker is not assigned any degree of permanent disability, but is financially compensated). The objective of this study is to offer an initial model based on artificial neural networks and logistic regression which facilitates preparing reports in the process of determining the existence of permanent disability. This model allows to obtain an approximation of the expected result for each case of permanent disability. The training dataset used to obtain the model is composed of information from reports of the medical unit. Each report is tagged with one of the three categories (no disability, permanent disability or fee). An important characteristic of the dataset is that it is highly unbalanced.
نتیجه گیری انگلیسی
We have study the combination of Evolutionary Generalized Radial Basis Function instead of Evolutionary Radial Basis Function and logistic regression methods. This basis function solve some problems that lacks the performance of the standard Gaussian model, such as the approximation of constant valued function or the approximation of high dimensionality datasets. The good synergy between these two techniques has been experimentally proved using a permanent disability classification problem. The hybrid neuro-logistic models have proved to serve as an accurate tool in the classification of permanent disability. A comparative study between an extensive collection of standard classifiers and the results of the statistical tests applied, and the hybrid neuro-logistic models shows that the latter are more precise in determining the degree of permanent disability. Our hybrid models include a non-linear component (from different kinds of neural networks) and a standard linear component, combining both in a logistic regression predictor. The complexity of the model and the high amount of parameters involved in these classifiers encouraged us to use a combined methodology, including an evolutionary algorithm and a standard maximum-likelihood optimization process. Useful information could be extracted from the most accurate model, given its simple structure (number of connections and number of hidden neurons). Simple structure is one of the main advantages of the models presented. The obtained model is not intended to be a widely used tool in the classification of permanent disability. First, it would be necessary to examine more data as the scope of the PD problem is very broad due to the high number and complexity of cases. However, our findings can be used to develop new, improved systems. For instance, an extended model could be used to create an information system, both for patients and professionals, which would provide assistance in the evaluation of permanent disability.