تجزیه و تحلیل رگرسیون خطی فازی خوشه عاقلانه با متغیر خروجی فازی متقارن
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
24207 | 2006 | 27 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : www.sciencedirect.com/science/article/pii/S0167947306001915, Volume 51, Issue 1, 1 November 2006, Pages 287–313
چکیده انگلیسی
The traditional regression analysis is usually applied to homogeneous observations. However, there are several real situations where the observations are not homogeneous. In these cases, by utilizing the traditional regression, we have a loss of performance in fitting terms. Then, for improving the goodness of fit, it is more suitable to apply the so-called clusterwise regression analysis. The aim of clusterwise linear regression analysis is to embed the techniques of clustering into regression analysis. In this way, the clustering methods are utilized for overcoming the heterogeneity problem in regression analysis. Furthermore, by integrating cluster analysis into the regression framework, the regression parameters (regression analysis) and membership degrees (cluster analysis) can be estimated simultaneously by optimizing one single objective function. In this paper the clusterwise linear regression has been analyzed in a fuzzy framework. In particular, a fuzzy clusterwise linear regression model (FCWLR model) with symmetrical fuzzy output and crisp input variables for performing fuzzy cluster analysis within a fuzzy linear regression framework is suggested. For measuring the goodness of fit of the suggested FCWLR model with fuzzy output, a fitting index is proposed. In order to illustrate the usefulness of FCWLR model in practice, several applications to artificial and real datasets are shown.
مقدمه انگلیسی
In a statistical perspective, the regression analysis is utilized for studying the dependence relationship between a real phenomenon (dependent variable or output variable) and other (explanatory) real phenomena (explanatory variables or independent variables or input variables). The traditional regression analysis can be suitably utilized in the case of homogeneous observations. However, in many real cases, there are several situations where the observations are not homogeneous. In these cases, by utilizing the traditional regression, we have a loss of fitting performance of the regression model. In order to improve the goodness of fit, it is more suitable to utilize the so-called clusterwise regression analysis, in which we embed the techniques of clustering into regression analysis. In this way, the clustering methods are utilized for overcoming the heterogeneity problem in regression analysis. For explaining more clearly the aim and the real usefulness of the clusterwise regression analysis, we consider the following explicative example of clusterwise on a market segmentation problem in business, drawn by Lau et al. (1999): “The manager collects a sample of the sales and income data from a set of costumers. If the costumers have homogeneous income elasticity (i.e., the regression coefficient ββ), ββ can simply be estimated by regression of sales on income. In real business, costumers are heterogeneous and income elasticity will vary with customers of different clusters in the sample. The major tasks for the manager are: (i) use the income elasticity as the basis to divide customers into mutually exclusive segments, (ii) estimate the average income elasticity for each segment, (iii) identify the members of each segment. If we ignore the income elasticity differences among segments, the income elasticity estimated from the regression of sales on income will certainly be biased and inaccurate. In other words, if we want to model the parameter heterogeneity in the traditional regression, the appropriate statistical analysis will involve the simultaneous applications of the cluster analysis and regression model. One straightforward approach is the two stage method. In stage 1, we apply cluster analysis to the dataset to divide customers into segments. In stage 2, we perform regression for each segment to estimate the income elasticity. The problem is that the functions optimized in stages 1 and 2 are two different objective functions which are not necessarily related. A better formulation is to integrate the cluster analysis into regression framework, so that the income elasticities and segment membership parameters can be estimated simultaneously by optimizing one single objective function”. In the body of literature, there are many theoretical works on clusterwise regression analysis (see, for example, De Sarbo and Cron, 1988, De Sarbo et al., 1989, De Veaux, 1989, Hathaway and Bezdek, 1993, Hathaway et al., 1996, Hennig, 2000, Hennig, 2003, Hong and Chao, 2002, Lau et al., 1999, Leşki, 2004, Preda and Saporta, 2005, Quandt and Ramsey, 1978, Shao and Wu, 2005, Spath, 1979, Yang and Ko, 1997, Van Aelest et al., 2006 and Wedel and De Sarbo, 1995). Furthermore, the clusterwise regression analysis finds application in several fields, such as market segmentation and business, socio-economics, biology, engineering, and so on (see, for instance, Aurifeille and Quester, 2003 and De Sarbo and Cron, 1988; Hosmer, 1974; Lau et al., 1999 and Wedel and Steenkamp, 1991). In this paper the clusterwise linear regression is analyzed in a fuzzy framework. In particular, we propose a fuzzy clusterwise linear regression model (FCWLR model) with symmetrical fuzzy output and crisp input variables for performing fuzzy cluster analysis within a fuzzy linear regression framework. We build our FCWLR model by considering, simultaneously, the Bezdek's approach to fuzzy cluster analysis (Bezdek, 1981) and the linear regression model with fuzzy output variable View the MathML source(Y˜) and crisp explanatory variables (X1,…,Xk)(X1,…,Xk) suggested by Coppi and D’Urso (2003): View the MathML sourcemi=mi*+ei,mi*=xi′a,si(-)=si*(-)+εi(-),si(-)=mi-li,si*(-)=mi*-li*,li*=mi*b+d,si(+)=si*(+)+εi(+),si(+)=mi+li,si*(+)=mi*+li*, Turn MathJax on where View the MathML sourcexi′ is (1×(k+1))(1×(k+1))-vector containing the scalar 1 and the values of the k crisp input variables observed on the i th unit, View the MathML sourcemi,mi* are, respectively, the i th observed center and the i th interpolated center , View the MathML sourceli,li* are, respectively, the i th observed spreads and the i th interpolated spreads , aa is ((k+1)×1)((k+1)×1)-vector of regression parameters for mimi, bb, dd are the regression parameters for the other models, and View the MathML sourceei,εi(-),εi(+) are the residuals. In matrix form, we can write the previous model as follows: equation(1.1) View the MathML sourcem=m*+e,m*=Xa,s(-)=s*(-)+ε(-),s(-)=m-l,s*(-)=m*-l*,l*=m*b+1d,s(+)=s*(+)+ε(+),s(+)=m+l,s*(+)=m*+l*, Turn MathJax on where 11 is (n×1)(n×1)-vector of all 1's, XX is (n×(k+1))(n×(k+1))-matrix containing the vector 11 concatenated to k crisp input variables, mm, m*m* are, respectively, (n×1)(n×1)-vectors of observed centers and interpolated centers , ll, l*l* are, respectively, (n×1)(n×1)-vectors of observed spreads and interpolated spreads , aa is ((k+1)×1)((k+1)×1)-vector of regression parameters for mm, bb, dd are, respectively, the regression parameters for the other models, and View the MathML sourcee,ε(-),ε(+) are, respectively, (n×1)(n×1)-vectors of residuals. Notice that, the above fuzzy regression model is based on three linear models. The first one interpolates the centers of the fuzzy observations, the second and third ones yield the lower and upper bounds (centers±spreads)(centers±spreads), by building other linear models over the first one. The model is hence capable to take into account possible linear relations between the size of the spreads and the magnitude of the estimated centers. This is often the case in realistic applications, where dependence among centers and spreads is likely to occur (for instance, the uncertainty or fuzziness concerning a measurement may depend on its magnitude) ( Coppi and D’Urso, 2003 and D’Urso, 2003). Furthermore, in order to test the performance of the proposed FCWLR we suggest a suitable fitting measure, i.e., the R2R2 coefficient. The structure of the paper is characterized in the following way. In Section 2, we define the fuzzy data, i.e., the symmetrical fuzzy data and in Section 3 we consider a particular distance measure between symmetrical fuzzy data. Successively, in Section 4, we propose a FCWLR model with symmetrical fuzzy output variable and crisp input variables. In particular, we formalize the model and solve the connected optimization problem; furthermore, for measuring the fitting of our model, we propose the R2R2 coefficient and then prove the decomposition of the total deviation. In Section 5, for showing the applicative performances, our model is applied to several datasets. Some concluding remarks are considered in Section 6.
نتیجه گیری انگلیسی
In this paper, we have suggested a fuzzy clusterwise linear regression model (FCWLR model) for symmetrical fuzzy output. Furthermore, in order to measure the fitting of our model we have proposed the R2R2 coefficient and then proved the decomposition of the total deviation. For showing the applicative performances, our model has been applied to different datasets. Interesting questions for future research include: 1. Simulation study in order to analyze in depth the computational performances of our suggested fuzzy clusterwise linear regression analysis. 2. Fuzzy clusterwise linear regression models based on different types of entropy regularization (for instance, by considering the Shannon entropy measure, the Rényi entropy, etc.). 3. Cluster-validity criteria for fuzzy clusterwise regression. 4. The extension of fuzzy clusterwise regression for fuzzy variables with mixed membership functions. 5. The construction of fuzzy clusterwise regression techniques for dealing with fuzzy random response variables. In this case, the uncertainty to be processed in the model will include both fuzziness and randomness. 6. The extension of the suggested fuzzy clusterwise regression analysis for datasets completely fuzzy (i.e., for fuzzy response variable and fuzzy explanatory variables). 7. The utilization in the estimation procedures of other types of distance measures (e.g., Bertoluzza et al., 1995; Tran and Duckstein, 2002a and Tran and Duckstein, 2002b) between fuzzy variables. 8. Fuzzy clusterwise nonlinear regression analysis (i.e., by considering fuzzy polynomial regression models (see, for instance, D’Urso and Gastaldi, 2002).