مدل رگرسیون لجستیک دو مرحله ای
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|24766||2009||8 صفحه PDF||سفارش دهید||4910 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 36, Issue 3, Part 2, April 2009, Pages 6727–6734
In this article, a logistic regression model combined with decision tree for dealing with a significant interaction effect among the explanatory variables is suggested. Decision tree is applied for investigating the interaction among explanatory variables and grouping subjects based on χ2 value for optimal split. Each group of subjects which is named cluster is determined by optimal split for the interacting explanatory variables. The suggested model incorporates this cluster as an explanatory variable for including significant interaction in the logistic regression model. This model shows better performances in assessment of predictive model than the logistic regression model or decision tree: better ranked classes, increased correct classification rate and R2, improved Kolmogorov–Smirnov (K–S) statistic, and a better lift. National pension data are applied to this model, and as an application of the suggested model, strategies for reducing financial risks in managing and planning for pension financing are illustrated.
Efficient financial planning is essential for the management of the National pension. Currently, society tends toward a low-birth rate as well as longevity, especially in Korea. To reflect the current society’s trend in the management of the National pension, the prediction and classification of potential pensioners based on their probability scores for pension payment is important. This allows the possibility to determine the characteristics of the classified members, and thus to suggest strategies for reducing financial risks in planning for paying and collecting contributions by reflecting the classified members’ characteristics. Of the several types of National pension, this study pertains to the survivor’s pension (SP) that occurs as the result of diverse and unexpected causes. The data are analyzed with the goal to predict the occurrence of the SP payment and determining which factors are influential to the SP payment among five basic factors (explanatory variables). The classification of the potential pensioners and finding their characteristics for providing strategies on managing and planning the financing of pension in relation to the basic but essential factors is also a goal of this study. The National pension data consists of three groups: disability pensioners (DP), early old-age pensioners (EOP), and potential pensioners (PP) who are insured persons or qualify as a pensioner but do not belong to the former two groups. The logistic regression analysis revealed the PP group showed the largest probability of odds for the occurrence of SP among the three groups. Thus, the present study for SP is conducted with the PP group, which is the most influential for the occurrence of SP. SP is given to the insured person’s survivor, where the insured person has participated for over 10 years in the pension, and died without receiving benefits; a lump-sum death payment would be provided when the eligibility of the SP is not met. The statistical model is fitted to the fatalities among the PP for the purpose of examining the significant factors of the five basic explanatory variables. The constructed model predicts and classifies the pensioners according to the predicted score for the occurrence of SP. Several studies on pension have been performed with a logistic regression model (Bergh et al., 2007 and Chen et al., 2007) or a probit model (Huberman, Iyengar, & Jiang, 2007). Not only logistic regression but also decision trees are used for estimating class membership of a categorical dependent variable without any assumption of the explanatory variable (Breiman et al., 1984, Buntine, 1992 and Lewis, 2004). Few works have been published on the comparison of classification techniques in different areas (Camdeviren et al., 2007 and Kurt et al., 2008). Logistic regression has been utilized for predicting the occurrence of an interesting event or estimating the probability score for occurrence of an interesting event (Agresti, 2002 and Hosmer and Lemeshow, 1989). This model provides the information on the effects of the explanatory variables regarding the dependent variable. However, when the logistic regression model includes significant interaction effects, the main effect becomes complicated to explain. In addition, if many interaction effects exist, which interactions should be included in the model are difficult to determine. Conversely, decision tree allows explicit examination of the interaction effect, and to determine which interaction effects are most influential and thus provide the influential interactions to be involved in the model. A two-stage logistic regression model for handling interaction effect is suggested in this paper in order to explain both the main and the interaction effects in the logistic model; influential interactions are selected via decision tree analysis, and a cluster variable of representing optimal trees as categories is involved in the logistic regression model as an explanatory variable. This two-stage logistic regression model incorporates interactions of explanatory variables and explains the main effect when interactions in the logistic regression model are present. This suggested model improves correct classification rate (CR) as well as the Kolmogorov–Smirnov (K–S) statistic which measures how well the classified classes are ordered, and improves Max rescaled R2 which measures the correlation between the observed and the predicted value. Comparison of the suggested model with traditional logistic regression models is shown in the following order. Section 2.1 describes the data and logistic regression model for the occurrence of the SP payment. Section 2.2 discusses the benefits of sampled data with equalized frequency of the binary responses in applying the logistic regression model. Three logistic regression models are introduced. Section 2.3 introduces the suggested model, motivation, and the how to incorporate the interaction effect with application of the decision tree to the logistic regression model. Comparisons of logistic regression models with the suggested model are also presented. Four possible logistic regressions are compared regarding performances.