مدل های رگرسیون برای داده های بقای گروه بندی شده: ارزیابی و تجزیه و تحلیل حساسیت
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|26401||2011||15 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computational Statistics & Data Analysis, Volume 55, Issue 2, 1 February 2011, Pages 993–1007
In this study, regression models are evaluated for grouped survival data when the effect of censoring time is considered in the model and the regression structure is modeled through four link functions. The methodology for grouped survival data is based on life tables, and the times are grouped in kk intervals so that ties are eliminated. Thus, the data modeling is performed by considering the discrete models of lifetime regression. The model parameters are estimated by using the maximum likelihood and jackknife methods. To detect influential observations in the proposed models, diagnostic measures based on case deletion, which are denominated global influence, and influence measures based on small perturbations in the data or in the model, referred to as local influence, are used. In addition to those measures, the local influence and the total influential estimate are also employed. Various simulation studies are performed and compared to the performance of the four link functions of the regression models for grouped survival data for different parameter settings, sample sizes and numbers of intervals. Finally, a data set is analyzed by using the proposed regression models.
Grouped survival data are obtained from studies in which all sample units are evaluated at the same moment, which implies the occurrence of an excessive number of tied lifetimes. Hence, to eliminate the presence of ties, the lifetimes are grouped into intervals so that only the information concerning the intervals in which the individuals failed or were censored is available. According to Aranda-Ordaz (1983), the presence of many tied times is a particularly problematic point in model fitting, since the verification of the model’s assumptions is based on continuous data. The first ideas concerning the treatment of grouped survival data arose in the 1980s from articles by Cox (1972) and Kalbfleisch and Prentice (1973), which presented semi-parametric models for analyzing this type of data. Some more recent applications on grouped survival data can be found in the literature. For instance, Hertz-Piccioto and Rockhill (1997) studied the efficiency of partial likelihood approximations in the presence of ties; Lam and Ip (2003) reported group-based modeling; and Yu et al. (2004) proposed models with a cure fraction for grouped survival data under the parametric approach in the absence of covariates. According to Heitjan (1991), inferential analyses have rarely been performed, particularly because of the difficulty of the computational calculation due to the fact that the response variable is changed into intervals. Therefore, this paper presents a modified regression model for grouped data in which the regression structure is modeled by using four link functions (logit, complementary log–log, log–log and probit). Also, the likelihood function is modified so as to include failing individuals, individuals at risk and censored individuals. We consider a frequentist analysis of a regression model for grouped survival data. The inferential part was carried out using the asymptotic distribution of the maximum likelihood estimators, which, with a small sample, can present some difficulties. As an alternative to classic analysis, we explore the use of the jackknife estimator for regression models. In this case, it is not necessary to use the asymptotic distribution of the maximum likelihood estimators. After modeling, it is important to check the assumptions in the model as well as to conduct a sensitivity study to detect influential or outlying observations that can distort the results. Numerous approaches have been proposed in the literature to detect influential or outlying observations. An efficient way to detect influential observations was proposed by Cook (1986). He suggested that more confidence can be placed in a model which is relatively stable under small modifications. The best-known perturbation schemes are based on case deletion, introduced by Cook (1977), in which the effect of completely removing cases from the analysis is studied. This reasoning forms the basis for our global influence introduced in Section 3.1, and in performing such an analysis, it will be possible to determine which subjects might be influential for the analysis (see, for example, Cook and Weisberg, 1982 and Xie and Wei, 2007). On the other hand, when using case deletion, all information from a single subject is deleted at once. Therefore, it is hard to tell whether that subject has any influence on a specific aspect of the model. A solution for the earlier problem can be found in a quite different paradigm, by taking a local influence approach where one again investigates how the results of an analysis change under small perturbations in the model and where these perturbations can be specific interpretations. Also, some authors have investigated the assessment of local influence in survival analysis models. For instance, Pettitt and Bin Daud (1989) investigated local influence in proportional hazard regression models; Escobar and Meeker (1992) adapted local influence methods to regression analysis with censoring; Ortega et al. (2003) considered the problem of assessing local influence in generalized log-gamma regression models with censored observations; Ortega et al. (2006) derived curvature calculations under various perturbation schemes in exponentiated Weibull regression models with censored data; Carrasco et al. (2008) investigated local influence in log-Weibull modified regression models with censored data; Silva et al. (2008) adapted global and local influence methods in log-Burr XII regression models with censored data; Ortega et al. (2009) investigated local influence in generalized log-gamma regression models with a cure fraction; and Hashimoto et al. (2010) derived the appropriate matrices for assessing local influences in the log-exponentiated Weibull regression model for interval-censored data. We have developed a similar methodology to detect influential subjects in regression models for grouped survival data. The article is organized as follows. In Section 2, we describe the regression model for grouped survival data using one of the four link functions as well as the methods of estimation by maximum likelihood and jackknife employed in this study. In Section 3, we discuss the measurements used for analyzing sensitivity. In Section 4, we perform various simulation studies, and compare them to the performance of the four link functions, for different parameter settings, sample sizes and numbers of intervals. In Section 5, we present an application of the proposed method on a real data set. Finally, we present our main conclusions in Section 6.
نتیجه گیری انگلیسی
In this paper, a regression model for grouped survival data is proposed as an alternative to model lifetime in the presence of many tied data. We used the quasi-Newton algorithm to obtain the maximum likelihood estimates and performed asymptotic tests for the parameters based on the asymptotic distribution of the maximum likelihood estimators. On the other hand, as an alternative analysis, the paper discusses the use of the jackknife estimator for the regression model for grouped survival data. In the applications within real data, we observed that all estimation methods presented similar results. Furthermore, this article compared the performance of the proposed model considering four link functions (logit, complementary log–log, log–log and probit). We also applied global and local influence methodologies in a regression model for grouped survival data. The necessary matrices for application of the techniques were obtained by taking into account some usual perturbations in the model/data. By applying the procedures on a data set from the medical area, we could assess the sensitivity aspects of the maximum likelihood estimates under some perturbation schemes as well as check the goodness-of-fit of the postulated model. Although the diagnostic plots detected some possible influential observations, their deletion did not cause inferential changes in the results. Furthermore, this article compared the performance of the proposed model under four link functions and different numbers of intervals based on mean squared error through a simulation study. These simulations suggest that regression models for grouped survival data can be used for modeling data considering logit and complementary log–log link functions. The approach was applied to real data sets, which indicates the usefulness of the approach.