عملکرد مدل بقای پارامتری تحت فاصله غیر تصادفی سانسور شده : یک مطالعه شبیه سازی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|10256||2013||15 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computational Statistics & Data Analysis, , Volume 63, July 2013, Pages 16-30
In many medical studies, individuals are seen periodically, at a set of pre-scheduled clinical visits. In such cases, when the outcome of interest is the occurrence of an event, the corresponding times are only known to fall within an interval, formed by the times of two consecutive visits. Such data are called interval censored. Most methods for the analysis of interval-censored event times are based on a simplified likelihood function which relies on the assumption that the only information provided by the censoring intervals is that they contain the actual event time (i.e. non-informative censoring). In this simulation study, the performance of parametric models for interval-censored data when individuals miss some of the pre-scheduled visits completely at random (MCAR), at random (MAR) or not at random (MNAR) was assessed comparing also with a simpler approach that is often used in practice. A sample of HIV-RNA measurements and baseline covariates of HIV-1 infected individuals from the CASCADE study is used for illustration in an analysis of the time between the initiation of antiretroviral treatment and viral load suppression to undetectable levels. Results suggest that parametric models based on flexible distributions (e.g. generalised Gamma) can fit such data reasonably well and are robust to irregular visit times caused by an MCAR or MAR mechanism. Violating the non-informative censoring assumption though, leads to biased estimators with the direction and the magnitude of the bias depending on the direction and the strength of the association between the probability of missing visits and the actual time-to-event. Finally, simplifying the data in order to use standard survival analysis techniques, can yield misleading results even when the censoring intervals depend only on a baseline covariate.
In conventional time-to-event analyses it is assumed that the time between the study’s origin and the onset of the event of interest is either exactly known or right censored (i.e. greater than the available follow up time). An array of methods is available for the analysis of such data such as the Kaplan–Meier estimator of the survival function, log-rank tests for comparisons between groups or various types of semi-parametric (e.g. proportional hazards Cox model) or parametric regression methods for modelling the hazard of the event or the survival time in terms of a set of covariates. However, in many cases the onset of the event cannot be immediately observed and the analyst knows only the interval of time within which the event occurred. For example, in many medical studies patients are seen at a set of pre-scheduled visits at the clinic where the physician can determine if a condition is present or not. The last visit at which the condition was absent and the first at which the condition was present can be used to form a time interval within which the event of interest must have occurred, giving rise to the so-called interval-censored data. More formally, if Ti is the random variable representing the time to the event of interest of the i-th (i=1,2,…,n) individual and data are interval censored, instead of observing Ti, the observables are intervals (Li,Ri] such that Ti∈(Li,Ri]. This definition allows also for exactly observed, right-censored and left-censored data for which Li=Ri,Ri=∞ and Li=0, respectively. The motivation for this investigation stems from epidemiological studies on HIV infected individuals and more specifically from those focusing on the so-called virologic response to treatment. In such studies, the time origin is usually the initiation of “combination antiretroviral treatment” (cART; simultaneous administration of at least three anti-HIV drugs) and the event of interest is the suppression of the HIV viral load to levels which are below the threshold of detection of modern assays. Inference focuses usually on the estimation of the survival or cumulative incidence functions and between groups comparisons, often summarised by the estimated median time-to-virologic response and hazard ratios, respectively. However, HIV viral load is only periodically measured and the exact time Ti of achieving undetectability for the first time after cART initiation, is only known to lie between the time of the last measurement at which viral load was detectable (Li) and the first one at which it was undetectable (Ri). A common approach, often used in this type of studies, is to “simplify” the data by assuming that the time of virologic response coincides with the time of the first undetectable measurement (i.e. Ti=Ri; will be referred to as the “rightpoint” method) and then analyse the data using standard survival analysis techniques (Althoff et al., 2010 and Pence et al., 2007). There are also other variations of these data simplification techniques (e.g. imputing Ti using the middle of the (Li,Ri] interval; which will be referred to as the “midpoint” method) but they are rarely used in HIV research (Geretti et al., 2009).
نتیجه گیری انگلیسی
Interval-censored data are frequently observed in many research areas, especially in medical studies focusing on the time to the occurrence of an event. In many such studies, individuals are periodically monitored thus event times are only known to fall within a time interval formed by the times of two consecutive examinations. Although several methods for the analysis of interval-censored time-to-event data have been developed and are nowadays implemented in easy to use software, many researchers prefer to “simplify” the data and apply the well established methods for right-censored survival times. However, this approach has been shown to be prone to bias and likely to yield misleading results. On the other hand, the aforementioned methods for the analysis of interval-censored data rely on the assumption of non-informative censoring and ignore possible dependence between examination times and the actual event time. The aim of this work was to investigate the performance of fully parametric survival models when the interval-censoring mechanism is not completely random. The motivation for this study was the analysis of the time between the initiation of antiretroviral therapy in HIV infected individuals and the suppression of HIV viral load to undetectable limits (i.e. virologic response). Frequency of viral load measurements can vary depending on various baseline factors and even associations of this frequency with the unobserved time to virologic response cannot be ruled out, especially in observational studies. In such cases the non-informative conditions may not hold and the appropriateness of the “simplified” likelihood function is not guaranteed. A sample of data from the CASCADE collaboration was analysed by various methods ranging from non-parametric estimators of the survival function to fully parametric AFT models. These methods were applied to both the original interval-censored data and to two “simplified” versions of them where virologic response was assumed to coincide either with the first undetectable viral load measurement (“rightpoint” method) or the middle of the interval between the last detectable and the first undetectable measurement (“midpoint” method). Results from these analyses highlighted the severe overestimation of the time required to achieve viral load undetectability when using the “rightpoint” data simplification approach whereas the “midpoint” method resulted in an estimated survival curve which was closer to the one estimated from methods which account for the interval-censored nature of the data. Differences in virologic response rates between individuals with different routes of transmission were not clear and its magnitude and statistical significance was heavily depending on the methods of analysis. There were strong indications though that the lower frequency of viral load measurements among IDUs may have influenced the analysis of the “simplified” data, giving inflated estimates of the differences in virologic response rates between IDUs and MSM. A simulation study designed to partly mimic the real CASCADE data but also incorporate various mechanisms of missing visits was then conducted. Results from the one-group scenarios showed that even with no missing visits at all or with visits missing completely at random, specific parameter values of the data generating distribution can lead to slow convergence to normality with parameters’ estimates having skewed sampling distributions and strong mutual correlations. Shape and scale parameters derived from a generalised Gamma AFT model could be severely biased unless sample sizes were very large but survival curves and median times-to-event were accurately estimated. However, when visits were missing not at random the effects on the estimation of the survival curve were clear: when the probability of missing visits was higher (thus censoring intervals were wider) among individuals with longer event times, the estimated survival curves were shifted to the left leading to underestimated median times-to-event and vice-versa. The degree of under- or over-estimation was higher when the association between the probability of missing visits and the unobserved time-to-event was stronger. Two-group scenarios focusing on the estimation of the hazard ratio between two groups showed that fully parametric models for interval-censored data performed very well when the censoring mechanism was completely random or depending on a baseline covariate. In the latter case both “simplified data” approaches combined with the application of a standard Cox model yielded biased estimates of the hazard ratio. As in the one-group simulations though, when the probability of missing visits was depending on the event time, both the parametric model for interval-censored data and the Cox model fitted on the “simplified” data gave biased estimates of the between groups hazard ratio. In summary, results from both real and simulated data suggest that simplification approaches based on single imputations of the interval-censored event time should be avoided. Parametric survival models for interval-censored data can be used instead as they are able to fit complex data reasonably well, especially when based on flexible distributions (e.g. generalised Gamma). Caution is required though when sample sizes are small and model parameters rather than survival curves or hazard ratios are of interest. Comparisons with results obtained from non-parametric methods and careful examination of the implied survival curves is recommended. Simulation results showed that AFT models for interval-censored data are robust to irregular visits’ times caused by an MCAR or MAR mechanism. MNAR mechanisms though can heavily affect inference. Despite the complexity of the dependent interval censoring issue, some relevant methods have been proposed in the literature. For example, in the special case of current status data (case I interval censoring i.e. only one examination time) van der Laan and Robins (1998) and Zhang et al. (2005) proposed methods based on inverse probability of censoring weights and unobservable random effects, respectively. For the general case of interval-censored data, Finkelstein et al. (2002) investigated one-sample problem and Zhang et al. (2007) considered a proportional hazards frailty model. Finally, Wang et al. (2010) proposed an additive hazards model with a generalisation for two monitoring times which may be dependent on the failure time of interest. In a similar way to the simpler case of right censoring (Williams and Lagakos, 1977), the dependency between the censoring mechanism and the survival process cannot be tested without relying on extra assumptions (Betensky and Finkelstein, 2002) or modelling (van der Laan and Robins, 1998). Unfortunately, most of these methods cannot be applied when there are many and irregular visit times and have not been implemented yet in the mainstream statistical software packages. In the absence of user friendly tools for the analysis of interval-censored survival data with potentially informative censoring, the analyst should be cautious when frequency and timing of the examinations is highly variable. Factors that are probably related to both the probabilities of missed or delayed visits and the event rate (e.g. patient’s compliance to administered treatment in a time-to-response study) should be identified and included in multivariable models in an effort to achieve conditional independence between the interval censoring mechanism and the event times. It is evident though that violations of the non-informative assumption cannot be ruled out thus further work focusing on the development of methods for sensitivity analyses is required.