Our work examines the performance of proposed local influence diagnostics applied to multivariate normal longitudinal data with drop-outs: these diagnostics prove to be ambiguous as they are sensitive not only to the presence of anomalous records, as intended, but also, unfortunately, to the misspecification of the longitudinal covariance structure of the response. We suggest an unambiguous index for detecting covariance misspecification, and recommend that an analyst use this index first to confirm that the covariance structure is well specified before attempting to interpret the influence diagnostics.
There has been considerable interest in modeling longitudinal data with non-ignorable (NI) or, equivalently, missing-not-at-random (MNAR) drop-outs (see, for example, Diggle and Kenward, 1994, Molenberghs et al., 1997, Fitzmaurice and Laird, 2000 and Wilkins and Fitzmaurice, 2006). NI drop-outs are records (or, specifically in the biostatistical area, subjects) which disappear from the study prematurely and whose disappearance is related to the subsequently missing measurements. Since the assumptions necessary for fitting, say, a Diggle and Kenward selection model (Diggle and Kenward, 1994) to this type of data are often unverifiable, more recent work has focused on the use of sensitivity analyses (Molenberghs et al., 2001, Verbeke et al., 2001 and Jansen et al., 2006). These authors recommend the use of missing-at-random (MAR) modeling supplemented with local influence diagnostics to detect the impact of possible deviations from the MAR assumption. MAR describes drop-outs whose missingness depends only on the observed and not the missing measurements. (Further explanations of the MAR and NI/MNAR drop-out models can be found in Little and Rubin, 1987, Chapter 6.) Jansen et al. (2006) have indicated that these local influence diagnostics should be used ‘not to detect individuals that drop-out non-randomly, but rather to detect anomalous subjects that lead to a seemingly MNAR mechanism.’ These authors state that ‘a careful study of such subjects, combined with appropriate treatment (e.g., correction of errors, removal, etc.), can lead to a final MAR model, in which more confidence can be put by the researchers, which ultimately is the goal of every sensitivity analysis.’
There has also been interest in the consequences of incorrectly specifying the covariance structure associated with the longitudinal response (Crowder, 2001, Koreshia and Fang, 2001, Wang and Carey, 2003 and Wang and Lin, 2005). These papers consider ways for improving the estimation of the covariance structure, given the loss in regression estimation efficiency when that structure has been misspecified. Our results indicate that one consequence of incorrectly specifying a covariance structure is that the above local influence diagnostics can then be misleading, apparently signaling that particular records are influential when that is not actually the case. To assist analysts, we propose an index for covariance misspecification that we have found to be unambiguous in the presence of influential drop-outs. We recommend that analysts first examine this index to ensure that the covariance structure is well specified before attempting to interpret the influence diagnostics.
To obtain this index, we considered a number of established candidates in a study that was exploratory rather than definitive. The study found that the distributions of all but one of the indices were strongly affected by whether the missingness was MAR or NI, making those indices difficult to use since the determination of the true nature of missingness requires strong and unverifiable assumptions. Our study was pragmatic rather than theoretical, since we are interested in determining methods that perform well in practice as opposed to working well with pre-chosen forms of misspecification.
Clearly, interesting follow-up questions remain about measuring covariance misspecification. Among these is finding a suitable measure of sensitivity to the parameters of the covariance structure. Another is the level of that sensitivity. The two papers by Banerjee and Magnus, 1999 and Banerjee and Magnus, 2000 provide sensitivity statistics which measure the effects on the regression coefficient estimates and t- and F-tests when the errors in OLS are, say, from an ARMA(p,q)(p,q) time series model but are incorrectly assumed to be independent. Since our own interest is with misspecification in any of a potentially large number of ways from a range of possibilities, the challenge of performing comprehensive sensitivity analyses is substantial. Instead, we took the pragmatic approach of selecting a variety of possible test cases that deviated moderately from the structure assumed in the analysis and examining the consequences. Doing so focuses on the existence of weaknesses in the previously proposed influence diagnostics under conditions that might occur in practice, rather than on a deeper analysis of the nature and causes of those weaknesses.
In Section 2, we describe standard models for longitudinal multivariate normal data and for drop-outs that will be used in simulations. The local influence diagnostics and their simulation findings are discussed in Section 3, while the index for detecting covariance misspecification and simulation results are in Section 4. Section 5 provides concluding remarks.
We recommend that analysts use a proposed index, CM2CM2, for spotting covariance misspecification first in order to ensure that the covariance structure is well specified before attempting to interpret the influence diagnostics. Otherwise, those diagnostics can be misleading. The choice of CM2CM2 is based on the fact that its distribution is relatively unaffected by changes in the regression and drop-out model parameters, the number of records, the record lengths, and the nature of the drop-out.