رویکرد سلسله مراتبی بیزی برای تجزیه و تحلیل داده های شمارش طولی با پراکندگی بیش از حد: یک مطالعه شبیه سازی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|10106||2013||13 صفحه PDF||سفارش دهید||7236 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computational Statistics & Data Analysis, , Volume 57, Issue 1, January 2013, Pages 233-245
In sets of count data, the sample variance is often considerably larger or smaller than the sample mean, known as a problem of over- or underdispersion. The focus is on hierarchical Bayesian modeling of such longitudinal count data. Two different models are considered. The first one assumes a Poisson distribution for the count data and includes a subject-specific intercept, which is assumed to follow a normal distribution, to account for subject heterogeneity. However, such a model does not fully address the potential problem of extra-Poisson dispersion. The second model, therefore, includes also random subject and time dependent parameters, assumed to be gamma distributed for reasons of conjugacy. To compare the performance of the two models, a simulation study is conducted in which the mean squared error, relative bias, and variance of the posterior means are compared.
In medical research, data are often collected in the form of counts, e.g., corresponding to the number of times that a particular event of interest occurs. A common model for count data is the Poisson model, which is rather restrictive, given that variance and mean are equal. Often, in observed count data, the sample variance is considerably larger (smaller) than the sample mean—a phenomenon called overdispersion (underdispersion). Generically, this is referred to as extra-(Poisson)-dispersion (Iddi and Molenberghs, 2012). If not appropriately accounted for, extra-dispersion may cause serious flaws in precision estimation, and inferences based there upon (Breslow, 1990). However, such excess variation has little effect on the estimation of the regression coefficients of primary interest (Cox, 1983). One of the approaches to this problem is to assume a specific, flexible parametric distribution for the Poisson means associated with each observed count. Margolin et al. (1981) assumed a gamma mixing distribution for the Poisson means which leads to the negative binomial model. The advantage of this parametric approach is that parameter estimates may be obtained by maximum likelihood, leading to estimates that are asymptotically normal, consistent, and efficient if the parametric assumptions are accurate (Cramér, 1946 and Wald, 1949). Under conditions discussed by Cox (1983), maximum likelihood methods maintain high efficiency for modest amounts of extra-dispersion, even when not explicitly accounted for in the parametric model. Pocock et al. (1981) proposed an intermediate solution, via maximum likelihood, to the problem of fitting regression models to tables of frequencies when the residual variation is substantially larger than would be expected from assumptions. Williams (1982) proposed a moment method for logistic linear models, and Breslow (1984) used the method proposed by Pocock et al. (1981) and Williams (1982) for log-linear models. Furthermore, the quasi-likelihood method, which can be considered a moment method, was applied for overdispersion by McCullagh and Nelder (1989) and Wedderburn (1974). The asymptotic properties of all these moment methods for extra-binomial and extra-Poisson variations were studied by Moore (1986). For modeling longitudinal count data with overdispersion, similarly to Zeger (1988) and Thall and Vail (1990) developed a mixed-effects approach in which the regression coefficients are estimated by generalized estimating equation and the variance component is estimated using method of moments. This may be viewed as an extension of Liang and Zeger’s (1986) model for longitudinal count data. Variance components are generally of broad interest (Pryseley et al., 2011). Besides, Booth et al. (2003) and Molenberghs et al. (2007) brought together both modeling strands and allowed at the same time correlation between repeated measures and overdispersion in the counts. This work was extended by Molenberghs et al. (2010) to data types different from counts. Molenberghs et al. (2007) termed their model the combined model. All of these authors conducted parameter estimation and inferences using a likelihood paradigm. In contrast, this paper takes a likelihood perspective. In particular, two versions of a hierarchical Poisson model for longitudinal count data are studied. The first one includes subject-specific random effects to account for subject heterogeneity (a conventional generalized linear mixed model) and the second one includes an additional parameter accounting for overdispersion, generated through an additional gamma distributed random effect (a combined model). The two models are applied to real longitudinal count data and compared using a simulation study. This paper proceeds as follows. In Section 2, the motivating study is described, which comprises a set of data on epileptic patients. The statistical methodologies is laid out in Section 3. In Section 4, the data set is analyzed, followed by a simulation study in Section 5.
نتیجه گیری انگلیسی
A Bayesian inferential route was proposed for the HPNOD (and the HPN), and compared the performance of the HPN and HPNOD models on data generated with and without overdispersion. A Bayesian approach was adopted. When the data are generated with high overdispersion levels, the HPN model leads to higher bias and less precise estimates for the variance of the random effect (σ2) than the HPNOD. HPN and HPNOD produce similar results for the slopes. HPNOD and HPN provide similar bias and precision for the slopes and for the random effects variance σ. To check the problem with the intercept estimates using the HPNOD model, the correlation between the parameters was calculated. The intercepts between the two models cannot be directly compared, but only indirectly, given that it takes the form logE(θij)+β0+0.5σ2 in the HPNOD and β0+0.5σ2 in the HPN. A Deviance Information Criterion (DIC) was applied to check the overall performance of both models. The DIC result seems to imply that the HPNOD is much better than the HPN model for data with high, moderate, and low overdispersion. Nevertheless, the HPNOD model has slightly smaller DIC values than the HPN for data without overdispersion. The results of the simulation study also show that there is an effect of cluster size and sample size. The bias and the MSE decrease when the cluster size increase and there is a slight decrease of the bias and the MSE when the sample size increases. To investigate the robustness of the simulation study, three different true values for View the MathML source were chosen. The results obtained were similar under these three different true values of View the MathML source which shows the robustness of the simulation study. Most of our findings for the analysis of the epilepsy data set are in agreement with the findings reported in Molenberghs et al. (2007). In both studies, there was a difference in the estimates of the intercepts and also on the inference of the slopes using both models. The HPNOD model shows also that there is no significant change in the number of epileptic seizures over time for the patients who received the treatment while the HPN models does. This underscores the importance of careful extra-dispersion modeling. Further, both models produce non-significant values for the difference and ratio in slopes. However, the study done by Molenberghs et al. (2007) shows that there is significant difference in the slopes using the HPN. In both studies, the HPNOD model fits better than the HPN model. Note that our findings are different from the ones reported in Thall and Vail (1990) and in Lindsey (1993). This should not come as a surprise, because these authors consider a different set of data, studying different compounds. To conclude, the HPNOD model performs better than the HPN model for data featuring high, moderate and low overdispersion level. However, both models perform similarly for data without overdispersion. Using the HPN model, the bias and MSE of all parameters increases when the overdispersion level increases. The HPN model results in bias and inefficient estimates for all parameters, especially for σ and for data with high overdispersion (0<α<=0.25). This may be due to the excess variability resulting from overdispersion not taken into account with the HPN model. This underscores that we should accommodate the extra-model variability. Further investigation is needed to answer the question why the HPNOD model is providing unbiased estimate of the intercepts when the data are generated with moderate overdispersion level but not when there is high overdispersion, low overdispersion, and no overdispersion.