بهینگی نابرابر در مقابل اندازه خوشه برابر برای تجزیه و تحلیل رگرسیون خطی اثر مخلوط از کارآزمایی های تصادفی با خوشه در یک بازوی درمان
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
24306 | 2010 | 15 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computational Statistics & Data Analysis, Volume 54, Issue 8, 1 August 2010, Pages 1906–1920
چکیده انگلیسی
The efficiency loss due to varying cluster sizes in trials where treatments induce clustering of observations in one of the two treatment arms is examined. Such designs may arise when comparing group therapy to a condition with only medication or a condition not involving any kind of treatment. For maximum likelihood estimation in a mixed effects linear regression, asymptotic relative efficiencies (RE ) of unequal versus equal cluster sizes in terms of the DD-criterion and DsDs-criteria are derived. A Monte Carlo simulation for small sample sizes shows these asymptotic RE s to be very accurate for the DsDs-criterion of the fixed effects and rather accurate for the DD-criterion. Taylor approximations of the asymptotic REs turn out to be accurate and can be used to predict the efficiency loss when planning a trial. The RE usually will be more than 0.94 and, when planning sample sizes, multiplying both the number of clusters in one arm and the number of persons in the other arm by 1/RE is the most cost-efficient way of regaining the efficiency loss.
مقدمه انگلیسی
Trials evaluating the effect of an intervention are often characterized by observations being correlated within clusters. A well-known case is group or cluster randomized trials (Donner and Klar, 1994 and Raudenbush, 1997), where groups (e.g. schools or general practices) are assigned to one of the several treatment conditions. In these designs groups are the units of assignment. Observations may also be clustered when individuals are the units of assignment. This may occur when the treatment itself induces clustering, such as in individually randomized group treatment trials (Pals et al., 2008), where treatments are given to groups of individuals. In such trials interactions between persons within a group may lead to observations being correlated (Bauer et al., 2008 and Roberts and Roberts, 2005). It is quite common that the clustering occurs in only one of the treatment arms, such as when group therapy is compared to a condition involving no kind of intervention (e.g. Bauer et al., 2008, Heller-Boersma et al., 2007 and Pisinger et al., 2005) or to a condition involving only medication (e.g. Dannon et al., 2004 and Haugli et al., 2001). Even if the treatment is given on an individual basis, instead of groupwise, the treatment may induce clustering. This may occur if several patients are treated by the same therapist. Since it is likely that patients of the same therapist will be treated in a more similar way than patients treated by different therapists (Pals et al., 2008 and Roberts, 1999), observations within each therapist will be clustered. Also in this case clustering may occur in only one of the two treatment arms, such as when treatment is contrasted with a waiting-list condition (e.g. Ladouceur et al., 2000, Thompson et al., 1987 and Van Minnen et al., 2003) or with a pharmacological or placebo condition (e.g. Jarrett et al., 1999). Starting from a particular cost function Moerbeek and Wong (2008) examined optimal designs and derived sample size formulas in case there is clustering in one of two treatment arms. The present study extends their study by examining clusters that are of unequal size. Unequal cluster sizes may be due to variation in actual cluster size, but also due to nonresponse or dropout of subjects, and therefore is a common situation. The efficiency loss due to variation in cluster sizes when focussing on the estimation of the treatment effect has already been examined (Candel and Van Breukelen, 2009). The present study will examine the efficiency loss when considering the ensemble of all model parameters involved. Also the efficiency loss for the subset of all fixed parameters, among which the treatment effect, and the efficiency loss for the subset of all variance components, will be considered. In randomized trials the fixed parameters are usually of primary interest. The standard errors of the fixed effect estimators, however, are a function of the variance components. Furthermore, variance component estimation in itself may be relevant, such as in quality control studies where the variance in health outcomes between clusters (e.g. general practices, therapists or therapy groups) is examined (e.g. Van Berkestein et al., 1999). This motivates studying the efficiency loss also for the variance components. The efficiency criteria that are examined in this paper, are known as the DD-criterion and, in case of a subset of parameters, as the DsDs-criterion (Atkinson et al., 2007). For each criterion the issue is how much efficiency is lost due to varying cluster sizes and how to compensate for this loss. In deriving the efficiency loss we assume that the data within each treatment arm are (approximately) normally distributed and are analyzed with mixed effects linear regression. The relative efficiency of unequal versus equal cluster sizes will be derived for the asymptotic case when the model parameters are estimated through maximum likelihood. Furthermore, Taylor approximations of the asymptotic relative efficiencies, that can be of practical use when planning a trial, will be derived. Since in relevant studies (e.g. Calzone et al., 2005, Haugli et al., 2001, Pals et al., 2008, Roberts and Roberts, 2005 and Wampold and Serlin, 2000), the number of clusters as well as the cluster sizes themselves are rather small, the asymptotic relative efficiencies and their Taylor approximations will be checked for small samples by an extensive Monte Carlo simulation study, both for maximum likelihood and restricted maximum likelihood estimation. Finally, we will address how to optimally regain the efficiency loss. If we want to minimize the costs involved with a study, should we additionally sample relatively more clusters for one arm or more persons for the other? The paper is structured as follows. Section 2 presents the mixed effects linear regression model for trials comparing a treatment arm with clustering to a control arm without clustering. In Section 3 the criteria for evaluating the efficiency loss due to varying cluster sizes will be presented. Section 4 will provide explicit expressions for the asymptotic relative efficiencies when comparing equal to unequal cluster sizes, and will also present Taylor approximations for these asymptotic expressions. Section 5 will discuss the design and results of a Monte Carlo simulation that examines the relative efficiency for various cluster size distributions with realistic sample sizes. The accuracy of both the asymptotic relative efficiencies and the Taylor approximations will be discussed. Section 6 explains how to regain the efficiency loss such that the costs of a design are minimized. Section 7 illustrates for an empirical example how to determine sample sizes in case of the DsDs-criterion for fixed effects and how to adjust these to repair the efficiency loss that is expected due to varying cluster sizes. The paper closes with some implications for the planning phase of trials.
نتیجه گیری انگلیسی
In analyzing the data from trials in which treatments have clustering effects, care has to be taken of the dependency between observations within clusters. For outcomes that are (approximately) normally distributed, mixed effects linear regression is a way of capturing this clustering. When comparing group therapy to pharmacological treatment or to no treatment at all, or when comparing an individual treatment condition where therapists each treat several patients to a waiting-list condition, clustering occurs in only one of two treatment arms. When planning such a trial one should consider the efficiency loss due to varying cluster sizes. Efficiency in terms of the DD-criterion and in terms of the DsDs-criteria for fixed effects and variance components were considered. The efficiency loss was studied by deriving expressions for the asymptotic relative efficiency of unequal versus equal cluster sizes. To incorporate the loss of efficiency in planning a trial, second-order Taylor approximations of the asymptotic relative efficiencies were derived, of which the minima turn out to depend on the coefficient of cluster size variation only. The second-order Taylor approximations rather adequately described the asymptotic View the MathML sourceREs. To the extent that the asymptotic View the MathML sourceREs give an adequate description of the View the MathML sourceREs for realistic sample sizes, these Taylor approximations may therefore be useful in planning trials. In an extensive Monte Carlo simulation study, the asymptotic View the MathML sourceRE for fixed effects rather adequately approximated the simulated View the MathML sourceRE. For fixed effects, when calculating the minimum View the MathML sourceRE according to the Taylor approximation, it would however be more safe to lower the minimum View the MathML sourceRE by 1%. For the DD-criterion, the minimum View the MathML sourceRE according to the Taylor approximation should best be lowered by 2%. The simulated View the MathML sourceRE clearly depends on the coefficient of variation of the cluster sizes. The bimodal distribution and largest cluster size range yields the lowest View the MathML sourceRE for any of the criteria considered. The difference between ML and REML estimation was negligible for View the MathML sourcen¯=6 and also for View the MathML sourcen¯=10 in case the View the MathML sourceRE is defined in terms of the DsDs-criterion for the fixed parameters. In other cases there was a consistent advantage of ML over REML. In these cases ML estimation thus appears to be more robust to varying cluster sizes, but note that ML estimators of the variance components are more biased (Brown and Prescott, 2006). The simulated View the MathML sourceREs show that the loss of efficiency was modest. This is to be expected, since the asymptotic relative efficiencies for the partially nested designs turned out to be larger than those for cluster randomized trials, and the minimum asymptotic relative efficiencies for cluster randomized trials have been shown to be rather high (Van Breukelen et al., 2007 and Van Breukelen et al., 2008). For all three DD-criteria the relative efficiency of unequal versus equal cluster sizes exceeds 0.94. If efficiency is defined in terms of View the MathML sourcevar(βˆ1) only, the relative efficiency has been shown to become 0.90 at worst (Candel and Van Breukelen, 2009). This implies a larger efficiency loss for this criterion and thus a larger replication of the original design to regain the efficiency. However for all criteria considered, including the efficiency in terms of View the MathML sourcevar(βˆ1), when planning sample sizes the (al)most cost-efficient way of regaining the efficiency loss is multiplying the number of clusters in the treatment arm as well as the number of persons in the control arm by a factor View the MathML source1/RE. Simulations for other cluster size distributions, involving more than 3 cluster sizes and larger numbers of clusters (K=15K=15 and K=16K=16), as well as simulations involving other values for the model parameters were done. Furthermore, since ratios of the error variance in the control versus the treatment arm appear to vary between 1 and 2 (Haugli et al., 2001, Heller-Boersma et al., 2007 and Roberts and Roberts, 2005), also the ratios 0.5 and 2 were examined. Since the allocation ratios for the treatment versus the control arm appear to vary between 0.4 and 1.5 in various studies (Calzone et al., 2005, Dannon et al., 2004, Haugli et al., 2001, Heller-Boersma et al., 2007, Ladouceur et al., 2000, Thompson et al., 1987 and Van Minnen et al., 2003), in addition the allocation ratios 1/4 and 4 were examined. The results for these cases were in line with the results of the present Monte Carlo simulation, thereby supporting the generalizability of our conclusions. In some intervention studies a categorical (e.g. nominal or ordinal) outcome measure is used. A useful extension of the present study would therefore involve mixed effects nominal or ordinal logistic regression. It has to be examined whether (approximate) formulas for the asymptotic relative efficiencies can be derived. Similarly to the present study, these asymptotic relative efficiencies could then be tested for their practical utility through a Monte Carlo simulation study.