Generalized linear mixed models (GLMM’s) are an extension of generalized linear models that introduce random effects to the linear predictor. The presence of packages that fit GLMM’s in widely available software such as SAS (NLMIXED), Stata and R (lme4) is indicative of the extensive use of these models in modern research and data analyses. GLMM’s are useful models for analysing repeated measurements and clustered observations, however, the majority of estimation methods that have been developed are based on the normality assumption of random effects (Breslow and Clayton, 1993). This assumption provides a robust and convenient way to estimate the fixed effects but may compromise estimation efficiency (Tao et al., 1999). In addition, Neuhaus et al., 1992 and Neuhaus et al., 2012 demonstrate that misspecification of the random effects distribution can lead to biased estimation of the intercept and covariates associated with the random effects.
A GLMM is typically specified as a conditional distribution of the data vector View the MathML sourcey given the random (possibly vector) effects View the MathML sourceγ, the random effect covariate vector View the MathML sourcex and the fixed effect covariate vector View the MathML sourcez. In a clustered data setting, the nini observations on the iith cluster, yi1,…,yiniyi1,…,yini, are conditionally independent and modelled as
equation(1)
View the MathML sourceYi1,…,Yini|γi,zi1,…,zini,xi1,…,xini∼∏jnif(yij|γi,zij,xij),
Turn MathJax on
where View the MathML sourcezij and View the MathML sourcexij are the fixed and random covariate vectors respectively characterizing observation View the MathML sourceyij,i=1,…,n. GLMM’s connect the mean View the MathML sourceμij=E[yij|γi,zij,xij] to the covariate vectors View the MathML sourcex and View the MathML sourcez through
equation(2)
View the MathML sourceg(μij)=zijTβ+xijTγi,
Turn MathJax on
where gg is the link function. We refer to the distribution of the random effects View the MathML sourceGγ as the mixing distribution. Parametric mixture models assign View the MathML sourceGγ a parametric distribution, often a normal distribution as in Stiratelli et al. (1984) and McCulloch et al. (2008). Semiparametric mixture models traditionally estimate the mixing distribution View the MathML sourceGγ using nonparametric maximum likelihood estimation methods (NPMLE) (Laird, 1978), but these approaches have typically considered random intercepts only.
Clustered binary data arise frequently in clinical studies where repeated measurements are gathered on experimental units. An example is the study by Bent et al. (2006) in which 225 men with moderate to severe symptoms of benign prostatic hyperplasia were randomized to receive one year of treatment with either saw palmetto extract (n=112n=112) or placebo (n=113n=113). The study measured outcomes and predictors at 8 visits over a 14 month period. The outcome of interest is a binary indicator of a severe symptom, that is, a level of the American Urological Association Symptom Index score > 20. The predictors of interest are: the binary indicator of treatment group, month of the visit, and the month by treatment group interaction. Here it is natural to model the outcomes using patient-specific random effects.
We consider the binomial model for ff in (1) above and use the canonical logistic link, gg. The random intercept and random slope vary from cluster to cluster so that the jjth observed response of cluster i,yiji,yij has a View the MathML sourceBernoulli(pij) distribution. The linear model in (2) becomes
equation(3)
View the MathML sourcelogit(pij)=ai+bixij+∑lzijlβl,
Turn MathJax on
where aiai and bibi denote the random intercept and slope for cluster ii, respectively, and ll indexes the number of fixed covariates in the model. Here we assume that the random effects vector View the MathML sourceγi=(ai,bi) has joint distribution View the MathML sourceGγ.
Our goal is to compute the NPMLE for View the MathML sourceGγ and the MLE for View the MathML sourceβ. Earlier methods available in the literature fit univariate random effects and include the expectation–maximization (EM) algorithm (Laird, 1978), the vertex direction method (VDM) (Fedorov, 1972, Wu, 1978a and Wu, 1978b), the vertex exchange method (VEM) (Böhning, 1985) and the intra-simplex direction method (ISDM) (Lesperance and Kalbfleisch, 1992). Wang (2007) proposed an algorithm which is a modification of Atwood’s (1976) quadratic method. He used a linear regression formulation to solve the quadratic programming sub-problem for estimating the mixing weights for which Atwood did not offer a detailed solution. Like ISDM, Wang added multiple support points at each iteration rather than one as in Atwood’s algorithm, and he discarded unwanted support points that have zero mass at each iteration whereas Atwood combined nearby support points. Wang (2007) showed that the algorithm converges at a faster rate than previously published algorithms.
This paper presents a new algorithm, the Direct Search Directional Derivative (DSDD) method, that computes nonparametric maximum likelihood estimates of a mixing distribution for a logistic regression model containing multiple random effects. The algorithm uses a direct search method (Torczon, 1991) to identify maxima of the gradient function to include as mixing distribution support points. Then the algorithm incorporates the quadratically convergent method of Wang (2007) to estimate the mixing proportions.
The structure of the paper is as follows. Section 2 introduces and reviews some of the literature on NPML estimation of mixture models. In Section 3, we describe the DSDD algorithm for computing the NPMLE of View the MathML sourceGγ with View the MathML sourceβ fixed and compare it with an alternative. Section 4 compares two methods for computing the joint MLE’s of View the MathML sourceGγ and View the MathML sourceβ. A dataset from the National Basketball Association (NBA) is analysed using a mixed model in Section 5 and Section 6 concludes with a discussion.