A method for the simulation of samples from the exact posterior distributions of the parameters in logistic regression is proposed. It is based on the principle of data augmentation and a latent variable is introduced, similar to the approach of Albert and Chib (J. Am. Stat. Assoc. 88 (1993) 669), who applied it to the probit model. In general, the full conditional distributions are intractable, but with the introductions of the latent variable all conditional distributions are uniform, and the Gibbs sampler is easily applicable. Marginal likelihoods for model selection can be obtained at the expense of additional Gibbs cycles. The technique is extended and can be applied with nominal or ordinal polychotomous data.
When modelling binary data, the outcome variable Y has a Bernoulli distribution with probability of success π. If the probability of success depends on a set of covariates, then we have a distinct probability πi, specific to the ith observation, Yi. The probability πi is regressed on the covariates through a link function that preserves the properties of probability. So πi=H(βxi) where xi is the vector of covariates associated with the ith observation, 0⩽H(.)⩽1, and H(.) is a continuous non-decreasing function. Usually the link function is taken as the cumulative distribution function (CDF) of some continuous random variable, defined on the whole real line. The two link functions in common use are the CDF of the standard normal distribution, the probit model, and the CDF of the logistic distribution, the logit model. These kinds of models are described in detail in a number of books. See, for example, Cox (1971) or Maddala (1983). For a sample of n observations, the likelihood function is given by
equation(1.1)
When using maximum likelihood estimation, inferences about the model are usually based on asymptotic theory. Griffiths et al. (1987) found that the MLEs have significant bias for small samples. With the Bayesian approach and prior π(β), the posterior of β is given by
equation(1.2)
which is intractable in the case of the probit and logit models. In the past, asymptotic normal approximations were used for the posterior of β. Zellner and Rossi (1984) used numerical integration when the number of parameters is small. Albert and Chib (1993) introduced a simulation-based approach for the computation of the exact posterior distribution of β in the case of the probit model. The approach is based on the idea of data augmentation ( Tanner and Wong, 1987), where a normally distributed latent variable is introduced into the problem. This approach also enables them to model binary data using a t link function.
In this paper we apply the data augmentation approach of Albert and Chib (1993) to the logit model. This enables us to use Gibbs sampling to obtain samples from the posterior distribution of β, drawing only from uniform distributions. The technique is extended in Section 3 to multiple response categories, and in Section 4 applied to ordinal responses where the thresholds, or cut off points, must also be estimated. Again, only simulation from uniform distributions is required to obtain marginal posterior distributions.
Gibbs sampling is a simplified version of the Metropolis–Hastings algorithm (Metropolis et al., 1953; Hastings, 1970), and applicable when it is possible to sample directly from all conditional distributions. The Metropolis–Hastings algorithm is usually employed in the case of logistic regression. Other Markov chain Monte Carlo techniques in use are adaptive rejection sampling (ARS), which is used in the WinBugs software, and adaptive rejection metropolis sampling (ARMS).
While marginal posterior distributions of parameters in logistic regression can be obtained using WinBugs, it cannot provide marginal likelihoods. In Section 5 the data augmentation technique is applied to model selection via Bayes factors. Based on a method proposed by Chib (1995), the marginal likelihood under a particular model can be calculated by running additional Gibbs cycles, one for each parameter in the model. In Section 6 the technique is illustrated by two applications.
The main purpose of this paper is to illustrate a relatively simple method of simulating values from the marginal posterior distributions of the parameters in a logit model using the Gibbs sampler. This model is also very suitable for calculating marginal likelihoods and thus Bayes factors when comparing competing models. As the full conditional distributions of the parameters are intractable, Bayesian analyses usually employed the Metropolis–Hastings algorithm to obtain posterior distributions. The Gibbs sampler is much easier to apply, and, from experience, converges quickly. The method is based on the data augmentation approach of Tanner and Wong (1987), which is applied by Albert and Chib (1993) to the probit model, where a normally distributed latent variable is introduced. In the case of the logit model a logistic variable is transformed so that we work with a uniformly distributed latent variable.
In the applications of Section 6 the marginal posterior of a parameter was obtained by the “Rao-Blackwell” method (see Gelfand and Smith, 1990, Section 2.6) of averaging over the conditional distributions, given the generated values of the other parameters. Because of the nature of the conditional distributions (uniform) it needs a relatively large number of cycles to obtain a smooth marginal posterior distribution. In the case of several covariates it may be more efficient to approximate the posteriors by smoothed histograms of the generated values.
In this paper we used exchangeable logistic priors for all regression parameters when calculating Bayes factors. This was done for convenience (still drawing from uniform distributions), and because the logistic distribution is close to the normal distribution in shape. The differences in posteriors are negligible if the prior variances are not too small. However, this can be generalised at a slight expense of computational effort. If the β's are assumed to have some exchangeable proper priors, then the full conditional distribution of a particular β will be the prior distribution, truncated at the same endpoints as the uniform distribution derived in this paper. The conditional distribution of the latent variable u remains unaffected.