روش کارآمد برای تخمین پارامترهای محدود با برنامه های کاربردی برای (کمند) رگرسیون لجستیک منظم
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
24760 | 2008 | 15 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computational Statistics & Data Analysis, Volume 52, Issue 7, 15 March 2008, Pages 3528–3542
چکیده انگلیسی
Fitting logistic regression models is challenging when their parameters are restricted. In this article, we first develop a quadratic lower-bound (QLB) algorithm for optimization with box or linear inequality constraints and derive the fastest QLB algorithm corresponding to the smallest global majorization matrix. The proposed QLB algorithm is particularly suited to problems to which the EM-type algorithms are not applicable (e.g., logistic, multinomial logistic, and Cox’s proportional hazards models) while it retains the same EM ascent property and thus assures the monotonic convergence. Secondly, we generalize the QLB algorithm to penalized problems in which the penalty functions may not be totally differentiable. The proposed method thus provides an alternative algorithm for estimation in lasso logistic regression, where the convergence of the existing lasso algorithm is not generally ensured. Finally, by relaxing the ascent requirement, convergence speed can be further accelerated. We introduce a pseudo-Newton method that retains the simplicity of the QLB algorithm and the fast convergence of the Newton method. Theoretical justification and numerical examples show that the pseudo-Newton method is up to 71 (in terms of CPU time) or 107 (in terms of number of iterations) times faster than the fastest QLB algorithm and thus makes bootstrap variance estimation feasible. Simulations and comparisons are performed and three real examples (Down syndrome data, kyphosis data, and colon microarray data) are analyzed to illustrate the proposed methods.
مقدمه انگلیسی
Logistic regression is one of the most widely used statistical tools in many areas such as biomedicine, social sciences, economics and business (Collett, 1991 and Agresti, 2002). If we know that parameters are restricted by some constraints, then it is reasonable to expect that we should be able to do better by incorporating such additional information than by ignoring them (Robertson et al., 1988 and Silvapulle and Sen, 2005). Fitting logistic models becomes challenging when some model parameters are restricted inside a convex region in the Euclidean space (e.g., parameter estimation for lasso regression). When maximum likelihood estimates (MLEs) of parameters are located on the boundary of the region or the region that can be represented in terms of a set of equality/inequality restrictions, the constrained optimization problem may reduce to penalized problem that is closely related to the posterior mode (or maximum a posteriori estimate) in a Bayesian framework. The situation can be further complicated if the penalty function is not totally differentiable. We consider the well-known lasso logistic regression which motivates the present problem of interest. Variable selection is one of the most pervasive problems in statistical applications. Classic methods for model/variable selection have not had much success in biomedical application, especially in high-dimensional data analysis including gene or protein expression data analysis, partly due to their numerical instability. A novel method that mitigates some of this instability and has good predictive performance is the lasso regression (Tibshirani, 1996). For logistic models, the lasso regression is to find equation(1.1) View the MathML sourceθˆ lasso=argmaxℓ(θ)subject to∑j=1q|θj|≤u, Turn MathJax on where View the MathML sourceθ=(θ1,…,θq)T is a q×1q×1 vector of unknown parameters, ℓ(θ)ℓ(θ) is the log-likelihood function defined in (1.4), and uu is a tuning parameter. Although a quadratic approximation to ℓ(θ)ℓ(θ) can lead to a simpler iteratively reweighted least squares procedure (Tibshirani, 1996), convergence of this procedure is not generally ensured. One possible extension of (1.1) is to formulate the so-called bridge regression (Frank and Friedman, 1993). In this case, one would like to find equation(1.2) View the MathML sourceθˆ bridge=argmaxℓ(θ) subject to ∑j=1q|θj|γ≤u, Turn MathJax on where γ>0γ>0. However, solution to (1.2) was not given for any given uu and γγ in Frank and Friedman (1993). Motivated by the constrained optimization problems (1.1) and (1.2), we consider the following logistic model with constrained parameters, equation(1.3) View the MathML sourceyi∼indBinomial(ni,pi),logit(pi)=x(i)⊤θ,1≤i≤m, Turn MathJax on where yiyi denotes the number of subjects with positive response in nini trials and View the MathML source{yi}i=1m are independent, pipi is the probability that a subject gives positive response, x(i)x(i) is the vector of covariates, and θθ is a q×1q×1 vector of unknown coefficients being restricted by some simple constraints a≤θ≤ba≤θ≤b for some q×1q×1 constant vectors aa and bb (e.g., the lasso regression) or linear inequalities of the form c≤Pk×qθ≤dc≤Pk×qθ≤d for some k×1k×1 constant vectors cc and dd (see the examples in Sections 5.3 and 5.4). When rank(P)=q(P)=q, letting μ=Pθμ=Pθ yields a≤μ≤ba≤μ≤b and θ=(P⊤P)−1P⊤μθ=(P⊤P)−1P⊤μ. In other words, linear inequality constraints with a full column-rank matrix PP can be reparameterized into simple box constraints [a,b][a,b] (Khuri, 1976, Tan et al., 2003 and Tan et al., 2007). Hence, the log-likelihood function of θθ in (1.3) is equation(1.4) View the MathML sourceℓ(θ)=∑i=1m{yi(x(i)⊤θ)−nilog[1+exp(x(i)⊤θ)]}, Turn MathJax on and the goal is to find the constrained MLE View the MathML sourceθˆ or the penalized MLE View the MathML sourceθ̃ given by equation(1.5) View the MathML sourceθˆ=argmaxθ∈[a,b]ℓ(θ),or Turn MathJax on equation(1.6) View the MathML sourceθ̃=argmax{ℓ(θ)−λJ1(θ)}, Turn MathJax on where J1(θ)J1(θ) is a penalty function and λ>0λ>0 is a smoothing parameter for the trade-off between the accuracy of the model fit and smoothness. When J1(θ)J1(θ) is not totally differentiable, the penalized problem (1.6) sometimes can be reformulated as equation(1.7) View the MathML sourceθ̃=argmaxθ∈[a,b]{ℓ(θ)−λJ2(θ)}, Turn MathJax on where J2(θ)J2(θ) is differentiable everywhere. When the log-likelihood is well behaved (e.g., well approximated by a quadratic function), a natural algorithm for finding MLE is the Newton–Raphson (NR) or scoring methods because they converge quadratically. For logistic model with large number of variables (e.g., genes), both methods require tedious calculations for the Hessian or expected information matrix at each iteration. In addition, the log-likelihood does not necessarily increase at each iteration for NR method, which may sometimes be divergent (Cox and Oakes, 1984, p.172). Böhning and Lindsay (1988, p. 645–646) provided an example of a concave function for which the NR method does not converge. Although EM-type algorithms (Dempster et al., 1977 and Meng and Rubin, 1993) possess the ascent property that ensures monotone convergence, they are not applicable to logistic regression owing to the absence of a missing-data structure. Therefore, for problems in which the missing-data structure does not exist or is not readily available, the quadratic lower-bound (QLB) algorithm ( Böhning and Lindsay, 1988) is often an alternative. However, when the model has constrained parameters, the QLB algorithm is not applicable. In addition, like EM-type algorithms, the QLB algorithm is usually criticized for its slow convergence, especially for solving complicated problems or in high-dimensional data analysis. In this paper, we first develop a QLB algorithm that can generally be applicable to optimization with box or linear inequality constraints. The fastest QLB corresponding to the smallest global majorization matrix is also derived. In brief, the QLB algorithm consists of an optimization transfer (T-step) and a constrained maximization (M-step). The T-step transfers the optimization from the intractable log-likelihood function to a quadratic surrogate function Q(θ|θ′)Q(θ|θ′) such that both functions share the same maximizer. The M-step can often be accomplished via some built-in SPLUS functions (e.g., nnls.fit or nlregb), which is one of the advantages of the algorithm. The QLB algorithm is especially suited to those problems (e.g., logistic, multinomial logistic, and Cox’s model) in which the EM-type algorithms are not applicable while it retains the same EM ascent property and thus assures the monotonic convergence. Secondly, we generalize the QLB algorithm to penalized problems in which the penalty functions may not be totally differentiable. The proposed method therefore provides an alternative algorithm for estimation in lasso logistic regression, where the convergence of the existing lasso algorithm is not generally ensured. Finally, to accelerate the convergence rate, we introduce a pseudo-Newton algorithm that does not necessarily have the ascent property but retains the simplicity of the QLB algorithm and the fast convergence of the Newton method. We show both theoretically and numerically that the pseudo-Newton algorithm is dramatically faster than the fastest QLB algorithm (up to 71 times in CPU time or 107 times in numbers of iterations) and thus makes the bootstrap variance estimation feasible. Another merit of the pseudo-Newton method is that the Cholesky decomposition of the surrogate matrix is calculated only once while the same matrix is required to be updated at each iteration for the Newton method. The rest of this article is organized as follows. Section 2 develops the QLB algorithm for optimization with box or linear inequality constraints and derives the fastest QLB algorithm. Section 3 generalizes the QLB algorithm to penalized problems and investigates some convergence properties. Section 4 introduces a pseudo-Newton method. We apply the fastest QLB and pseudo-Newton algorithm to estimate constrained parameters and to select variables in logistic regression in Sections 5 and 6, respectively. Simulations and comparisons are performed and three published data sets are analyzed to illustrate the proposed methods. We conclude with a discussion in Section 7.