دانلود مقاله ISI انگلیسی شماره 8047
ترجمه فارسی عنوان مقاله

الگوریتم های فرا هیوریستیک برای برآورد پارامتر از مدل رگرسیون خطی نیمه پارامتری

عنوان انگلیسی
Meta-heuristic algorithms for parameter estimation of semi-parametric linear regression models
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
8047 2006 8 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Computational Statistics & Data Analysis, Volume 51, Issue 2, 15 November 2006, Pages 801–808

ترجمه کلمات کلیدی
احتمال تعمیم الگوریتم - بازپخت شبیه سازی شده - مدل نیمه پارامتری
کلمات کلیدی انگلیسی
Generalized likelihood,Simulated annealing algorithm,Semi-parametric models,
پیش نمایش مقاله
پیش نمایش مقاله  الگوریتم های فرا هیوریستیک برای برآورد پارامتر از مدل رگرسیون خطی نیمه پارامتری

چکیده انگلیسی

Consider the semi-parametric linear regression model Y=β′X+εY=β′X+ε, where εε has an unknown distribution F0F0. The semi-parametric MLE View the MathML sourceβ˜ of ββ under this set-up is called the generalized semi-parametric MLE(GSMLE). Although the GSML estimation of the linear regression model is statistically appealing, it has never been attempted due to difficulties with obtaining the GSML estimates of ββ and F until recent work on linear regression for complete data and for right-censored data by Yu and Wong [2003a. Asymptotic properties of the generalized semi-parametric MLE in linear regression. Statistica Sinica 13, 311–326; 2003b. Semi-parametric MLE in simple linear regression analysis with interval-censored data. Commun. Statist.—Simulation Comput. 32, 147–164; 2003c. The semi-parametric MLE in linear regression with right censored data. J. Statist. Comput. Simul. 73, 833–848]. However, after obtaining all candidates, their algorithm simply does an exhaustive search to find the GSML estimators. In this paper, it is shown that Yu and Wong's algorithm leads to the so-called dimension disaster. Based on their idea, a simulated annealing algorithm for finding semi-parametric MLE is proposed along with techniques to reduce computations. Experimental results show that the new algorithm runs much faster for multiple linear regression models while keeping the nice features of Yu and Wong's original one.

مقدمه انگلیسی

Linear regression is one of the most powerful tools among statistical methods. Its applications cover almost every field, from economics, engineering, physics, life and biological sciences to social sciences. In this paper, we consider the following multiple linear regression: equation(1.1) Y=β′X+ε,Y=β′X+ε, Turn MathJax on where ββ is the unknown regression coefficient of dimension m, εε has an unknown cdf F0F0. Model (1.1) is a semi-parametric problem since no further assumption is made upon (β,ε)(β,ε). For the linear regression model (1.1), there exist several estimators for ββ in the literature: the least square estimator (LSE), the Theil–Sen estimator (Theil, 1950 and Sen, 1968), various M-estimators (Huber, 1964 and Ritov, 1990), the adaptive estimators (Bickel, 1982), the L-estimators and the R-estimators (Montegomery and Peck, 1992). While enjoying the optimal statistical properties, LSE is not efficient unless F0F0 is a normal distribution. Besides LSE, the Theil–Sen estimator, the L-estimators and the R-estimators are based on order statistics like the median of regression slopes, offering computationally simple alternatives. The Theil–Sen estimator has good small-sample efficiency, particularly when the error term is heteroscedastic (Wilcox, 1998). While having desirable computational and statistical properties, R-estimators (and also L-estimators) cannot reject outliers properly. It is observed (Hampel et al., 1986) that R-estimators either “reject” always or never in the sense of giving zero influence. The maximum likelihood method provides an alternate estimation. Recently, Yu and Wong, 2003a, Yu and Wong, 2003b and Yu and Wong, 2003c have studied the GSML estimators under the semi-parametric setting. To begin with, define the generalized maximum likelihood function of observations T1,…,TnT1,…,Tn to be (Kiefer and Wolfowitz, 1956) equation(1.2) View the MathML sourceL=∏i=1nf(Ti),f(t)=F(t)-F(t-)andF∈F where Ti=Yi-β′XiTi=Yi-β′Xi and View the MathML sourceF={F:Fis an increasing function,F(-∞)=0andF(∞)=1}. The MLE of the cdf, also called the generalized MLE, is the function F that maximizes L over FF. It is well known that the generalized MLE is the empirical distribution function. Moreover, the semi-parametric MLE of (F0,β)F0,β, denoted by View the MathML sourceF˜0,β˜, is a pair of (F,β)(F,β) that maximizes equation(1.3) View the MathML sourceL(F,β)=∏i=1nf(Ti) over all (F,β)∈F×Rmwhere f(t)=F(t)-F(t-). For fixed ββ, the likelihood function L in (1.3) is maximized by the empirical cdf equation(1.4) View the MathML sourceF˜β(t)=1n∑i=1n1Yi-β′Xi⩽t, where 1(·)1(·) is the indicator function. Then, any View the MathML sourceβ˜ that maximizes View the MathML sourcel(β)≡LF˜β,β is a semi-parametric MLE of ββ and the corresponding View the MathML sourceF˜β is the semi-parametric MLE of the cdf F0F0. A non-iterative algorithm for finding all semi-parametric MLE is given in Yu and Wong, 2003a and Yu and Wong, 2003c and some asymptotic properties of the semi-parametric MLE are discussed in Yu and Wong (2003b). Note that the GSMLE is akin to a class of efficient M-estimators studied in Zhang and Li (1996). Their M-estimators are zero-crossing points of a function Φ(β)Φ(β), which has the form equation(1.5) View the MathML sourceΦ=∑i=1nφYi-Y¯-β′Xi-X¯(Xi-X¯),φ(t)=∂∂tlnf(t), where f is a pdf. In the M-estimator approach, f is replaced by some estimator View the MathML sourcef^. Since View the MathML source(∂/∂t)F˜β(t)=0 a.e., View the MathML sourceβ˜ is trivially an M-estimator with View the MathML sourcef^=F˜β. Thus, the M-estimator approach provides no information to the GSMLE View the MathML sourceF˜β. To find the GSML estimators, some exhaustive search algorithms and their variants are presented in Yu and Wong, 2003a and Yu and Wong, 2003c, which are computationally intensive especially for high-dimensional models. Note that this is essentially an optimization problem featured with a non-smooth objective function, many local extremum values and huge state space even for moderate dimensions, we shall adopt the simulated annealing algorithm (SA) to find the GSML estimators. For an introduction to SA, see Arts and Korst (1990), van Laarhoven and Arts (1987), for a survey on applications of SA, see Kang et al. (1998), van Laarhoven and Arts (1987). The rest of the paper is organized as follows: in Section 2, a review of Yu and Wong's Algorithm for the semi-parametric MLE problem is given, and the analysis of its running cost is carried out. An introduction to SA and a new algorithm based on SA to find semi-parametric MLE are given in Section 3. Applications and discussions of the new algorithm are contained in Section 4. Comparisons with other algorithms are made in Section 5 and some final remarks and conclusions are drawn in Section 6.

نتیجه گیری انگلیسی

In this paper, a simulated annealing algorithm for finding semi-parametric MLE of multiple linear regression model is presented. Although Yu and Wong's original algorithm is non-iterative and thus obtain the semi-parametric MLE in finite steps, it is inefficient for even moderate n and m from the point of view of computation practise. While keeping the robustness of semi-parametric MLE w.r.t. outliers, and the consistency of Yu and Wong's original algorithm, the new algorithm searches the whole state space with a Metropolis acceptance criteria to reach the global maximum of the generalized likelihood function. By establishing the mapping between indexes of observations and the candidate estimators, system of equations is solved only when needed. This trick reduces computations greatly. Moreover, two other modern search algorithm, TS and GA, are briefly discussed. Experimental results show that all of them are much more efficient than the exhaustive search, with SA to be the most preferable one.