تجزیه و تحلیل حساسیت محدود رگرسیون L1L1 خطی: آشفتگی ها به محدودیت های و حذف مشاهدات
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|25893||2006||19 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computational Statistics & Data Analysis, Volume 51, Issue 2, 15 November 2006, Pages 1213–1231
This paper extends the direct sensitivity analysis of Shi and Lukas [2005, Sensitivity analysis of constrained linear L1L1 regression: perturbations to response and predictor variables. Comput. Statist. Data Anal. 48, 779–802] of linear L1L1 (least absolute deviations) regression with linear equality and inequality constraints on the parameters. Using the same active set framework of the reduced gradient algorithm (RGA), we investigate the effect on the L1L1 regression estimate of small perturbations to the constraints (constants and coefficients). It is shown that the constrained estimate is stable, but not uniformly stable, and in certain cases it is unchanged. We also consider the effect of addition and deletion of observations and determine conditions under which the estimate is unchanged. The results demonstrate the robustness of L1L1 regression and provide useful diagnostic information about the influence of observations. Results characterizing the (possibly non-unique) solution set are also given. The sensitivity results are illustrated with numerical simulations on the problem of derivative estimation under a concavity constraint.
Consider a linear model View the MathML sourceyi=xiTβ+ɛi, i=1,…,ni=1,…,n, where the p×1p×1 parameter vector ββ is known to satisfy additional linear equality constraints View the MathML sourcexiTβ-yi=0 and/or inequality constraints View the MathML sourcexiTβ-yi⩽0. Important applications occur in parametric (and non-parametric) curve and surface fitting, and in the estimation of solutions of ill-posed and inverse problems from noisy data (see Wahba, 1990). In many such problems there is some extra information about the unknown curve or solution that can be used to constrain the parameters. In particular, if the solution (or some linear functional, e.g., an integral) is known to have a certain fixed value at some point, we obtain an equality constraint on the parameters. If the solution is known to be positive, monotone, concave or convex, this leads to certain linear inequality constraints on the parameters (see Wahba, 1982; and O’Leary and Rust, 1986). Constrained regression problems also arise in certain biometric and econometric models (see Judge et al., 1985). Using the method of least absolute deviations, we define the constrained L1L1 regression estimate of ββ to be the solution of the problem (denoted LL1LL1): equation(1.1A) View the MathML sourceminimizeS(β)=∑i=1nxiTβ-yi,β∈Rp Turn MathJax on equation(1.1B) View the MathML sourcesubject toxiTβ-yi=0,i∈E=n+1,…,n+nE, Turn MathJax on equation(1.1C) View the MathML sourcexiTβ-yi⩽0,i∈I=n+nE+1,…,n+nE+nI, Turn MathJax on where EE and II refer to equalities and inequalities, respectively, and we assume that nE<p<n+nE+nInE<p<n+nE+nI. Shi and Lukas (2005) investigated the sensitivity of the solution to (1.1) with respect to perturbations in the responses yiyi and row vectors View the MathML sourcexiT, 1⩽i⩽n1⩽i⩽n, of the design matrix XX. The analysis was done using the active set framework of the reduced gradient algorithm (RGA) developed in Shi and Lukas (2002). In this paper we use the same framework and extend the sensitivity analysis to cover perturbations to the constraints (both the constants and coefficients) and the addition and deletion of observations. It is well known that unconstrained L1L1 regression is robust; the L1L1 estimate is resistant to outliers in the response yy and may not be changed at all by some large perturbations in yy. The same is true for constrained L1L1 regression, and the estimate is stable, but not uniformly stable, with respect to small perturbations in yy and xixi, i=1,…,ni=1,…,n (Shi and Lukas, 2005 and Ellis, 1998). Here we show that similar results apply for perturbations to the constraints in (1.1); the constrained estimate is stable for small perturbations and it may not even change with some large perturbations in the constraints. The stability is not uniform but depends on the degree of ill-conditioning and, for the constraint coefficient vectors, also on how close the estimate is to being non-unique. For unconstrained L1L1 regression, the effect of deletion of observations has been investigated by Narula and Wellington (1985), Dupačová (1992) and Castillo et al. (2001). These works are based on the LP formulation of the L1L1 regression problem. Here we use the direct active set approach on the general LL1LL1 problem (1.1) to find conditions under which an existing solution remains optimal after the deletion of an observation. We also show how the RGA can be used to find a new solution to the problem with a deleted observation, starting from the original solution. The L1L1 version of the Cook distance can then be computed efficiently to determine the influence of each observation on the L1L1 regression estimate. This provides important diagnostic information about the model. A result about the addition of an observation shows that constrained L1L1 regression is robust with respect to the new response. A brief description of the RGA framework and optimality conditions for (1.1) is given in Section 2. It is known that the solution of an L1L1 regression problem can be non-unique (Bloomfield and Steiger, 1983). In Section 3 we derive results characterizing the solution set of (1.1) and show that it can be computed efficiently using the RGA. In Sections 4 and 5, we investigate the effect on the solution to (1.1) of a perturbation to the constants and coefficients of the constraints, respectively. Results relating to the deletion and addition of an observation are derived in Sections 6 and 7, respectively. In Section 8 we consider the problem of estimating the derivative f(t)=g′(t)f(t)=g′(t) from data yi=g(ti)+ɛiyi=gti+ɛi, i=1,…,ni=1,…,n, under the concavity constraint f″(t)⩽0f″(t)⩽0. The estimation of derivatives arises in many applications, in particular in the analysis of growth curves (Gasser et al., 1984 and Eubank, 1988) and pharmacokinetic response curves (Song et al., 1995). Numerical differentiation is an ill-posed problem, meaning that the solution is sensitive to errors in the data (see Anderssen and Bloomfield, 1974), and such problems were a major motivation for this work. We use a truncated trigonometric series for the estimate fp(t)fp(t) of the derivative and find the estimate by solving a constrained linear L1L1 regression problem. Results of numerical simulations illustrate the sensitivity results from 4, 5 and 6, and also a result from Shi and Lukas (2005) on perturbations to the responses. We also consider the effect of increasing the number of points at which the constraint View the MathML sourcefp″(t)⩽0 is evaluated.