مدل رگرسیون خطی با اشتباهات اسلش، بیضوی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
24651 | 2013 | 12 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computational Statistics & Data Analysis, Volume 64, August 2013, Pages 153–164
چکیده انگلیسی
We propose a linear regression model with slash-elliptical errors. The slash-elliptical distribution with parameter qq is defined as the ratio of two independent random variables ZZ and View the MathML sourceU1q, where ZZ has elliptical distribution and UU has uniform distribution in (0,1)(0,1). The main feature of the slash-elliptical distribution is to have greater flexibility in the degree of kurtosis when compared to the elliptical distributions. Other advantages of this distribution are the properties of symmetry, heavy tails and the inclusion of the elliptical family as a limit case when q→∞q→∞. We develop the methodology of estimation, hypothesis testing, generalized leverage and residuals for the proposed model. In the analysis of local influence, we also develop the diagnostic measures based on the likelihood displacement under the some perturbation schemes. Finally, we present a real example where slash-Student-tt model is more stable than other considered models.
مقدمه انگلیسی
Last decades, similar works have been developed based on symmetrical distribution. A review of some areas where symmetric distributions are applied is described in Chmielewski (1981). In many situations of statistical modeling there is a need of searching for less sensitive models of outlying observations. Galea et al. (2003) developed diagnostic methods for linear symmetrical models and Cysneiros and Paula (2005) developed restricted methods in symmetrical linear models. Paula et al. (2009) introduced the class of linear models with first-order autoregressive elliptical errors and diagnostic methods were derived. Rogers and Tukey (1972) presented the slash distribution as the probability distribution of a standard normal variable divided by an independent standard uniform variable. In the general case, we say that a random variable SS has standard slash distribution with parameter q>0q>0 if it can be expressed as the ratio of two independent random variables ZZ and View the MathML sourceU1q, where ZZ has standard normal distribution N(0,1)N(0,1) and UU has uniform distribution in (0,1)(0,1). The slash distribution has the properties of symmetry, heavy tails and converges to the normal distribution when q→∞q→∞. Rogers and Tukey (1972) and Mosteller and Tukey (1977) discussed slash distribution and its properties. Kafadar (1982) proposed maximum likelihood estimators for location-scale parameters considering the slash distribution obtained through linear transformation View the MathML sourceY=μ+ϕS. In the work of Andrews et al. (1972), Gross (1973) and Morgenthaler and Tukey (1991), the slash distribution is mainly used in simulation studies whose scenario involves extreme situations. Wang and Genton (2006) defining a skewed version of the slash distribution, assumed that the random variable ZZ has a multivariate skew normal distribution. Arslan (2008) introduced a new class of multivariate skew-slash distributions using the normal variance-mean mixture approach. Later, Arslan and Genç (2009) generalized the family of distributions proposed by Wang and Genton (2006), constructed a family of multivariate distributions as a scale mixtures of the multivariate symmetric Kotz-type distribution and the uniform distribution. Lachos et al. (2008) derived diagnostic methods based on the local influence on scale-mixture models. Ferreira et al. (2011) developed EM algorithm for estimation of the parameters on scale-mixture models. Gómez et al. (2007) considered the standard slash distribution in the form: Z=U1/qS∼N(0,1)Z=U1/qS∼N(0,1) Turn MathJax on and generalized slash distribution by replacing the distribution of ZZ by the family of univariate and multivariate elliptical distributions. This new family of distributions proposed by Gómez et al. (2007), known as slash-elliptical distribution, has the property of symmetry and greater flexibility in the degree of kurtosis when compared to the elliptical distribution. These properties were also observed by Genç (2007) to slash-power-exponential distribution. Another advantage of this family of distributions is to contain the elliptical family as a limiting case. Given this new family of distributions, the linear regression model with error distribution in a univariate slash-elliptical family is developed, which will be called only by slash-elliptical distribution. The aim of this paper is to derive a methodology for estimating, hypothesis testing, generalized leverage, residual and diagnostic analysis based on the local influence approach. Section 2 introduces the slash-elliptical regression model and procedures for estimation are presented. Simulation studies of the proposed residual are presented. In addition, diagnostic measures based on the local influence approach are developed in Section 3. Section 4 is devoted to analysis of a real data set using slash-elliptical regression model and finally, some conclusions are presented in the final section.
نتیجه گیری انگلیسی
In this study, we proposed the linear regression model with slash-elliptical errors and parameter qq known or fixed. We developed an inferential methodology based on estimated asymptotic standard errors obtained by observed information matrix. In the case that the parameter qq is not known, we suggest using the View the MathML sourceAIC or View the MathML sourceBIC criteria to select the slash-elliptical model with the qq corresponding to the best fit. We presented the asymptotic tests (likelihood ratio, Wald and score) that can be used to assess the inclusion or exclusion of explanatory variables. We proposed a standardized residual and based on simulation studies, we concluded that the proposed residual has mean and standard error close to zero and one, respectively and is approximately symmetrical. However, the kurtosis is not three as in normal distribution and depend on extra parameters which leads us to believe that the distribution of this residual is not normal. In the diagnostic analysis, we verified that generalized leverage matrix View the MathML sourceGL(θˆ) is idempotent and therefore we can consider the criterion View the MathML sourceGLii≥2pn to identify leverage points. In the analysis of local influence, we considered the measures: likelihood displacement under the perturbation scheme in scale and cases-weight; and a distance measure based on the Pearson residual under the perturbation scheme in response variable and regressors. We applied the proposed methodology to the data set of salinity of water, comparing the elliptical models (normal and Student-tt (ν=3ν=3)) and slash-elliptical (slash-normal (q=3)(q=3), slash-Student-tt (q=3,ν=2q=3,ν=2), slash-CN (q=5,λ=0.5,σ=4)(q=5,λ=0.5,σ=4) and slash-slash (q=3,γ=3q=3,γ=3)). We observed that the standard errors of the slash-Student-tt (q=3,ν=2q=3,ν=2) model is smaller than in other models considered. Observations #16 and #5 are influential points in the slash-elliptical models, as well as in elliptical models. Although the #16 and #5 observation are also highlighted in slash-Student-tt (q=3,ν=2q=3,ν=2) model, this model presents the smallest variation on estimates of the parameters when these points are removed. Thus, we can conclude that the slash-Student-tt (q=3,ν=2q=3,ν=2) model is more stable when we dropout influential points, than other models.