یک مطالعه از آزمون F با مشتقات جزئی برای مدل های رگرسیون خطی چندگانه
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|24234||2007||16 صفحه PDF||سفارش دهید||8373 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computational Statistics & Data Analysis, Volume 51, Issue 12, 15 August 2007, Pages 6269–6284
Partial F tests play a central role in model selections in multiple linear regression models. This paper studies the partial F tests from the view point of simultaneous confidence bands. It first shows that there is a simultaneous confidence band associated naturally with a partial F test. This confidence band provides more information than the partial F test and the partial F test can be regarded as a side product of the confidence band. This view point of confidence bands also leads to insights of the major weakness of the partial F tests, that is, a partial F test requires implicitly that the linear regression model holds over the entire range of the covariates in concern. Improved tests are proposed and they are induced by simultaneous confidence bands over restricted regions of the covariates. Power comparisons between the partial F tests and the new tests have been carried out to assess when the new tests are more or less powerful than the partial F tests. Computer programmes have been developed for easy implements of these new confidence band based inferential methods. An illustrative example is provided.
Consider a standard multiple linear regression model given by equation(1.1) Y=Xβ+e,Y=Xβ+e, Turn MathJax on where Y=(y1,…,yn)TY=(y1,…,yn)T is a vector of observations, X is an n×(p+1)n×(p+1) full column-rank design matrix with the first column given by (1,…,1)T(1,…,1)T and the l th (2⩽l⩽p+1)(2⩽l⩽p+1) column given by (x1,l-1,…,xn,l-1)T(x1,l-1,…,xn,l-1)T, β=(β0,…,βp)Tβ=(β0,…,βp)T is a vector of unknown coefficients, and e=(e1,…,en)Te=(e1,…,en)T is a vector of independent random errors with each ei∼N(0,σ2)ei∼N(0,σ2), where σ2σ2 is an unknown parameter. One important problem for model (1.1) is to assess whether some of the coefficients βiβi's are zero and so the corresponding covariates xixi's have no effect on the response variable Y . The model can then be simplified. To be specific, let View the MathML sourceβ=(β1T,β2T)T, where View the MathML sourceβ1T=(β0,…,βp-k) and View the MathML sourceβ2T=(βp-k+1,…,βp) with 1⩽k⩽p1⩽k⩽p. If β2β2 is zero then the covariates xp-k+1,…,xpxp-k+1,…,xp have no effect on the response variable Y and model (1.1) reduces to equation(1.2) Y=X1β1+e,Y=X1β1+e, Turn MathJax on where X1X1 is formed by the first p-k+1p-k+1 columns of the matrix X. A commonly used statistical approach to assessing whether β2β2 is zero is to test the hypotheses equation(1.3) View the MathML sourceH0:β2=0againstHa:β2≠0 Turn MathJax on by using the partial F test, which rejects H0H0 if and only if View the MathML source[RegressionSSofmodel(1.1)-RegressionSSofmodel(1.2)]/kMSresidualofmodel(1.1)>fk,να, Turn MathJax on where View the MathML sourcefk,να is the upper αα point of an F distribution with k and ν=n-(p+1)ν=n-(p+1) degrees of freedom. This can be found in most text books on multiple linear regression models; see, for example, Kleinbaum et al. (1998). The inferences that can be drawn from this partial F test are that if H0H0 is rejected then β2β2 is deemed to be non-zero and so at least some of the covariates xp-k+1,…,xpxp-k+1,…,xp affect the response variable Y , and that if H0H0 is not rejected then there is not enough statistical evidence to conclude that β2β2 is not equal to zero. (Unfortunately, this latter case is often misinterpreted as β2β2 is equal to zero and so model (1.2) is accepted as more appropriate than model (1.1).) Whether H0H0 is rejected or not, no information on the magnitude of β2β2 is provided directly by this approach of hypotheses testing. The first purpose of this paper is to show that there is a simultaneous confidence band associated naturally with the partial F test and the partial F test can be interpreted more intuitively via this simultaneous confidence band. Hypotheses (1.3) can be tested by using this confidence band: the acceptance or rejection of H0H0 is according to whether or not the zero hyper-plane lies completely inside the confidence band; by zero-hyperplane we mean the graph in Rk+1Rk+1 of the zero valued function on RkRk. The advantage of this confidence band approach over the partial F test is that it provides information on the magnitude of βp-k+1xp-k+1+⋯+βpxpβp-k+1xp-k+1+⋯+βpxp, whether or not H0H0 is rejected. This is discussed in Section 3. However, this confidence band is over the entire range (-∞,∞)(-∞,∞) of each of the covariates xp-k+1,…,xpxp-k+1,…,xp. As a linear regression model is an acceptable approximation often only over a restricted region of these covariates, the part of the confidence band outside this restricted region is useless for inference. It is therefore unnecessary to guarantee the 1-α1-α simultaneous coverage probability over the entire range of each of these covariates. Furthermore, inferences deduced from the part of the confidence band outside the restricted region, such as the rejection of H0H0, may not be valid since the assumed model may be wrong outside the restricted region. This calls for the construction of a 1-α1-α simultaneous confidence band only over this restricted region of the covariates. This confidence band is narrower and so allows more precise inferences over the restricted region than the confidence band associated with the partial F test. These results are illuminated in Sections 4 and 5. Section 6 compares the powers of the partial F test and the new test induced from the confidence band over a restricted region considered in Section 4. Some concluding remarks are contained in Section 7. But we first provide some preliminaries in Section 2.
نتیجه گیری انگلیسی
It is pointed out in this paper that the usual partial F test has a naturally associated confidence band, which is more informative than the test itself. But this confidence band is over the entire range of all the covariates. As regression models are true often only over a restricted range of the covariates, the part of this confidence band outside this range is useless and to guarantee an overall 1-α1-α confidence level is wasteful of resources. A narrower and hence more efficient confidence band is constructed over a restricted range of the covariates. A by-product of this confidence band is a new test of hypotheses (1.3). This test is an improvement over the partial F test in the sense that the partial F test requires implicitly that model (1.1) holds over the entire range of the covariates x2x2 while the new test only requires that model (1.1) holds over x2∈Cx2∈C. Ignoring this weakness of the partial F test, the power comparison between these two tests indicates for what alternative hypothesis β2≠0β2≠0 the new test can be either dramatically more or less powerful than the partial F test. It is our view that the prime factors in choosing the region C should be that model (1.1) holds on C and that C should be pertinent to the interests of inference about the model, rather than the power property of the new test. The confidence bands are more informative than the tests to allow us to make informed decisions in model selection. In this paper the covariates are assumed to be continuous variables and there is no functional relationship among them. If some covariates are discrete variables or there are functional relationships among the covariates the confidence band approach advocated here can be adapted in a natural way via the region C in (4.1) and the corresponding V in (4.2). The partial F test approach simply throws away this kind of information completely. The covariate region C considered in this paper is restricted to a hyper-rectangle, with the limits aiai and bibi being any real values. There may be situations that an ellipsoidal covariate region is of interest. But it is not clear how to construct a simultaneous confidence band over a given ellipsoidal covariate region in general. Casella and Strawderman (1980) considered the construction of a simultaneous confidence band for a whole linear regression model over an ellipsoidal covariate region that only has a particular center and a particular shape. Further research in this direction is required. Finally, the confidence band plots and rejection region plots are mainly possible for k⩽2k⩽2. For larger values of k we may consider listing the values of View the MathML sources(x2)=|x2Tβ^2|σ^(x2TAx2)1/2 Turn MathJax on for all observed values of x2x2. Large values of s(x2)s(x2) may correspond to x2x2 values that lead to a rejection of the restricted test or the partial F test. At the minimum these values may identify regions where this may occur. The list can also include values of s(x2)s(x2) for any values of x2x2, and in particular, one can choose a grid of values of interest.