Let us consider the fixed regression model, and assume that the random errors, {εt}, follow an ARMA-type dependence structure. The purpose of this paper is to study the application of the bootstrap test to check that the unknown regression function, m, follows a general linear model of the type:
with A being a functional of in . In a previous paper, González-Manteiga and Vilar-Fernández (1995) proposed a test, , based on the Crámer–von-Mises-type functional distance, where is a Gasser–Müller-type non-parametric estimator of m, and is a member of the family that is closest to . In this work, two bootstrap algorithms are considered, where the dependence structure of the errors is taken into account. A broad simulation study and an applied example show the good behavior of the bootstrap test.
Let us consider the regression model
equation(1)
where xt∈C, with a compact set in the unknown regression function, and {εt}t=1n a sequence of unobserved zero mean random variables.
In the last few years, several hypothesis tests have been developed for testing
equation(2)
versus a general alternative hypotheses of the type:
H1: “m is a function with a certain degree of smoothness”.
Given an initial sample {(xt,Yt)}t=1n, almost all of these tests are based on one distance between a non-parametric pilot estimator, and one parametric estimator, of m under H0, denoted by . If this discrepancy is statistically significant, hypothesis H0 is rejected. Otherwise, it is accepted. Among the interesting recent papers that address this problem we can cite those by Firth et al. (1991), Kozek (1991), Eubank and Hart (1993), Eubank et al. (1993), Härdle and Mammen (1993) and Stute and González-Manteiga (1996), where different non-parametric pilot estimators are used (kernel, spline, etc.).
In this work, we devote attention to the goodness of fit for linear regression models. That is, for type (1) models we wish to test the hypothesis
equation(3)
with respect to the alternative given in (2), where A is a functional of of .
The study is carried out taking into account that the errors, εt, are dependent. It often happens when analyzing economical data samples, growth curves and, in general, whenever the observations are sequentially gathered in time. It is important to take into account the existence of the correlation among the errors when the model is statistically analyzed. To ignore this fact causes inefficiency in the parametric estimation of the model ( Seber and Wild, 1989), in the non-parametric estimation ( Chu and Marron, 1991), and it also affects the power of the goodness-of-fit test used, as we will later show in the simulation study.
The dependence structure in the errors for the goodness-of-fit problem, was considered for the first time in a previous paper by González-Manteiga and Vilar-Fernández (1995). In their work, a test based on a discrepancy between a non-parametric estimator of m (of Gasser and Müller, 1979 type)
equation(4)
and one parametric, was considered. Denoting as usual a kernel function, and h>0 the smoothing parameter. The estimator minimizes the functional
equation(5)
where ω is a weight function in order to prevent boundary effects of the kernel estimator and is the empirical distribution function over the points of the design. Now, in a natural way, we can use as a discrepancy measure between the null hypothesis and the alternative
equation(6)
where Ψ is the Crámer–von-Mises-type functional distance defined in (5).
In Theorem 1 of González-Manteiga and Vilar-Fernández (1995), the asymptotic normality of and D is obtained under the hypothesis that the errors follow a MA(∞) dependence structure. They deduce that
equation(7)
with and , where γ(k) is the order k autocovariance function of the process of errors {εi} and ∗ denotes the convolution operator.
Using the limit distribution (7), hypothesis H0 is rejected with a significance level of α when
equation(8)
zα being such that Φ(zα)=1−α, with Φ the distribution function of the standard normal.
In practice, expression (8) cannot be used, as it depends on two unknown parameters: σε2 and σD2 which would have to be estimated from the sample. A “plug in” version of the test can be obtained with the estimation of these parameters. In both cases, the speed of convergence to the normal in (7) is very slow and, for the usual values of the smoothing parameter h≈n−1/5, the convergence speed obtained is of order n−1/10 (see Härdle and Mammen (1993) González-Manteiga and Vilar-Fernández (1995) for more details on this aspect). Very large sample sizes for the “plug-in” test are then necessary.
In order to solve this problem, in this work, we study two bootstrap algorithms to approximate the distribution of , in one alternative way to the normal approximation. In Section 2, we describe the two resampling algorithms under one ARMA-type dependence in the errors. In Section 3, we present a broad simulation study to compare the tests with normal critical region (with known theoretical parameters or with estimated parameters) with the tests obtained from the two bootstrap algorithms. In this study, several aspects of interest are considered, such as the selection of the smoothing parameters, the influence of the error's structure, the sample size and the parameters of the model. In Section 4, the proposed tests are applied in a numerical example. Finally, we present the conclusions of our study in Section 5.