دانلود مقاله ISI انگلیسی شماره 24278
ترجمه فارسی عنوان مقاله

بررسی تطبیق سر و صدا برای روش های بستن مبتنی بر برآوردگرهای رگرسیون خطی محلی غیر پارامتریک

عنوان انگلیسی
Evaluation of matching noise for imputation techniques based on nonparametric local linear regression estimators
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
24278 2008 12 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Computational Statistics & Data Analysis, Volume 53, Issue 2, 15 December 2008, Pages 354–365

ترجمه کلمات کلیدی
تطبیق سر و صدا - روش های بستن - برآوردگرهای رگرسیون خطی محلی - غیر پارامتریک -
کلمات کلیدی انگلیسی
matching noise ,imputation techniques , nonparametric, local linear regression estimators,
پیش نمایش مقاله
پیش نمایش مقاله  بررسی تطبیق سر و صدا برای روش های بستن مبتنی بر برآوردگرهای رگرسیون خطی محلی غیر پارامتریک

چکیده انگلیسی

A new matching procedure based on imputing missing data by means of a local linear estimator of the underlying population regression function (that is assumed not necessarily linear) is introduced. Such a procedure is compared to other traditional approaches, more precisely hot deck methods as well as methods based on kNN estimators. The relationship between the variables of interest is assumed not necessarily linear. Performance is measured by the matching noise given by the discrepancy between the distribution generating genuine data and the distribution generating imputed values.

مقدمه انگلیسی

In several contexts, e.g. official statistics (D’Orazio et al., 2002 and D’Orazio et al., 2006), marketing (Räessler, 2002), genetics (as for the data sets in repositories like genenetwork.org), data files coming from different sources are frequently available at a moderate cost. Each data file contains the values of some of the variables of interest. This is a serious limitation, when one is interested in the joint analysis of variables that are not jointly observed. The statistical matching problem consists in constructing a complete synthetic data file, where all the variables of interest are present. In a sense, this is a purely “descriptive” objective, representing the multivariate joint distribution, with the aim to create a data set available to end-users. The synthetic data set is constructed by using imputation techniques. As a consequence the joint distribution of the variables of interest in the synthetic data file does not generally coincide with the genuine distribution. This discrepancy is the matching noise. From an end-user perspective, the smaller the matching noise, the better the reconstructed data file. Different techniques have been proposed in the literature for tackling the statistical matching problem, among them an important role is played by hot deck methods, as well as kNN methods. Their properties are studied in Paass (1985) and Marella et al. (2008), where both theoretical and simulation results are obtained. In this paper we go further by introducing new nonparametric matching techniques based on local linear regression, that are compared to existing ones. The paper is organized as follows. In Section 2 the main technical aspects are briefly introduced. In Section 3 a class of nonparametric imputation procedures are described, including the method based on the local linear estimator. In Section 4 the matching noise (for imputation based on local linear regression estimators) is formally evaluated. Finally, in Section 5 a simulation study is implemented.

نتیجه گیری انگلیسی

In this paper, a method of imputation based on the local linear estimation of the regression function of the variables of interest has been introduced and compared (in terms of matching noise) to other popular imputation techniques (hot deck methods and methods based on kNN estimators). On the theoretical ground imputation based on local linear regression is asymptotically matching noise free. Comparisons made by simulation show that the higher the complexity of the functional relationship between the predictor XX and the response variable ZZ, the better the performance of the imputation method based on the local linear regression estimator. The performance of imputation based on the local linear regression estimator is close to that of mean kNN plus random residual for the reconstruction of the marginal distribution of ZZ, and to that of the distance hot deck when the interest is in the conditional distribution of Z∣XZ∣X. As a result, this method offers an advantageous compromise for a good preservation of both the marginal and conditional distributions. As far as the bandwidth selection is concerned, LRot and LGcv generally give good results. The LGcv method performs better when the population regression function is complex, far from linearity. This result parallels analogous results obtained by Marron and Wand (1992) for nonparametric density estimation. In that case, cross validation gives good results when the density function to be estimated is particularly rough.