مدل رگرسیون لجستیک با متغیر وابسته به موضوع پاسخ تصادفی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|24748||2007||10 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computational Statistics & Data Analysis, Volume 51, Issue 12, 15 August 2007, Pages 6060–6069
The univariate and multivariate logistic regression model is discussed where response variables are subject to randomized response (RR). RR is an interview technique that can be used when sensitive questions have to be asked and respondents are reluctant to answer directly. RR variables may be described as misclassified categorical variables where conditional misclassification probabilities are known. The univariate model is revisited and is presented as a generalized linear model. Standard software can be easily adjusted to take into account the RR design. The multivariate model does not appear to have been considered elsewhere in an RR setting; it is shown how a Fisher scoring algorithm can be used to take the RR aspect into account. The approach is illustrated by analyzing RR data taken from a study in regulatory non-compliance regarding unemployment benefit.
Randomized response (RR) is an interview technique that can be used when sensitive questions have to be asked and respondents are reluctant to answer directly (Warner, 1965 and Chaudhuri and Mukerjee, 1988). Examples of sensitive questions are questions about alcohol consumption, sexual behavior or fraud. RR variables can be seen as misclassified categorical variables where conditional misclassification probabilities are fixed by design. The misclassification protects the privacy of the individual respondent. A meta-analysis by Lensvelt-Mulders et al. (2005) shows that RR yields more valid prevalence estimates than other methods for sensitive questions. This paper discusses the univariate and the multivariate logistic regression model when the response variables are subject to RR. The discussion consists of two parts. First, we consider the univariate logistic regression model for binary RR response variables and present this model as a generalized linear model (GLM). By presenting the model as a GLM, two aspects of the model will become clear that have not been noticed in the literature. (i) The model inherits useful properties from the standard GLM; for example, properties of parameter estimates. (ii) The model can be assessed by adjusting standard software for GLMs. The paper shows how adjustments of routines in R and GLIM are possible, making the assessment of logistic regression models for RR response variables reliable and fast. The second part of the discussion in this paper is the presentation of a multivariate logistic regression model for RR response variables. As far as we know this model has not been considered elsewhere. The model makes it possible to investigate the relation between several RR response variables and a set of covariates jointly. There are various ways to define a multivariate logistic regression model without RR (Fahrmeir and Tutz, 2001, Section 3.5). The present paper extends the multivariate logistic regression model as presented by Glonek and McCullagh (1995) and shows how the model can be adapted to take the RR design into account. We briefly review the literature on univariate logistic regression for RR response variables. Maddala (1983) was the first to present the likelihood of the model with respect to the RR design by Warner (1965). Scheers and Dayton (1988) discuss the model with respect to both the Warner model and the unrelated-question model (Greenberg et al., 1969). Van der Heijden and Van Gils (1996) present the model where the response variable is subject to either the RR design by Boruch (1971) or the RR design by Kuk (1990). A recent application of the univariate model is presented in Lensvelt-Mulders et al. (2006). Chen (1989) describes the link between RR and misclassification in the context of log-linear models. Magder and Hughes (1997) discuss the logistic regression model where the response variable is subject to misclassification comparable to the perturbation induced by the RR design. As far as we know, the encompassing GLM framework has not yet been discussed. The outline of the paper is as follows. Section 2 describes the RR design. Section 3 present the logistic regression model given a binary RR response variable. Section 4 discusses the multivariate logistic regression model for RR response variables. In Section 5, applications are discussed using RR data from a Dutch study in regulatory non-compliance regarding unemployment benefit. Section 6 concludes the paper.