رگرسیون لجستیک فازی بر اساس حداقل مربعات نزدیک با کاربرد در مطالعات بالینی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
24876 | 2011 | 13 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computers & Mathematics with Applications, Volume 62, Issue 9, November 2011, Pages 3353–3365
چکیده انگلیسی
To model fuzzy binary observations, a new model named “Fuzzy Logistic Regression” is proposed and discussed in this study. In fact, due to the vague nature of binary observations, no probability distribution can be considered for these data. Therefore, the ordinary logistic regression may not be appropriate. This study attempts to construct a fuzzy model based on possibility of success. These possibilities are defined by some linguistic terms such as …, low, medium, high…. Then, by use of the Extension principle, the logarithm transformation of “possibilistic odds” is modeled based on a set of crisp explanatory variables observations. Also, to estimate parameters in the proposed model, the least squares method in fuzzy linear regression is used. For evaluating the model, a criterion named the “capability index” is calculated. At the end, because of widespread applications of logistic regression in clinical studies and also, the abundance of vague observations in clinical diagnosis, the suspected cases to Systematic Lupus Erythematosus (SLE) disease is modeled based on some significant risk factors to detect the application of the model. The results showed that the proposed model could be a rational substituted model of an ordinary one in modeling the clinical vague status.
مقدمه انگلیسی
Nowadays, vague or non-precise observations include a large amount of research data. Depending on the kind of variables, vague observations arise for different reasons. In quantitative variables which are measured in terms of real numbers, no suitable or developed measuring instruments may lead to non-precise observations. Also, sometimes, unavailability of original characteristics may cause approximate measures. In contrast, qualitative variables which express a qualitative attribute, take on a finite number of codes. These codes do not imply numerical properties and refer to the distinct categories of the variable. The definition of categories is very important in categorical variables. Ambiguous definition may cause confusion in classification. Indeed, borderlines of categories are not crisp and cases near the categories’ borderline have a vague status. In addition, some categorical variables are inherently measured by fuzzy scale. For example, the observations are described by linguistic terms such as large, heavy, or approximately equal to five. The best description of these kinds of observations is that they are fuzzy outputs. Modeling the relationship between these observations and making a prediction under the fuzzy environment is a challenge for the classical modeling analysis.
نتیجه گیری انگلیسی
In this paper we proposed fuzzy logistic regression and discussed fuzzy least squares method to estimate model’s parameters. We aimed to illustrate that the ideal assumptions of ordinary logistic regression like other statistical models may not hold in practice. Also, we emphasized on the vague nature of boundaries in categorical variables which leads to non-precise observations. Especially in binary variables, cases should be categorized to two distinct categories while some cases have vague status confronting to this general categorization. Ignoring these observations in the modeling process is not rational. Also, considering them may lead to contravening distributional assumptions. This problem is more important when the observations of the response variable are non-precise. In logistic regression analysis, no assumptions are made about the distribution of the explanatory variables [11]. But the binary response variable should follow Bernoulli probability distribution. Obviously no probability distribution can consider for non-precise observations. The inability of ordinary logistic regression in modeling vague binary observations and frequency of these observations in clinical researches motivated us to discuss this new model. The proposed model was recommended for crisp input- fuzzy binary output observations. Instead of probabilistic odds, we modeled an equal term named “possibilistic odds” [20]. Indeed, we defined and modeled the logarithmic transformation of possibility of success, View the MathML sourceμ̃i (a consistent degree to success criteria) for each case. Also, we expressed that there are two definitions for View the MathML sourceμ̃i, considering it as a real number on (0, 1) or assigning a linguistic term to it consisting of very low, low, medium, high and very high. These terms are fuzzy sets in turns. They should be defined in such a way that the union of their supports covers the whole range of (0, 1) interval. We chose the second definition. By detecting on intrinsically linearity property of logistic regression model, the fuzzy least squares approach was used to estimate model’s parameters. Also, one goodness-of-fit criterion was introduced to evaluate our model, and at the end, a numerical example in the clinical field was used to detect the applied aspect of our model. In comparison with other previous works on the logistic regression model in fuzzy environment, our proposed model has some advantages. For instance, our method is based on the fuzzy least squares method. It models the fuzzy relations between crisp inputs-fuzzy output (as a linguistic term) observations. Also, the estimated outputs are fuzzy numbers representing the possibility values of the interested event (coded 1). Whereas, Dom et al. [11] and [17] and Nagar and Srivastava [18] modeled fuzzy relation between crisp inputs- crisp output observations based on a possibilistic approach. In addition, Yang and Chen [22] proposed a logistic mixed model with fuzzy mixture value which is commonly used for fuzzy clustering not fuzzy modeling. It is worth mentioning that Takemura [21] also proposed a method for fuzzy logistic regression. But our method has some advantages compared to his method as follows: 1- He used probabilistic odds while we have introduced and applied the concept of possibilistic odds and therefore our model is a full fuzzy model. 2- He did not represent any index for evaluating his method while we have used an index for this purpose. 3- The observed fuzzy outputs in our method are linguistic terms considered as L–RL–R fuzzy number (detecting the possibility of belonging to the desired category (category 1)) whereas in Takemura’s method, the ambiguous probability of belonging to category 1 was rated by fuzzy rating method and then the rating was considered as L–RL–R fuzzy number. It is obvious that expressing the possibility values as linguistic terms is simpler and also more common in practice. 4- He used three (first, middle and last) points to calculate the distance between the observed and estimated values while we have used all αα cuts of the fuzzy numbers to calculate these distances. As a result, the proposed method in the present study displays differently to the logistic regression model in a fuzzy environment and represents some new methodology aspects in this field. However, the proposed model may be extended for the case when both explanatory and response variables are fuzzy. Also, different methods of parameters estimation or model evaluation can be used to improve the model.