دانلود مقاله ISI انگلیسی شماره 24635
ترجمه فارسی عنوان مقاله

تشخیص پرت در رگرسیون خطی فازی با ورودی و خروجی واضح با مشخصات متغیر زبانی

عنوان انگلیسی
Outlier detection in fuzzy linear regression with crisp input–output by linguistic variable view
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
24635 2013 9 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Applied Soft Computing, Volume 13, Issue 1, January 2013, Pages 734–742

ترجمه کلمات کلیدی
رگرسیون فازی - رگرسیون معمولی - برنامه نویسی ریاضی - داده های پرت - متغیرهای زبان
کلمات کلیدی انگلیسی
Fuzzy regression, Ordinary regression, Mathematical programming, Outlier data, Linguistic variables,
پیش نمایش مقاله
پیش نمایش مقاله  تشخیص پرت در رگرسیون خطی فازی با ورودی و خروجی واضح با مشخصات متغیر زبانی

چکیده انگلیسی

Existence of outlier data among the observation data leads to inaccurate results in modeling. Detection to omit or lessen the impact of such data has a significant effect to make corrections in a model. Either elimination or reduction of the outlier data influence is two ways to prevent their negative effect on the modeling. Both approaches of elimination and impact reduction are taken into account in dealing with the mentioned problem in fuzzy regression, where both the input and output data are non-fuzzy. The main idea is considered based on linguistic variables and possibility concept as well as ordinary regression to deal with the outlier data. Several examples as well as a case study are put into effect to show the capability of proposed approach. Graphical abstract Full-size image (14 K) Figure options Highlights ► The problem of outlier data affects results of regression analyses and leads to inaccurate estimates and forecasts. ► It is undertaken here using linguistic variables and possibility concept along with regression to reduce the impact of the outlier data. ► The areas in fuzzy numbers generated for each sample data is considered as a measure for the uncertainty to detect outliers. ► The both h-level and spread of the fuzzy numbers contribute to recognize the outlier data intelligently and lessen their effects. ► The results of this new approach applied to several examples show high performance of the proposed method.

مقدمه انگلیسی

Statistical regression is a common way to find a crisp relationship between dependent variable (y) and independent variables (x). An ordinary regression analysis is indeed an explanation for the variation of the former in terms of the latter in which probability distribution is used to find its parameters. However, the possibility theory [2] is applied to extract a fuzzy relationship between the input and output data, when fuzzy regression is considered. This relationship can lead to an inaccurate model with the existence of outlier data. Detection and omission of outlier data is an important process that may prevent from obtaining untrustworthy models. Fuzzy Linear Regression (FLR) analysis is introduced by Tanaka et al. [1], who established his idea on the basis of the possibility theory while until yet. However, many revisions have been proposed on fuzzy regression models. Linear programming method [3], [4], [5] and [6], and the least-squares model [7], [8], [9], [10] and [11] are the two classes of solutions that are currently known for fuzzy regression models. Nonetheless, Tanaka's approach is used yet because of its simplicity; but it has some problems that can be classified into two categories: 1. Influence of difference trend problems. 2. Outlier data problem. Chang and Lee [12] and [13] considered the first set of problems. They demonstrated that fuzziness and uncertainty in the structure of a system are two essential factors that deeply affected on the trend of the centers and spreads. Investigation on outliers was carried out by Peters [14] to control bad influence of the training data on the estimated interval. For this purpose, he applied fuzzy linear programming with triangular membership the width of which depends on some adjusting parameters such as “goodness” of the solution, the tolerance interval, and the desired value of the objective function. Chen [15] illustrated that Peter's model may result in error, particularly when data contain outliers. Indeed, his finding revealed that PFLR (Possibilistic Fuzzy Linear Regression) or UFLR (Unrestricted sign Fuzzy Linear Regression) model is led to wrong outcomes whenever the estimated confidence interval is too broad. He put an additional restriction (k-value, which is stated as a difference in the width between the spread of the estimated data and the spread of the dependent observation data) to keep influence of outliers away. Nonetheless his model was very sensitive to the value of k. Other investigators, comprising Ortiz et al. [16] indicated that robust regression may be an alternative tool for detection of outlier data. Tanaka and Lee [17] used linear programming with quadratic programming to handle outlier data based on combination of central tendency and possibility properties. Because Chang and Lee [12] and [13], Ortiz et al. [16] and Chen [15] models consider fuzzy observation while the proposed model regards crisp data, thus we consider the results of Tanaka et al. [1] model and Peters [14] model. This paper deals with outlier data problems for non-fuzzy input and non-fuzzy output models by applying linguistic variables. Outlier data are determined by applying ordinary regression along with possibility concept to omit or lessen their effects. The organization of the remaining parts of the paper is as follows. In Section 2, preliminary definition of fuzzy numbers is considered. Proposed method will be introduced in Section 3. Numeric examples as well as a case study will be applied in order to demonstrate the ability of proposed approach in the Section 4. Conclusion of the paper will be pointed out in the last section.

نتیجه گیری انگلیسی

In this study, fuzzy concept and linguistic variable along with ordinary regression were used to determine outlier data. Then we pointed out a basic curve by ordinary regression to apply the fuzzy number concept and recognize the outlier data. To lessen their effect and reduce uncertainty level, we used the area of fuzzy number. This act led to consider the h-level and the spread of the fuzzy number simultaneously. Finally, the proposed model was implemented on data to demonstrate its capability by MAD index through several examples. Moreover, a case study was carried out to illustrate application of the proposed approach to the real world. Although, from computational view point, the proposed approach is a little time consuming, but with appearing different software such as MATLAB software all of mentioned processes are performed as quickly as possible.