ترجمه فارسی عنوان مقاله

یک روش ترکیبی برای نسبت دادن ارزش از دست رفته با استفاده از بهینه سازی فازی سی با رگرسیون بردار پشتیبانی و الگوریتم ژنتیک

عنوان انگلیسی

A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm

کد مقاله	سال انتشار	تعداد صفحات مقاله انگلیسی
8179	2013	11 صفحه PDF

منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Information Sciences, Volume 233, 1 June 2013, Pages 25–35

ترجمه کلمات کلیدی

داده های گم شده - مقادیر گمشده - بستن - رگرسیون بردار پشتیبانی

کلمات کلیدی انگلیسی

دانلود رایگان 2 صفحه اول مقاله لاتین (PDF)

پیش نمایش مقاله

چکیده انگلیسی

Missing values in datasets should be extracted from the datasets or should be estimated before they are used for classification, association rules or clustering in the preprocessing stage of data mining. In this study, we utilize a fuzzy c-means clustering hybrid approach that combines support vector regression and a genetic algorithm. In this method, the fuzzy clustering parameters, cluster size and weighting factor are optimized and missing values are estimated. The proposed novel hybrid method yields sufficient and sensible imputation performance results. The results are compared with those of fuzzy c-means genetic algorithm imputation, support vector regression genetic algorithm imputation and zero imputation.

مقدمه انگلیسی

Missing values are highly undesirable in data mining, machine learning and other information systems [33]. In recent years, much research has been regarding missing value estimation and imputation has been performed [3], [9], [24], [33], [35], [47] and [49]. To deal with missing values in datasets: ignoring, deleting, zero or mean estimation methods might be used instead of imputation methods [7] and [30]. However, the primary disadvantages of these estimation methods are the loss of efficiency due to discarding incomplete observations and biases in estimates when data are missing in a systematic manner [35]; these disadvantages reduce data quality. Quality data mining results can be obtained only with high quality data [37] and [41]. Therefore, missing values should be estimated to increase data quality. Missing values typically occur because of sensor faults, a lack of response in scientific experiments, faulty measurements, data transfer problems in digital systems or respondents’ unwillingness to respond to survey questions [1], [27], [31], [32] and [36]. In scientific research, especially in psychology, data for some variables in the database to be analyzed may be missing. If the missing values are not treated correctly, they may decrease or even jeopardize the validity of the research [3], [5], [14], [22] and [34].

نتیجه گیری انگلیسی

In this paper, a hybrid method that uses support vector regression which is known as a reliable machine learning technique and a genetic algorithm was used with fuzzy clustering to estimate missing values. Complete train data were clustered based on their similarity, and fuzzy principles were used during clustering. Therefore, each missing value becomes a member of more than one cluster centroids, which yields more sensible imputation results. Six datasets with different characteristics were used in this paper, and the cluster size and the weighting factor parameters are optimized according to the corresponding dataset. Better imputation accuracy is achieved compared with the FcmGa, SvrGa, Zero imputation methods. In empirical tests the proposed method proved to be more accurate than the others. The proposed fuzzy c-means SvrGa imputation was compared with other representative models, the Fcm-Genetic algorithm and Support vector regression-Genetic algorithm. The experimental results demonstrated that the fuzzy c-means SvrGa imputation yields a more sufficient, sensible estimation accuracy ratio for suitable clustering data.