دانلود مقاله ISI انگلیسی شماره 114156
ترجمه فارسی عنوان مقاله

انتخاب ویژگی های آنلاین برای داده های با واسطه طبقه ای با ابعاد بزرگ

عنوان انگلیسی
Online feature selection for high-dimensional class-imbalanced data
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
114156 2017 13 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Knowledge-Based Systems, Volume 136, 15 November 2017, Pages 187-199

ترجمه کلمات کلیدی
انتخاب آنلاین، ابعاد بزرگ، عدم تعادل کلاس، مجموعه خشن همسایگی،
کلمات کلیدی انگلیسی
Online feature selection; High dimensional; Class imbalance; Neighborhood rough set;
پیش نمایش مقاله
پیش نمایش مقاله  انتخاب ویژگی های آنلاین برای داده های با واسطه طبقه ای با ابعاد بزرگ

چکیده انگلیسی

When tackling high dimensionality in data mining, online feature selection which deals with features flowing in one by one over time, presents more advantages than traditional feature selection methods. However, in real-world applications, such as fraud detection and medical diagnosis, the data is high-dimensional and highly class imbalanced, namely there are many more instances of some classes than others. In such cases of class imbalance, existing online feature selection algorithms usually ignore the small classes which can be important in these applications. It is hence a challenge to learn from high-dimensional and class imbalanced data in an online manner. Motivated by this, we first formalize the problem of online streaming feature selection for class imbalanced data, and then present an efficient online feature selection framework regarding the dependency between condition features and decision classes. Meanwhile, we propose a new algorithm of Online Feature Selection based on the Dependency in K nearest neighbors, called K-OFSD. In terms of Neighborhood Rough Set theory, K-OFSD uses the information of nearest neighbors to select relevant features which can get higher separability between the majority class and the minority class. Finally, experimental studies on seven high-dimensional and class imbalanced data sets show that our algorithm can achieve better performance than traditional feature selection methods with the same numbers of features and state-of-the-art online streaming feature selection algorithms in an online manner.