دانلود مقاله ISI انگلیسی شماره 137870
ترجمه فارسی عنوان مقاله

رویکرد توزیع شده به نسل تکامل چند منظوره از طبقه بندی های مبتنی بر قاعده فازی از داده های بزرگ

عنوان انگلیسی
A distributed approach to multi-objective evolutionary generation of fuzzy rule-based classifiers from big data
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
137870 2017 36 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Information Sciences, Volumes 415–416, November 2017, Pages 319-340

ترجمه کلمات کلیدی
سیستم های فازی تکاملی چند هدفه، اطلاعات بزرگ، طبقه بندی های مبتنی بر قاعده فازی، آپاچی جرقه،
کلمات کلیدی انگلیسی
Multi-objective evolutionary fuzzy systems; Big data; Fuzzy rule-based classifiers; Apache Spark;
پیش نمایش مقاله
پیش نمایش مقاله  رویکرد توزیع شده به نسل تکامل چند منظوره از طبقه بندی های مبتنی بر قاعده فازی از داده های بزرگ

چکیده انگلیسی

In the last years, multi-objective evolutionary algorithms (MOEAs) have been extensively used to generate sets of fuzzy rule-based classifiers (FRBCs) with different trade-offs between accuracy and interpretability. Since the computation of the accuracy for each chromosome evaluation requires the scan of the overall training set, these approaches have proved to be very expensive in terms of execution time and memory occupation. For this reason, they have not been applied to very large datasets yet. On the other hand, just for these datasets, interpretability of classifiers would be very desirable. In the last years the advent of a number of open source cluster computing frameworks has however opened new interesting perspectives. In this paper, we exploit one of these frameworks, namely Apache Spark, and propose the first distributed multi-objective evolutionary approach to learn concurrently the rule and data bases of FRBCs by maximizing accuracy and minimizing complexity. During the evolutionary process, the computation of the fitness is divided among the cluster nodes, thus allowing the designer to distribute both the computational complexity and the dataset storing. We have performed a number of experiments on ten real-world big datasets, evaluating our distributed approach in terms of both classification rate and scalability, and comparing it with two well-known state-of-art distributed classifiers. Finally, we have evaluated the achievable speedup on a small computer cluster. We present that the distributed version can efficiently extract compact rule bases with high accuracy, preserving the interpretability of the rule base, and can manage big datasets even with modest hardware support.