دانلود مقاله ISI انگلیسی شماره 22306
ترجمه فارسی عنوان مقاله

روش موازی اثربخش برای داده کاوی ژنتیکی - فازی

عنوان انگلیسی
An effective parallel approach for genetic-fuzzy data mining
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
22306 2014 8 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Expert Systems with Applications, Volume 41, Issue 2, 1 February 2014, Pages 655–662

فهرست مطالب ترجمه فارسی
چکیده

مقدمه

مروری بر تحقیقات مرتبط

الگوریتم‌های ژنتیک

چارچوب داده کاوی فازی – ژنتیکی موازی

نمایش کروموزوم‌ها و ارزیابی برازش

الگوریتم داده کاوی فازی – ژنتیکی موازی پیشنهاد شده

مثال

تحلیل پیچیدگی زمانی

نتایج تجربی

نتیجه گیری و تحقیقات آتی
ترجمه کلمات کلیدی
داده کاوی - مجموعه های فازی - الگوریتم ژنتیک - پردازش موازی - قاعده اتحادیه
کلمات کلیدی انگلیسی
Data mining, Fuzzy set, Genetic algorithm, Parallel processing, Association rule
ترجمه چکیده
مهم‌ترین کاربرد داده کاوی در تلاش‌هایی است که برای استنتاج قواعد وابستگی از داده‌های تراکنشی صورت می‌گیرد. در گذشته، از مفاهیم منطق فازی و الگوریتم‌های ژنتیکی برای کشف قواعد وابستگی فازی سودمند و توابع عضویت مناسب از مقادیر کمی استفاده می‌کردیم. با وجود این، ارزیابی مقادیر برازش نسبتاً زمان بر بود. به دلیل افزایش‌های شگرف در قدرت محاسباتی قابل دسترسی و کاهش همزمان در هزینه‌های محاسباتی در طول یک دهه‌ی گذشته، یادگیری یا داده کاوی با به کارگیری تکنیک‌های پردازشی موازی به عنوان روشی امکان پذیر برای غلبه بر مسئله‌ی یادگیری کند شناخته شده است. بنابراین، در این مقاله الگوریتم داده‌ کاوی موازی فازی – ژنتیکی را بر اساس معماری ارباب - برده ارائه کرده‌ایم تا قواعد وابستگی و توابع عضویت را از تراکنش‌های کمی استخراج کنیم. پردازنده‌ی master مانند الگوریتم ژنتیک از جمعیت یگانه‌ای استفاده می‌کند، و وظایف ارزیابی برازش را بین پردازنده‌های slave توزیع می‌کند. اجرای الگوریتم پیشنهاد شده در معماری ارباب – برده بسیار طبیعی و کارآمد است. پیچیدگی‌های زمانی برای الگوریتم‌های داده کاوی ژنتیکی – فازی موازی نیز مورد تحلیل قرار گرفته است. نتایج این تحلیل تأثیر قابل توجه الگوریتم پیشنهاد شده را نشان داده است. هنگامی که تعداد نسل‌ها زیاد باشد، افزایش سرعت الگوریتم ممکن است نسبتاً خطی باشد. نتایج تجربی تیز این نکته را تأیید می‌کنند. لذا به کارگیری معماری ارباب – برده برای افزایش سرعت الگوریتم داده‌ کاوی ژنتیکی – فازی روشی امکان پذیر برای غلبه بر مشکل ارزیابی برازش کم سرعت الگوریتم‌ اصلی است.
ترجمه مقدمه
با پیشرفت روزافزون فن آوری اطلاعات (IT)، قابلیت ذخیره سازی و مدیریت داده‌ها در پایگاه‌های داده‌ اهمیت بیشتری پیدا می‌کند. به رغم اینکه گسترش IT پردازش داده‌ها را تسهیل و تقاضا برای رسانه‌های ذخیره سازی را برآورده می‌سازد، استخراج اطلاعات تلویحی قابل دسترسی به منظور کمک به تصمیم گیری مسئله‌ای جدید و چالش برانگیز است. از این رو، تلاش‌های زیادی معوف به طراحی مکانیسم‌های کارآمد برای کاوش اطلاعات و دانش از پایگاه داده‌های بزرگ شده است. در نتیجه، داده کاوی، که نخستین بار توسط آگراول، ایمیلنسکی و سوامی (1993) ارائه شد، به زمینه‌ی مطالعاتی مهمی در مباحث پایگاه داده‌ای و هوش مصنوعی مبدل شده است.
پیش نمایش مقاله
پیش نمایش مقاله  روش موازی اثربخش برای داده کاوی ژنتیکی - فازی

چکیده انگلیسی

Data mining is most commonly used in attempts to induce association rules from transaction data. In the past, we used the fuzzy and GA concepts to discover both useful fuzzy association rules and suitable membership functions from quantitative values. The evaluation for fitness values was, however, quite time-consuming. Due to dramatic increases in available computing power and concomitant decreases in computing costs over the last decade, learning or mining by applying parallel processing techniques has become a feasible way to overcome the slow-learning problem. In this paper, we thus propose a parallel genetic-fuzzy mining algorithm based on the master–slave architecture to extract both association rules and membership functions from quantitative transactions. The master processor uses a single population as a simple genetic algorithm does, and distributes the tasks of fitness evaluation to slave processors. The evolutionary processes, such as crossover, mutation and production are performed by the master processor. It is very natural and efficient to run the proposed algorithm on the master–slave architecture. The time complexities for both sequential and parallel genetic-fuzzy mining algorithms have also been analyzed, with results showing the good effect of the proposed one. When the number of generations is large, the speed-up can be nearly linear. The experimental results also show this point. Applying the master–slave parallel architecture to speed up the genetic-fuzzy data mining algorithm is thus a feasible way to overcome the low-speed fitness evaluation problem of the original algorithm.

مقدمه انگلیسی

As information technology (IT) progresses rapidly, its capacity to store and manage data in databases is becoming important. Though IT development facilitates data processing and eases demands on storage media, extraction of available implicit information to aid decision making has become a new and challenging task. Vigorous efforts have thus been devoted to designing efficient mechanisms for mining information and knowledge from large databases. As a result, data mining, first proposed by Agrawal, Imielinksi, and Swami (1993), has become a central field of study in the database and artificial intelligence areas. Deriving association rules from transaction databases is most commonly seen in data mining (Chen et al., 1996, Famili et al., 1997 and Han and Fu, 1995). It discovers relationships among items such that the presence of certain items in a transaction tends to imply the presence of certain other items. In the past, Agrawal and his co-workers proposed a method (Srikant & Agrawal, 1996) for mining association rules from data sets using quantitative and categorical attributes. Their proposed method first determines the number of partitions for each quantitative attribute, and then maps all possible values of each attribute onto a set of consecutive integers. Recently, the fuzzy set theory (Zadeh, 1965) has been used more and more frequently in intelligent systems because of its simplicity and similarity to human reasoning (Kandel, 1992). Many fuzzy learning algorithms for inducing rules from given sets of data have been designed and used to good effect with specific domains (Hong and Lee, 1996 and Yuan and Shaw, 1995). Several fuzzy mining algorithms for managing quantitative data have also been proposed (Cai et al., 1998, Kaya and Alhajj, 2003, Luan et al., 2012, Mangalampalli and Pudi, 2009, Mohamadlou et al., 2009, Ouyang and Huang, 2009 and Wang et al., 2012), where the membership functions were assumed to be known in advance. The given membership functions may, however, have a critical influence on the final mining results. Genetic algorithms (GAs) Holland, (1975) have also recently been used in the field of data mining since they are powerful search techniques in solving difficult problems and can provide feasible solutions in a limited amount of time. Hong et al. thus proposed a GA-based fuzzy data-mining method (Hong, Chen, Wu, & Lee, 2006) for extracting both association rules and membership functions from quantitative transactions. In that method, the fitness evaluation is based on the suitability of derived membership functions and the number of large itemsets. The evaluation for fitness values is, however, quite time-consuming. Due to dramatic increases in available computing power and concomitant decreases in computing costs over last decades, learning or mining by applying parallel processing techniques has become a feasible way of overcoming the problem of slow learning (Cordón et al., 2001, Herrera et al., 1997 and Wang et al., 1998). Several parallel approaches to speed up the process of data-mining were also proposed (Agrawal and Shafer, 1996, Chen et al., 2012, Joshi et al., 2000 and Veloso et al., 2003). In addition, some parallel methods with genetic algorithms were also suggested (Abramson and Abela, 1992 and Araujo et al., 1999). They have been applied to solving timetable scheduling and discovering classification rules. Among the parallel architectures, the master–slave architecture is particularly easy to implement. It also usually promises substantial gains in performance (Cantu-Paz, 1998). The master processor allocates the tasks to the slave processors and collects the results from them. It can also do its own work if necessary. As mentioned before, the fitness evaluation in genetic-fuzzy data mining is usually very time-consuming. In this paper, we thus extend our previous work (Hong et al., 2006) by using the master–slave parallel architecture to dynamically adapt membership functions and to use them to deal with quantitative transactions in fuzzy data mining. It is very natural and efficient to design a parallel GA-fuzzy mining algorithm based on the master–slave architecture. The master processor uses a single population as a simple genetic algorithm does, and distributes the tasks of fitness evaluation for suitability of membership functions and large itemsets to slave processors. The evolutionary processes, such as crossover, mutation and production are performed by the master processor. We expect that by appropriately allocating the tasks among the different types of processors, the efficiency of the proposed genetic-fuzzy mining algorithm can greatly be raised. The remainders of this paper are organized as follows. Related works about parallel processing applied to data mining and genetic algorithms are reviewed in Section 2. A parallel genetic-fuzzy mining framework based on the master–slave architecture is described in Section 3. The adopted representation of chromosomes and the genetic operators used in this paper are stated in Section 4. A parallel genetic-fuzzy mining algorithm is proposed in Section 5. A simple example for demonstrating the proposed algorithm is given in Section 6. The time complexity of the proposed algorithm is analyzed in Section 7. The experimental results are shown in Section 8. Finally, conclusion and future work are given in Section 9.

نتیجه گیری انگلیسی

In this paper, we have proposed a parallel genetic-fuzzy mining algorithm based on the master–slave architecture to extract both association rules and membership functions from quantitative transactions. The master and the slaves first cooperate to discover a set of suitable membership functions. The set of membership functions found is then used by the master processor to mine fuzzy association rules. The second phase is not necessary to be processed in parallel since the most time-critical part lies in phase 1. The time complexities for both sequential and parallel genetic-fuzzy mining algorithm have been analyzed, with results showing the good effect of the proposed approach. When the number of generations is large, the speed-up can be nearly linear. The experimental results have also shown this point. Applying the master–slave parallel architecture to speed up the genetic-fuzzy data mining algorithm is thus a feasible way to overcome the low-speed fitness evaluation problem of the original algorithm. The proposed parallel mining algorithm can also be easily modified for execution on a bounded number of processors to meet real-world requirements. In the future, we will continuously attempt to enhance the GA-based framework for more complex mining problems.