Data mining is most commonly used in attempts to induce association rules from transaction data. In the past, we used the fuzzy and GA concepts to discover both useful fuzzy association rules and suitable membership functions from quantitative values. The evaluation for fitness values was, however, quite time-consuming. Due to dramatic increases in available computing power and concomitant decreases in computing costs over the last decade, learning or mining by applying parallel processing techniques has become a feasible way to overcome the slow-learning problem. In this paper, we thus propose a parallel genetic-fuzzy mining algorithm based on the master–slave architecture to extract both association rules and membership functions from quantitative transactions. The master processor uses a single population as a simple genetic algorithm does, and distributes the tasks of fitness evaluation to slave processors. The evolutionary processes, such as crossover, mutation and production are performed by the master processor. It is very natural and efficient to run the proposed algorithm on the master–slave architecture. The time complexities for both sequential and parallel genetic-fuzzy mining algorithms have also been analyzed, with results showing the good effect of the proposed one. When the number of generations is large, the speed-up can be nearly linear. The experimental results also show this point. Applying the master–slave parallel architecture to speed up the genetic-fuzzy data mining algorithm is thus a feasible way to overcome the low-speed fitness evaluation problem of the original algorithm.
As information technology (IT) progresses rapidly, its capacity to store and manage data in databases is becoming important. Though IT development facilitates data processing and eases demands on storage media, extraction of available implicit information to aid decision making has become a new and challenging task. Vigorous efforts have thus been devoted to designing efficient mechanisms for mining information and knowledge from large databases. As a result, data mining, first proposed by Agrawal, Imielinksi, and Swami (1993), has become a central field of study in the database and artificial intelligence areas.
Deriving association rules from transaction databases is most commonly seen in data mining (Chen et al., 1996, Famili et al., 1997 and Han and Fu, 1995). It discovers relationships among items such that the presence of certain items in a transaction tends to imply the presence of certain other items. In the past, Agrawal and his co-workers proposed a method (Srikant & Agrawal, 1996) for mining association rules from data sets using quantitative and categorical attributes. Their proposed method first determines the number of partitions for each quantitative attribute, and then maps all possible values of each attribute onto a set of consecutive integers.
Recently, the fuzzy set theory (Zadeh, 1965) has been used more and more frequently in intelligent systems because of its simplicity and similarity to human reasoning (Kandel, 1992). Many fuzzy learning algorithms for inducing rules from given sets of data have been designed and used to good effect with specific domains (Hong and Lee, 1996 and Yuan and Shaw, 1995). Several fuzzy mining algorithms for managing quantitative data have also been proposed (Cai et al., 1998, Kaya and Alhajj, 2003, Luan et al., 2012, Mangalampalli and Pudi, 2009, Mohamadlou et al., 2009, Ouyang and Huang, 2009 and Wang et al., 2012), where the membership functions were assumed to be known in advance. The given membership functions may, however, have a critical influence on the final mining results. Genetic algorithms (GAs) Holland, (1975) have also recently been used in the field of data mining since they are powerful search techniques in solving difficult problems and can provide feasible solutions in a limited amount of time. Hong et al. thus proposed a GA-based fuzzy data-mining method (Hong, Chen, Wu, & Lee, 2006) for extracting both association rules and membership functions from quantitative transactions. In that method, the fitness evaluation is based on the suitability of derived membership functions and the number of large itemsets. The evaluation for fitness values is, however, quite time-consuming.
Due to dramatic increases in available computing power and concomitant decreases in computing costs over last decades, learning or mining by applying parallel processing techniques has become a feasible way of overcoming the problem of slow learning (Cordón et al., 2001, Herrera et al., 1997 and Wang et al., 1998). Several parallel approaches to speed up the process of data-mining were also proposed (Agrawal and Shafer, 1996, Chen et al., 2012, Joshi et al., 2000 and Veloso et al., 2003). In addition, some parallel methods with genetic algorithms were also suggested (Abramson and Abela, 1992 and Araujo et al., 1999). They have been applied to solving timetable scheduling and discovering classification rules.
Among the parallel architectures, the master–slave architecture is particularly easy to implement. It also usually promises substantial gains in performance (Cantu-Paz, 1998). The master processor allocates the tasks to the slave processors and collects the results from them. It can also do its own work if necessary. As mentioned before, the fitness evaluation in genetic-fuzzy data mining is usually very time-consuming. In this paper, we thus extend our previous work (Hong et al., 2006) by using the master–slave parallel architecture to dynamically adapt membership functions and to use them to deal with quantitative transactions in fuzzy data mining. It is very natural and efficient to design a parallel GA-fuzzy mining algorithm based on the master–slave architecture. The master processor uses a single population as a simple genetic algorithm does, and distributes the tasks of fitness evaluation for suitability of membership functions and large itemsets to slave processors. The evolutionary processes, such as crossover, mutation and production are performed by the master processor. We expect that by appropriately allocating the tasks among the different types of processors, the efficiency of the proposed genetic-fuzzy mining algorithm can greatly be raised.
The remainders of this paper are organized as follows. Related works about parallel processing applied to data mining and genetic algorithms are reviewed in Section 2. A parallel genetic-fuzzy mining framework based on the master–slave architecture is described in Section 3. The adopted representation of chromosomes and the genetic operators used in this paper are stated in Section 4. A parallel genetic-fuzzy mining algorithm is proposed in Section 5. A simple example for demonstrating the proposed algorithm is given in Section 6. The time complexity of the proposed algorithm is analyzed in Section 7. The experimental results are shown in Section 8. Finally, conclusion and future work are given in Section 9.