دانلود مقاله ISI انگلیسی شماره 24416
ترجمه فارسی عنوان مقاله

استخراج قانون انجمن از طریق سیستم کلونی مورچه ها برای پایگاه تحقیقات بیمه سلامت ملی در تایوان

عنوان انگلیسی
Association rule mining through the ant colony system for National Health Insurance Research Database in Taiwan
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
24416 2007 16 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Computers & Mathematics with Applications, Volume 54, Issues 11–12, December 2007, Pages 1303–1318

ترجمه کلمات کلیدی
داده کاوی - محدودیت های چند بعدی - سیستم کلونی مورچه ها - روش اپریوری
کلمات کلیدی انگلیسی
Data mining,Multiple dimensional constraints,Ant colony system, Apriori
پیش نمایش مقاله
پیش نمایش مقاله  استخراج قانون انجمن از طریق سیستم کلونی مورچه ها برای پایگاه تحقیقات بیمه سلامت ملی در تایوان

چکیده انگلیسی

In the field of data mining, an important issue for association rules generation is frequent itemset discovery, which is the key factor in implementing association rule mining. Therefore, this study considers the user’s assigned constraints in the mining process. Constraint-based mining enables users to concentrate on mining itemsets that are interesting to themselves, which improves the efficiency of mining tasks. In addition, in the real world, users may prefer recording more than one attribute and setting multi-dimensional constraints. Thus, this study intends to solve the multi-dimensional constraints problem for association rules generation. The ant colony system (ACS) is one of the newest meta-heuristics for combinatorial optimization problems, and this study uses the ant colony system to mine a large database to find the association rules effectively. If this system can consider multi-dimensional constraints, the association rules will be generated more effectively. Therefore, this study proposes a novel approach of applying the ant colony system for extracting the association rules from the database. In addition, the multi-dimensional constraints are taken into account. The results using a real case, the National Health Insurance Research Database, show that the proposed method is able to provide more condensed rules than the Apriori method. The computational time is also reduced.

مقدمه انگلیسی

Mining association rules from a large database of business data, such as transaction records, has been an important issue in the field of data mining. The problem of association rule mining can be divided into two sub-problems: (1) frequent itemset discovery and (2) association rules generation. It has also been shown that the overall performance of mining is seriously determined by the first sub-problem. Frequent itemset mining algorithms often generate a very large number of frequent itemsets and rules, which reduce both the efficiency and also the effectiveness of the mining algorithms since only the subset of the complete frequent itemsets and association rules is of interest to users. In addition, the users need an additional post-processing step to filter the large number of mined rules to determine the useful ones. Recent work [1], [2], [3] and [4] has highlighted the importance of constraint-based mining. They exploit user-specific constraints in the mining process to improve performance, or efficiency. With multi-dimensional items, constraints can be imposed on multiple dimensional attributes. We classify multi-dimensional constraints into two cases according to the number of sub-constraints including: (1) single constraint against multiple dimensions, such as View the MathML sourcemax(X,cost)≤(X,price), where XX is an itemset and each item in XX contains two attributes “cost” and “price”, and (2) conjunction and/or disjunction of multiple sub-constraints, such as View the MathML source(C1:X,cost≤v1)∧(C2:X,price≤v2), where v1v1 and v2v2 are constant values, respectively. Therefore, this study intends to use the ant colony system, which has recently been shown to be very promising in the areas of the traveling salesman problem and scheduling [5] and [6], for multiple dimensional constraints mining association rules. Furthermore, since data mining has rarely been applied to solve questions in medical science, this study uses data from the National Health Insurance Research Database of Taiwan to find disease association rules. Here, an important issue is to find the potential disease and early prevention. The evaluation results show that the proposed method, using the ant colony system, really can provide more concise and accurate information than the conventional Apriori-based algorithm. The rest of this paper is organized as follows. Section 2 summarizes some important background information, and the proposed method is described in Section 3. Section 4 presents the evaluation results and discussion. Finally, concluding remarks are made in Section 5.

نتیجه گیری انگلیسی

In this century, previously unknown diseases, like SARS, have created disasters for humanity. These may result from human beings’ carelessness or environmental harm. Thus, developing a decision support system for medical workers becomes a very critical issue for patient treatments and extracting the important relationships or association rules between diseases is especially critical. This not only can save the medical costs, but also improve our health. This study has demonstrated that a novel approach, ACS, is able to mine the association rules from a health database, since it can deal with the discovery of hidden knowledge from the database. The results present some interacting relationships among the disease items. The proposed algorithm is much better than the Apriori both in efficiency and reliability according to expert questionnaire survey results. Since the proposed ACS scans the database only once, there can be a tremendous savings in computational time. This is especially important for large databases. In addition, the proposed method allows the user to define the search constraints, which makes the extracted rules more appropriate to users’ needs and the computational speed will be faster. Although this study has yielded very promising results, there are still some issues that to be resolved. In the data preparation stage, the proposed method uses constraint conditions to reduce the searching time. It is suggested to use another method to deal with the raw data, such as clustering methods. On the other hand, in the mining results, there are many similar rules to be generated, so it may feasible to apply another technology, like Fuzzy theory, to merge the similar rules together.