روش تشخیص تغییر الگوهای متوالی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
20886 | 2009 | 11 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Decision Support Systems, Volume 46, Issue 2, January 2009, Pages 501–511
چکیده انگلیسی
Recent trends in customer-oriented markets drive many researchers to develop sequential pattern mining algorithms to explore consumer behaviors. However, most of these studies concentrated on how to improve accuracy and efficiency of their methods, and seldom discussed how to detect sequential pattern changes between two time-periods. To help business managers understand the changing behaviors of their customers, a three-phase sequential pattern change detection framework is proposed in this paper. In phase I, two sequential pattern sets are generated respectively from two time-period databases. In phase II, the dissimilarities between all pairs of sequential patterns are evaluated using the proposed sequential pattern matching algorithm. Based on a set of judgment criteria, a sequential pattern is clarified as one of the following three change types: an emerging sequential pattern, an unexpected sequence change, or an added/perished sequential pattern. In phase III, significant change patterns are returned to managers if the degree of change for a pattern is large enough. A practical transaction database is demonstrated to show how the proposed framework helps managers to analyze their customers and make better marketing strategies.
مقدمه انگلیسی
Sequential pattern mining is the technique that explores frequently occurring patterns related to time from a large-scale database. It has been applied to web log mining [5], customer purchase behavior analysis [10], DNA sequence analysis [11], project team coordination [22], retailing management [8], and so on. Previous studies related to sequential pattern mining mainly focused on how to build accurate models or how to discover interesting rules in efficient ways. AprioriAll [1], GSP [29], PrefixSpan [27], FFS [32], and SPAM [3] are some well known algorithms that efficiently identify sequential patterns. In addition, other efforts have concentrated on sequential pattern mining in multi-databases [18] with constraints [7], time-intervals [9], and fuzzy sets [6]. Relatively few attempts have been made to analyze sequential pattern changes in databases collected over time [19]. However, some popular patterns at one time-period may not be valid in another time-period [4]. For example, the sequential pattern “Computer → Memory → Color_Printer” is frequent in the last year. However, this pattern might not be popular in this year but change to “Computer → Memory → Multifunctional_Printer.” If managers cannot capture this dynamic behavior change in time and provide appropriate products or services to their customer, customer attrition will be unavoidable. There have been many existing works focusing on dynamic aspects or comparison between two different datasets. They can be broadly classified as four research trends [4], [23] and [28]. The first trend is to improve rule matching accuracy when new transaction data are constantly added to the original transaction database. El-Sayed et al. [14] introduced an efficient strategy for discovering frequent patterns in sequence databases that requires only two scans of the database. Koh and Shieh [17] proposed an algorithm of AFPIM (Adjusting FP-tree for Incremental Mining) based on adjusting FP-tree structures to maintain frequent itemsets discovered in a changing database. The algorithm only scans changed data in the database, and adjusts FP-tree structures to enhance the efficiency of mining association rules. Masseglia et al. [24] proposed an efficient algorithm, called incremental sequence extraction (ISE), to compute frequent sequences in an updated database. The algorithm minimizes computational costs by re-using the minimal information from the original database. Lee et al. [21] proposed a SWF (sliding-window filtering) algorithm, which uses a filtering thresho ld to deal with the candidate itemset generation. The algorithm has not only significantly reduced I/O and CPU costs but also effectively controlled memory utilization. The second research trend is to discover emerging patterns. Emerging pattern mining gauges significant changes or differences from one database to another [12]. Fan and Ramamohanarao [15] developed an algorithm which applied the minimum support, the minimum growth rate and chi-squared values to extract interesting emerging patterns. Terlecki and Walczak [30] presented the relations between rough set and jumping emerging patterns. They proposed practical applications of these observations to the minimal reduced problem and to test whether a given attribute set is differentiating. The third research trend is subjective interestingness mining. Interestingness mining is used to find unexpected rules with respect to the user's existing knowledge. Padmanabhan and Tuzhilin [25] developed methods to generate unexpected patterns with respect to managerial intuition by eliciting managers' beliefs about the domain and used these beliefs to seed the search for unexpected patterns in data. The methods provide managers with more relevant patterns from data and aid in effective decision making. Liu et al. [23] proposed a new approach to assist users in finding interesting rules from discovered association rules and analyze subjective interestingness. The approach first asks users to specify his/her existing knowledge such as beliefs or concepts, and then analyzes the discovered association rules to conformity and various interestingness criteria. Padmanabhan and Tuzhilin [26] presented a method for discovering unexpected patterns in data. The method combines independent concepts of minimality and unexpectedness to discover a minimal set of unexpected patterns. Lanquillon [20] proposed the concepts of added rules and perished rules. An added rule is the rule that is hard to be found in the past database, while a perished rule is the rule that is difficult to be found in the present database. The fourth research trend is to discover regularity from time series data. Han et al. [16] presented several algorithms for efficient mining of partial periodic patterns by exploring some interesting properties related to partial periodicity. The algorithms shows that mining partial periodicity needs only two scans over a time series database to efficiently mine long periodic patterns. Yu et al. [31] proposed a granulation-based method to find similarities between two time series. The granulation-based method deals with problematic temporal data mining tasks such as similar subsequence searching, clustering and indexing etc. Altiparmark et al. [2] proposed a novel approach for information mining between time series data. They utilized frequent itemset mining as well as clustering and declustering techniques with novel distance metrics for measuring similarities between items series data. Although these researches can efficiently detect the dynamic changes between two datasets, their discussions are limited to association rules only. None of them discusses the change of sequential patterns in two time-periods. To bridge the gap, this study proposes a change detection framework to observe the dynamic alternation of sequential patterns between two time-periods. The rest of this paper is organized as follows. Section 2 defines the dissimilarity measurement for sequential patterns. Section 3 details each phase of the proposed change detection framework. With the framework, a sequential pattern can be classified as one of the three change types: an emerging sequential pattern, an unexpected sequence change, or an added/perished sequential pattern. Section 4 provides an implementation case to demonstrate the benefit of the proposed framework. A summary and future perspective for the study is concluded in Section 5.
نتیجه گیری انگلیسی
Recent trends in customer-oriented markets drive many researches develop sequential pattern algorithms to explore consumer behaviors. Sequential pattern mining is the technique that discovers frequently occurring patterns related to time from a large-scale database. With sequential patterns, enterprises can develop their own strategy for targeted marketing, customer retention, and promotion. However, customer behaviors usually change over time. Behavior patterns popular in the previous time-period may not be valid for the next time-period. To help managers capture the dynamic behavior changes in time, a sequential pattern change detection framework is proposed in this paper. The framework consists of three major phases. In phase I, two sequential pattern sets are generated from two different time-period databases respectively using the GSP algorithm. The algorithm finds all sequences whose support is greater than the user-defined minimum support value (γmin). The smaller the γmin value, the more the number of sequential patterns generated. In phase II, the dissimilarities between all pairs of sequential patterns are evaluated based on a developed sequential pattern matching algorithm. Then, the minimum dissimilarity values between all pairs of sequential patterns are derived and compared with a user-defined minimum dissimilarity value (βmin).Based on a set of judgment criteria, each sequential pattern is clarified as one of the following three change types: an emerging sequential pattern, an unexpected sequence change, or an added/perished sequential pattern. The higher the βmin value, the less the number of added/perish sequential patterns generated but the more the number of unexpected sequence changes. In phase III, each change pattern is determined as significant or not using the user-defined minimum degree of change (αmin).The higher the αmin value, the less the number of changed patterns reported. Finally, the set of significant change patterns are generated and reported to users for further analysis. It is clear that setting these threshold values could be a challenge for users. Unfortunately, it is hard or impossible to know how many or what sequential patterns might be generated except the whole analysis process is completed. Setting the same threshold values to different datasets will generate totally different results. Therefore, this system automatically conducts a set of experiments using different threshold values in the each phase. Then, the system summarizes the experimental results to users and provides proper threshold values as suggestions. Two critical statistical measures are defined in this research. The first measure called “support” is used to evaluate how strong a sequential pattern is. It is defined as the number of tuples in a sequence database containing sequence s. A sequential pattern having high support value means that the pattern appears quit often. Another statistical measure called “degree of change” is used to evaluate how significant a change between two sequential patterns might have. The definition of “degree of change” for different change types is similar but different. A change pattern with high degree of change means that the change is dramatic. Based on the statistical measures “support” and “degree of change”, users can easily observe whether sequential patterns and change patterns are worth to pay attention or not. In this study, the dissimilarity measurement is designed based on three major considerations. They are the number of mismatched elements, the position of the element changed, and the dissimilarity between any mismatched elements. However, some other considerations are worthwhile to try. For example, rule change based on money impact might be very interesting when evaluating the dissimilarity between two sequential patterns. If system can report the money impact for each change, users can easily evaluate the money gain or lost under different decision making. Another limitation of the proposed framework is that the time-intervals between transactions are ignored. Without considering time interval between elements, the generated sequential patterns might not distinguish customer behavior well. In the future, we will consider time interval issue in the proposed framework, so that the change detection task can be more accurate.