دانلود مقاله ISI انگلیسی شماره 21746
ترجمه فارسی عنوان مقاله

کشف مدل های جریان کار از طول عمر فعالیت ها

عنوان انگلیسی
Discovering workflow models from activities’ lifespans
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
21746 2004 14 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Computers in Industry, Volume 53, Issue 3, April 2004, Pages 283–296

ترجمه کلمات کلیدی
داده کاوی جریان کار - کشف روند - ورود به سیستم رویداد - طول عمر فعالیت
کلمات کلیدی انگلیسی
Workflow mining, Process discovery, Event log, Activity lifespan
پیش نمایش مقاله
پیش نمایش مقاله  کشف مدل های جریان کار از  طول عمر فعالیت ها

چکیده انگلیسی

Workflow systems utilize a process model for managing business processes. The model is typically a directed graph annotated with activity names. We view the execution of an activity as a time interval, and present two new algorithms for synthesizing process models from sets of systems’ executions (audit log). A model graph generated by each of the algorithms for a process, captures all its executions and dependencies that are present in the log, and preserves existing parallelism. We compare the model graphs synthesized by our algorithms to those of Agrawal et al. [Mining process models from workflow logs, in: Proceedings of the Advances in Database Technology (EDBT’98), 6th International Conference on Extending Database Technology, Valencia, Spain, 23–27 March 1998, Lecture Notes in Computer Science, Proceedings vol. 1377, Springer, Berlin, 1998] by running them on simulated data. We observe that our graphs are more faithful in the sense that the number of excess and absent edges is consistently smaller and it depends on the size and quality of the log. In other words, we show that our time interval approach permits reconstruction of more accurate workflow model graphs from a log.

مقدمه انگلیسی

Constructing business processes is a central issue for companies [8] and [20]. Managing processes in an automatic or semi-automatic fashion result in a significant reduction of cost and improves efficiency of business operations, thus, enabling fast adaptation to changing requirements and more. As a result, developing techniques for constructing and managing business processes is an active research area [2] and [6]. Workflow systems utilize a visual model of information flow that is used for monitoring and managing systems that execute actions (also called activities or tasks) of predefined situations. The actions together with constraints on execution order between them define the business process [11]. Commercial workflow systems and management consoles need a model of the business process for scheduling agents (e.g., computers) to execute the actions, control production, etc. (see [11] and [8]). For modeling a business process, most ERP/CRM products use embedded workflow model [7] and [19]. Many organizations that run their systems using legacy applications do not have a model of the processes within the organizations. Current tools for model detection operate on the resource level only. Thus, there is a need for tools to build business process level models, especially when all executive level measures such as “return on investment”, or SLA quality are derived from this level rather than the resource level. There are few methods for constructing a business process model from information stored in a workflow log (a collection of process execution logs). We follow the approach that represents the model as a directed graph (workflow graph) with nodes representing activities and an edge from node A to node B represents that there is a process execution in which A must finish executing before B starts. In practice, a single business process model can permit an execution that include a given activity and another execution that may not include it. Thus, for each process execution the participating edges are selected with a Boolean function associated with each edge. The function determines whether the control flow or not along that edge. Another paradigm is to deal with workflow evolution that updates process models according to the logs [1], [3], [7], [12], [13], [15] and [17].

نتیجه گیری انگلیسی

We tested three algorithms. Our interval algorithm (interval sorted) described in 3.1 and 3.2, a modified version (interval) of our algorithm in which we merge all the execution graphs into one graph in a single step, and the non-interval algorithm, which is the second algorithm described in [2]. For the last algorithm we use the finish events to represent activities executions. Every value presented in the following histograms is the average over 50 runs of each algorithm. The quality of the algorithm is measured by comparing the synthesized graphs generated with the reconstruction algorithms with the target process graph (reference) used for generating the logs for the experiments. The edges that are in the target graph but not in the synthesized graph are the absent edges and their percentage with respect to the target graph is the complement of the recall value. Those edges that are in the synthesized graph but not in the target graph are the excess edges and their percentage with respect to the synthesized graph is the complement of the precision value. We next show for each of the three algorithms the average precision and recall values as function of the log size. In Fig. 8, each input graph has 10 nodes.The average recall of each of the three algorithms is relatively stable and is independent with the log size. The interval sorted algorithm gives the best results. The average precision is very high and stable for both the interval and the interval sorted algorithms. The precision of non-interval algorithm is more sensitive to the size of the log but is relative high. In Fig. 9, each input graph has 17 nodes.The average recall of each of the three algorithms is relatively stable and is independent with the log size. The interval sorted algorithm gives the best results (its recall is between 89 and 92%). The average precision is relatively high for the three algorithms and is not too sensitive to the log size. In Fig. 10, each input graph has 25 nodes.The average recall and the average precision are relatively stable for the non-interval algorithm. The recall of the interval Sorted is sensitive to very small logs and improves when the logs have more executions; it provides the best recall results. The average recall is relatively stable for the interval algorithm, yet its precision is somewhat sensitive to the log-size. The interval algorithm gives the best precision results for logs with more than 1000 executions. In Fig. 11, we compare execution times versus log size, and versus the size of the input graph.