دانلود مقاله ISI انگلیسی شماره 161600
ترجمه فارسی عنوان مقاله

خوشه دنباله ورود به سیستم برای استخراج گردش کار در سیستم های چند کاره

عنوان انگلیسی
Log sequence clustering for workflow mining in multi-workflow systems
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
161600 2018 38 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Data & Knowledge Engineering, Available online 12 April 2018

ترجمه کلمات کلیدی
کار معدنکاری، خوشه بندی توالی، الگوی رفتار کاربر درخت پیش فرض احتمالی، تقسیم ماتریس غیر منفی،
کلمات کلیدی انگلیسی
Workflow mining; Sequence clustering; User behavior pattern; Probabilistic suffix tree; Non-negative matrix factorization;
پیش نمایش مقاله
پیش نمایش مقاله  خوشه دنباله ورود به سیستم برای استخراج گردش کار در سیستم های چند کاره

چکیده انگلیسی

Current workflow mining efforts aim to discover process knowledge from user-system interaction logs and represent it as high-level workflow models. They assume there is one single workflow model in a system, or rely on the information that can explicitly link each log sequence to the underlying workflow model. Such assumptions may not be applicable to multi-workflow systems where the instances of different workflow models are mixed together without being differentiated. To address this issue, this paper proposes to apply sequence clustering methods to group similar log sequences together. Each sequence cluster corresponds to a workflow model and the log sequences in the cluster are the corresponding instances. This paper investigates different similarity measures, including structure-based and user-based, as well as different clustering algorithms, including one-side clustering and co-clustering. In order to incorporate user factors into sequence clustering, which is novel to the current sequence clustering methods, this paper proposes to model User Behavior Patterns (UBPs) as probabilistic distributions over sequences and learn it from the event log. We represent a UBP as a Probabilistic Suffix Tree and use it to measure sequence similarity. The co-clustering method leverages the dyad relationship between UBPs and log sequences to improve the clustering accuracy. An experimental study has been conducted and the result indicates that user-based methods outperform structure-based methods in terms of accuracy and they are more effective on dealing with noises in the log and the increase of log size. The UBP-sequence co-clustering method achieves the best performance which indicates the effectiveness of incorporating user factors and applying co-clustering.