دانلود مقاله ISI انگلیسی شماره 22022
ترجمه فارسی عنوان مقاله

فاصله گراف بر مبنای سنجش داده بازیابی گردش کار گرا با محدودیت متغیر زمان

عنوان انگلیسی
A graph distance based metric for data oriented workflow retrieval with variable time constraints
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
22022 2014 11 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Expert Systems with Applications, Volume 41, Issue 4, Part 1, March 2014, Pages 1377–1388

ترجمه کلمات کلیدی
مدیریت فرایند کسب و کار - جریان کاری داده گرا - محاسبه شباهت - داده کاوی روند - بازیابی فرآیند
کلمات کلیدی انگلیسی
Business process management, Data oriented workflow, Similarity computation, Process mining, Process retrieval
پیش نمایش مقاله
پیش نمایش مقاله  فاصله گراف بر مبنای سنجش داده بازیابی گردش کار گرا با محدودیت متغیر زمان

چکیده انگلیسی

There are many applications in business process management that require measuring the similarity between business processes, such as workflow retrieval and process mining, etc. However, most existing approaches and models cannot represent variable constraints and achieve data oriented workflow retrieval of considering different QoS requirements, and also fail to allow users to express arbitrary constraints based on graph structures of workflows. These problems will impede the customization and reuse of workflows, especially for data oriented workflows. In this paper, we will be towards workflow retrieval with variable time constraints. We propose a graph distance based approach for measuring the similarity between data oriented workflows with variable time constraints. First, a formal structure called Time Dependency Graph (TDG) is proposed and further used as representation model of workflows. Similarity comparison between two workflows can be reduced to computing the similarity between their TDGs. Second, we detect whether two TDGs of workflows for similarity comparison are compatible. A distance based measure is proposed for computing their similarity by their normalization matrices established based on their TDGs. We theoretically proof that the proposed measure satisfies the all the properties of distance. In addition, some exemplar processes are studied to illustrate the effectiveness of our approach of similarity comparison for workflows.

مقدمه انگلیسی

Business process management is an established area that aims at the automation of a business process (Cook & Wolf, 1998), and has been widely applied in many fields such as e-Science (Taylor, Deelman, & Gannon, 2007), medical healthcare (Lyng et al., 2009 and Maximini and Schaaf, 2003), search (Frefimann, 2006) and information integration (Hung and Chiu, 2004 and Lee et al., 2011), etc. In the recent years, the use of business processes has significantly expanded from the original domain of business processes towards new areas in scientific data processing, such as data oriented workflows, etc. Data oriented workflows (Ikeda, Park, & Widom, 2011) have been widely applied in many scientific areas with the large amount of data and complex computation tasks. Data-oriented workflows can be modeled as graphs where nodes denote tasks/services for data computation and data manipulation, and edges denote the flow of data input to and output from the tasks/services. Scientific workflows (McPhillips & et al., 2009) are a kind of typical data oriented workflows that are applied in many fields such as bioinformatics, astronomy, ecology, earth science, etc. A variety of scientific workflow systems such as Kepler (Ludascher & et al., 2005) and Taverna (Oinn & et al., 2005) were developed, which accelerate the pace of scientific progress in these scientific areas. Process similarity measure is often used in process retrieval (Bergmann and Gil, 2011, Leake and Kendall-Morwick, 2008 and Madhusudan et al., 2004), process mining (Greco et al., 2005, Huang et al., 2006, Lim et al., 2012, van der Aalst et al., 2003 and Wen et al., 2007), process scheduling (R-Moreno, Borrajo, Cesta, & Oddi, 2007) and process integration (von Berg et al., 2001, Wang et al., 2006, Yan et al., 2001 and Zha et al., 2010). This is especially true for data oriented workflows retrieval. What users are indeed concerned about is how to apply advanced methods on new data for discovering new facts. However, the users in real life usually do not know how to develop a workflow for applying these advanced methods, and they even do not know how to program. Fortunately, data oriented workflows are often developed with the aim of scientific experimentation and can be “repeatedly” executed with different data or different parameters (Ellisman, Fahringer, Fox, Gannon, et al., 2007). The methods of using data oriented workflows for data analysis and processing are often highly similar, so users always concentrate on producing large amounts of high-quality analysis data instead of deeply understanding these methods. In the situation, when analysis data are available, users would be highly interested in searching a repository of workflows. They can select from the repository the most suitable workflow through which the best possible methods can analyze these data. Generally, data oriented workflow retrieval is a promising solution that can help users find a suitable workflow satisfying their requirements to a given problem by matching the expressed constraints and ranking them according to some criteria instead of developing one themselves. Some approaches for workflow retrieval have demonstrated potential and initial success in business workflow retrieval (Awad and Sakr, 2010 and Beeri et al., 2008). However, some important open issues for data oriented workflow retrieval remain to be resolved. On the one hand, most existing researches in workflows retrieval do not involve quality of services. A complex workflow for data analysis and processing is composed of dozens of distributed tasks/services which are often provided by external service providers. Although some of services are free to access, the availability and quality of services (QoS) can be guaranteed only by paying for these services. Most important, the services provided also possibly have different levels of QoS. Quality of services possibly includes many aspects such as execution time, response speed, service cost, etc. The price of a service can be determined by its levels of QoS. Service providers can charge higher prices for higher levels of QoS. Users may not always need that workflows can be completed in a higher level of QoS than they require. They sometimes may prefer to use cheaper services with a lower QoS that is sufficient to meet their requirements. However, most existing approaches and models cannot represent variable constraints and achieve data oriented workflow retrieval of considering different QoS requirements of users. On the other hand, graph based representation of workflows is an intuitive and effective approach describing data/control flows of workflows. It is desirable for users making data analysis and processing to use the graph based constraints for expressing their QoS requirements of workflows they require. However, most existing approaches fail to allow users to express arbitrary constraints based on graph structures of workflows. Although some workflow query languages such as BPQL (Beeri et al., 2008) and BPMN-Q (Awad & Sakr, 2010), were proposed for workflow query, users are often not experts in any query language. There is also a lack of tools to be able to express their retrieval requests as easily as possible and support a rich set of graph edit operations (e.g., adding/removing/replacing of a flow or a task), assignments of QoS constraints, and automatic similarity computation of data oriented workflows. All these problems will impede the customization and reuse of scientific workflows. We confine our work to the time constraints of data oriented workflows in this paper. Services developed by external providers have variable execution speeds which correspond to different levels of QoS. A higher execution speed means a more execution time. If user hopes that tasks in workflows can be completed in shorter execution time, these tasks would be completed in higher levels of QoS, and therefore users need to pay more for executing these tasks with higher levels of QoS. However, different users have different requirements of execution time. They may not always need to complete workflows earlier than they require. This paper will be towards workflow retrieval with variable time constraints. In this paper, we propose a graph distance based approach for measuring the similarity between oriented workflows with variable time constraints. We define a graph based distance metric by which we can measure and compare the similarity between two workflows based on variable time constraints. First, a formal structure called Time Dependency Graph (TDG) is proposed and further used as representation model of workflows. Similarity comparison between two workflows can be reduced to computing the similarity between their TDGs. Second, we detect whether two TDGs of processes for similarity comparison are compatible from the perspective of functionality of workflows. Then, a distance based measure is proposed for computing their similarity by the normalization matrices based on their TDGs. We theoretically proof that the proposed measure satisfies the all the properties of distance. In addition, a case example is studied to illustrate our approach of similarity comparison for workflow retrieval. This paper is organized as follows. In Section 2, we review the related work. Section 3 is to propose a graph based representation of workflows with variable time constraints, which is called Time Dependency Graph (TDG). In Section 4, we detect the compatibility of two workflows by the δ-compatibility. In Section 5, we compare the similarity between two workflows based on their TDGs. Normalization matrices are proposed to represent business processes. In Section 6, we further propose a distance based measure d for measuring the similarity between business processes. Most important, we theoretically proof that the measure satisfies all the properties of distance. Section 7 is to illustrate our approach by using a case of process retrieval. Section 8 is to introduce the developed prototype system. Section 9 is the conclusion.

نتیجه گیری انگلیسی

Workflow retrieval is a promising solution for reuse of workflows, especially for data oriented workflows. However, most existing approaches and models cannot represent variable time constraints for achieving data oriented workflow retrieval of considering different QoS requirements of time constraints, and also fail to allow users to express arbitrary constraints based on graph structures of workflows. This paper is towards workflow retrieval with time constraints in order to tackle these problems. We propose a novel graph distance based approach for measuring the similarity between data oriented workflows with variable time constraints. Time Dependency Graph (TDG) is proposed for the representation of workflows with variable time constraints. Similarity comparison between two workflows can be reduced to computing the similarity between their TDGs. A distance based measure is further proposed for computing their similarity of TDGs based on their normalization matrices. We theoretically prove that the proposed measure satisfies the all the properties of distance. A prototype system was also developed for loading, editing processes, and computing the similarity between processes. Some exemplar processes are studied to illustrate the efficiency of our approach to similarity comparison of workflows. To our best knowledge, it is the first work that measures the similarity between processes with the QoS requirement of variable time constraints for process retrieval. The future work continues along the direction of data oriented scientific workflow retrieval. First, we will continue to refine the representation model of data oriented workflows such that input/output data, the relevant data types, and the semantics of services and data can be explicitly expressed for semantic workflow retrieval. Second, our work in this paper has a precondition, i.e., if two nodes have the same names in two TDGs, then the two nodes are the same tasks/services, and have the same functionality. However, this situation is often not true. Because workflows are often developed by different providers, and are executed on a distributed and heterogeneous environment, it is inevitable that the services/tasks with the same names have different functionalities, or the services with the same functionality have the different names. We will also explore workflow retrieval based on heterogeneous environments by analyzing and comparing the semantics of services and flows in heterogeneous workflows expressed in different languages. The data semantics of inputs and outputs of services from heterogeneous workflows will be described by ontologies.