شناسایی الگوی پویا در اجرای توزیع جریان کار مبتنی بر تلفن همراه
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|21806||2007||21 صفحه PDF||سفارش دهید||12308 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Journal of Parallel and Distributed Computing, Volume 67, Issue 11, November 2007, Pages 1137–1154
This paper is concerned with the design, implementation, and evaluation of algorithms for communication partner identification in mobile agent-based distributed job workflow execution. We first describe a framework for distributed job workflow execution over the Grid: the Mobile Code Collaboration Framework (MCCF). Based on the study of agent communications during a job workflow execution on MCCF, we identify the unnecessary agent communications that degrade the system performance. Then, we design a novel subjob grouping algorithm for preprocessing the job workflow's static specification in MCCF. The obtained information is used in both static and dynamic algorithms to identify partners for agent communication. The mobile agent dynamic location and communication based on this approach is expected to reduce the agent communication overhead by removing unnecessary communication partners during the dynamic job workflow execution. The proof of the dynamic algorithm's correctness and effectiveness are elaborated. Finally, the algorithms are evaluated through a comparison study using simulated job workflows executed on a prototype implementation of the MCCF on a LAN environment and an emulated WAN setup. The results show the scalability and efficiency of the algorithms as well as the advantages of the dynamic algorithm over the static one.
Data-intensive collaborative scientific computations, such as bioinformatics  and digital imaging survey (e.g., Sloan Digital Sky Survey (SDSS) ), have the following characteristics: • They can be represented as Directed Acyclic Graph (DAG).1 For example, the identification process of galaxy clusters within SDSS can be represented as a DAG . • They often require diverse, high volume and distributed data sets. Hence, huge volume of data motion over the Internet makes it a bottleneck for such applications .Job workflow is a natural technology for developing such computations. To keep data local as much as possible and to provide decentralized control of execution, distributed job workflow execution using mobile agent (MA) is getting increasing research attention. Many systems have been developed (e.g., those described in [9,10,14,23,25,26]). However, in these systems, an MA usually carries all the implementation codes and supporting packages during its migration. When the code size is increased, the overhead caused by code transfer is increased fast, thus prolonging the job workflow execution time . This may compromise the benefits of code mobility. The Mobile Code Collaboration Framework (MCCF) was developed to enable and enhance the productivity of largescale data-intensive computation over the Grid  using MA technology. In order to reduce the overhead caused by code transfer during the distributed job workflow execution, Lightweight Mobile Agent (LMA)  and Code-on-Demand (CoD)  techniques are adopted in the development of MCCF. Similar to the aforementioned distributed workflow systems using MAs, there is no centralized workflow engine in MCCF.Both execution results of subjobs and the job workflow control are directly transferred amongst distributed resources. During a job workflow execution, dynamic LMA replication, parallel LMA migration and execution are employed to support concurrent execution of multiple data independent subjobs so as to improve system performance. This in turn gives rise to the requirement of communication between agents. Two agents communicating with each other are called communication partners or partners in short. For message passing based agent communication, the first step is to identify corresponding partners, the second step is to locate (or discover) them, and the third step is to route the message to them. Agent discovery methods in the existing message passing based MA communication can be classified into discovery infrastructure-based methods and fully decentralized methods. Discovery infrastructure-based methods include home-based , forwarding-based  and hierarchical-based  methods. As pointed out in , all discovery infrastructure-based methods rely on a static and small-scaled distributed lookup server for MA discovery, which does not scale well for largescale MA systems. Fully decentralized methods include flooding-based , distributed hash table (DHT) based , and contact list-based  methods. Flooding-based methods generate excessive flooding messages which will degrade the overall system’s performance considerably and cause a large amount of traffic over the Internet. The DHT-based mechanism takes advantage of the DHT overlay, but the maintenance of the DHT overlay is costly. In addition, the number of hops required for sending a message is log n, which complicates the reliable message delivery besides introducing additional delay in MA migration. A contact list-based mechanism for MA discovery is adopted in MCCF for its simplicity. For contact list-based MA location and communication, each MA maintains a list of all its partners’ locations. Before an MA is migrated or discarded, it will notify its partners so that they can update their contact lists accordingly. Previous work [7,17,19,27,28,30] on MA communication normally assumes that the partner is already known and focuses mainly on dynamic MA location/discovery and message routing. However, with the dynamic MA replication during the distributed job workflow execution in MCCF, this assumption does not hold any more. Thus, partner identification is an important issue in MA communication in MCCF. Particular, for contact list-based method, it is to identify partners to be included in the contact list. This paper makes three important contributions in the field of partner identification for agent communications during distributed job workflow execution: (i) Via studying agent communications and dynamic characteristics of a job workflow execution over MCCF, we identify the unnecessary agent communication partners. (ii) We propose a subjob grouping algorithm for preprocessing the job workflow’s static specification. The obtained information is then used in both the static and dynamic algorithms for communication partners identification. MA dynamic location and communication based onthis approach is expected to reduce the agent communication overhead. In addition, the proof of the correctness and effectiveness of the dynamic algorithm are elaborated. (iii) We provide a performance comparison study of the simulated job workflows on the actual implementation of the MCCF and the algorithms. This paper is organized as follows: the MCCF overview and further explanation of the problem will be given in Section 2. The subjob grouping algorithm together with the static and dynamic partner identification algorithms based on it will be given and analyzed in Section 3. An experimental study of our contact list-based method will be described in Section 4. Finally, Section 5 concludes the paper and outlines our future work.
نتیجه گیری انگلیسی
With the explosion of scientific data, MA-based distributed job workflow execution is a promising paradigm for multidisplinary scientific computation over the Grid. MCCF is such a system that supports distributed execution of job workflows. LMA and Code on Demand (CoD) technologies are used in the construction of the MCCF. LMA in the MCCF is defined using AC. AC is like a “blueprint’’ . It is migrated amongst computational resources. Agents are created on its behalf and carry out the required work. AC may also be replicated when necessary. Existing research on MA communication focuses on dynamic agent location/discovery and communication, assuming that the partners are already known. However, with the dynamic AC replica creation in MCCF, this assumption does not hold any more. By studying the dynamic characteristics and agent communications during a job workflow execution over MCCF, we identify the unnecessary communication partners. A novel subjob grouping algorithm for preprocessing the static job workflow specification is developed. Information generated from preprocessing is used for partner identification in both the static and dynamic algorithms. The static partner identification algorithm employs the preprocessed information and determines the partner set before the execution of the job workflow. Since AC replicas are dynamically generated, redundant partners will have to be included; whereas, in the dynamic partner identification algorithm, an agent’s partner set is dynamically updated using the preprocessed information during the distributed job workflow execution. Therefore, the partners in the set are limited to only those necessary ones. The correctness and optimality of the dynamic partner identification algorithm have also been proven formally. To evaluate our approach, experiments have been carried out on “d-h-TG’’ shaped job workflows, randomly generated job workflows as well as the workflow of a real application. For the “d-h-TG’’ type of job workflows, the theoretical analysis of our approach shows that the size of partner set will always be less than two regardless of how manyAC replicas are required in the job workflow execution. The experimental results show that the total number of partner update messages increases linearly withthe total number of AC replicas required, which is consistent with the theoretical analysis. Similar behavior is also observed for the randomly generated job workflows. The experiments show that the preprocessing-based partner identification algorithms have better performance than the “Naive’’ algorithm in all the cases. The preprocessing-based dynamic algorithm gives the optimal result in terms of the number of partner update messages generated. It out-performs the preprocessing-based static algorithm when redundant partners have to be included due to the lack of dynamic information. As analyzed in the static partner identification algorithm (see Section 3.2) and the experiments, whether or not redundant partners need to be added depends on the topology of the workflow. Contact list-based method is chosen for agent communication in MCCF because of its simplicity. We believe that the partner identification algorithms can also be used to support other MA communication methods (e.g., the mailbox-based approach). We also believe that our preprocessing-based partner identification algorithms can also be used in other applications areas that require distributed execution of workflows, for example, inter-enterprise business process management . The current optimization on partner identification is based on the assumption that the number of AC replicas and the subjobs assigned to an AC replica are fixed. The preprocessingbased dynamic partner identification algorithm is optimal for the given grouping of subjobs. For a DAG, there may exist other subjob grouping algorithms that are different from the one introduced in Section 3.1. Hence, we will further investigate how the number of AC replicas and the subjobs assigned to an AC replica will affect the cost for communication as well as the execution.