زمان بندی قابل اعتماد مبتنی بر اعتبار های نظیر به نظیر شبکه ها از برنامه های جریان کار
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|21857||2010||19 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computer Networks, Volume 54, Issue 18, 20 December 2010, Pages 3341–3359
Grids facilitate creation of wide-area collaborative environment for sharing computing or storage resources and various applications. Inter-connecting distributed Grid sites through peer-to-peer routing and information dissemination structure (also known as Peer-to-Peer Grids) is essential to avoid the problems of scheduling efficiency bottleneck and single point of failure in the centralized or hierarchical scheduling approaches. On the other hand, uncertainty and unreliability are facts in distributed infrastructures such as Peer-to-Peer Grids, which are triggered by multiple factors including scale, dynamism, failures, and incomplete global knowledge. In this paper, a reputation-based Grid workflow scheduling technique is proposed to counter the effect of inherent unreliability and temporal characteristics of computing resources in large scale, decentralized Peer-to-Peer Grid environments. The proposed approach builds upon structured peer-to-peer indexing and networking techniques to create a scalable wide-area overlay of Grid sites for supporting dependable scheduling of applications. The scheduling algorithm considers reliability of a Grid resource as a statistical property, which is globally computed in the decentralized Grid overlay based on dynamic feedbacks or reputation scores assigned by individual service consumers mediated via Grid resource brokers. The proposed algorithm dynamically adapts to changing resource conditions and offers significant performance gains as compared to traditional approaches in the event of unsuccessful job execution or resource failure. The results evaluated through an extensive trace driven simulation show that our scheduling technique can reduce the makespan up to 50% and successfully isolate the failure-prone resources from the system.
Grid computing enables the sharing, selection, and aggregation of geographically distributed heterogeneous resources, such as computational clusters, supercomputers, storage devices, and scientific instruments. These resources are under control of different Grid sites and being utilized to solve many important scientific, engineering, and business problems. Inter-connecting distributed Grid sites through peer-to-peer routing and information dissemination structure (also known as Peer-to-Peer Grids) is essential to avoid the problems of scheduling efficiency bottleneck and single point of failure in the centralized or hierarchical scheduling approaches. Peer-to-Peer Grid (P2PG) model offers an opportunity for every site to pool its local resources as part of a single, massive scale resource sharing abstraction. P2PG infrastructures are large, heterogeneous, complex, uncertain and distributed. In a P2PG, both control and decision making are decentralized by nature and different system components (users, services, application components) interact together to adaptively maintain and achieve a desired system wide behaviour. Furthermore, the availability, performance, and state of resources, applications and services undergo continuous changes during the life cycle of an application. Thus uncertainty and unreliability are facts in P2PG infrastructures, which are triggered by multiple factors, including: (i) software and hardware failures as the system and application scale that lead to severe performance degradation and critical information loss; (ii) dynamism (unexpected failure) that occurs due to temporal behaviours, which should be detected and resolved at runtime to cope with changing conditions; and (iii) lack of complete global knowledge that hampers efficient decision making as regards to composition and deployment of the application elements. The aforementioned challenges are addressed in this paper by developing a novel self-managing  scheduling algorithm for workflow applications that takes into account the Grid site’s prior performance and behaviour for facilitating opportunistic and context-aware placement of application components. The proposed scheduling algorithm is fully dependable, as it is capable of dynamically adapting to the changes in system behaviour by taking into consideration the performance metrics of Grid sites (software and hardware capability, availability, failure). The dependability of a Grid site is quantified using a decentralized reputation model, which computes local and global reputation scores for a Grid site based on the feedbacks provided by the scheduling services that have previously submitted their applications to that site. In particular, this paper contributes the following to the state-of-the-art in the Grid scheduling paradigm: A novel Grid scheduling algorithm that aids the Grid schedulers such as resource brokers in achieving improved performance and automation through intelligent and opportunistic placement of application elements based on context awareness and dependability. Further, the effectiveness of this contribution is appraised through: (i) A comprehensive simulation-driven analysis of the proposed approach based on realistic and well-known application failure models to capture the transient behaviours that prevails in existing Grid-based e-Science application execution environments; (ii) A comparative evaluation that demonstrates the self-adaptability of the proposed approach in comparison to Grid environments where: (1) resource/application behaviours do not change (i.e. no failure occurs), therefore no self-management is required and, (2) transient conditions exist but runtime systems and application elements have no capability to self-adapt. The remainder of this paper is organized as follows. The related work that are focused on dependable application scheduling, distributed reputation models and Grid workflow management is presented in next section. Section 3 provides a brief discussion related to key system models with respect to overlay creation, application composition, task failure and application scheduling. In Section 4, we provide the distributed reputation management technique and the algorithms related to proposed dependable scheduling approach with example. Simulation setup, performance metrics and key findings of the experiments performed are analyzed and discussed in Section 5. Finally, we conclude the paper with the direction for future work.
نتیجه گیری انگلیسی
In this paper, we have presented a reputation based dependable scheduling technique for workflow applications in Peer-to-Peer Grids. Using simulation, we have measured the performance of the proposed scheduling technique against two cases: Failure without Self-adaptation and No Failure. The results show that our scheduling technique can reduce the makespan up to 50% and successfully isolate the failure-prone resources from the system. Thus, by applying the proposed reputation based scheduling technique, not only context-aware and opportunistic placement of workflow tasks is possible but also significant performance gains are achievable (as analyzed in the previous section). Moreover, our results have practical importance since they highlight the fact that the schedulers, which do not have the ability to self-adapt in dynamic Grid conditions deliver degraded performance to application workflows. Thus, it is reasonable to conclude that developing self-adapting Grid scheduling and application management techniques is important to exploiting the realm of Grids. Further, adapting to dynamic resource conditions aids in coping with the unpredictability and uncertainty of Internet-scale, multi-sites Peer-to-Peer Grids. In future, we intend to focus on implementing this reputation based dependable scheduling technique in real world P2PG system such as Aneka Federation . As this paper shows that the variation in Rth has an impact on the system performance, in our future work, we also endeavour to devise an approach considering dynamic Rth, adjusted by the scheduler.