متدولوژی به سوی تجزیه و تحلیل عملکرد خودکار از برنامه های موازی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|27776||2004||13 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Parallel Computing, Volume 30, Issue 2, February 2004, Pages 211–223
Tuning and debugging the performance of parallel applications is an iterative process consisting of several steps dealing with identification and localization of inefficiencies, repair, and verification of the achieved performance. In this paper, we address the analysis of the performance of parallel applications from a methodological viewpoint with the aim of identifying and localizing inefficiencies. Our methodology is based on performance metrics and criteria that highlight the properties of the applications and the load imbalance and dissimilarities in the behavior of the processors. A few case studies illustrate the application of the methodology.
The performance achieved by a parallel application is the result of complex interactions between the hardware and software resources of the system where the application is being executed. The characteristics of the application, e.g., algorithmic structure, input parameters, problem size, influence these interactions by determining how the application exploits the available resources and the allocated processors. In this framework, tuning and debugging the performance of parallel applications become challenging issues . A typical approach to address these issues is experimental, that is, based on instrumenting the application, monitoring its execution and analyzing its performance either on the fly or post-mortem. Many tools have been developed for this purpose (see e.g., , ,  and ). These tools analyze the measurements collected at run-time and provide statistics and diagrams describing the performance of the application and of its activities, e.g., computation, communication, I/O. The major drawback of these tools is that they fail to assist users in mastering the complexity inherent in this analysis. To overcome this drawback, various methodological approaches have been proposed and tools have been developed out of these approaches with the aim of identifying performance bottlenecks, that is, the code regions, e.g., routines, loops, of the applications critical from the performance viewpoint. The Poirot project  proposed a tool architecture to automatic diagnose parallel applications using a heuristic classification scheme. The Paradyn Parallel Performance tool  dynamically instruments the applications to automate bottleneck detection at run-time. The Paradyn Performance Consultant starts a hierarchical search of the bottlenecks and refines this search by using stack sampling  and by pruning the search space considering the behavior of the application during previous runs . The Kappa-Pi tool  deals with a post-mortem automatic performance analysis of message passing applications based on PVM. The analysis of processor utilizations leads to the identification of performance bottlenecks classified by means of a rule based knowledge system. Aksum  automatically performs multiple runs of a parallel application and detects performance bottlenecks by comparing the performance achieved varying the problem size and the number of allocated processors. In this paper, we address the analysis of the performance of parallel applications from a methodological viewpoint with the aim of identifying and localizing performance inefficiencies. We define new performance metrics and criteria that highlight the properties of the applications and the load imbalance and dissimilarities in the behavior of the allocated processors. These metrics rely on the measurements collected by monitoring at run-time the applications. The integration of this methodology into a performance tool will help users in interpreting the performance achieved by their applications. The paper is organized as follows. Section 2 presents the methodology and introduces metrics and criteria for the evaluation of the overall behavior of a parallel application. Section 3 focuses on the behavior of the processors allocated to the application. Section 4 presents an application of the methodology on a few case studies. Finally, Section 5 summarizes the methodology and discusses its integration into a performance analysis tool.
نتیجه گیری انگلیسی
Performance analysis of parallel applications is quite challenging. Many factors influence the performance and it is difficult to assess whether and where the applications have experienced poor performance. The methodological approach presented in this paper is in the framework of automatic performance analysis of parallel applications and is aimed at the identification and localization of their performance inefficiencies. The methodology provides users with some guidelines for the interpretation of the performance achieved by their applications. We define metrics and criteria that characterize the performance of the applications. The metrics, derived as a result of the analysis of measurements collected at run–time, highlight the performance properties of the applications and the load imbalance and dissimilarities in the behavior of the allocated processors. The criteria are used to identify the activity and code region experiencing the most severe performance inefficiencies. We are currently developing a prototype of a performance tool that computes the metrics and implements the criteria proposed in this paper. We believe that the integration of our methodology into a performance tool represents a good enhancement towards automatic performance analysis. Users expect from the tools answers to their performance problems and our methodology tries to provide a few of these answers. As a future work, we also plan to assess the sensitivity of the metrics and criteria used for the identification of performance inefficiencies. For this purpose, we will analyze a large set of measurements collected on different parallel systems for a large variety of numerical and scientific applications .