ترجمه فارسی عنوان مقاله

بهره برداری از موازی سازی برای حلقه های تو در تو با چرخه های وابستگی

عنوان انگلیسی

Exploitation of parallelism to nested loops with dependence cycles

کد مقاله	سال انتشار	تعداد صفحات مقاله انگلیسی
20267	2004	14 صفحه PDF

منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Journal of Systems Architecture, Volume 50, Issue 12, December 2004, Pages 729–742

ترجمه کلمات کلیدی

بهینه سازی لوپ - تجزیه و تحلیل وابستگی داده ها - چرخه وابستگی - بهره برداری موازی

کلمات کلیدی انگلیسی

Loop optimization, Data dependence analysis, Dependence cycle, Parallelism exploitation,

دانلود رایگان 2 صفحه اول مقاله لاتین (PDF)

پیش نمایش مقاله

چکیده انگلیسی

In this paper, we analyze the recurrences from the breakability of the dependence links formed in general multi-statements in a nested loop. The major findings include: (1) A sin k variable renaming technique, which can reposition an undesired anti-dependence and/or output-dependence link, is capable of breaking an anti-dependence and/or output-dependence link. (2) For recurrences connected by only true dependences, a dynamic dependence concept and the derived technique are powerful in terms of parallelism exploitation. (3) By the employment of global dependence testing, link-breaking strategy, Tarjan’s depth-first search algorithm, and a topological sorting, an algorithm for resolving a general multi-statement recurrence in a nested loop is proposed. Experiments with benchmark cited from Vector loops showed that among 134 subroutines tested, 3 had their parallelism exploitation amended by our proposed method. That is, our offered algorithm increased the rate of parallelism exploitation of Vector loops by approximately 2.24%.

مقدمه انگلیسی

In high speed computing [20], there are the two most popular parallel computational models, distributed memory multiprocessors and shared memory multiprocessors. Because the technique to memory hardware [20] is improved, therefore, the access time of shared memory for a system of multiprocessors is obviously decreased and the system has been increasingly used for scientific and engineering applications. However, the major shortcoming of shared memory multiprocessors is the difficulty in programming because programmers are responsible for analyzing data dependence relations among statements in programs and exploiting the parallelism of statements in programs among shared memory multiprocessors. A successful vectored/paralleled compiler is capable of exploiting the parallelism of a program. Three features of a vectored/paralleled compiler determine its level of parallelism exploitation: (1) an accurate data dependence testing, (2) efficient loop optimization and (3) efficient removals of undesired data dependences. Techniques for dependence analysis algorithms, which directly support data dependence testing, have been developed and used quite successfully [2], [5], [7], [8], [9], [10], [20], [22], [23], [24], [25], [26] and [27]. Computationally expensive programs in general spend most of their time in the execution of loops. Extracting parallelism from loops in an ordinary program therefore has a considerable effect on the speed-up. Many loop optimizational methods have been developed and broadly fallen into two classes: loop vectorization and loop parallelization [3], [4], [20], [21], [28], [29] and [30]. In terms of the reduction of data dependences, most researches concentrate on loop optimization by the front-end of vectored/paralleled compilers such as scalar renaming, scalar expansion, scalar forward-substitution and dead code elimination [20]. Studies on the back-end of vectored/paralleled compilers primarily deal with separation of parallelism execution and sequential execution of the statements. Relatively less attention has been given to data dependence elimination [1], [9] and [11]. Recurrence is a type of π-blocks in a general nested loop, which is extracted at the time of loop distribution, a back-end phase [17]. Statement(s) involved in a recurrence are strongly connected via various dependence types. Famous techniques, such as node splitting, thresholding, cycle shrinking, etc., are able to eliminate data dependence of a recurrence to a certain extent, depending upon specific dependence types [1], [17], [18], [20] and [21]. In this paper, we study a parallelism exploitation for loops with dependence cycles on basis of breaking dependence links. Formally the breaking strategies for dependence cycles were surveyed in a single loop [11]. Their method is extended to break the dependence links in a nest of loops. In Section 2, the concept of data dependence is reviewed. In Section 3, an analysis of the formation of dependence cycles is provided and three dependence links on the breaking-strategy basis are derived. For each link pattern, its features, breaking techniques and applications are introduced in detail. An algorithm is developed for the resolution of a general multi-statement recurrence. Experimental results showing the advantages of the proposed method are given in Section 4. Finally, Section 5 contains a conclusion.

نتیجه گیری انگلیسی

Parallelism exploitation for statements with dependence cycles in a nest of loops is necessary. The vectorizability of a dependence cycle depends primarily on its dependence links. There exist seven possible dependence relations for one statement on another statement. A dependence cycle with an anti-dependence link is breakable via node splitting [17]. In case the source variable for the anti-dependence relation is itself the sin k variable of another true-dependence, a corrective strategy should be incorporated into the node splitting algorithm. An output-dependence link or an anti- and output-dependence link in a dependence cycle is breakable via a sin k variable renaming technique. This technique is also applicable to break an anti-dependence link. In practice, node splitting and sin k variable renaming techniques should be utilized in a complementary manner. For a dependence cycle formed only by links of true-dependence or all other dependences, a general, simple, but less-efficient partial vectorization algorithm is available. To improve the efficiency, a dynamic dependence concept and its derived technique are powerful. This approach is particularly efficient to deal with a simple dependence cycle. All of the recurrence-resolving strategies can be integrated with the dependence testing technique, Tarjan’s depth-first search algorithm, and a topological sorting to develop an automatic recurrence-resolving system.