بهینه سازی کلونی مورچه موازی در گرافیک واحد پردازش
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|7840||2013||10 صفحه PDF||سفارش دهید||8910 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Journal of Parallel and Distributed Computing, Volume 73, Issue 1, January 2013, Pages 52–61
The purpose of this paper is to propose effective parallelization strategies for the Ant Colony Optimization (ACO) metaheuristic on Graphics Processing Units (GPUs). The Max–Min Ant System (MMAS) algorithm augmented with 3-opt local search is used as a framework for the implementation of the parallel ants and multiple ant colonies general parallelization approaches. The four resulting GPU algorithms are extensively evaluated and compared on both speedup and solution quality on a state-of-the-art Fermi GPU architecture. A rigorous effort is made to keep parallel algorithms true to the original MMAS applied to the Traveling Salesman Problem. We report speedups of up to 23.60 with solution quality similar to the original sequential implementation. With the intent of providing a parallelization framework for ACO on GPUs, a comparative experimental study highlights the performance impact of ACO parameters, GPU technical configuration, memory structures and parallelization granularity.
The Ant Colony Optimization (ACO) metaheuristic  is a constructive, population-based approach based on the social behavior of ants. As it is acknowledged as a powerful method to solve combinatorial optimization problems, a considerable amount of work is dedicated to improving its performance. Among the proposed solutions, we find the use of parallel computing to reduce computation time, improve solution quality or both. Most parallel ACO implementations can be classified into two general approaches. The first one is the parallel execution of the ants construction phase in a single colony. Initiated by Bullnheimer et al. , it aims to accelerate computations by distributing ants to computing elements. The second one, introduced by Stützle , is the execution of multiple ant colonies. In this case, entire ant colonies are attributed to processors in order to speedup computations as well as to potentially improve solution quality by introducing cooperation schemes between colonies. These implementations usually follow the message-passing and shared-memory computing paradigms. The relatively high-level abstraction model they provide facilitates the development of effective and portable optimization software on conventional CPU-based parallel architectures. However, as research on parallel architectures is rapidly evolving, new types of hardware have recently become available for high performance computing. Among them, we find Graphics Processing Units (GPUs) which provide great computing power at an affordable cost but are difficult to program. In fact, it is not clear that conventional paradigms are suitable for expressing parallelism in a way that is efficiently implementable on GPU architectures. As academic and industrial combinatorial optimization problems always increase in size and complexity, the field of parallel metaheuristics has to follow this evolution of high performance computing. The purpose of this paper is to propose parallel implementations of ACO that are suitable for GPU computing environments. For both parallel ants and multiple colonies general approaches, two parallelization strategies are designed and experimentally compared on speedup and solution quality. Important algorithmic, technical and programming issues are also addressed in this context. This paper is organized as follows. First, we present the ACO metaheuristic, the Max–Min Ant System (MMAS) algorithm and its application to the Traveling Salesman Problem (TSP). We choose MMAS and TSP to focus on algorithmic aspects of ACO and technical issues of GPU computing that are not problem dependent, as well as to strictly compare our results to the original works of Stützle and Hoos . After a fairly complete review of the literature on parallel ACO, the proposed GPU parallelization strategies for MMAS are explained. Finally, extensive experimental results are presented to evaluate and compare their performance.
نتیجه گیری انگلیسی
The aim of this paper was to propose efficient parallelization strategies for the implementation of Ant Colony Optimization on Graphics Processing Units. Following the parallel ants general approach, the ANTthreadANTthread and ANTblockANTblock strategies aimed at associating the ants tour construction and local search phases to the execution of streaming processors and multiprocessors respectively. On the other hand, the COLONYblockCOLONYblock and COLONYGPUCOLONYGPU strategies implemented the multiple ant colonies approach by attributing entire ant colonies to multiprocessors and whole GPUs respectively. We showed that both general approaches can be efficiently implemented on a GPU architecture. In fact, the ANTblockANTblock strategy managed to provide speedups as high as 19.47 with the basic MMAS and 12.48 with the addition of a 3-opt local search procedure. Speedups raise even higher with the COLONYGPUCOLONYGPU strategy, reaching 23.60 with the combination of MMAS and local search. Overall, this shows that it is possible to significantly reduce the execution time of ACO on GPU while rigorously keeping the similar competitive solution quality of the sequential MMAS. Still, as it is the case in the field of parallel ACO and parallel metaheuristics in general, much can still be done for the effective use of GPUs. In fact, the variety of the proposed strategies and the extensive comparative study provided in this paper brings its share of questions and research avenues. For example, even though the use of the GPU shared memory leads to some of the best speedups, this hardware feature also shows its limits on bigger TSPs. Moreover, maximal exploitation of GPU resources often requires algorithmic configurations that do not let ACO perform an effective exploration and exploitation of the search space. Globally, this paper shows that parallel performance is strongly influenced by the combined effects of parameters related to the metaheuristic, the GPU technical architecture and the granularity of the parallelization. As it becomes clear that the future of computers no longer relies on increasing the performance on a single computing core but on using many of them in a single system, it becomes desirable to adapt optimization tools for parallel execution on architectures like GPUs. Following this line of thought, our future works are aimed at using the framework and knowledge built in this paper to propose an ACO metaheuristic that is specifically tailored to GPU execution. Also, in order to provide better insight into the memory and algorithmic bottlenecks that have been identified in this work, a more formal analysis would be likely required. For example, the requirements for the different types of memory (main, device, shared, etc.) could be analyzed as a function of problem size, numbers of ants, colonies, SM, SP, etc. Such an analysis could lead to the proposition of algorithms that automatically determine effective thread/block/GPU configurations for ACO and other metaheuristics. We believe that the global acceptance of GPUs as components for optimization systems requires algorithms and software that are not only effective, but also usable by a wide range of academicians and practitioners.