دانلود مقاله ISI انگلیسی شماره 27569
ترجمه فارسی عنوان مقاله

تجزیه و تحلیل عملکرد از تولید لوک هد خودکار توسط نمودار جریان کنترل: برخی آزمایشات

عنوان انگلیسی
Performance analysis of automatic lookahead generation by control flow graph: some experiments
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
27569 2001 17 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Simulation Practice and Theory, Volume 8, Issue 8, 15 July 2001, Pages 511–527

ترجمه کلمات کلیدی
شبیه سازی موازی - نمودار جریان کنترل - صف بندی شبکه - تجزیه و تحلیل عملکرد -
کلمات کلیدی انگلیسی
Lookahead, Automatic lookahead generation, Parallel simulation, Control flow graph, Queuing network, Performance analysis,
پیش نمایش مقاله
پیش نمایش مقاله  تجزیه و تحلیل عملکرد از تولید لوک هد خودکار توسط نمودار جریان کنترل: برخی آزمایشات

چکیده انگلیسی

The performance of parallel discrete event models is highly dependent on lookahead, particularly when a conservative algorithm is employed. Unfortunately lookahead is known to be problem-dependent, which restricts the implementations of conservative algorithms. This paper uses a simple queuing network to show how this lookahead affects performance and discusses various techniques for automatic generation of lookahead using control flow graphs (CFGs). These methods are tested on the queuing network simulation running on a CRAY T3E 1200E. Results indicate that the automatic lookahead techniques, though requiring some time to compute, perform as well as the best manually extracted lookahead injected into the parallel program.

مقدمه انگلیسی

In parallel discrete event simulation PDES [1], the model to be simulated is decomposed into physical processes that are modelled as simulation objects and assigned to logical processes (LPs). The simulator is composed of a set of concurrently executing LPs running on distributed physical processors. The LPs communicate by exchanging two types of time-stamped messages: real messages, which are events in one processor and must be scheduled in another processor; and null messages, which are used to synchronise the processors. In order to maintain causality, the LPs must process messages in strictly non-decreasing time-stamp order. There are two basic synchronisation protocols used to ensure that this condition is not violated: the first conservative and the second optimistic. Conservative protocols, such as Chandy-Misra [2] and Bryant [3] (often known as CMB) avoid causality errors, while optimistic protocols, such as Time Warp [4], allow causality errors to occur, but implement some recovery mechanism. Conservative protocols were the first distributed simulation mechanism. The basic problem conservative mechanisms must address is the determination of safe events. The conservative process must first determine that it is impossible to receive another event with a lower time stamp than the event it is currently trying to execute. Such events are assumed to be safe and can be executed without violation of system causality. Processes containing no safe events must be blocked; this can lead to a deadlock situation if no appropriate precautions are taken. Many optimisations of the basic conservative mechanism can be found in the literature. For instance, instead of creating null messages a mechanism can be used to detect when the simulation is deadlocked and another mechanism to break the deadlock [5] or send null messages on a demand basis, i.e. whenever a process is about to become deadlocked, it requests the next message from the sender process [6]. This approach reduces the null message traffic, though a longer delay may be required, due to transmission of two messages. In an optimistic protocol, each LP operates as a distinct discrete event simulator, maintaining input and output event lists, a state queue, and a local simulation time (called local virtual time or LVT). Each LP processes events optimistically and moves ahead in LVT. As each LP simulates asynchronously, it is possible for an LP to receive an event from the past, termed a straggler, which violates the causality constraints of the events in the simulation. On receipt of a straggler message, the LP must rollback to undo events that have been wrongly processed. Rollback involves two steps (i) restoring the state to a time preceding the time stamp of the straggler and (ii) cancelling any output event messages that were erroneously sent. After rollback, the events are re-executed in the proper order. Optimisations of the basic mechanism have been proposed to overcome the drawback of state-saving. For instance, processes could stop the immediate sending of the anti-messages for any roll back computation and instead, could wait to see if the re-execution of the computation regenerates the same messages. If the same message is regenerated, there is no need to cancel the message [7]. Also combining optimistic and conservative protocols is employed in order to take advantage of less memory usage property of the conservative algorithm. Ideally such a protocol monitors the parallel simulation and estimates the trade-off between the conservatism cost (i.e. blocking cost) and the optimism cost (i.e. state-saving and rollback) and accordingly adjusts the protocol control parameters [8]. Fujimoto [9] outlines the main concerns of PDES and Ferscha [10] comprehensively demonstrates the synchronisation layer of the PDES. The emergence of complicated applications (such as wireless networks [11], telecommunication systems [12], manufacturing [13], and cache design [14]) and widely available parallel and distributed platforms encourages the use of PDES as a tool for decision-makers. However, the development of tools for PDES is in its early stages. The PDES community is working toward providing run-time systems and libraries but despite many efforts, they still are neither flexible nor easy to use, regardless of performance issues. Thus most users stick with programming from scratch. However, efficient implementation of an optimistic protocol is complex, and the conservative protocols are application dependent. Studies of conservative protocols reveal the importance of so-called lookahead, which makes them less general. Lookahead allows different processors to simulate their sub-models asynchronously but usually is application dependent. Control flow graphs (CFGs) are intended to generate lookahead automatically and different algorithms have been proposed which are analysed in this paper.

نتیجه گیری انگلیسی

Using Sargent and Cota's algorithms, CFGs can be used to generate automatic lookahead for problems that contain time delay elements. The CFG paradigm has three attractive features. Firstly, its graphical representation provides an intermediate tool between the conceptual model and the simulation program, which makes it easier to understand and validate the model. Secondly, it separates the model execution from the parallel processing layer. Thirdly extracting the required lookahead in the model can be performed automatically and the methods employed are not application dependent. This paper demonstrates the effects of different automatic lookahead generation techniques on various performance indicators, showing that they perform as well as the best manually extracted lookahead. Further research that makes a closer examination of the above algorithms is required for a more precise evaluation of their individual costs and benefits. In this regard, acquiring the ratio of required time for computing lookahead over the exploited lookahead via various techniques and making comparison between them may provide some insight. It should also be noted that, though large values of lookahead indicate low inter-processor dependency, which should improve the performance of the simulation, the role of lookahead is not wholly straightforward. It is not clear the extent to which lookahead can be used as an indicator for the likely efficiency for simulation parallelisation. More research in this area is needed with a view to develop new indicators (which possibly combine lookahead with other attributes of the parallel program) that can be used as better indicators of this efficiency. Partitioning a model into CFG components still requires an experienced modeller and is application dependent. Therefore, it seems sensible to maintain a model as complex ACs or ensuring that it is always possible to drill down into very simple elements. It seems likely that the degree of complexity in the model affects the efficiency of the automatic lookahead generation techniques. In simple CFGs, employing less complicated techniques like MOC analysis might be more efficient than complicated dynamic techniques due to the computation needed for extracting the lookahead using the latter methods. The generated lookahead in complex CFGs may be more sensitive to the lookahead technique used and this is still under study. For this analysis, a criterion is required to represent the complexity of a CFG model. Otherwise it is impossible to investigate the relationship between model complexity and the different techniques so as to come up with guidelines to help users in choosing a lookahead technique for a specific CFG model. Finally, it should be noted that, in the research described here, the CFGs are statically mapped to the processors which may cause imbalance performance in complex models and the alternative is to use a dynamic load-balancing scheme. However it is possible to prevent imbalanced loads by wisely allocating the CFGs before run time. In complex models with large number of CFGs this is crucial for two reasons. Firstly to obtain a good performance, and secondly to employ the right number of processors. Employing too many processors, may mean that the communication cost can outweigh the parallel computation benefit gained in different processors. Various procedures for the optimal prior allocation of CFGs are being investigated.