قابلیت اطمینان و بهینه سازی عملکرد سیستم های زمان واقعی خط لوله
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
7278 | 2013 | 15 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Journal of Parallel and Distributed Computing, Volume 73, Issue 6, June 2013, Pages 851–865
چکیده انگلیسی
We consider pipelined real-time systems that consist of a chain of tasks executing on a distributed platform. The processing of the tasks is pipelined: each processor executes only one interval of consecutive tasks. We are interested in minimizing both the input–output latency and the period of application mapping. For dependability reasons, we are also interested in maximizing the reliability of the system. We therefore assign several processors to each interval of tasks, so as to increase the reliability of the system. Both processors and communication links are unreliable and subject to transient failures. We assume that the arrival of the failures follows a constant parameter Poisson law, and that the failures are statistically independent events. We study several variants of this multiprocessor mapping problem, with several hypotheses on the target platform (homogeneous/heterogeneous speeds and/or failure rates). We provide NP-hardness complexity results, and optimal mapping algorithms for polynomial problem instances. Efficient heuristics are presented to solve the general case, and experimental results are provided.
مقدمه انگلیسی
A pipelined real-time system [22] and [27] consists of a chain of tasks executing on a distributed platform. Each task is a block of code with a known amount of work to be processed. The role of the first task of the chain is to acquire some data set from the environment (thanks to sensor drivers), to process it, and finally to transmit its result to the second task. Each subsequent task receives its input data from its predecessor task, processes it, and transmits its result to its successor task, except the last task that transmits it to the environment (thanks to actuator drivers). The whole chain of tasks is executed repeatedly, as new data sets enter the system. Each data set is input to the first task and progresses from task to task until its processing is completed. Executing a real-time system in a pipelined way is essential to increase the throughput, by making the best possible usage of available resources in the distributed execution platform. Tasks are assigned to processors using an interval mapping, which groups consecutive tasks of the linear chain and assigns them to the same processor. Interval mappings are more general than one-to-one mappings, which establish a unique correspondence between tasks and processors; they allow communication overheads to be reduced, not to mention the many situations where there are more tasks than processors, and where interval mappings are mandatory. The key performance-oriented metrics to determine the best interval mapping are the period and the latency. The period is the time interval between the beginning of the execution of two consecutive data sets. Equivalently, the inverse of the period is the throughput, which measures the aggregate rate of processing of data. The latency is the time elapsed between the beginning and the end of the execution of a given data set; hence, it measures the response time of the system for processing the data set entirely. Therefore, to minimize the period, we try to create many small intervals so that we can start processing the next data set as soon as possible, while for minimizing the latency, we rather try to reduce the sum of communication costs, and hence to split the chain of tasks in the least possible number of intervals. As a consequence, minimizing the latency is antagonistic to minimizing the period, and trade-offs should be found between these two criteria. Each data set has a deadline on the completion time of its execution (the real-time constraint). The deadlines are related to the period PP and latency LL as follows. Data sets periodically enter the system with a given period PP. Data set 0 enters the system at time 0 and has a deadline equal to LL. Data set KK enters the system at time K×PK×P and has a deadline equal to K×P+LK×P+L. Accordingly, the deadline of each data set will be met as soon as we derive a schedule whose period does not exceed PP, and whose latency does not exceed LL. This model is consistent with those applications found in most safety critical real-time systems (e.g., avionics, railway or nuclear applications [8] and [30]), which enforce a prescribed processing rate and maximum response time. This leads to a global deadline for each application instance (data set), but individual tasks distinct from the output task have no deadlines. Besides constraints on the performance-oriented criteria, expressed as an upper bound on the period and/or the latency, pipelined real-time systems must also meet crucial dependability constraints, which are expressed as a lower bound on the reliability of the mapping. Increasing the reliability is achieved by replicating the intervals onto several processors. Augmenting the replication level (defined as the average number of times each interval is replicated) is good for reliability, but bad for period and latency, because fewer processors will be available for executing the task intervals. We thus have three antagonistic criteria: reliability, period, and latency. This antagonism between the criteria makes the problem very challenging. A typical example of applications with real-time and reliability constraints is encountered in the automotive industry, with the Autosar architecture.1 Autosar consists of a hardware architecture made of several processors (called ECUs—Electronic Computing Units) connected by a bus, and of several software components, each one being an embedded automotive function. Each function is a pipelined real-time system that starts with some input drivers that will generate a new dataset at each invocation (for instance the wheel angular speed), followed by several software blocks, and terminated by some actuator driver (for instance the hydraulic brake pressure). Each such function must meet a latency (also called the end-to-end timing constraint, from the sensor to the actuator), a period, and a reliability constraint. We evaluate the reliability of a single task mapped onto a processor according to the classical model of Shatz and Wang [34], where each hardware component (processor or communication link) is fail-silent and is characterized by a constant failure rate per time unitλλ: the reliability of a task of duration dd is therefore e−λde−λd. For an interval of several tasks mapped onto a single processor, we just have to sum up the task durations, hence obtaining e−λDe−λD, where DD is the sum of task durations in the interval. For a mapping with replication, we compute the reliability by building the Reliability Block Diagram (RBD) [29] and [3] corresponding to this mapping. Here we face the delicate issue that computing the reliability is exponential in the size of the mapping (or equivalently the size of the RBD). To solve this issue, we insert routing operations in the mapping to guarantee that the RBD is by construction serial–parallel, therefore allowing us to compute its reliability in linear time. The models are detailed in Section 2 and we discuss related work in Section 3. Our contribution is multifold. In Section 4, we show how to compute the different objectives (reliability, expected and worst-case latency, expected and worst-case period) for a given multiprocessor mapping. Then, we derive complexity results for homogeneous platforms in Section 5. We prove that: 1. computing a mono-criterion mapping that optimizes the reliability is polynomial (Section 5.1); 2. optimizing both the reliability and the period remains polynomial (Section 5.2); 3. the problem of optimizing both the reliability and the latency is NP-complete (Section 5.3); 4. the problem of assigning processors for a given partition of the task chain into intervals is polynomial (Section 5.5). Moreover, for homogeneous platforms, we provide a linear program to solve the problem of optimization of reliability for given bounds on period and latency in Section 5.4. For heterogeneous platforms, we prove that the mono-criterion problem of optimizing the reliability is NP-complete, and hence all the multi-criteria mapping problems that include the reliability in their criteria are also NP-complete (Section 6). We provide heuristics in Section 7 for the most general problem of optimizing the reliability under constraints on period and latency on a heterogeneous platform, and we conduct experiments on homogeneous and heterogeneous platforms to assess their performance (Section 8). Finally, we state some concluding remarks and future research directions in Section 9.
نتیجه گیری انگلیسی
This paper has addressed difficult multi-criteria optimization problems related to the mapping of pipelined real-time systems onto homogeneous and heterogeneous distributed platforms. The main goal was to optimize the reliability of the mapping through task replication, while enforcing bounds on performance-oriented criteria, namely the period and the latency. The period is a system-oriented criterion, as it measures the aggregate yield of the platform. On the contrary, the latency is a user-oriented criterion, as it characterizes the response time of each individual data set. Period and latency are antagonistic criteria, and achieving good trade-offs for these two objectives is known as a difficult task. However, performance itself is not enough on failure-prone platforms, and a challenging question is how to achieve application resilience without sacrificing efficiency? But adding reliability to the picture dramatically complicates the mapping problem: one has to decide which computing resources will be kept for performance, and which will be spared (replicated) for coping with failures. Even for homogeneous platforms, deciding how many processors must be assigned to each set of tasks turns out to be a difficult resource selection problem when three different objectives are considered simultaneously. Altogether, the results presented in this paper provide a solid theoretical foundation for the study of tri-criteria mappings of pipelined real-time systems. We have derived a comprehensive set of NP-hardness complexity results, together with optimal algorithms for polynomial instances. Another contribution of this paper is the introduction of a realistic communication model that nicely accounts for the inherent physical limitations on the communication capabilities of state-of-the-art processors. In addition, communication failures have been incorporated in the model through routing operations, which guarantee that evaluating the system reliability remains computationally tractable. An interesting future research direction would be to investigate whether it is feasible to remove this routing procedure, and accurately approximate the reliability of general systems (non serial–parallel). On homogeneous platforms, we have presented an integer linear program to solve the problem of maximizing the reliability with bounds on period and on latency, while polynomial-time heuristics are derived for the most general problems. We have proposed two heuristics: Heur-L that attempts to minimize the latency and Heur-P that attempts to minimize the period. Our experiments demonstrate the efficiency of the heuristics, and the supremacy of Heur-P in most cases. It would be interesting to address the approximability of the problems with heterogeneous instances, either establishing approximation factors or proving negative results. Another direction for future work involves the design of heuristics for even more difficult problems that would mix performance-oriented criteria (period, latency) with several other objectives, such as reliability, resource costs, and power consumption.