مکانیزم تشخیص خطا مبتنی بر توازن ستون برای بافرهای FIFO
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
6406 | 2013 | 15 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Integration, the VLSI Journal, Volume 46, Issue 3, June 2013, Pages 265–279
چکیده انگلیسی
This paper presents a low cost fault detection mechanism for FIFO buffers. The scheme is based on column parity maintenance in a single register, which is updated by monitoring the values written to and read from the FIFO memory array. A non-zero column parity when the FIFO is empty, constitutes an indication of fault, and this property is exploited for fault detection. The technique has gains in area, power and critical path delay, at the expense of (1) greater detection latency, due to the need for the FIFO to become empty in order to assert a violation and (2) worse Silent Data Corruption (SDC) rate.
مقدمه انگلیسی
First In First Out (FIFO) memories are used for buffering and flow control and are indispensable parts of almost every system. They are widely used in on chip communication fabric, high speed communication protocol implementations, image/video accelerators [1], multiple channel DMA controllers and coprocessor interfaces (e.g. Xilinx FSL [2]). What is more, they are extensively used between different clock domains for synchronization. Modern multicore processors require different clock domains controlled by Dynamic Voltage and Frequency Scaling (DVFS) mechanisms to meet the power requirements (maximum Thermal Design Power—TDP) and communication between them is ensured using FIFOs [3]. Globally Asynchronous Locally Synchronous (GALS) systems [4] extensively use FIFOs. The 48 core Intel IA-32 chip consists of a lot of FIFOs for communication [5]. Since FIFOs intervene in many system operations, they should be protected properly to ensure reliable operation. This becomes more imperative in current and future technology nodes, in which system failures are becoming more and more dominant. Static and dynamic variations [6] and [7] result in unreliable operation. Moreover, aging mechanisms [8] such as NBTI [9], electromigration, time dependent dielectric breakdown degrade devices and wires during system's lifetime and cause faults in the field. Furthermore, soft errors [10] and other types of transient faults affect reliability significantly. Thus, dependability in vital system operations is of utmost importance. Operation in lower voltage, which is more than required to meet the power constraints, exacerbates devices reliability and process variation related problems appear more intense. SRAM cells seem to be much more vulnerable than logic and flip flops [11], since they are more dense, and their stability greatly depends on the asymmetry of the threshold voltages of their transistors. In [11] has been reported that in 12 nm one every few thousands SRAM cells will be faulty (due to Random Dopant Fluctuations and aging). In [12] the dependence of SRAM cell probability of failure (pfail) on voltage is shown for 32 nm technology nodes. Results show that even in small memory arrays, faults due to process variation/aging will occur. On the other hand, soft error resiliency should be ensured. Results also show that combinational logic is less vulnerable to process variation/aging induced faults (unless the time constraints are very tight). Thus, in this paper we more focus on array related faults. All these effects result from scaling. Thus it would be wise to spend as less resources as possible for protection, otherwise we could result in canceling the effects of scaling. Information redundancy is a viable solution for protection in FIFOs, but it comes at the expense of state overhead and combinational circuit path for encoding/decoding. Even byte parity, which is the simplest fault detection technique, imposes a 12.5% state overhead, which is not negligible. What is more, it imposes critical path (8–9 input XORs in encoding/decoding) and energy overheads. In this paper we propose a low cost fault detection technique for FIFO buffers. It is based on the update of a global parity register, which stores global parity in a column basis, requiring only a flip flop and two XOR gates per column. The fault detection is based on the fact that when the FIFO becomes empty, the accumulated parity of the data read will be the same with that of the written and this register should be zero. A non-zero value is an indication of fault, and this property is exploited for fault detection. All faults encountered between two empty states accumulate and are effectively indicated in the column parity register. The requirement to wait for the FIFO to become empty, however, increases the detection latency. Thus, we trade off detection latency for area, power and critical path overhead reduction. Due to the unbounded detection latency of the mechanism, it can be more easily applied in systems with backward error recovery (BER) schemes or just as a health monitoring mechanism. The technique is easily applicable also in hardware peripherals which take input data in communication bursts and cannot accept new data unless they have processed them. Besides the errors in memory arrays, this scheme can detect even addressing problems and metastability related problems in dual clock domain FIFOs. In particular, the contributions of this paper are the following: • A low cost fault detection mechanism for FIFOs is proposed. It allows for low cost error detection, at the expense of greater latency. It eliminates the state overhead of standard horizontal parity codes, while keeping combinational area low and decreasing critical path overheads and power consumption. • The mechanism was modeled and synthesized using Verilog HDL and its area, power and delay overheads were evaluated thoroughly, and compared to standard parity protection schemes. Single clock and dual clock domain FIFOs were considered. • The mechanism's fault coverage is analytically estimated. The remainder of the paper is organized as follows. Section 2 introduces the proposed mechanism and mentions its capabilities and the involved overheads. Section 3 presents experimental results regarding the implementation of the proposed technique in 90 nm ASIC technology, as well as detection latency results in some example systems. Section 4 presents results regarding the fault coverage of the mechanism. Finally, Section 5 lists the related work and Section 6 concludes the paper.
نتیجه گیری انگلیسی
A low cost fault detection mechanism has been proposed for FIFO buffers. The mechanism trades off detection latency to gain reductions in area, delay and power overheads without compromising soft error resiliency. In a 256×32 DFF based FIFO, we have reported reductions of 5.63 times in area and 4.09 times in power overheads in comparison with standard horizontal 32-bit parity, while the critical path delay was not affected at all. The area overheads of the proposed protection technique varied between 0.57% and 2.36% in all single clock configuration examined (2.09–5.14% in dual clock), which can be further reduced to 0.17% with grouping of many columns. The respective overhead ranges of horizontal 32-bit parity and byte parity were 2.44–33.56% and 8.83–39.95%, respectively. No critical path overhead was imposed in any of the proposed configurations, in contrast with horizontal parity schemes (up to 19%). Beyond array cell errors, the mechanism can detect other types of errors (e.g. in addressing), which makes it particularly useful. The simplicity of the mechanism makes it easy to adopt and can be a good candidate in designs with many and/or large FIFOs, especially when area, power and frequency constraints cannot sustain expensive FIFO protection schemes.