اعمال نفوذ ساختار ماتریس هزینه برای پیاده سازی سخت افزار محاسبه اختلاف استریو با استفاده از برنامه ریزی پویا
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|25678||2010||13 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Computer Vision and Image Understanding, Volume 114, Issue 11, November 2010, Pages 1126–1138
Dynamic programming is a powerful method for solving energy minimisation problems in computer vision, for example stereo disparity computations. While it may be desirable to implement this algorithm in hardware to achieve frame-rate processing, a naı¨ve implementation may fail to meet timing requirements. In this paper, the structure of the cost matrix is examined to provide improved methods of hardware implementation. It is noted that by computing cost matrix entries along anti-diagonals instead of rows, the cost matrix entries can be computed in a pipelined architecture. Further, if only a subset of the cost matrix needs to be considered, for example by placing limits on the disparity range (include neglecting negative disparities by assuming rectified images), the resources required to compute the cost matrix in parallel can be reduced. Boundary conditions required to allow computing a subset of the cost matrix are detailed. Finally, a hardware solution of Cox’s maximum-likelihood, dynamic programming stereo disparity algorithm is implemented to demonstrate the performance achieved. The design provides high frame rate (>123 fps) estimates for a large disparity range (e.g. 128 pixels), for image sizes of 640 × 480 pixels, and can be simply extended to work well over 200 fps.
Dynamic programming is an optimisation algorithm that exploits optimal substructure to greatly reduce the time required to compute an optimal solution . It has numerous applications, such as biosequence matching (including DNA sequencing), the Viterbi algorithm for estimation of hidden Markov models (HMMs), and stereo disparity estimation in computer vision. Many problems that can be solved with dynamic programming involve matching data from two streams, such as the left- and right-image pixels from scanlines in stereo images, or problems involving matching string subsequences from two target strings. It should be noted that other problems share this structure, notable examples are the matrix-chain multiplication problem, the Needleman–Wunsch algorithm for biosequence matching, and the Viterbi algorithm for estimating hidden Markov models. This paper explores a dynamic programming approach applied to the stereo matching problem, and gives a sample implementation using FPGAs. Fast, efficient implementations for dynamic programming problems are of interest, especially in cases where the data rate is high, as it is in image processing. A number of researchers have made attempts to accelerate dynamic programming algorithms using hardware and/or special processors , , ,  and . Much of this work has involved protein and DNA sequence matching , ,  and  using an “edit-distance” cost function for string comparison  which shares the same anti-diagonal parallelism that we exploit in our approach. In  the anti-diagonal structure is used to pipeline comparison of multiple target strings against a reference string, but each cost matrix is computed in a row-wise manner. Martins et al.  point out the challenges of adapting processing to the changing sizes of the anti-diagonals, and propose a block-wise anti-diagonal approach on a parallel processor architecture to ensure good processor utilisation. Anvik et al.  identify the anti-diagonal cost matrix structure as a design pattern and introduce a framework generator that minimises the development effort required to adapt this design pattern to specific applications (they demonstrate three such applications) running on a small array of four processors. They use a block-wise approach similar to that of . Two of these approaches involve implementation on FPGAs  and , with both being tested in simulation only. The approach of  uses multiple FPGAs, 1 but does not re-map the cost matrix as we do, resulting in half of the processing elements being idle on any given cycle. Hoang and Ayala-Rincón et al.  and  both use systolic arrays in their processing architecture, although it is pointed out that adapting such arrays for computing only a diagonal-band of the cost matrix (as we do by limiting the disparity range D) is challenging. Stereo matching has been extensively studied by the computer vision community. Detailed reviews of existing algorithms have, thus, also been published by a number of researchers. Gong et al.  provide an overview of general approaches that utilise sparse, dense, volumetric or level-set algorithms. Furthermore, Seitz et al.  provide a review of multi-view matching methods. Some image registration techniques also utilise sparse, feature based matching and are discussed by Zitova and Flusser . The dynamic programming solution presented in this paper is the “Dynamic Programming Maximum Likelihood” (DPML) stereo disparity algorithm by Cox et al. . It is a global and dense disparity estimation algorithm—Scharstein and Szeliski  and Brown et al.  provide extensive discussions and comparisons of similar dynamic programming approaches. These comparisons demonstrate that dynamic programming provides accurate estimates that compete well even with the best of existing matching methods. More accurate approaches do exist, however, they tend to operate at significantly lower speeds (see ). Closely related to the studies by Scharstein and Brown, Lu et al.  provide a further survey on cost aggregation for matching problems. In current literature, the most common approaches towards FPGA hardware stereo matching are based on correlation or area-based methods. Of these, the most common methods make use of SAD aggregated cost functions applied to pixel intensities. As shown by Scharstein and Szeliski in  SAD correlation approaches do not provide very accurate disparity estimates. Typically these implementations make use of line buffering to align pixels in a stereo pair for parallel windowing computations. Works by Miyajima and Maruyama , Hariyama et al. , Perri et al. , Mitéran et al.  and Han et al.  all utilise such buffering. Both Perri et al.  and Mitéran et al.  observe that neighbouring windows of SAD computations use many of the same values. These values are stored and re-used as required. Hariyama et al.  define two levels of parallelism to perform coarse to fine refinement of disparities within localised regions, and thus improve the accuracy. Both Hariyama et al.  and Simhadri et al.  utilise adder trees to perform cost and comparison computations. It is worth noting that adder trees are primarily useful in situations such as windowing—they do not typically apply to dynamic programming type solutions. Lee et al.  present an FPGA-based SAD stereo algorithm running at 31 fps for 640 × 480 images with a disparity of 64 pixels, although no quantitative accuracy results are given. Most of these implementations produce results at speeds of approximately 30 fps, although some go as high as 150 fps on smaller images. Typically, papers that present these solutions do not present an analysis of accuracy of their algorithms. Additional hardware methods use phase based correlation. Díaz et al. , Darabiha et al.  and Masrani and MacLean  provide examples of these approaches implemented in hardware. Phase based methods have the advantage of producing significantly better results than SAD algorithms, obtaining results that compete well with dynamic programming solutions. The problem with these methods lies in the square root computations required—these computations are difficult to implement in hardware and come with an associated high resource cost, especially when implemented in parallel. Both Darabiha et al.  and Masrani et al.  provide an implementation that uses multi-scale Locally Weighted Phase Correlation (LWPC). Like SAD based implementations these phase based approaches achieve approximate 30 fps performance. Díaz et al.  demonstrate an algorithm that achieves over 200 fps performance but does so by reducing the search range to only four neighbouring pixels. A final approach that produces very high frame rates (over 200 fps), on par with the approach presented in this paper, makes use of the census algorithm (see Woodfill et al. ). The census algorithm computes costs within a pixel neighbourhood using the census transform—the transform essentially utilises hamming distance based comparisons. The high frame rates and very simple computations associated with this approach come with poor accuracy. Murphy et al.  present a “low-cost” implementation of the census algorithm on a Xilinx Spartan-3 FPGA, with performance of 40 fps and a disparity range of 20 pixels, with no accuracy results presented. To date, no hardware implementations of dynamic programming algorithms appear to have been explored by the vision community. Furthermore most existing solutions produce relatively low frame rates of approximately 30 fps at resolutions that are typically lower than 640 × 480 pixels. They achieve higher frame rates by sacrificing on accuracy, resolution or disparity search range. This paper explores a hardware based dynamic programming solution that achieves high frame rate performance with no compromise on accuracy and limited compromise on disparity search range or resolution. The solution provides good scaling characteristics relative to SAD and phase based correlation approaches. The rest of this paper is structured as follows. In Section 2 we describe the structure inherent in a dynamic-programming cost matrix and how this structure can be exploited to design efficient hardware. In Section 3 we describe a hardware implementation of a dynamic programming algorithm for stereo disparity estimation, showing how the cost matrix structure leads to a fast yet efficient hardware design. Finally, in Section 4, we give performance results for the stereo hardware implementation. Further technical details of the implementation can be found in ,  and .
نتیجه گیری انگلیسی
This paper describes useful structure in the cost matrix of a large class of dynamic programming problems, namely that if any anti-diagonal in the cost matrix depends only on the preceding ones, then it is possible to compute cost matrix entries in parallel, in the time it takes the data to arrive. This allows for an efficient, pipelined architecture to be used in hardware implementations. An example design that implements stereo disparity estimation using dynamic programming is described. The interleaved and pipelined implementation runs at up to 131.14 fps for 640 × 480 images using a disparity range of D = 64 pixels, and in simulations runs at 123.85 fps for D = 128. For smaller images it is possible to perform disparity computations at frames rates well in excess of 200 fps. On a larger device, the processing rate can be doubled by instantiating two copies of the design in parallel with each other. The system has an accuracy essentially identical to the reference software implementation, and it fits on a moderately large FPGA device.