A novel algorithm for automated simultaneous exploration of datapath and Unrolling Factor (UF) during power–performance tradeoff in High Level Synthesis (HLS) using multi-dimensional particle swarm optimization (PSO) (termed as ‘M-PSO’) for control and data flow graphs (CDFGs) is presented. The major contributions of the proposed algorithm are as follows: (a) simultaneous exploration of datapath and loop UF through an integrated multi-dimensional particle encoding process using swarm intelligence; (b) an estimation model for computation of execution delay of a loop unrolled CDFG (based on a resource configuration visited) without requiring to tediously unroll the entire CDFG for the specified loop value in most cases; (c) balancing the tradeoff between power–performance metrics as well as control states and execution delay during loop unrolling; (d) sensitivity analysis of PSO parameter such as swarm size on the impact of exploration time and Quality of Results (QoR) of the proposed design space exploration (DSE) process. This analysis presented would assist the designer in pre-tuning the PSO parameters to an optimum value for achieving efficient exploration results within a quick runtime; (e) analysis of design metrics such as power, execution time and number of control steps of the global best particle found in every iteration with respect to increase/decrease in unrolling factor.
The proposed approach when tested on a variety of data flow graphs (DFGs) and CDFGs indicated an average improvement in QoR of >28% and reduction in runtime of >94% compared to recent works.
When digital systems are built, designers undergo numerous decision making steps at various levels of design abstraction (register transfer level, system/high level etc.) such as the type of architectural framework (datapath) required, exploring the best possible implementation alternative and managing hardware–software tradeoff. More formally, the above process is termed as design space exploration. Design space exploration when performed during high level synthesis becomes a non-trivial task as it involves multiple convoluted design decisions specially when simultaneously dealing with conflicting parameters such as power, area and performance. The above process becomes further intricate when an auxiliary variable called ‘loop unrolling factor’ joins the decision making process. Owing to the reasons above, architecture exploration suffers from exponential order of complexity with the increase in number of alternative solutions, thereby making it impossible to perform an exhaustive search (Coussy et al., 2009, Coussy and Morawiec, 2008, De Micheli, 1994, Gajski et al., 1992, Mohanty et al., 2008 and Zhang and Ng, 2000).
Other DSE approaches in HLS employed very recently such as genetic algorithm (GA) (Sengupta et al., 2012, Harish Ram et al., 2012, Krishnan and Katkoori, 2006, Gallagher et al., 2004 and Mandal et al., 2000) used for solving similar problem (but of lesser complexity as it did not include exploration of unrolling factor as well as tradeoff between power and execution time) have not been found suitable candidates owing to the computationally expensive runtime (growing exponentially) and lower guarantee of reaching optimal result. This has also been found after comparison with Krishnan and Katkoori, 2006 and Sengupta et al., 2012 where proposed PSO based approach produces better QoR with lesser exploration run time. Moreover, we have attained real optimal solutions (by comparing with golden solutions) for almost all cases unlike other heuristic approaches. Therefore, in order to combat the problem of exploration, a novel framework using PSO for automated parallel exploration of datapath and loop unrolling factor is utilized.
To the best of the authors’ belief, this is the first work that proposes a completely automated parallel exploration of datapath and loop unrolling factor using novel hyper-dimensional particle encoding as discussed in Section 3. So far in the related works, none has performed simultaneous exploration of UF and resource combination. Further, no approach exists in the literature that transforms PSO (Kennedy & Eberhart, 1995) for solving MO-DSE problem through an automated process for CDFGs. Besides, following are the highlights/contributions of this work:
(a)
Simultaneous exploration of datapath and loop UF through multi-dimensional PSO.
(b)
An estimation model for delay computation of a loop unrolled CDFG used in most cases.
(c)
Balancing tradeoff between power–performance as well as control states and delay.
(d)
Sensitivity analysis of swarm size and its impact on exploration time and QoR of DSE.
The rest of the paper is organized as follows: Section 2 discusses the related works. The problem formulation and proposed framework are explained in Section 3. The demonstration of the proposed algorithm is introduced in Section 4. Section 5 reports and analyzes the experimental results while Section 6 concludes the paper.
This paper introduced a novel methodology for automated parallel exploration of an optimal datapath and unrolling factor using a hyper-dimensional particle encoding mechanism. Moreover a novel model for execution time is proposed which is a function of UF. With the help of this function, the execution time can be estimated based on resource combination found without unrolling CDFGs completely in most cases. Furthermore, a novel model for power is proposed. The proposed framework maintain tradeoff between conflicting parameters such as power–performance and simultaneously resolve orthogonal issues such as improve QoR and reduce exploration run time. Additionally, in this paper, a novel analysis of power, execution time and control steps with respect to UF was proposed. Besides above, a novel algorithm for screening of UFs was also presented in this paper which pruned the design space and helped to optimize the exploration time and improve QoR.
The result of proposed approach when compared with recent works (Krishnan and Katkoori, 2006 and Sengupta et al., 2012) proved that the proposed approach achieved an average improvement in QoR of >28% and reduction in runtime of >94%.
The future work will be directed towards proposing exploration techniques for handling nested loop based CDFGs application during multi-objective tradeoff. Further inclusion of recent parameter such as temperature into the DSE framework in HLS is planned as our future research goal.