انتخاب مدلی برای حداقل مربعات رگرسیون بردار پشتیبانی بر اساس استراتژی جهان کوچک
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
25451 | 2011 | 11 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 38, Issue 4, April 2011, Pages 3227–3237
چکیده انگلیسی
Model selection plays a key role in the application of support vector machine (SVM). In this paper, a method of model selection based on the small-world strategy is proposed for least squares support vector regression (LS-SVR). In this method, the model selection is treated as a single-objective global optimization problem in which generalization performance measure performs as fitness function. To get better optimization performance, the main idea of depending more heavily on dense local connections in small-world phenomenon is considered, and a new small-world optimization algorithm based on tabu search, called the tabu-based small-world optimization (TSWO), is proposed by employing tabu search to construct local search operator. Therefore, the hyper-parameters with best generalization performance can be chosen as the global optimum based on the powerful search ability of TSWO. Experiments on six complex multimodal functions are conducted, demonstrating that TSWO performs better in avoiding premature of the population in comparison with the genetic algorithm (GA) and particle swarm optimization (PSO). Moreover, the effectiveness of leave-one-out bound of LS-SVM on regression problems is tested on noisy sinc function and benchmark data sets, and the numerical results show that the model selection using TSWO can almost obtain smaller generalization errors than using GA and PSO with three generalization performance measures adopted.
مقدمه انگلیسی
As an important branch of machine learning, least squares support vector machine (LS-SVM), introduced by Suykens and Vandewalle (1999), has been a promising tool in the fields of character recognition, signal processing and stock action prediction, etc. (Baylar et al., 2009, Hanbay, 2009, Kumar and Gopal, 2009 and Shin et al., 2005) in the last decade. Model selection is a key issue in LS-SVM applications. There are two hyper-parameters, i.e. regularization parameter and kernel parameter, in LS-SVM model. In general, the cross validation (CV) errors are widely used as the generalization performance measure to control the selection of hyper-parameters of regression model (Duan, Keerthi, & Poo, 2003). Two typical approaches of this method are 5-fold CV and leave-one-out (LOO) methods. Though these methods have advantages such as simplicity, reliability, etc., they are computationally expensive. Chang and Lin (2005) derived theoretically various LOO bounds for ε − SVR which is introduced by Vapnik (1995) and discussed continuity and differentiability of LOO bounds, e.g. Radius-Margin Bound and Span Bound, using L2 − SVR. These bounds are easy to compute and these two properties supply a theoretical foundation to select optimal parameters using the gradient-based optimization methods. But the model selection for LS-SVM, especially for LS-SVM regression (LS-SVR) problem, has not been studied widely. One of the reasons is replacing the quadratic programming with equality constraints to solving a set of linear equations in LS-SVM. As a result, the existing analytical bounds of classical support vector machines (SVMs) cannot be adopted directly for LS-SVMs. Guo, Yang, Wu, Wang, and Liang (2008) utilized 5-fold CV to carry out model selection of LS-SVM classifications, but high computational cost is unavoidable. Cawley and Talbot (2007) used matrix manipulation to derive a closed form of LOO bound and introduced skillfully the Bayesian method to solve the potential over-fitting problem produced by LOO method. This bound is just a byproduct of single training procedure. Therefore, LOO bound is a simple and mathematically tractable criterion with very low computational expense. However, the effectiveness of this method is tested only for the classification problems and no recommendations are made in the literature on how the LOO bound can be applied to LS-SVM regression. Because the task is to pursue the best generalization performance, model selection can be essentially considered as a single-objective global optimization problem where the generalization performance measure performs as fitness function. In practice, the widely used strategy is grid search over the whole parameter space. It is unavoidably time consuming since this method requires retraining the model many times under different parameter settings. Another strategy is to minimize the theoretical error bounds using iterative gradient descent techniques. This strategy needs the bound functions are differentiable. Although less computational burden is needed, the choice of initial values will heavily affect the final results. The intelligence algorithms represented by genetic algorithm (GA) (Michalewicz, 1996) and particle swarm optimization (PSO) (Kennedy & Eberhard, 1995) can find the global optima effectively, therefore they have been successfully applied to model selection (Avci, 2009 and Guo et al., 2008). However, these heuristic algorithms usually fall into premature of the population when solving complex optimization problems and thus obtain results with low precision. Therefore, the performance of model selection could be improved after the deficiency discussed above has been corrected. As a new optimization method based on the model of complex small-world network, the small-world optimization algorithm (SWA) (Du, Shao, & Feldman, 2006) can find the global optima more effectively. However, the traditional small-world optimization algorithms, proposed in Du, Wu, and Zhuang (2006) and Wang, Yeung, and Lochovsky (2008), adopt binary encoding strategy. In Du and Shao et al. (2006), the decimal-coding SWA needs to be converted from decimal to binary every single time the cost function is calculated. The decimal encoding strategy can increase precision as there will be no loss in resolution during coding, and the problem space can be searched more thoroughly and rigorously. Michalewicz (1996) proved that the performance of decimal encoding strategy is better than binary encoding no matter from the perspective of convergence speed or result’s accuracy. Another advantage of decimal encoding is the gradient information could be utilized directly. In this paper, a new decimal-coding small-world optimization algorithm based on tabu search, called the tabu-based small-world optimization (TSWO), is proposed by employing tabu search to construct local search operator. Compared with GA and PSO, TSWO has better performance of global optimization and further provide the possibility of adopting gradient information which has been computed in Chang and Lin (2005) for ε − SVR and ( Cawley, 2006) for LS-SVR, respectively. Therefore, TSWO is better suited to model selection. This paper is organized as follows. In Section 2, we provide a brief review to traditional SWA and tabu search algorithm. In Section 3, the idea and design of TSWO are elaborated in detail. Adopting LOO bound proposed in Cawley (2006) as the fitness function of TSWO, a new model selection method based on TSWO for LS-SVR is presented in Section 4. Experimental results on multimodal function and benchmark regression data sets are then presented in Section 5, followed by a conclusion of the paper in last section.
نتیجه گیری انگلیسی
In this paper, a new model selection method based on small-world strategy is proposed for LS-SVM regression. As typical selection criteria exhibit multiple local optima (e.g., see Fig. 5 and Friedrichs & Igel, 2005), tabu-based small-world optimization, whose global performance of optimization has been tested on typical multimodal functions in Section 5.1, is generally better suited for LS-SVR model selection than common evolutionary algorithms. Adopting Eq. (5) which is verified as an efficient performance measure in Section 5.2, LS-SVR model selection can be well performed with small generalization errors. TSWO plays an important role in the model selection for LS-SVR. Metaphorically speaking, the idea of TSWO is very similar to the common modal of pursuing criminals in social life. At initial stage, the pursuing starts from several cities (multiple initial candidates) according to clues. Meanwhile, polices perhaps move to another city to continue pursuing (random long connection). If a suspect cannot be determined, then postpone (tabu) it and find others. The range to pursue should be reduced (decrease l) gradually until the criminal is caught or the cost of pursuing is too high to bear (stopping criteria). From this interested metaphor, TSWO emphasizes the local search ability to find the global optimal solution just like SWA presented in Du and Wu et al. (2006). Meanwhile, the main differences between genetic algorithm and small-world optimization have been discussed in Du and Shao et al. (2006) in detail. A future study is to introduce the gradient information to speed up the optimization process. Moreover, the multi-objective optimization based on small-world phenomenon is also interesting.