روش سریع برای آموزش تقریبی رگرسیون بردار پشتیبانی سخت
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
25286 | 2010 | 10 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Neural Networks, Volume 23, Issue 10, December 2010, Pages 1276–1285
چکیده انگلیسی
The hard support vector regression (HSVR) usually has a risk of suffering from overfitting due to the presence of noise. The main reason is that it does not utilize the regularization technique to set an upper bound on the Lagrange multipliers so they can be magnified infinitely. Hence, we propose a greedy stagewise based algorithm to approximately train HSVR. At each iteration, the sample which has the maximal predicted discrepancy is selected and its weight is updated only once so as to avoid being excessively magnified. Actually, this early stopping rule can implicitly control the capacity of the regression machine, which is equivalent to a regularization technique. In addition, compared with the well-known software LIBSVM2.82, our algorithm to a certain extent has advantages in both the training time and the number of support vectors. Finally, experimental results on the synthetic and real-world benchmark data sets also corroborate the efficacy of the proposed algorithm.
مقدمه انگلیسی
As a state-of-the-art tool for classification and regression, the support vector machine (SVM) (Burges, 1998, Cristianini and Shawe-Taylor, 2000, Schölkopf and Smola, 2002 and Vapnik, 1995), which is built on the structural risk minimization principle that minimizes the upper bound of the generalization error which consists of training errors and a confidence interval term which is decided by the Vapnik–Chervonekis dimension, enjoys many successful applications to real-life fields, such as optimal control (Suykens, Vandewalle, & De Moor, 2001), image segmentation (Chen & Wang, 2005), time series prediction (Lau & Wu, 2008) and so on. Generally, there exist two stages for a good machine learning theory, for example, an artificial neural network (ANN), one of which is the model or frame selection, which includes the definition of the optimization problem and the parameter selection, i.e., what to do. Another is that what method or algorithm will be taken to solve the defined model or frame, i.e., how to do. However, these two stages are always dependent and not separate. Undoubtedly, SVM cannot escape from these two stages. Though very successful and familiar, there are shortcomings in each stage for SVM. In the first stage, the selection of parameters, including the kernel parameters and the regularization parameter, is still an open problem. There is no foolproof method to choose them before training. Usually, the cross validation (CV) technique is used to cope with this embarrassment, but the computational complexity is very expensive, even unaffordable. Another block is which optimization technique is employed to solve the constrained quadratic programming for training SVM in the second stage. Although the training SVM is solvable in principle, actually, a sophisticated optimization technique like the interior point method is ineffective to settle this intractable problem, especially for a large scale problem, because the training cost, viz. O(N3)O(N3), where NN is the size of the ensemble training samples, is huge. Over the past few years, many fast algorithms have been developed to accelerate the training of SVM. Altogether, these algorithms are grouped into two categories, one of which trains SVM in the dual, which is the familiar strategy to cope with the training problem. Firstly, Osuna, Freund, and Girosi (1997) proposed the chunking algorithm to speed up the SVM training, but the large number of support vectors limits its application. Hence, Joachims (1999) proposed an efficient algorithm, namely, SV Mlight. As a special case of SV Mlight, Platt (1999) selected two variables as the working set and presented the famous sequential minimal optimization (SMO) so that the analytical solution for subproblem is easily obtained. Subsequently, the SMO algorithm was extended to the regression realm (Flake and Lawrence, 2002 and Shevade et al., 2000). Based on these works, Chang and Lin (2001) developed SMO and encoded it as the well-known software, viz. LIBSVM2.82. Usually, it is very easy to give readers the impression that this is the only possible way to train an SVM. In fact, there exists another way to train an SVM, i.e., the training procedure can be processed in the primal. Fung and Mangasarian (2003), and Mangasarian (2002) presented a finite Newton method to train a linear SVM and revealed that it is rather effective. Furthermore, some appropriate tricks were introduced by Keerthi and DeCoste (2005) to modify and accelerate the finite Newton method. Chapelle (2007) proposed a recursive finite Newton method for nonlinear SVM and shown that an SVM can be solved in the primal as efficiently as the dual methods. Recently, Wang, Jia, and Li (2008) proposed a robust method to solve SVM in the primal. Bo, Wang, and Jiao (2007) presented a recursive finite Newton method for nonlinear support vector regression (SVR), and Zhao and Sun (2008) proposed a robust SVR in the primal with a non-convex loss function. Apart from these training algorithms to accelerate SVM/SVR in the dual or primal, some algorithms (Dong et al., 2005 and Tsang et al., 2005) exist to approximately train an SVM. No matter what algorithm is used to train a SVM/SVR, the aim is to accelerate the training procedure. As we know, SVM is proposed based on a geometrical viewpoint, i.e., SVM is a large margin based classifier for binary classification. It seeks a linear hyperplane to separate the training samples as much as possible in the high dimensional feature space. We are familiar with this viewpoint. However, there exists another viewpoint to interpret SVM, that’s to say, it matches the regularization learning frame View the MathML sourceloss+penalty (Evgeniou et al., 2000, Girosi, 1998 and Poggio and Smale, 2005) with the hinge loss function. Recently, the regularization learning frame has been generalized (Li et al., 2007 and Li, Leung, and Lee, 2007). Thus, besides the kernel parameters, the regularization parameter will be chosen to build an SVM, thus definitely increasing the computational complexity. Although the hard-margin support vector machine (HSVM) does not select the regularization parameter, it easily experiences overfitting in the presence of noise. To obtain a state-of-the-art performance as in the soft-margin support vector machine, viz. the so-called support vector machine, Bo, Wang, and Jiao (2008) proposed a greedy stagewise algorithm to cope with the overfitting of HSVMs without employing the regularization term. That is, the greedy stagewise strategy based HSVM (GS-HSVM) not only obtains a comparable generalization performance to the soft-margin support vector machine but also needs less computational complexity due to the absence of the regularization term. Enlightened by their work, we extend it to regression realm and propose a fast algorithm to approximately train a hard SVR (HSVR) using the greed stagewise strategy, named as GS-HSVR. This algorithm does not need to choose the extra regularization parameter and does to suffer from the overfitting due to the existing noise, so the computational complexity is clearly reduced, especially when using the CV technique to select an appropriate regularization parameter for SVR. As a matter of fact, the experimental results show that the early stopping rule of the GS-HSVR can control the capacity of the regression machine, which equivalently acts in an implicit regularization role. Moreover, the computational complexity of GS-HSVR is O(N|SV|⋅N)O(N|SV|⋅N), where N|SV|N|SV| is the number of support vectors. Even in the worst case, that is, all the training samples are chosen as support vectors, the computational burden is only O(N2)O(N2). Compared with the well-known solver, i.e., LIBSVM2.82, our algorithm to a certain extent has advantages in the training time and the number of support vectors (#SV), which is corroborated by the experiments on the synthetic and real-world benchmark data sets. In this paragraph, we will introduce the paper’s organization. In Section 2, the classical SVR, namely soft SVR, is depicted briefly, and the hard SVR is deduced. Subsequently, we use the greedy stagewise strategy to approximately train a hard SVR and propose the GS-HSVR algorithm. Experiments on synthetic and real-life benchmark data sets confirm the effectiveness of the GS-HSVR in Section 4. Finally, conclusions follow.
نتیجه گیری انگلیسی
In general, there are two problems in using SVR. The first is model selection, i.e., how to determine the appropriate parameters for the model. The second is the choice of optimization technique to solve the SVR. Although the hard SVR can reduce a parameter, viz. the regularization parameter, to alleviate the computational complexity, it runs the risk of overfitting due to the presence of noise. To mitigate this embarrassment, we have enhanced the greedy stagewise strategy to the regression realm and proposed an algorithm, named GS-HSVR, to approximately settle the optimization problem of the hard SVR. The GS-HSVR only allows the weights of the selected samples be updated once, thus avoiding them being magnified infinitely and effectively preventing overfitting from arising. Intuitively, this early stopping rule implicitly controls the capacity of feature space, which equivalently plays a regularization role in the optimization procedure. Furthermore, it causes our algorithm only to get an approximate solution. In addition, the computational complexity of GS-HSVR is only O(N|SV|⋅N)O(N|SV|⋅N). Compared with the excellent software LIBSVM2.82, GS-HSVR to a certain extent has advantages in both the training time and the number of support vectors. Moreover, GS-HSVR does not need extra regularization parameters, which can save a lot of time, especially when using the CV technique for the parameter selection. Experiments on synthetic and real-world benchmark data sets further confirm the effectiveness of this approximate algorithm, namely GS-HSVR, for hard SVR.