رگرسیون بردار پشتیبانی لاگرانژی از طریق به حداقل رساندن محدب نامحدود
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
26140 | 2014 | 13 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Neural Networks, Volume 51, March 2014, Pages 67–79
چکیده انگلیسی
In this paper, a simple reformulation of the Lagrangian dual of the 2-norm support vector regression (SVR) is proposed as an unconstrained minimization problem. This formulation has the advantage that its objective function is strongly convex and further having only mm variables, where mm is the number of input data points. The proposed unconstrained Lagrangian SVR (ULSVR) is solvable by computing the zeros of its gradient. However, since its objective function contains the non-smooth ‘plus’ function, two approaches are followed to solve the proposed optimization problem: (i) by introducing a smooth approximation, generate a slightly modified unconstrained minimization problem and solve it; (ii) solve the problem directly by applying generalized derivative. Computational results obtained on a number of synthetic and real-world benchmark datasets showing similar generalization performance with much faster learning speed in accordance with the conventional SVR and training time very close to least squares SVR clearly indicate the superiority of ULSVR solved by smooth and generalized derivative approaches.
مقدمه انگلیسی
Support Vector Machines (SVMs) are state-of-the-art machine learning techniques based on Vapnik and Chervonenkis structural risk minimization principle (Vapnik, 2000). They are computationally powerful tools showing better generalization ability compared to other machine learning methods on a wide variety of real-world problems like face detection (Osuna, Freund, & Girosi, 1997), gene prediction (Guyon, Weston, Barnhill, & Vapnik, 2002), text categorization (Joachims, Ndellec, & Rouveriol, 1998) and image segmentation (Chen & Wang, 2005). It is well known that SVM formulation (Cristianini and Shawe-Taylor, 2000 and Vapnik, 2000) can be posed as a quadratic programming problem (QPP) with linear inequality constraints which is a convex programming problem having a unique optimal solution. Although SVM has been initially proposed for classification, later it has been extended to regression (Vapnik, 2000). The function approximation by regression is of great importance in many fields of research such as bioinformatics, control theory, economics, information science and signal processing. The main challenge in developing a useful regression model is to capture accurately the underlying functional relationship between the given inputs and their output values. Once the resulting model is obtained it can be used as a tool for analysis, simulation and prediction. For a given set of data points, support vector regression (SVR) aims at determining a linear regressor where the input data are either considered in the original pattern space itself or taken into a higher dimensional feature space. With the introduction of εε-insensitive error loss function proposed by Vapnik (2000), SVR has emerged as a powerful paradigm of choice for regression because of its improved generalization performance in comparison to other popular machine learning methods like artificial neural networks. Considering the minimization of the square of 2-norm of the slack variables instead of the usual 1-norm and maximizing the margin with respect to both the orientation and the relative location to the origin of the bounding parallel planes, Mangasarian and Musicant, 2001a and Mangasarian and Musicant, 2001b studied “equivalent” SVM formulations for classification problems resulting in positive-definite dual problems having only non-negative constraints. For the formulation of SVM as an unconstrained optimization problem whose objective function becomes not twice differentiable, a smoothing technique is adopted to derive a new SVM formulation called Smooth SVM (SSVM) in Lee and Mangasarian (2001). As its extension to εε-insensitive error loss based SVR, termed Smooth SVR (SSVR), we refer the reader to Lee, Hsieh, and Huang (2005). Also, on a finite Newton method of solution for SVM for classification with generalized Hessian approach, see Fung and Mangasarian (2003) and further on its extension to SVR, the interested reader is referred to Balasundaram and Kapil (2011). In addition, for the extension of the Active set SVM (ASVM) proposed for the classification (Mangasarian & Musicant, 2001b) to SVR, we refer Musicant and Feinberg (2004). For a review on optimization techniques used for training SVMs, see Shawe-Taylor and Sun (2011). Finally on the interesting work on multitask learning and multiview learning methods as extension of kernel based methods, we refer Ji and Sun (2013), Sun (2011) and Xie and Sun (2012). It is well-known that the conventional εε-insensitive SVR is formulated as a convex quadratic minimization problem having 2m2m nonnegative variables and 2m2m linear inequality constraints whereas 2-norm SVR introduces only 2m2m linear inequality constraints, where mm is the number of training points. In either case, more variables and constraints enlarge the problem size and therefore increase in the computational cost for solving the regression problem. Recently, a new nonparallel plane regressor termed as twin SVR (TSVR) is proposed in Peng (2010a) wherein a pair of QPPs, each having mm number of constraints, is solved instead of solving a single QPP having 2m2m constraints as in the conventional SVR which makes TSVR works faster than SVR. In our approach, the 2-norm SVR in dual is reformulated as a single, unconstrained minimization problem having only mm variables. We term such reformulation as unconstrained Lagrangian SVR (ULSVR). The proposed ULSVR model can be solved using gradient based methods. However, since the objective function contains a term having non-smooth ‘plus’ function, two approaches are taken to solve the minimization problem: (i) two smooth approximation functions, studied in Lee and Mangasarian (2001) and Peng (2010b), are introduced and their slightly modified unconstrained minimization problems are solved; (ii) directly solve the minimization problem by applying generalized derivative. In both the approaches, by applying simple iterative methods, the zeros of the gradient are computed. Finally, for the study of Lagrangian and implicit Lagrangian SVR solved using block matrices with the advantage of reduced training cost, see the work of Balasundaram and Kapil, 2010 and Balasundaram and Kapil, 2011. The effectiveness of the proposed approach in formulating ULSVR problem and solving it using various iterative methods is demonstrated by performing numerical experiments on a number of synthetic and real-world benchmark datasets and comparing their results with conventional SVR and least squares SVR. Comparable generalization performance in solving ULSVR with less computational training time clearly indicates its suitability on problems of interest. The paper is organized as follows. In Section 2, formulations of the conventional and 2-norm SVR are introduced. Section 3 briefly dwells on least squares SVR. The proposed ULSVR formulation having only mm variables and various algorithms used for solving it are described in Section 4. On a number of synthetic and real-world datasets numerical experiments are performed and the results obtained by ULSVR are compared with SVR and least squares SVR in Section 5. Finally we conclude our work in Section 6. In this work, all vectors are considered as column vectors. For any two vectors x,yx,y in ℜnℜn, their inner product will be denoted by xtyxty where xtxt is the transpose of xx. When xx is orthogonal to yy we write x⊥yx⊥y. The 2-norm of a vector xx will be denoted by, ||x||||x||. For x=(x1,…,xn)t∈ℜnx=(x1,…,xn)t∈ℜn, the plus function x+x+ is defined as: (x+)i=max{0,xi}(x+)i=max{0,xi}, where i=1,…,ni=1,…,n. Further we define the step function x∗x∗ as: (x∗)i=1(x∗)i=1 for xi>0xi>0, (x∗)i=0(x∗)i=0 if xi<0xi<0 and (x∗)i=0.5(x∗)i=0.5 when xi=0xi=0. The identity matrix of appropriate size is denoted by II and the diagonal matrix of order nn whose diagonal elements become the components of the vector x∈ℜnx∈ℜn by diag(x) . The column vector of ones of dimension mm is denoted by ee. If ff is a real valued function of the variable x=(x1,…,xn)t∈ℜnx=(x1,…,xn)t∈ℜn then its gradient vector and Hessian matrix are defined by, View the MathML source∇f=(∂f∂x1,…,∂f∂xn)t and ∇2f=(∂2f/∂xi∂xj)i,j=1,…,n∇2f=(∂2f/∂xi∂xj)i,j=1,…,n respectively.
نتیجه گیری انگلیسی
By a simple reformulation of the Lagrangian dual of SVR, a novel SVR formulation is proposed in this work as an unconstrained minimization problem with the key advantage being the number of unknown variables is equal to the number of input data points. It is further proposed to solve the unconstrained minimization problem using iterative algorithms derived by considering smoothing and generalized derivative approaches. Unlike solving a quadratic programming problem of SVR, all the iterative algorithms lead to solving a system of linear equations. They can be easily coded in MATLAB. Experiments were performed on a number of synthetic and real-world datasets using linear and Gaussian kernels. Comparable generalization performance is achieved and often they are better than those of the conventional SVR and least squares SVR. Further, it has been shown empherically that the iterative algorithms are much faster than the conventional SVR and cost nearly the same as of least squares SVR. In summary, very simple MATLAB coding, comparable accuracy and much faster learning speed indicate the superiority of our novel formulation solved by the iterative algorithms.