خطاهای آموزنده از رگرسیون بردار پشتیبانی برنامه ریزی خطی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|25244||2011||9 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Applied Mathematical Modelling, Volume 35, Issue 4, April 2011, Pages 1820–1828
In this paper, we give several results of learning errors for linear programming support vector regression. The corresponding theorems are proved in the reproducing kernel Hilbert space. With the covering number, the approximation property and the capacity of the reproducing kernel Hilbert space are measured. The obtained result (Theorem 2.1) shows that the learning error can be controlled by the sample error and regularization error. The mentioned sample error is summarized by the errors of learning regression function and regularizing function in the reproducing kernel Hilbert space. After estimating the generalization error of learning regression function (Theorem 2.2), the upper bound (Theorem 2.3) of the regularized learning algorithm associated with linear programming support vector regression is estimated.
The main aim of this paper is the error analysis for the linear programming support vector regression (SVR) problem in leaning theory. To this end, this paper is organized as follows. We start by giving a brief introduction of the basic techniques of support vector machine (SVM) or SVR in Section 1, including the history, motivation, most important publications and some corresponding results with this paper. Moreover, a brief review of linear programming support vector regression (LP-SVR) and quadratic programming one (QP-SVR) in the reproducing kernel Hilbert space is presented in Section 2. Learning error analysis of LP-SVR is given in Section 3. In Section 4, we plus a short conclusion and summary with our research work, also, some future works are attached. First of all, we briefly describe the historic background of support vector learning algorithm (SVLA). The SVLA is a nonlinear generalization of the generalized portrait algorithm developed in the sixties (see,  and ). As such, the SVLA is firmly grounded in the framework of statistical learning theory. In 1995, Cortes and Vapnik (see, ) proposed a learning network named support-vector network. Originally, it is a learning machine for two-group classification problems. One of the most important idea is that input vectors had been non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface was constructed. They showed that the training data (separable and non-separable) could be separated without errors. After that, this learning network became more and more popular and renamed as SVM. Today, SVM has become an important subject in learning theory and has evolved into an active area of research. Mathematically, SVM is from a pattern classification problem based on a given classification of m points (x1, x2, … , xm) in the n -dimensional space RnRn, represented by an m × n matrix A = (x1, x2, … , xm)T, given the membership of each data point xi, i = 1, 2, … , m in the classes 1 or −1 as specified by a given m × m diagonal matrix D with 1 or −1 diagonals. Primarily, SVM (see, ) is given by the following quadratical programming with linear inequalities constraints equation(1) View the MathML sourcemin(α,b)∈Rn+112‖α‖2+12‖y‖2,s.t.D(Aα+be)⩾e+y,y⩾0. Turn MathJax on Model (1) can be seen as the original model of QP-SVM. Here, α is a vector of separator coefficients (direction vector of classification hyperplane), b is an offset (the control parameter of the distance of hyperplane plane to the origin) and e∈Rme∈Rm stands for a vector of ones. The decision function of classification is given by equation(2) f(x)=sign(αTx+b).f(x)=sign(αTx+b). Turn MathJax on By now, many different forms of QP-SVM (1) have been introduced for different purposes (see ). In this work, we mainly pay our attention to the learning error or the convergence rate of the proposed algorithm. For the convergence rate of QP-SVM (1), there are many works. We refer the readers to Steinart , Zhang , Wu and Zhou , Zhao and Yin , Hong , and Wu et al. . Among of forms of SVM, the LP-SVM is important because of its linearity and flexibility for large data setting. Many authors have introduced the LP-SVM. We refer the readers to Bradley and Mangasarian , Kecman and Hadaic , Niyogi and Girosi , Pedroso and Murata  and Vapnik . Its primal optimization model is as follows equation(3) View the MathML sourcemin(λ,y)∈R2m1meTλ+1CeTy,s.t.D(AATλ+be)⩾e-y,λ⩾0,y⩾0. Turn MathJax on The trade-off factor C = C(m) > 0 depends on m and is crucial. Many experiments demonstrate that LP-SVM is efficient and perform even better than QP-SVM for some purposes: capable of solving huge sample size problems (see, ), improving the computational speed (see, ), and reducing the number of support vectors (see, ). However, little is known for the learning error or the convergence of the LP-SVM. We only find that a classification problem of LP-SVM was studied in Wu and Zhou . The primal goal of this paper is to address in the investigation of regression problem and to provide the error analysis for linear programming support vector regression (LP-SVR).
نتیجه گیری انگلیسی
In reproducing Hilbert spaces, with the covering number, a new upper bound for estimating learning errors of linear programming support vector regression has been presented in this paper. This errors bound formulation has been shown that View the MathML sourceB(m,C)=2cstRsm11+s+2tcs′C(2-β)m+1m+C(1-β)m+C-β. Turn MathJax on Here m is the number of sample points, C is the trade-off who control the regulation item in the model of LP-SVR (7), s is the polynomial complexity exponent of the given reproducing Hilbert spaces, c s and View the MathML sourcecs′ are two constants who depend on s and can be estimated by Definition 2.3 and Lemma 2.3, t > 1 is a given constant and 0 < β ⩽ 1 is also a given constant. Observing the B (m , c ), we can see that there is gap between E(fz)E(fz) and E(fρ)E(fρ) because View the MathML sourcelimm→+∞B(m,c)=2tcs′Cβ. Turn MathJax on It means that the gap can not vanish no matter how to select the sample data points. Due to the difficulty of calculating the covering number, large body of work can not be done in the field of experiments. The authors think that it will be a very challenge future work.