رگرسیون بردار پشتیبانی کوتاه شده مقاوم
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|25264||2010||8 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 37, Issue 7, July 2010, Pages 5126–5133
In this paper, we utilize two ε-insensitive loss functions to construct a non-convex loss function. Based on this non-convex loss function, a robust truncated support vector regression (TSVR) is proposed. In order to solve the TSVR, the concave–convex procedure is used to circumvent this problem though transforming the non-convex problem to a sequence of convex ones. The TSVR owns better robustness to outliers than the classical support vector regression, which makes the TSVR gain advantages in the generalization ability and the number of support vector. Finally, the experiments on the synthetic and real-world benchmark data sets further confirm the effectiveness of our proposed TSVR.
The support vector machine (SVM) (Burges, 1998, Cristianini and Shawe-Taylor, 2000, Schölkopf and Smola, 2002 and Vapnik, 1995), which is based on statistical theory and structural risk minimization principle, proposed by Vapnik and his group enjoys successful real-life applications such as classification and regression. As a state-of-the-art technique, SVM has better generalization ability compared with other machine learning methods like artificial neural network (ANN). However, for SVM, the computational complexity is expensive in the training stage, which needs to solve a quadratic programming problem, and its solution only depends on a small subset of the training samples, viz. the set of support vectors. For classification problem, SVM has already obtained good results, and in the regression domain, there exists a corresponding machine called support vector regression (SVR), whose training computational cost is as expensive as SVM’s, i.e., O(N3), where N is the ensemble number of the training samples. To circumvent their problems, so far, many convex optimization algorithms, such as Chunking ( Osuna, Freund, & Girosi, 1999), SMO ( Platt, 1999 and Shevade et al., 2000), SVMlight ( Joachims, 1999), SVMTorch ( Collobert & Bengio, 2001), and software LIBSVM ( Chang & Lin, 2001), have been proposed to accelerate the training speed by optimizing a small subset of the variables in the dual during the iteration procedure. Generally, most people study the SVM from a geometrical viewpoint, i.e., SVM is a large margin based classifier for binary classification. It seeks a linear hyperplane to separate the training samples as largely as possible in the high dimensional feature space. Thereby, the large margin helps SVM improve its generalization ability. This view is familiar to us. However, there exists another viewpoint to explain SVM, that is, it fits the regularization learning framework of loss + penalty with the hinge loss ( Evgeniou et al., 2000 and Wahba, 1999). In the regularization learning framework, the loss function is utilized to minimize the empirical error so that the resultant model can fit the training samples well. The penalty term is used to regularize the resulting model so as to escape from overfitting. Actually, SVR can be also interpreted from the regularization learning framework. For SVR, the penalty term, which is used to control the model complexity, is the same as that of SVM, while SVR selects the ε-insensitive function to construct the loss term as a surrogate of the hinge loss for SVM. If we distinguish between SVM and SVR only from this point-of-view namely loss + penalty, then we can understand that the loss terms lead to their differences. Practically, apart from the hinge loss in SVM and the ε-insensitive loss in SVR, the loss functions play a significant part in supervised learning, which construct different optimization objectives so as to discriminate different learning algorithms, for example, log loss for logistic regression, exponential loss for Boosting, least squares loss for ridge regression, and so on. Overall these loss functions are all convex which own the application and theory advantages over non-convex ones, because they can be easily used, efficiently computed, and conveniently analysed. However, non-convex loss functions have superiority to convex ones in generalization performance, robustness and scalability, and they have attracted much attention in recent years. Shen, Tseng, Zhang, and Wong (2003) firstly proposed ψ-learning and indicated that non-convex loss functions yield fast convergence rates to the Bayes limit, but Steinwart and Scovel (2004) revealed that SVMs can achieve comparable convergence rates with the hinge loss under a lot of similar assumptions. Subsequently, Liu et al., 2005 and Liu and Shen, 2006 profoundly studied the use of ψ loss, and Collobert, Sinz, Weston, and Bottou (2006) shown that the non-convexity can provide scalability advantages over convexity. Recently, Wu and Liu (2007) proposed a robust truncated hinge loss SVM, which is demonstrated to be more robust to outliers and to deliver more accuracy classifiers. However, these studies only focus on the classification domain, whereas there are few reports on regression domain. Hence, inspired by these studies, in our paper, a robust truncated support vector regression (TSVR) is proposed, which is based on a truncated ε-insensitive loss function. The TSVR cannot only enhance prediction accuracy but also curtail the number of support vectors to shorten prediction time in a way through discarding the outliers existing in the training set. Because the sophisticated convex optimization algorithms are not used to deal with the non-convex optimization problem of the TSVR, thus the concave–convex procedure (CCCP) ( Yuille & Rangarajan, 2003) is employed to transform the non-convex optimization to a sequence of convex ones, which can be solved using the sophisticated convex optimization techniques. Finally, experiments on the synthetic and real-world benchmark data sets confirm the effectiveness, namely robustness to outliers, of our proposed TSVR. In this section, we will introduce the organization of our paper. In Section 2, the classical SVR is introduced. In the following, we construct a non-convex loss function. Based on this non-convex loss function, the truncated SVR is proposed in Section 4. Section 5 gives the convergence and computational complexity of the proposed TSVR. To confirm the effectiveness, we utilize the synthetic and real-world benchmark data sets to test the TSVR in Section 6. Finally, the conclusions follow.
نتیجه گیری انگلیسی
In our paper, a robust TSVR with two walls, namely the inner and the outer wall, is proposed. When we set the outer wall infinitely, the TSVR become a classical SVR. In other words, the classical SVR is only a special case of the TSVR. Because the TSVR is based on the non-convex loss function, the sophisticated convex optimization techniques are not utilized. Hence, the concave–convex procedure is employed to transform the non-convex optimization to a sequence of convex ones, so the convex optimization technique can be used. To confirm the effectiveness of the TSVR, two synthetic data sets are used to test the TSVR. Through comparing the TSVR with the classical SVR, we find that TSVR owns better robustness to outliers than the classical SVR. That is to say, the proposed TSVR has advantages over the classical SVR in the generalization performance and the number of support vector. The experiments on the real-world benchmark data sets are further favorable for our viewpoint. Our experiments are performed on the small data sets due to using the traditional quadratic programming technique, namely the active set method. It is our future work that how the efficient methods like SMO or other methods are used for the TSVR on large data sets.