دانلود مقاله ISI انگلیسی شماره 25881
ترجمه فارسی عنوان مقاله

رگرسیون بردار پشتیبانی تطبیقی ​​بر اساس دنباله ای جدید از چندجمله ای متعامد متحد

عنوان انگلیسی
An adaptive support vector regression based on a new sequence of unified orthogonal polynomials
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
25881 2013 15 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Pattern Recognition, Volume 46, Issue 3, March 2013, Pages 899–913

ترجمه کلمات کلیدی
- چندجمله ای های چبیشف - عملکرد هسته - اقدامات سازگار - نمونه های کوچک - توانایی تعمیم
کلمات کلیدی انگلیسی
Chebyshev polynomials,Kernel function,Adaptable measures,Small sample,Generalization ability
پیش نمایش مقاله
پیش نمایش مقاله  رگرسیون بردار پشتیبانی تطبیقی ​​بر اساس دنباله ای جدید از چندجمله ای متعامد متحد

چکیده انگلیسی

In practical engineering, small-scale data sets are usually sparse and contaminated by noise. In this paper, we propose a new sequence of orthogonal polynomials varying with their coefficient, unified Chebyshev polynomials (UCP), which has two important properties, namely, orthogonality and adaptivity. Based on these new polynomials, a new kernel function, the unified Chebyshev kernel (UCK), is constructed, which has been proven to be a valid SVM kernel. To find the optimal polynomial coefficient and the optimal kernel, we propose an adaptive algorithm based on the evaluation criterion for adaptive ability of UCK. To evaluate the performance of the new method, we applied it to learning some benchmark data sets for regression, and compared it with other three algorithms. The experiment results show that the proposed adaptive algorithm has excellent generalization performance and prediction accuracy, and does not cost more time compared with other SVMs. Therefore, this method is suitable for practical engineering application.

مقدمه انگلیسی

Support vector regression (SVR), a support vector machine (SVM) for regression, has been widely applied to the fields of machinery fault diagnostic technique, dynamics environmental forecasting, and earthquake prediction. Based on VC dimension and structural risk minimization [1], it can solve some practical problems such as sparsity, nonlinearity, high dimension, etc. However, in practical applications, the training data sets in some important fields, such as telemetry data of rockets and missiles and data of human experimentation of vaccine, are sparse, of small size and contaminated by noise. This may decrease the generalization ability and the prediction accuracy of the algorithm. One way to solve this problem is to improve algorithm structure. Recently, several new SVM learning algorithms have been proposed for more powerful generalization ability. Least squares support vector machine (LSSVM) [2] and [3], for example, curtails the training time obviously due to the availability of equality instead of inequality constraints during the modeling process [4]. However, it cannot solve the sparsity problems and may increase the predictive time due to the large number of support vectors [5]. Other algorithms, such as proximal support vector machine (PSVM) [6], fuzzy support vector machine (FSVM) [7], possibility support vector machine (PSVM) [8] and semi-supervised support vector machine [9], all have certain shortcomings, which limit their application in practical engineering projects. Another idea is to find an excellent kernel function or hybrid kernel function for the algorithm, and optimize it so as to get higher generalization ability. The most common kernel functions include Gauss kernel K(x,z)=exp(−∥x−z∥2/2σ2)K(x,z)=exp(−∥x−z∥2/2σ2)[1], polynomial kernel K(x,z)=((〈x,z〉+1)/β)nK(x,z)=((〈x,z〉+1)/β)n[1], sigmoid K(x,z)=tanh(a〈x,z〉+r)K(x,z)=tanh(a〈x,z〉+r)[1], wavelet kernel View the MathML sourceK(x,z)=∏j=1m(cos(1.75(xj−zj)/a)exp(−(∥xj−zj∥2)/2a2))[10], where σσ, ββ, a and r are the kernel parameters. These kernel yield an inner product of two given vectors in a high dimensional feature space where all input data can be linearly separated, without the need of an appropriate transformation function Φ(•)Φ(•) [1]. However, the adaptive ability of any single kernel function is limited, resulting in good prediction accuracy on some data sets while poor accuracy on others. In addition, with the increase of the number of kernel parameters in the hybrid kernel function, the parameter optimization and weight selection requires a high-efficiency mechanism, for instance, boosting algorithm [11] and [12], semi-definite programming (SDP) [13], quadratically constrained quadratic program (QCQP) [14], semi-infinite linear program (SILP) [15] and [16], hyperkernels [17], simple multiple kernel learning (SimpleMKL) method [18] and the kernel target alignment [19], [20] and [21]. Actually, the task is quite complex and may reduce the efficiency of the algorithm. In this paper, we propose a kernel function based on a sequence of orthogonal polynomials varying with their coefficient, and design an adaptive algorithm for optimizing the parameters of the kernel function. Our aim is to improve the generalization ability of the algorithm and avoiding the complicated parameter optimization and weight selection in the learning process as well. The issue we are concerned about first is to find a new sequence of orthogonal polynomials. As is known to all, Chebyshev polynomials, a sequence of orthogonal polynomials, have been commonly applied in many fields [22] and [23], and have attracted wide attention in the field of machine learning due to their orthogonality. Ye et al. based on the first-kind Chebyshev polynomials, constructed an orthogonal Chebyshev kernel [22]. Sedat et al. further proposed a generalized Chebyshev kernel [23] and [24] which extended the orthogonal Chebyshev kernel from single variable input to vector input, and modified Chebyshev kernel [25] which replaced the weighting function with an exponential function to better capture the nonlinearity along the decision surface. Compared with the traditional kernels, these kernels endue the algorithm with better generalization ability and prediction accuracy, and enable it to create less support vectors. However, the key problem in small sample regression is essentially how the kernel function and machine learning algorithm correctly determine the distribution characteristics of the small sample data sets. If the kernels derived from only the first-kind Chebyshev polynomial kernel are used, the limited adaptive ability of the kernels may result in poor prediction accuracy on some data sets. Considering this, we propose a new sequence of unified polynomials based on both the first- and second-kind Chebyshev polynomials, the unified Chebyshev polynomials (UCP), which may vary with their coefficient. Once the polynomial coefficient changes, the mathematical formulation of the polynomials will change correspondingly. By means of this property, the kernel function constructed by UCP can be applied to different kinds of data sets. The next issue to consider is how to change the polynomial coefficient so as to apply the kernel to different data sets and obtain higher generalization performance. To realize this, an evaluation criterion for adaptive capability of the proposed kernel and an adaptive algorithm based on Riemannian manifold are proposed. The remaining part of this paper is organized as follows. Section 2 gives a brief description of Chebyshev polynomials kernels and the proposed unified polynomials and presents the UCP kernel and proves the validity of it. In Section 3, we propose two adaptive measures and their calculation formula, and construct the evaluation criterion for the adaptive capability of the kernel. Based on the proposed kernel and the constructed evaluation criterion, an adaptive algorithm is designed. In Section 4, the role of the polynomial coefficient in the proposed kernel is evaluated. Then the proposed kernel and the adaptive algorithm are tested on the benchmark data sets. The results of the experiments are discussed in Section 5, while conclusion and possible future work are reported in Section 6. The detailed proofs of the orthogonality of the proposed polynomials and some proposed theorems are provided in the Appendix.

نتیجه گیری انگلیسی

In this paper, we propose a new sequence of polynomials which is constructed by unifying the two kinds of Chebyshev polynomials, i.e., unified Chebyshev polynomials (UCP). The proposed polynomials have two notable properties, orthogonality and adaptivity. Orthogonality guarantees the minimum attribute redundancy in feature space [22], [23], [24] and [25], allows the kernel to represent the sample space concisely, and reduces the complexity of the kernel matrix and the computation. While, adaptivity means that the mathematics formulations of the polynomials will change along with the change of the polynomial coefficient a. We also propose a new kernel, the unified Chebyshev kernel (UCK), based on the new polynomials. Due to the adaptivity of UCP, the formalization of UCK can be changed according to different data sets for better generalization ability of the algorithm. For finding the optimal coefficient a of UCK, the evaluation criterion for the adaptive ability of the kernel and the adaptive algorithm based on UCK (ASVM) are proposed based on Riemannian metric. Moreover, we have proved that the width measure in the adaptive measures can change the between-class distance and the inclination measure can change the in-class distance. The results of the experiments demonstrate that UCK is an excellent kernel and ASVM has good generalization performance. The proposed kernel and algorithm can improve the performance of the regression model. The greatest advantage of the algorithm is that the number of the parameters which can adapt the kernel to the sample space is smaller than that of other methods mentioned in Section 1, because of which, the proposed method only needs little computational cost. We have detected a few unusual points in our experiments, which suggests a possibility that the tube bandwidth εε may undermine the effect of the adaptive measures on the generalization ability of ASVM. In our future work, we will further study the relationship between the tube bandwidth εε and the adaptive measures.