آموزش وزن تطبیقی برای مشکلات رگرسیون خطی از طریق اختلاف کولبک-Leibler
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
24639 | 2014 | 11 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Pattern Recognition, Volume 46, Issue 4, April 2013, Pages 1209–1219
چکیده انگلیسی
In this paper, we propose adaptive weighted learning for linear regression problems via the Kullback–Leibler (KL) divergence. The alternative optimization method is used to solve the proposed model. Meanwhile, we theoretically demonstrate that the solution of the optimization algorithm converges to a stationary point of the model. In addition, we also fuse global linear regression and class-oriented linear regression and discuss the problem of parameter selection. Experimental results on face and handwritten numerical character databases show that the proposed method is effective for image classification, particularly for the case that the samples in the training and testing set have different characteristics.
مقدمه انگلیسی
Image representation and classification have gained wide applications in many areas such as pattern recognition, computer vision and machine learning [1], [2] and [3] during the past many years. One of the important characteristics of images is that the dimensions of images are usually high. If the images are represented in the vector form, they can be considered as the points in a very high dimensional space. However, high dimensional data suffers from the curse of dimensionality, which has a considerable impact on various classification techniques. In order to alleviate this problem, feature extraction techniques [4], [5] and [6] have been proposed to obtain the effective and meaningful features from the original data. These methods not only reduce the computational complexity of algorithms, but also improve the generalization performance of classifiers. The feature extraction methods usually search for low-dimensional representations of the original data. Principal component analysis, linear discriminant analysis, and independent component analysis [7] are typical examples of feature extraction methods. Note that classical feature extraction methods may not be robust in dealing with contaminated data. To this end, some robust feature extraction methods [8], [9], [10], [11], [12], [13], [14] and [15] have been proposed during the past several decades. The robust feature extraction methods can produce better low-dimensional representations of the contaminated data. After the reduced features of data are achieved, one still needs to resort to the classifiers to perform the classification tasks. Note that when feature extraction regardless of robustness is used as a pre-processed step in the classification problem, a basic assumption is that the samples in the training and testing sets should have similar characteristics. Specifically, low-dimensional representations of the samples in the training set have similar characteristics with those of the samples in the testing set. However, the samples in the training and testing sets may vary. For example, in a real-time face recognition system, the images taken on-line may have a big difference with those images in the database since real-time face images may contain occlusions and are taken from different cameras under different conditions. Consequently, directly performing feature extraction on these new images may be improper and will yield undesirable results in the classification problem. Different from feature extraction approaches where the features of samples are reduced, there is an increasing interest in the approaches based on the nearest subspace classifier in recent several years [16], [17] and [18]. Generally this representing model implicitly creates virtual prototypes from the training set and a linear combination of training samples is usually utilized in real applications. By computing the distance between the testing sample and the virtual prototype produced for each class, the testing sample is assigned to be the class with the smallest distance among all classes. In [17], Wright et al. adopted this idea for probing face representations and proposed a sparse representation-based classified scheme, with the encouraging result reported. Subsequently, a novel method dealing with face misalignments and illuminant variations is proposed in [19]. In order to deal with the occlusions in face images, a sparse correntropy framework [20] for computing robust sparse representations of face images is developed, where the half-quadratic optimization technique is used to maximize the objective function. In practice, exploring the representation of the testing sample in the above methods can be regarded as linear regression problems under proper constraints. The coefficients of combinations are usually controlled by the L1 norm in order to keep sparsity. However, if the dimension of samples is much larger than the number of the samples and there are high correlations between predictors, it has been empirically observed that the prediction performance of the Lasso is dominated by ridge regression [21]. Moreover, several researchers [22] and [23] also pointed out that the sparse representation on the testing image may not be crucial as the dimension of samples increases. In order to improve classical linear regression, the weighted linear regression methods [24] and [25] have been proposed for dealing with contaminated data. They work by incorporating extra non-negative weights, associated with each data point, into the fitting criterion. However, one disadvantage of some weighted linear regression methods is the assumption that the weights are known exactly. This is almost never the case in real applications and the effect of using estimated weights is not easy to assess. It is also noted that some robust regression methods [25] can be solved by iteratively doing weighted least squares. In this paper, the weight is first regarded as the probability distribution of some random variable. Then we use the KL divergence [26] to measure the difference between the true distribution and the estimated distribution. Thus the weights can be automatically learned by using the KL divergence. As a result, applying our method not only avoids the determination of the weights in advance, but also improves the robustness of linear regression. It is also found that ridge regression is an extreme case of our method. The experimental results show that our method is particularly effective in the case that the samples in the training and testing sets have different characteristics. Overall, the main contributions of this paper include (1) We propose adaptive weighted learning for linear regression problems based on the KL divergence. (2) We solve the proposed method using the alternative optimization algorithm and prove the convergence of the algorithm. (3) We fuse global linear regression and class-oriented linear regression and give the weighted classification rule. (4) We conduct the experiments on various image data sets to give a comparative analysis on several methods and discuss the criterion for parameter selection. The rest of this paper is organized as follows: The related work is briefly reviewed in Section 2. In Section 3, we propose adaptive weighted learning for linear regression problems via the KL divergence and use the alternative optimization technique to solve the proposed method. In Section 4, we carry out the experiments on face and handwritten numerical character databases and discuss the problem of parameter selection. Conclusions and further work are given in the final section.
نتیجه گیری انگلیسی
This paper proposes adaptive weighted learning for linear regression problems via the KL divergence. In the proposed method, the weights and coefficients of combinations can be simultaneously learned. It is worthwhile to point out that the weights in our method value can improve the robustness of linear regression. It is also of interest to note that ridge regression is a special case of our method. On the face databases where the dimension of samples is typically high, we show that our method is superior to ridge regression and 1NN classifier. The effectiveness of non-sparsity on representing a testing sample by training samples is possibly caused by the high dimensionality of face images, which makes the sparseness unnecessary to achieve good performance. On the handwritten digit database where the number of training samples is often large, we show that WCOLR is more preferred than WGLR and the sparseness on representing the testing sample by training samples is useful for overcomplete training samples. Overall, the significant advantage of our method is the performance improvements on undersampled problems. Although a strategy for parameter selection has been given in Section 4.1, it is not efficient on the large data set since it is computationally demanding. Hence it is worthwhile to further explore how to obtain the effective regularization parameters in future. In order to consider the sparseness on the coefficients of combinations in the proposed method, exploring convex combinations of the training samples is another interesting problem. In addition, how to utilize k-neighbors of the testing sample to represent the testing samples and further applications of the proposed method to other object classification problems are also very interesting.