ازدحام ذرات بهینه سازی مدل خطی رگرسیون چند متغیره برای طبقه بندی داده ها
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|24735||2009||7 صفحه PDF||سفارش دهید||4874 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Applied Soft Computing, Volume 9, Issue 2, March 2009, Pages 470–476
This paper presents a new data classification method based on particle swarm optimization (PSO) techniques. The paper discusses the building of a classifier model based on multiple regression linear approach. The coefficients of multiple regression linear models (MRLMs) are estimated using least square estimation technique and PSO techniques for percentage of correct classification performance comparisons. The mathematical models are developed for many real world datasets collected from UCI machine repository. The mathematical models give the user an insight into how the attributes are interrelated to predict the class membership. The proposed approach is illustrated on many real data sets for classification purposes. The comparison results on the illustrative examples show that the PSO based approach is superior to traditional least square approach in classifying multi-class data sets.
Data classification plays a major role in any pattern recognition problem. It is a supervised learning strategy which emphasizes on building models able to assign new instances to one of a set of well-defined classes. There has been wide range of machine learning and statistical methods for solving classification problems. Many algorithms have been developed including classical methods such as linear discriminant analysis and Bayesian classifiers, statistical techniques such as MARS (multivariate adaptive regression splines), machine learning approaches for decision trees, etc. including C4.5, CART, C5, bayes trees and neural network approaches such as multiplier perceptron and neural trees , , , , , ,  and . Approaches like fuzzy logic, support vector machine (SVM), tolerant rough sets, principal component analysis (PCA), linear programming also have been very popular for data classification problems . Some of the classification techniques mentioned above work well when the classes are linearly separable. However, in many real world problems the data may not be linearly separable and also data are very closely spaced and therefore a highly nonlinear decision boundary is required to separate the data. Techniques like neural network (NN), SVM, Fuzzy logic are very useful approaches for such cases. However, in many cases it is desired to find a simple classifier which gives the user a rough, but understandable insight into how the data attributes relates to class memberships. This objective can be achieved if it is possible to learn relationship hidden in data and express them in mathematical manner. There have been some attempts to solve classification problems using mathematical programming of linear discriminant analysis . Very recently genetic programming (GP)  has been used for developing a mathematical model automatically for classifying multi-class problems , ,  and . GP is an effective approach in discovering the underlying relationship among data and express the relationship among attributes in an understandable manner for classification problem. But the resultant mathematical models obtained using  and  require many arithmetic operations while predicting the class for data sets having many features and many classes. The objective of this paper is to present an effective mathematical model based on linear regression for multi-class data classification problem. We discuss our approach in developing multiple regression linear models (MRLMs) for different real data sets. The coefficients associated with the MRLM are estimated separately by least square estimation (LSE)  and  method and the classification accuracies are determined for all datasets separately. An evolutionary approach called particle swarm optimization (PSO)  is then used to estimate the coefficients of MRLM for each dataset and the classification accuracies are computed. The comparisons are made for above two approaches. It is shown that PSO approach outperforms the LSE approach in terms of giving better classification accuracy. Finally mathematical models are presented as illustrations for few datasets to show the interrelationship existing among attributes of respective datasets. The rest of the paper is organized as follows. In Section 2 the MRLM is briefly discussed with LSE technique. Section 3 describes the basics of PSO. Data set description and simulation results are given in Section 4. Section 5 gives conclusion and some direction for future research.
نتیجه گیری انگلیسی
We have proposed an effective PSO based multiple regression linear classifier design model for different real data sets in this paper. The coefficients of regression model are estimated using LSE and particle swarm optimization techniques separately for each dataset. It is found that the regression classifiers using PSO based optimization performs better compared to least square estimated approach. The performance comparisons on many real data sets are presented. The outcome of our approach is a simple linear mathematical model based on PSO approach which outperforms standard statistical technique like LSE in terms of percentage of correct classification. The resultant mathematical models give a rough insight into the interrelationship among the attributes of dataset. Our future work will focus on exploring different PSO variants for estimating the coefficients of regression model and comparing the results of PSO-MRLM with other classification methods. Also we will investigate the possibility of developing nonlinear mathematical models using the PSO approach.