سیستم پشتیبانی تصمیم گیری ترکیبی بر اساس مجموعه ای خشن و دستگاه آموزش قوی برای تشخیص بیماری هپاتیت
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
6070 | 2013 | 10 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Applied Soft Computing, Volume 13, Issue 8, August 2013, Pages 3429–3438
چکیده انگلیسی
Hepatitis is a disease which is seen at all levels of age. Hepatitis disease solely does not have a lethal effect, but the early diagnosis and treatment of hepatitis is crucial as it triggers other diseases. In this study, a new hybrid medical decision support system based on rough set (RS) and extreme learning machine (ELM) has been proposed for the diagnosis of hepatitis disease. RS-ELM consists of two stages. In the first one, redundant features have been removed from the data set through RS approach. In the second one, classification process has been implemented through ELM by using remaining features. Hepatitis data set, taken from UCI machine learning repository has been used to test the proposed hybrid model. A major part of the data set (48.3%) includes missing values. As removal of missing values from the data set leads to data loss, feature selection has been done in the first stage without deleting missing values. In the second stage, the classification process has been performed through ELM after the removal of missing values from sub-featured data sets that were reduced in different dimensions. The results showed that the highest 100.00% classification accuracy has been achieved through RS-ELM and it has been observed that RS-ELM model has been considerably successful compared to the other methods in the literature. Furthermore in this study, the most significant features have been determined for the diagnosis of the hepatitis. It is considered that proposed method is to be useful in similar medical applications.
مقدمه انگلیسی
Liver is an organ which has a wide range of functions, including digestion, energy production, glycogen storage, detoxification and regulation of blood glucose. Various diseases or microorganisms such as virus, bacteria prevents liver from functioning by damaging it [1] and [2]. One of these viruses, hepatitis locates in the cells in the liver tissue leading to the loss of the functioning of these cells. Healthy cells with virus infected become dysfunctional. Recently, types of hepatitis disease have been widespread throughout the world [3]. Hepatitis diseases has five different types and there are termed as Hepatitis A, Hepatitis B, Hepatitis C, Hepatitis D and Hepatitis E [1] and [4]. The target organ is liver in all types of Hepatitis. Each type has specific symptoms and most of them are treated succesfully. Hepatitis A is generally seen in children and named as infection hepatitits. Hepatitis A is prevented generally by going through an illness or vaccination. Hepatitis B and C are considered as carriers and there are no signs or symptoms of these diseases. Hepatitis B, caused by hepatitis B virus which damages liver by attacking it, is the most common liver infection in the world. Hepatitis diseases can be infected through blood, unprotected sex, shared or reused syringes. Moreover, during pregnancy or postpartum, it can be transmitted to the infant from mother with hepatitis [2]. For early diagnosis of these diseases, a blood test is required once a year. Besides clinic tests, machine learning and pattern recognition methods have been widely used for early diagnosis of hepatitis diseases in medicine by specialists. With the help of diagnostic systems, the possible errors in experts made in the stage of diagnosis can be decreased, and the medical data can be analysed in shorter time and more detailed as well [5]. The classical major steps for an automatic pattern recognition system are feature extraction and classification. Feature extraction is one of the most considerable steps in the area of pattern recognition because it can directly influence the result of the diagnosis system. Thus, there is a need to extract the most significant features from hepatitis dataset for the diagnosis of the hepatitis diseases. The feature reduction from hepatitis database can be implemented by using stochastic techniques, such as genetic algorithm (GA) [6] and simulated annealing (SA) [7] or statistical techniques, such as linear discriminant analysis (LDA) [8], principal component analysis (PCA) [2] and [9], the fisher discriminant analysis (FDA) [10] and the local fisher discriminant analysis (LFDA) [5]. In the classification step, feature vectors that are obtained from the feature reduction process is applied as input to a classifier algorithm, such as artificial neural network (ANN) [11] and [12], artificial immune system (AIS) [1] and [9], probabilistic neural network (PNN) [13], support vector machine (SVM) [5] and [7] and fuzzy inference system [8]. The machine learning methods widely used and highly successful for the diagnosis of hepatitis diseases are to be discussed in the next section. In this study, a new hybrid approach based on rough set (RS) and extreme learning machine (ELM) has been proposed for diagnosis of hepatitis diseases. The main objectives of this study are (1) to investigate the feasibility of RS to extract significant information from attributes in hepatitis data set and reduce its data size, (2) to make feature selection without removing missing values in the data set through RS, (3) to improve the classification accuracy for hepatitis disease diagnosis. In the first stage of the model consisted of two stages, the sub-features that best represents data are obtained by RS. The main contribution of RS is to execute feature reduction inspite of missing values. In the next stage, classification process has been done through ELM classifier by using the reducted feature sets. ELM as a relatively new learning algorithm for single hidden-layer feedforward networks (SLFNs) was first introduced by Huang et al. [14]. There are some advantages of the ELM algorithm: (1) it is extremely fast, (2) it has better generalization performance, (3) it tends to reach the solutions straightforward without trivial issues such as local minima, learning rate, momentum rate and over-fitting encountered in traditional gradient based learning algorithm [15]. In order to test the effectiveness of the proposed hybrid model, hepatitis data set, taken from UCI machine learning repository, has been used. An important part of this data set (48.3%) consists of missing values. Removal of the missing values from the data set may lead to the data loss both during feature reduction and classification process. Within a major part of studies in the literature, this data set has been subjected to classification process after missing values were removed. In this study, feature selection through RS has been carried out without removing missing values in the data set. RS-ELM hybrid model was tested for various training-test rates and finally, when training and test data sets were selected respectively at the rates of 80% and 20%, classification success of 100.00% was achieved. The experimental results show that the proposed RS-ELM can effectively improve the classification performance. It has also shown that RS-ELM outperforms the other methods and has achieved the best predicative classification accuracy with the reduced feature subset. As a result, the proposed hybrid model can be considered as helpful tool for the specialist in making a decision on diagnosing hepatitis diseases. The content of this study was organized as follows. In the next section, other studies performed by using hepatitis data set have been summarized. In Section 3, acquisition and introduction of data set have been done. In Section 4, theoretical information about RS and ELM has been given. In Section 5, experimental results have been presented. In the last section, results of this study have been discussed.
نتیجه گیری انگلیسی
The classification is an important tool used for diagnosing the diseases in clinical practices. In this study, a new medical support system based on RS and ELM is proposed for the diagnosis of hepatitis. While RS has been used for feature reduction, classification has been done by using reducted feature sets through ELM. For the test of performance of RS-ELM, hepatitis data set, which was used widely by other researchers through different machine learning methods, has been used. A significant part of this data set (48.3%) includes missing values. Feature reduction through RS has been done inspite of missing values. 20 separate reducted feature sets by using RS have been obtained and following deletion of missing values from each feature set, classification process has been implemented with remaining samples. Reducted feature sets have been utilized to classification for the training-test rates of 50–50%, 70–30%, and 80–20%. Results showed that the success has been achieved as 100.00% on upon selecting training-test set at the rates of 80%-20% with only four features. Moreover, RS-ELM has been compared with standard ELM and classification success of RS-ELM has been found better comparing ELM. It has been seen that selection of the most appropriate features through RS for diagnosis of hepatitis diseases has affected the classification success in a positive way. As a result of this study, proposed RS-ELM can be a powerful method for diagnosing hepatitis diseases. In addition, we believe that proposed hybrid model is to be a helpful tool for the specialists on making a decision at different medical disorders.