استخراج ویژگی موجک و الگوریتم ژنتیک برای تشخیص نشانگر در داده های سرطان کولورکتال
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|8080||2013||13 صفحه PDF||سفارش دهید||9220 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Knowledge-Based Systems, Volume 37, January 2013, Pages 502–514
Biomarkers which predict patient’s survival play an important role in medical diagnosis and treatment. How to select the significant biomarkers from hundreds of protein markers is a key step in survival analysis. In this paper a novel method is proposed to detect the prognostic biomarkers of survival in colorectal cancer patients using wavelet analysis, genetic algorithm, and Bayes classifier. One dimensional discrete wavelet transform (DWT) is normally used to reduce the dimensionality of biomedical data. In this study one dimensional continuous wavelet transform (CWT) was proposed to extract the features of colorectal cancer data. One dimensional CWT has no ability to reduce dimensionality of data, but captures the missing features of DWT, and is complementary part of DWT. Genetic algorithm was performed on extracted wavelet coefficients to select the optimized features, using Bayes classifier to build its fitness function. The corresponding protein markers were located based on the position of optimized features. Kaplan–Meier curve and Cox regression model were used to evaluate the performance of selected biomarkers. Experiments were conducted on colorectal cancer dataset and several significant biomarkers were detected. A new protein biomarker CD46 was found to be significantly associated with survival time.
Survival analysis involves the estimation of the distribution of time it takes for death to occur depending on the biology of the disease. It allows clinicians to plan a suitable treatment and counsel patients about their prognosis. In medical domains, survival analysis is mainly based on Kaplan–Meier (KM) estimator and Cox proportional hazards regression model  and , which are used to evaluate the performance of prognostic markers. However how to rank these biomarkers, is a key step in survival analysis. Normally, the selection of biomarkers is based on medical knowledge and the diagnosis of the clinician  and . This may ignore potential biomarkers. Machine learning algorithms have been widely used in biomarker analysis of high dimensional medical data, such as microarray data ,  and  or mass spectrometry data  and . Despite the potential advantages over standard statistical methods, their applications to survival analysis are rare due to the difficulty in dealing with censored data . Recent research has shown that machine learning methods, such as neural network  and , Bayesian network , decision tree and Naı¨ve Bayes classifier , are used to improve the survival model. However, none of these methods deals with the biomarker selection in survival analysis. In this study we propose a novel method of biomarker selection based on one dimensional continuous wavelet transform (CWT). Normally, one dimensional discrete wavelet transform (DWT) is used to reduce dimensionality in the analysis of high dimensional biomedical data  and . In biomarker detection, the feature space must have the corresponding relationship with original data space to locate the detected biomarker based on detected features. One dimensional CWT detects the feature of data at every scale and position, and keeps local property of the original data. Wavelet feature vector of CWT has the same length as the original data, and can be used to locate the biomarker in original data space. First we perform one dimensional continuous wavelet transform at different scales on colorectal cancer data to extract the discriminant features. Then we use genetic algorithm (GA) and Bayes classifier to select the optimized features from extracted wavelet coefficients. Due to the wavelet well-known property, which reveals the local features of data (or time feature) and does not lose the position information of original data, the corresponding protein markers in the original data space are obtained based on the position of optimized wavelet features. Finally Kaplan–Meier (KM) estimator and Cox regression model were used to evaluate the performance of selected protein markers. A new protein biomarker CD46 was found to have independent prognostic significance. Recent research suggests that “the immune system might be involved in the development and progression of colorectal cancer”  and . The detection of CD46 supports their deduction or conclusions. The rest of paper is organized as follows: In Section 2, we describe the colorectal cancer data. Our proposed method is introduced in Section 3. Wavelet feature extraction for colorectal cancer data is described in Section 4. In Section 5, genetic algorithm based on Bayes classifier is used to select the optimized features. Survival models are used to evaluate the selected biomarkers in Section 6. The experiments are conducted in Section 7, followed by discussion and concluding comments in Section 8.
نتیجه گیری انگلیسی
In this study we propose a novel method of biomarker detection in survival analysis. Two groups of patients were used to select the biomarkers of colorectal cancer data. One was the patients with survival time of less than 30 months, and another one was the patients with survival time of more than 70 months. First continuous wavelet analysis was used to extract the discriminant features between the two groups of patients. The best discriminant features were obtained based on CWT at scale 3. Genetic algorithm was performed on extracted wavelet coefficients to select the optimized features, using Bayes classifier in its fitness function. The best performance of 78.6% was obtained based on 6 optimized wavelet features, using five fold cross validation experiments. After genetic algorithm was runing several times to select the best features, several groups of biomarkers were detected. Some of the data referred to the same biomarker was removed, and 18 unique biomarkers were selected. Kaplan–Meier curve and Cox regression models were used to evaluate the performance of selected biomarkers. Protein markers CD46, chk1, p53, FLIP-L, nuclear stat1, RAET-IG and RAET-IE were found to be significant for survival using KM estimator with log-rank test. Cox regression model showed that CD46 and nuclear stat1 were independent prognostic biomarkers. The proportion of censored data affects the selection of biomarkers in survival analysis. Protein marker CD46 was found to take an important role in survival analysis for colorectal cancer patients, using machine learning methods of wavelet analysis and genetic algorithm. KM curve shows that CD46 was not significant for survival using 47% censored data with survival time of more than 40 months, but was significant for survival using 32% censored data with survival time of more than 70 months. In this study the number of censored data with survival time ranging from 40 to 70 months was much more than one of uncensored data (dead patients). This causes a bias affecting KM curve and thus some significant biomarkers were not detected. Machine learning methods reveal the hidden information of colorectal cancer data, which cannot be detected by traditional survival analysis methods, in particular using KM curve and Cox regression model with a large proportion of censored data. One of the primary innate mechanisms to prevent tumor growth is activation of the complement cascade. Activation of complement occurs via a cascade of enzyme activity, initiated by either the antibody-dependant classical pathway, or the antibody-independent alternative and lectin pathways . These lead to a common activation of the C3 component of complement, and in turn to the formation and membrane insertion of a terminal C5b-9 membrane attack complex (MAC), causing direct lysis of the target cell. To protect themselves from bystander attack by complement cells express membrane-bound complement regulatory proteins (mCRP) which act predominately at either the C3/C5 convertase level as with membrane cofactor protein (MCP; CD46) and decay accelerating factor (DAF; CD55), or act further downstream to inhibit assembly of the MAC as with protectin (CD59). Expression of one or more mCRP (frequently at a greater level than the corresponding normal tissue) has been demonstrated for most solid tumor types  and confers resistance to tumor elimination by complement dependent mechanisms. The mCRP CD46 has been identified on all human cells exposed to complement except erythrocytes . Unlike CD55 and CD59 which are GPI-anchored, the CD46 molecule inserts into the cell membrane via a transmembrane domain , acting as a cofactor for the factor-I-mediated cleavage of C3b and C4b into inactive forms and clearing these molecules from the surface of host cells. Previous attempts to characterize CD46 expression in both normal and neoplastic tissues and in tumor cell lines have been limited to the analysis of small numbers of cases. In the case of colorectal tissues, expression of CD46 by normal colonic epithelium and colonic adenomas has been shown to display a predominately basal and baso-lateral membrane staining, with circumferential membrane CD46 expression seen in colonic adenocarcinomas . Both Koretz et al.  and Thorsteinsson et al.  described consistent membrane expression of CD46 in analyses of 71 and 18 colonic adenocarcinomas respectively, with both sets of authors noting a higher antigen density in the neoplastic compared with non-neoplastic epithelium leading to the conclusion that CD46 is generally upregulated during malignant colorectal tumor progression. Similar findings of ubiquitous CD46 expression have been reported for tumors of the breast  and stomach . Experimental results show that our proposed method, which combines machine learning methods of wavelet analysis, genetic algorithm, and Bayes classifier with survival analysis methods of KM curve and Cox regression model, provides an efficient way to select potentially significant prognosis markers. A new protein marker CD46 was found significant in survival based on our proposed method.