مقایسه شاخص های پروگنوستیک با استفاده از تکنیک های داده کاوی و تجزیه و تحلیل رگرسیون کاکس در داده های سرطان سینه
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
22158 | 2009 | 8 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 36, Issue 4, May 2009, Pages 8247–8254
چکیده انگلیسی
The purpose of this study is to determine new prognostic indexes for the differentiation of subgroups of breast cancer patients with the techniques of decision tree algorithms (C&RT, CHAID, QUEST, ID3, C4.5 and C5.0) and Cox regression analysis for disease-free survival (DFS) in breast cancer patients. A retrospective analysis was performed in 381 breast cancer patients diagnosed. Age, menopausal status, age of menarche, family history of cancer, histologic tumor type, quadrant of tumor, tumor size, estrogen and progesterone receptor status, histologic and nuclear grading, axillary nodal status, pericapsular involvement of lymph nodes, lymphovascular and perineural invasion, adjuvant radiotherapy, chemotherapy and hormonal therapy were assessed. Based on these prognostic factors, new prognostic indexes for C&RT, CHAID, QUEST, ID3, C4.5 and C5.0 and Cox regression were obtained. Prognostic indexes showed a good degree of classification, which demonstrates that an improvement seems possible using standard risk factors. We obtained that C4.5 has a better performance than C&RT, CHAID, QUEST, ID3, C5.0 and Cox regression to determine risk groups using Random Survival Forests (RSF).
مقدمه انگلیسی
The clinicopathologic characteristics of breast cancer patients are heterogeneous. Consequently, the survival times are different in subgroups of patients. Generally, 5-years recurrence-free survival is ranged from 65% to 80% in all population in breast cancer patients (Buchholz, Strom, & McNeese, 2003). The purposes of this study were to determine new prognostic indexes for the differentiation of subgroups of breast cancer patients with the various methods (decision trees and Cox regression analysis) and explore the interactions between clinical variables and their impact on survival. Cheng et al. (2006) used Bayesian classification trees and Cox proportional hazard models were used to estimate the probability of local regional recurrence after mastectomy for individual breast cancer patients. Sauerbrei, Hübner, Schmoor, and Schumacher (1997) compared Cox regression analysis, Classification and Regression Tree (C&RT) and Nottingham Prognostic Index for determining new prognostic classification index in node negative breast cancer. Decision tree algorithms allow for non-linear relations between predictive factors and outcomes and for mixed data types (numerical and categorical), isolates outliers, and incorporates a pruning process using cross-validation as an alternative to testing for unbiasedness with a second data set (Faderl et al., 2002). Decision trees use recursive partitioning to assess the effect of specific variables on survival, thereby ultimately generating groups of patients with similar clinical features and survival times. The partitioning of patients into groups with differing survival times using clinical variables generates a tree-structured model that can be analyzed to assess its clinical utility. Therefore decision tree methods such as C&RT, Chi-squared Automatic Interaction Detector (CHAID), Quick, Unbiased, Efficient Statistical Tree (QUEST), Commercial version 5.0 (C5.0), Commercial version 4.5 (C4.5) and Interactive Dichotomizer version 3 (ID3) are more suitable than classical statistical methods. In our previous study, we evaluated performance of C&RT, CHAID, QUEST, C4.5 and ID3 methods according to predictive values for disease-free survival (DFS) in breast cancer patients. We estimated DFS rates according to the decision tree method based on the C4.5 analysis. Then, according to multidimensional scaling method C4.5 performed slightly better than other methods in predicting risk factors for recurrence (Ture, Tokatli, & Kurt, 2008). In this study, we analyzed the simultaneous relationship among risk factors for breast cancer by C&RT, CHAID, QUEST, C4.5, ID3, C5.0 and Cox regression analysis. We purpose to determine new prognostic indexes for the differentiation of subgroups of breast cancer patients with the decision tree algorithms and Cox regression analysis using Kaplan–Meier analysis. Random Survival Forests (RSF) was used to choice the best method and prognostic index.
نتیجه گیری انگلیسی
In this study, we tried to discover the risk groups and make decision rules for the management of breast cancer. We reported a research where we developed several prognostic index models for predicting breast cancer. Specially, we used six decision trees methods and Cox regression analysis. Furthermore, we determined risk groups of models for risk factors according to Kaplan–Meier analysis and evaluated performance of methods using RSF. The Cox regression model is the most common tool for investigating simultaneously the influence of several factors on the survival time of patients. But it was not a good prognostic index for determining DFS in breast cancer patients. Decision trees were more advantageous than Cox regression because decision trees are capable of extracting patterns and relationships hidden deep into medical datasets. Sauerbrei et al. (1997) reported that new prognostic indexes from Cox regression and C&RT showed a better degree of separation, which demonstrates that an improvement seems possible using standard prognostic factors. Cheng et al. (2006) reported that the prognostic index was useful methods to estimate the risk of local regional recurrence in breast cancer patients. Kenneth, Abbruzzese, Lenzi, and Raber (1999) reported that clinicians often experience difficulty applying standard statistical methods to assess the interactions between clinical variables, determining the cumulative effect of these variables on survival, and translating this information into appropriate management, because of the complex presentations of patients with unknown primary carcinoma. Hence, they showed using Kaplan–Meier analysis together with C&RT in patients with unknown primary carcinoma. Aligayer et al. (2002) showed to determine if Src activity is a marker for poor clinical prognosis in colon carcinoma patients, and analysed a significant association between elevated Src activity and shorter overall survival of all patients by Kaplan–Meier analysis. Stark and Pfeiffer (1999) reported that ID3, C4.5, CHAID and C&RT were well-suited for exploratory data analysis in complex data sets in veterinary epidemiology. In present study, to develop a new prognostic index we used seven different approaches. All of the new prognostic indexes showed a good degree of classification using standard risk factors. We obtained that C4.5 has a better performance than C&RT, CHAID, QUEST, ID3, C5.0 and Cox regression to determine risk groups. In our previous study, multidimensional scaling method was used as identify homogenous groups of methods and as a result of it, we done survival analysis for only superior method (Ture et al., 2008). But in present study, survival analysis was used for determining risk groups of all methods according to prognostic indexes. In both our studies, C4.5 performed a better than other methods in determining prognostic indexes and predicting risk factors for DFS. As a result, we suggest that data should be better explored and processed by high performance modelling methods. Researchers should avoid assessment of data by using only one method in future studies focusing on breast cancer or any other clinical condition. Furthermore, we recommend to use data mining techniques to determine risk groups and effect of risk factors on survival.