With this research, we sought to examine the performance of six different regression tree data mining methods to predict mortality in head injury. Using a data set consisting of 1603 head injury cases, we assessed the performance of: the Classification and Regression Trees (CART) method; the Chi-squared Automatic Interaction Detector (CHAID) method; the Exhaustive CHAID (E-CHAID) method; the Quick, Unbiased, Efficient Statistical Tree (QUEST) method; the Random Forest Regression and Classification (RFRC) method; and the Boosted Tree Classifiers and Regression (BTCR) method, in each case based on sensitivity, specificity, positive/negative predictive, and accuracy rates. Next, we compared their areas under the (Receiver Operating Characteristic) ROC curves. Finally, we examined whether they could be grouped in meaningful clusters with hierarchical cluster analysis. Areas under the ROC curves of regression tree data mining methods ranged from 0.801 to 0.954 (p < 0.001 for all). In predicting mortality in head injury under the ROC curve, the BTCR method achieved both the highest area (0.954) and accuracy rate (93.0%), while the CART method achieved both the lowest area (0.801) and accuracy rate (91.1%). All of the regression tree data mining methods were clustered in the same grouping, but the BTCR method was at the origin of the cluster while the CART and QUEST methods produced results that were least like the others. The BTCR, demonstrating a 93.0% accuracy rate and showing statistically significantly differences from the others, may be a helpful tool in medical decision-making for predicting mortality in head injury.
There are many classification methods that can be employed in the task of data mining. One of them is regression tree, a method that is been widely used for classification tasks. The regression tree examines classification predictions of categorical variables which are the target of various areas. There are many applications for this method, including predicting business failure (Li, Sun, & Wu, 2010), essential hypertension (Ture, Kurt, Kurum, & Ozdamar, 2005), and aggressive prostate cancer on biopsy (Supurgeon et al., 2006).
Head injury is well known to be an important public health issue, affecting all age groups, throughout the world. It is one of the major causes of death and disability (Koskinen and Alaranta, 2008 and Signorini et al., 1999). As such, patients are generally anxious about the prognosis soon after sustaining a head injury. However, the current methods for determining final outcome predictions from head injury patients are imperfect, and present important questions to physician regarding the heterogeneity of patient data, the variety of trauma causes, and additional personal factors such as patient age and the prevalence of systemic disease (Schaan, Jaksche, & Boszczyk, 2002). Recent studies have attempted to show outcomes following head injury, and based their predictions on many factors, including demographics, epidemiologics, and clinic and radiologic findings such as: age, cause of injury, Glasgow Coma Scale score, pupil response, computerized tomography parameters, etc. (Schaan et al., 2002 and Signorini et al., 1999).
Predicting the outcome of a head injury is a complex and cognitive process. Use of regression tree methods has proven helpful for medical decision-making in head injury. There exists a great number of studies that have compared risk assessment methods for predicting outcomes in head injury, such as that of Andrews et al. (2002), which compared the results of a decision tree and logistic regression analysis, of Rovlias and Kotsou (2004), which used the CART technique, and of Choi et al. (1991), which used a decision tree.
No study to date has attempted to consider the accuracy performances of: the Classification and Regression Trees (CART) method; the Chi-squared Automatic Interaction Detector (CHAID) method; the Exhaustive CHAID (E-CHAID) method; the Quick, Unbiased, Efficient Statistical Tree (QUEST) method; the Random Forest Regression and Classification (RFRC) method; and the Boosted Tree Classifiers and Regression (BTCR) simultaneously in the clinical application of predicting mortality from head injuries. We sought to address this oversight in this study, with which we looked to compare the predictive accuracy of various regression tree data mining methods to this end.
When we compared the performances of CART, CHAID, E-CHAID, QUEST, RFRC and BTCR for predicting or classifying mortality in head injuries, according to the risk factors of each individual case and on the basis of the above considerations, our conclusions were as follows:
•
The BTCR method achieved the highest accuracy rate and area under the ROC curve in classifying mortality in head injury.
•
The area under the ROC curve of the BTCR method was higher, in statistically significant terms, than the other methods. However, the area under the ROC curve of the CART method was significantly lower than that of the other methods.
•
Regression tree methods were clustered at the same cluster. However, the BTCR method was at the origin of the cluster while the CART and QUEST methods were least like the others.
The accuracy rates all of the regression tree methods are higher than 90%, so we can offer the age, Glasgow Coma Scale score, cause of injury, pupil reaction, traumatic subarachnoid hemorrhage, contusion, intra-cerebral hematoma, and cerebral edema as reliable indicators for predicting mortality in head injuries. Among the six different regression tree algorithms, the BTCR algorithm—demonstrating 93.0% accuracy rate and showing statistically significantly difference the others—may be a helpful tool in medical decision-making for predicting mortality in head injuries.