استفاده از تکنیک های داده کاوی برای مدل سازی پیش بینی بیماری چندگانه فشار خون بالا و چربی خون با عوامل خطر مشترک
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
22198 | 2011 | 12 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 38, Issue 5, May 2011, Pages 5507–5513
چکیده انگلیسی
Many previous studies have employed predictive models for a specific disease, but fail to note that humans often suffer from not only one disease, but associated diseases as well. Because these associated multiple diseases might have reciprocal effects, and abnormalities in physiological indicators can indicate multiple associated diseases, common risk factors can be used to predict the multiple associated diseases. This approach provides a more effective and comprehensive forecasting mechanism for preventive medicine. This paper proposes a two-phase analysis procedure to simultaneously predict hypertension and hyperlipidemia. Firstly, we used six data mining approaches to select the individual risk factors of these two diseases, and then determined the common risk factors using the voting principle. Next, we used the Multivariate Adaptive Regression Splines (MARS) method to construct a multiple predictive model for hypertension and hyperlipidemia. This study uses data from a physical examination center database in Taiwan that includes 2048 subjects. The proposed analysis procedure shows that the common risk factors of hypertension and hyperlipidemia are Systolic Blood Pressure (SBP), Triglycerides, Uric Acid (UA), Glutamate Pyruvate Transaminase (GPT), and gender. The proposed multi-diseases predictor method has a classification accuracy rate of 93.07%. The results of this paper provide an effective and appropriate methodology for simultaneously predicting hypertension and hyperlipidemia.
مقدمه انگلیسی
According to a World Health Organization (WHO) survey, Cardiovascular Disease (CVD) accounts for nearly one third of all deaths worldwide. Hypertension and hyperlipidemia are both indicators of the metabolic syndrome, and can potentially lead to CVD, cardiopathies, nephrosis, and other diseases (Kannel, 1990). Although many studies have investigated the risk factors of specific diseases and constructed corresponding prediction models, relatively little research considers multiple diseases. However, abnormalities in physiological indicators may indicate not only a single disease, but multiple diseases. Therefore, determining the common risk factors and developing a predictor model for multiple diseases is more important than doing so for only a single disease. For example, a patient with hypertension or hyperlipidemia is more likely to suffer from cardiovascular disease than a normal, healthy individual. Hypertension is also associated with hyperlipidemia (Bonna & Thelle, 1991). The purpose of this paper is to identify the common risk factors of hypertension and hyperlipidemia using data-mining techniques, and then, by applying the Multivariate Adaptive Regression Splines (MARS) method, to construct a predictive model for these two diseases. In their examination of studies on hypertension and hyperlipidemia in the literature, Staessen, Wang, and Thijs (2001) found that hypertension is the most important risk factor for CVD. The National Library of Medicine defines hypertension as a Systolic Blood Pressure (SBP) value ⩾140 mm Hg and/or a Diastolic Blood Pressure (DBP) value ⩾90 mm Hg. They also reports that the risk factors for hypertension include old age, non-white race, high sodium and total fat intake, family history of hypertension, physical inactivity, excessive alcohol consumption, and smoking. The diabetes care guide of the American College of Physicians (ACP, chap. 10) shows that in many studies, lipid-lowering therapy leads to a 22–24% reduction in major cardiovascular events in patients with type 2 diabetes. Lee and Entzminger (2006) conducted a cross-sectional study of 1398 patients, and found that old age, Body Mass Index (BMI), and low educational attainment are statistically significant risk factors for hypertension. Wu, Lee, Hsu, and Lee (2003) defined hyperlipidemia as serum Total Cholesterol (T-CHO) ⩾ 200 mg/dl, or Low-Density Lipoprotein (LDL) ⩾ 130 mg/dl, or high-density lipoprotein (HDL) ⩾ 200 mg/dl, in combination with either a T-CHO/HDL ratio of >5 or HDL <35 mg/dl. Silverstein et al. (2000) found that age, LDL, triglycerides, and HDL are risk factors of hyperlipidemia. The results of these studies show that the risk factors of hyperlipidemia are unlike those of hypertension, yet both conditions are causes of cardiovascular disease. From the viewpoint of preventive medicine, monitoring a subject’s risk factors with a predictive model might allow the patient to receive health care advice or early treatment that would prevent disease. Akdag et al. (2006) used the classification-tree method to determine the risk of hypertension among outpatients in a clinic in Denizli province, western Turkey, between January 2002 and July 2004. Their results show that BMI, waist-to-hip ratio, sex, serum triglycerides, serum total cholesterol, hypertension in first degree relatives, and saturated fat consumption are risk factors for hypertension. In their study of liver complaints, Young, So, and Chang (2003) used a growth curve analysis to construct a liver complaint predictor model. The three kinds of predictors in their study had 75.86%, 76.55%, and 78.62% accuracy rates. Armengol, Palaudaries, and Plaza (2001) identified the long term risk factors for diabetes, to predict complications, based on 370 cases. They achieved 100%, 90%, and 72.45% accuracy rates in predicting whether or not patients would suffer from apoplexy, amputation, or myocardial infarction, respectively. Integration of the literature reveals that hypertension and hyperlipidemia not only cause many diseases, but are also themselves caused by some common risk factors, such as age, T-CHO, and triglycerides. Patients suffering from hyperlipidemia are at a higher risk of developing hypertension. This paper uses a two-stage analysis procedure to analyze a database of 2048 subjects from a physical examination center in Taipei. The first stage uses data mining classifier techniques, including logistic regression analysis, discriminant analysis, and C5.0, CHAID, and Exhaustive CHAID, to separately determine the risk factors of hypertension and hyperlipidemia. The second stage uses the Multivariate Adaptive Regression Splines (MARS) method developed by Friedman (1991) to build a predictive model that can simultaneously predict these two diseases. Most previous studies use risk factors to construct a predictive model for a specific disease. However, this type of model can only be used to predict the likelihood of a subject acquiring one disease. The predictive model proposed in this paper can predict if a subject is at risk of both hypertension and hyperlipidemia.
نتیجه گیری انگلیسی
Most reports on the selection of disease risk factors focus on only one disease. Although these approaches can determine the key factors for a disease, they cannot identify the common risk factors and probable existence of a correlation between two or more diseases. This paper uses logistic regression analysis, C5.0, CHAID, Exhaustive CHAID, and discriminant analysis to identify the risk factors for both hypertension and hyperlipidemia. The most significant contribution of this paper is the determination of the common risk factors of these two conditions. This paper not only confirms that SBP, DBP, age, and triglycerides are risk factors for hypertension, but also indicates that BMI, medical records, GPT, and UA also indicate the risk of hypertension. Further, the study shows that the risk factors of hyperlipidemia include T-CHO, triglycerides, UA, GPT, gender, SBP, and LDL. Second, this study shows that the common risk factors of these two conditions include SBP, triglycerides, UA, GPT, and gender. Third, although UA and GPT are significant risk factors for both these conditions, very few researchers have included them in their studies. The results of this study provide some indicators to help physicians in diagnosing these conditions early in high-risk subjects. Fourth, previous studies have tended to focus on one specific disease, but have not built predictive models for multiple diseases that are interrelated. This study uses common risk factors to build MARS predictive models for hypertension and hyperlipidemia. The resulting models exhibit 93.07% accuracy in classifying subjects as belonging to one of the four physiological conditions.