دانلود مقاله ISI انگلیسی شماره 24852
ترجمه فارسی عنوان مقاله

درخت تصمیم گیری برای انتخاب حفظ سیستم دیوار بر اساس تجزیه و تحلیل رگرسیون لجستیک

عنوان انگلیسی
Decision tree for selecting retaining wall systems based on logistic regression analysis
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
24852 2010 12 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Automation in Construction, Volume 19, Issue 7, November 2010, Pages 917–928

ترجمه کلمات کلیدی
حفظ سیستم دیوار - رگرسیون لجستیک - درخت تصمیم گیری -
کلمات کلیدی انگلیسی
Retaining wall systems, Logistic regression, Decision tree,
پیش نمایش مقاله
پیش نمایش مقاله  درخت تصمیم گیری برای انتخاب حفظ سیستم دیوار بر اساس تجزیه و تحلیل رگرسیون لجستیک

چکیده انگلیسی

Machine learning techniques generally require thousands of cases to derive a reliable conclusion, but such a large number of excavation cases are very difficult to acquire in the construction domain. There have been efforts to develop retaining wall selection systems using machine learning techniques but based only on a couple of hundred cases of excavation work. The resultant rules were inconsistent and unreliable. This paper proposes an improved decision tree for selecting retaining wall systems. After retaining wall systems were divided into three components, i.e., the retaining wall, the lateral support, and optional grouting, a series of logistic regression analyses, analysis of variance (ANOVA), and chi-square tests were used to derive the variables and a decision tree for selecting retaining wall systems. The prediction accuracy rates for the retaining walls, lateral supports, and grouting were 82.6%, 80.4%, and 76.9%, respectively. These values were higher than the prediction accuracy rate (58.7%) of the decision tree built by an automated machine learning algorithm, Classification and Regression Trees (CART), with the same data set.

مقدمه انگلیسی

As a result of rapid urbanization, deep excavations have become very common in South Korea. The average number of basement floors in buildings in South Korea, as measured by counting the number of buildings that appeared in the Korean Institute of Architects magazine [6], has increased from two to five since 1972. Recently, in Seoul, a retaining wall collapsed during deep excavation work, creating a 40-meter-long and 30-meter-deep crater. The pit not only swallowed five cars but also resulted in extensive damage to nearby buildings and underground utilities, causing power-supply outages and flooding. The inappropriate selection of a retaining wall system may not be the sole cause of this kind of serious failure, but it is common to find cases where the inappropriate selection of a retaining wall system has led to serious schedule delays and increases in costs due to unexpected changes in the construction method during construction. Selecting a retaining wall system is a complex process, considering the various geotechnical and non-geotechnical factors involved. To help engineers choose a retaining wall system appropriate for a construction site, previous researchers proposed using machine learning techniques (a.k.a. data-mining techniques) based on excavation case histories [7], [12], [13], [14], [21], [22], [23] and [24]. In general, the machine learning technique requires thousands of cases to derive a reliable conclusion [4], but such a large number of excavation cases are very difficult to acquire in the construction domain. Consequently, the number of cases used in previous studies was usually smaller than 300. Another problem with previous studies is that the variables used in the machine learning techniques were chosen without a rigorous validation process for testing the correlations between the variables and the excavation methods. For these reasons, the application of (semi) automated machine learning techniques revealed several limitations. Therefore, this paper proposes to take a statistical approach, which provides researchers with more control over the selection and validation of variables and rule development based on a relatively small number of cases compared to the automated machine learning techniques. This paper first reviews previous studies with detailed examples and then explains the proposed statistical approach for building a decision tree for selecting retaining wall systems. Finally, using the same data set collected through this study, this paper compares the prediction accuracy rate (how accurately a prediction model can predict the outcome) of the decision tree developed in this study with that of another decision tree that is built using the Classification and Regression Trees (CART), a common machine learning algorithm.

نتیجه گیری انگلیسی

Previous studies employed various machine learning techniques to develop prediction models for selecting retaining walls appropriate for the site conditions considered. However, some of the automatically inferred prediction rules in the previous studies were questionable. Two potential causes were identified: 1) the explanatory variables were used without validating their significance and correlation with the target variables; and 2) automated machine learning algorithms require more than thousands of cases to acquire reliable results [4], but the sample sizes used were less than 300. In this study, we developed a new decision tree for selecting a retaining wall system using a more traditional statistical approach, which included logistic regression and other statistical analyses such as ANOVA and chi-square tests, than machine learning methods. The proposed statistical approach was composed of three phases: First, statistically significant explanatory variables were identified through a series of LR analyses. During the LR analyses, statistical problems due to multicollinearity and a small number of cases in certain outcome groups were faced. In order to deal with these issues, we double-checked the coefficients of the collinear explanatory variables through several cycles of LR analyses with different combinations of explanatory variables, which separated out the effects of the collinear variables. In addition, we combined certain outcome groups, including a small number of cases, into a larger outcome group, after we considered the results from preceding LR analyses and the characteristics of the target variables, which provided stable LR models. The statistical significance of the variables was verified with ANOVA and chi-square tests. In the second phase, the derived explanatory variables were examined against the general findings documented in the existing literature. The variables were reviewed with domain experts to confirm the practical significance of the derived explanatory variables that had been shown to be statistically significant. The LR models were refined iteratively by repeating the first and the second phases, based on the review results. In the third phase, selected explanatory variables were developed into the form of a binary decision tree based on their impact on the outcomes. The split (or threshold) value at each node was determined using the receiver operating-characteristic (ROC) plot. The resultant decision tree showed high prediction accuracy rates: 82.6% for retaining walls, 80.4% for lateral supports, and 76.9% for grouting methods. We built another decision tree using an automated machine learning algorithm and compared the prediction accuracy rates of the two decision trees using the same data set, explanatory variables, and target variables. There were many similarities between the two decision trees, but the accuracy rate of the automatically derived decision tree was far lower (58.7%) than that of our decision tree. If a greater number of cases had been used in the automated derivation of the decision tree, then the prediction rate might have been much higher, although getting a large number of cases for specific construction methods is difficult. In conclusion, this study makes three major contributions: 1) If a large number of cases can be collected in the construction domain, then the application of the automated machine learning techniques has a distinctive advantage over the statistical approach in that they can evolve by themselves as the number of cases increases. However, it may be very difficult to collect thousands of cases for specific construction methods as new and improved construction methods come out daily and as are becoming more and more diverse. This study could develop a reliable decision tree for selecting retaining wall systems based on 139 cases by taking a statistical approach. 2) The explanatory variables considered in selecting retaining wall systems were selected and verified through rigorous statistical analysis processes. These variables can be used as a reliable basis for future relevant studies. 3) In many machine learning techniques, the reasoning process is often invisible. Thus, it is often not possible to investigate the selection logic for retaining wall systems. This study quantitatively and qualitatively examined the reasoning process for selecting retaining wall systems and the considered variables at splitting nodes and their threshold values. Although some variables and threshold values may vary as construction equipment and technologies develop, or by other factors such as local regulations, the general logic may be used as a basis for examining future relevant studies. As the next step, the derived decision tree will be implemented as a web-based decision support system for novice field engineers, and input may be re-used to generate an improved decision tree. If the number of collected cases is large enough, the data might be used to develop a prediction model using advanced machine learning techniques.