# یک روش برای تسهیل تجزیه و تحلیل شدت آسیب حوادث رانندگی در بزرگراه دو بانده با استفاده از شبکه های بیزی

کد مقاله | سال انتشار | مقاله انگلیسی | ترجمه فارسی | تعداد کلمات |
---|---|---|---|---|

29140 | 2011 | 10 صفحه PDF | سفارش دهید | 8417 کلمه |

**Publisher :** Elsevier - Science Direct (الزویر - ساینس دایرکت)

**Journal :** Journal of Safety Research, Volume 42, Issue 5, October 2011, Pages 317–326

#### چکیده انگلیسی

A great deal of information on traffic accidents exists, extracted from different sources, in which many variables that are expected to affect injury severity in traffic accidents are considered. The number of variables used in research work could be enormous, and in some cases this number could be even higher than 100 variables (Delen, Sharda, & Bessonov, 2006). This might complicate the manner of dealing with a certain problem, where some of the variables considered might hide the effect of other more significant ones. Different types of studies tried to identify the most significant variables in order to only consider them in the analysis of traffic accidents (Chang and Wang, 2006, Chen and Jovanis, 2000, Kopelias et al., 2007 and Xie et al., 2009). Therefore, researchers in the field of traffic accidents, specifically in the domain of traffic accident injury severity, focused their research on trying to identify the most significant variables that contribute to the occurrence of a specific injury severity in a traffic accident. Most previous research used regression analysis techniques, such as logistic and ordered probit models (Al-Ghamdi, 2002, Bédard et al., 2002, Kockelman and Kweon, 2002, Milton et al., 2008, Yamamoto and Shankar, 2004 and Yau et al., 2006). These techniques have their own drawbacks. Chang and Wang (2006) indicated that these regression models use certain assumptions, and if any of these assumptions were violated, the ability of the model to predict the factors that contribute to the occurrence of a specific injury severity would be affected. Recently researchers used data mining techniques such as artificial neural networks, regression trees, and Bayesian networks. For instance, Abdelwahab and Abdel-Aty (2001) used artificial neural networks to model the relationship between driver injury severity and crash factors related to driver, vehicle, roadway, and environment characteristics. Thirteen variables were tested first for significance using the χ2 test, and the results indicated that only six variables were found to be significant: driver gender, fault, vehicle type, seat belt, point of impact, and area type. They compared the classification performance of Multi-Layer Perceptron (MLP) neural networks and that of the Ordered Probit Model (OPM). Their findings indicated that classification accuracy of MLP neural networks outperformed that of the OPM, where 65.6% and 60.4% of cases were correctly classified for the training and testing phases, respectively, compared to 58.9% and 57.1% correctly classified cases for the training and testing phases, respectively, by the OPM. Another study that used the neural networks to model injury severity in traffic accidents (Delen et al., 2006) classified the injury severity of a traffic accident into five categories (no injury, possible injury, minor non-incapacitating injury, incapacitating, and fatality) and they used certain techniques, such as χ2 test, stepwise logistic regression, and decision tree induction to select the most significant variables. Out of 150 variables, they selected 17 variables as important in influencing the level of injury severity of drivers involved in accidents. They used the MLP neural networks to classify the injury severity level, where their data included “no injury” cases 10 times more than “fatal cases;” they faced an unbalanced dataset situation that affected their total accuracy (40.71%). Other researchers used classification tree techniques to model injury severity in traffic accidents (Chang & Wang, 2006). In their study they developed a Classification and Regression Tree (CART) model to establish the relationship between injury severity and twenty explanatory variables that represented: driver/vehicle characteristics, highway/environmental variables and accident variables, where they aimed to model the injury severity of an individual involved in a traffic accident. Use of Bayesian Networks (BN) as the modeling approach in analysis of crash-related injury severity has been relatively scarce. De Oña, Mujalli, and Calvo (2011) employed BN to model the relationship between injury severity and 18 variables related to driver, vehicle, roadway, and environment characteristics. Some of these studies tend to apply the models on the datasets without selecting the most significant variables (Chang and Wang, 2006, Delen et al., 2006 and Simoncic, 2004). However, Chang and Wang (2006) stated that if the model was applied on a few important variables, much more useful results could be obtained. Others like Abdelwahab and Abdel-Aty (2001) used some statistical techniques to choose the most significant variables before applying their model. The scope of this research is to build BNs using some selected variables in order to evaluate the performance of BNs when using only the most significant variables, and to compare the results with a base model that is built using all the variables in the original dataset, in order to find out whether using only the most significant variables would affect values of the measures used to assess the built model. This paper is organized as follows. Section 2 presents the data used. In Section 3, the method followed is presented and described, and a brief review of variable selection methods and the basic concept of BNs are presented, along with a description of the performance indicators used to assess the performance of the built BNs. In Section 4, the results and their discussion are provided. In Section 5, some conclusions are given.

#### مقدمه انگلیسی

A great deal of information on traffic accidents exists, extracted from different sources, in which many variables that are expected to affect injury severity in traffic accidents are considered. The number of variables used in research work could be enormous, and in some cases this number could be even higher than 100 variables (Delen, Sharda, & Bessonov, 2006). This might complicate the manner of dealing with a certain problem, where some of the variables considered might hide the effect of other more significant ones. Different types of studies tried to identify the most significant variables in order to only consider them in the analysis of traffic accidents (Chang and Wang, 2006, Chen and Jovanis, 2000, Kopelias et al., 2007 and Xie et al., 2009). Therefore, researchers in the field of traffic accidents, specifically in the domain of traffic accident injury severity, focused their research on trying to identify the most significant variables that contribute to the occurrence of a specific injury severity in a traffic accident. Most previous research used regression analysis techniques, such as logistic and ordered probit models (Al-Ghamdi, 2002, Bédard et al., 2002, Kockelman and Kweon, 2002, Milton et al., 2008, Yamamoto and Shankar, 2004 and Yau et al., 2006). These techniques have their own drawbacks. Chang and Wang (2006) indicated that these regression models use certain assumptions, and if any of these assumptions were violated, the ability of the model to predict the factors that contribute to the occurrence of a specific injury severity would be affected. Recently researchers used data mining techniques such as artificial neural networks, regression trees, and Bayesian networks. For instance, Abdelwahab and Abdel-Aty (2001) used artificial neural networks to model the relationship between driver injury severity and crash factors related to driver, vehicle, roadway, and environment characteristics. Thirteen variables were tested first for significance using the χ2 test, and the results indicated that only six variables were found to be significant: driver gender, fault, vehicle type, seat belt, point of impact, and area type. They compared the classification performance of Multi-Layer Perceptron (MLP) neural networks and that of the Ordered Probit Model (OPM). Their findings indicated that classification accuracy of MLP neural networks outperformed that of the OPM, where 65.6% and 60.4% of cases were correctly classified for the training and testing phases, respectively, compared to 58.9% and 57.1% correctly classified cases for the training and testing phases, respectively, by the OPM. Another study that used the neural networks to model injury severity in traffic accidents (Delen et al., 2006) classified the injury severity of a traffic accident into five categories (no injury, possible injury, minor non-incapacitating injury, incapacitating, and fatality) and they used certain techniques, such as χ2 test, stepwise logistic regression, and decision tree induction to select the most significant variables. Out of 150 variables, they selected 17 variables as important in influencing the level of injury severity of drivers involved in accidents. They used the MLP neural networks to classify the injury severity level, where their data included “no injury” cases 10 times more than “fatal cases;” they faced an unbalanced dataset situation that affected their total accuracy (40.71%). Other researchers used classification tree techniques to model injury severity in traffic accidents (Chang & Wang, 2006). In their study they developed a Classification and Regression Tree (CART) model to establish the relationship between injury severity and twenty explanatory variables that represented: driver/vehicle characteristics, highway/environmental variables and accident variables, where they aimed to model the injury severity of an individual involved in a traffic accident. Use of Bayesian Networks (BN) as the modeling approach in analysis of crash-related injury severity has been relatively scarce. De Oña, Mujalli, and Calvo (2011) employed BN to model the relationship between injury severity and 18 variables related to driver, vehicle, roadway, and environment characteristics. Some of these studies tend to apply the models on the datasets without selecting the most significant variables (Chang and Wang, 2006, Delen et al., 2006 and Simoncic, 2004). However, Chang and Wang (2006) stated that if the model was applied on a few important variables, much more useful results could be obtained. Others like Abdelwahab and Abdel-Aty (2001) used some statistical techniques to choose the most significant variables before applying their model. The scope of this research is to build BNs using some selected variables in order to evaluate the performance of BNs when using only the most significant variables, and to compare the results with a base model that is built using all the variables in the original dataset, in order to find out whether using only the most significant variables would affect values of the measures used to assess the built model. This paper is organized as follows. Section 2 presents the data used. In Section 3, the method followed is presented and described, and a brief review of variable selection methods and the basic concept of BNs are presented, along with a description of the performance indicators used to assess the performance of the built BNs. In Section 4, the results and their discussion are provided. In Section 5, some conclusions are given.

#### نتیجه گیری انگلیسی

The main objective of this research work was to determine if it is possible to maintain or improve the performance of a model that is used to predict the injury severity of a traffic accident based on BNs reducing the number of variables considered in the analysis. The performance of the model was measured using five indicators (accuracy, specificity, sensitivity, HMSS, and ROC area). In order to perform this analysis, 1,536 records of traffic accidents on rural highways with information about 18 variables that are related with the severity of the accidents based on the standard police reports used in Spain were used. There were 59 combinations of evaluator-search algorithms used, which are commonly used in data mining, and 26 subsets of variables were identified. Within these subsets of variables the variables accident type (ACT), lighting (LIG), and number of injuries (NOI) were selected the most times (over 95%). Therefore, it could be said that these variables are the most significant ones in the classification of injury severity in traffic accidents, since they are included in almost all the selected subsets of variables. For each one of these subsets of variables, 10 simplified BNs were built for the training stage and another 10 for the testing stage. In total, 540 BNs were built using the hill climbing search algorithm and the MDL score (de Oña et al., 2011). Comparing the average values of the indicators for each one of the simplified BNs with respect to the average values obtained for the original BN (BN-18), it is observed that, in most cases (74%), the performance indicators values for the simplified BNs maintained or improved in comparison with those of BN-18. Therefore, it could be said that, in most cases, simplified networks maintain the performance of the original BN. Seven BNs were found to present statistically significant improvements in their performance indicators with respect to BN-18 and only one value of these indicators worsened. In more than 50% of these BNs the following variables are repeated: ACT, AGE, ATF, GEN, LIG, NOI, and OI. These seven variables were used to build a new BN (BN-7). The results of the performance indicators of this BN with respect to BN-18 improved practically in all the cases, and these improvements are statistically significant (p < 0.05) in 60% of the cases (accuracy, sensitivity, and ROC area). Therefore, this research work shows that, for the analysis of the severity of road accidents by Bayesian networks on rural roads, it is possible to reduce the number of variables considered in more than 60% (from 18 to 7 variables), maintaining the performance of the models and reducing their complexity. Thus the findings of this research work agrees with Chang and Wang (2006) where they stated that if a model is applied only on a few important variables, more useful results could be obtained. The procedure used to simplify BN models to analyze the severity of traffic accidents on rural highways could also be applied to other types of infrastructure (intersections, freeways, etc.) as well as to other models used to assess severity of traffic accidents (multinomial logit models, hierarchical logit models, probit models, etc.).