ترجمه فارسی عنوان مقاله

تجزیه و تحلیل تصادفات در بزرگراه های روستایی با استفاده از خوشه بندی طبقه نهان و شبکه های بیزی

عنوان انگلیسی

Analysis of traffic accidents on rural highways using Latent Class Clustering and Bayesian Networks

کد مقاله	سال انتشار	تعداد صفحات مقاله انگلیسی
29201	2013	10 صفحه PDF

منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Accident Analysis & Prevention, Volume 51, March 2013, Pages 1–10

ترجمه کلمات کلیدی

تجزیه و تحلیل خوشه ای - نهان کلاس خوشه - شبکه های بیزی - تصادفات رانندگی - طبقه بندی - شدت آسیب دیدگی - بزرگراه - ایمنی جاده -

کلمات کلیدی انگلیسی

Cluster analysis, Latent Class Clustering, Bayesian Networks, Traffic accidents, Classification, Injury severity, Highways, Road safety,

دانلود رایگان 2 صفحه اول مقاله لاتین (PDF)

پیش نمایش مقاله

چکیده انگلیسی

One of the principal objectives of traffic accident analyses is to identify key factors that affect the severity of an accident. However, with the presence of heterogeneity in the raw data used, the analysis of traffic accidents becomes difficult. In this paper, Latent Class Cluster (LCC) is used as a preliminary tool for segmentation of 3229 accidents on rural highways in Granada (Spain) between 2005 and 2008. Next, Bayesian Networks (BNs) are used to identify the main factors involved in accident severity for both, the entire database (EDB) and the clusters previously obtained by LCC. The results of these cluster-based analyses are compared with the results of a full-data analysis. The results show that the combined use of both techniques is very interesting as it reveals further information that would not have been obtained without prior segmentation of the data. BN inference is used to obtain the variables that best identify accidents with killed or seriously injured. Accident type and sight distance have been identify in all the cases analysed; other variables such as time, occupant involved or age are identified in EDB and only in one cluster; whereas variables vehicles involved, number of injuries, atmospheric factors, pavement markings and pavement width are identified only in one cluster.

مقدمه انگلیسی

Traffic accidents are contingent events and analysing them requires awareness of the particularities that define them. In general, accidents are defined by a series of variables – generally discrete variables – that explain them. Once the nature of the variables is known, researchers select the method that is most appropriate for developing and implementing the best statistical models for analysing the data in each case (Lord and Mannering, 2010, Savolainen et al., 2011 and Mujalli and De Oña, in press). One of the main problems of accident data and their modelling process is their heterogeneity (Savolainen et al., 2011). If this is not taken into account during the analysis, certain relationships between the data may not be detected. Researchers often try to reduce heterogeneity by segmenting traffic accident data on the basis of expert domain knowledge, methodological decisions or the intention to study a specific problem. Although expert knowledge can lead to a workable segmentation, it does not guarantee that each segment consists of a homogenous group of traffic accidents (Depaire et al., 2008). That is why specific analysis techniques, such as cluster analysis (CA), are used as aids in traffic accident segmentation. CA has been used in road safety analysis as a preliminary tool for attaining several aims. Karlaftis and Tarko (1998) used it to classify 92 areas of the state of Indiana into urban, sub-urban and rural areas. They applied Negative Binomial (NB) regression models to the results in order to analyse the influence of driver age on accidents. The results obtained with a model that used all the data and models based on clustered data showed statistically significant differences. Subsequently, Sohn (1999) used a Poisson regression model for previously clustered data (based on the latitude and longitude of each crash) to analyse accident frequency. Using CA, GIS (Geographic Information Systems) and NB models, Ng et al. (2002) developed an algorithm for estimating the number of accidents and evaluating their risk in a specific area. In a later study, Wong et al. (2004) proposed a method for evaluating the effect of a series of road safety strategies implemented in Hong Kong. They used CA as a preliminary step for grouping different road safety programmes and projects into smaller groups with significant road safety strategies. Ma and Kockelman (2006) used CA and a probit model to analyse the relationship between crash frequency and severity, road design, and the characteristics of use in the state of Washington. Depaire et al. (2008) used Latent Class Cluster (LCC) and Multinomial Logit (MNL) models to study the severity of traffic accidents. In their study, they identified seven clusters that represent different types of traffic accidents. Subsequently, they applied an MNL model to the full set of data and on each of seven identified clusters. Their results showed that the clustered data provided information that would not have been obtained if only the full database had been used. Recently, LCC have also been used by Park and Lord (2009) and Park et al. (2010) in order to segment a database and analyses vehicle crash data. Finally, Pardillo-Mayora et al. (2010) used CA to analyse data from run off road accidents to calibrate a roadside hazardous index for two-lane roads in Spain. The four characteristics considered for the index were: roadside slope, non-traversable obstacles, safety barrier installation, and alignment. They used CA to group the 120 combinations of the four indicators into categories with homogeneous effects on severity. Many previous studies have focused on compressing and identifying key factors that have an impact on the severity of the consequences of road accidents. Many different methodological approaches have been used to analyse severity (Savolainen et al., 2011): probit models (Bayesian ordered, binary, bivariate binary, bivariate ordered, heteroskedastic ordered, multivariate, ordered, random parameters ordered), logit models (Bayesian hierarchical binomial, binary, generalized ordered, heteroskedastic ordered, Markov switching multinomial, mixed generalized ordered, mixed joint binary, multinomial, nested, ordered, random parameters, random parameters ordered, sequential binary, sequential, simultaneous binary), log-linear model, partial proportional odds model, artificial neural networks, and classification and regression trees. Recently, Bayesian Networks (BNs) are being used to analyse traffic accident severity, with satisfactory results (Simoncic, 2004, De Oña et al., 2011 and Mujalli and De Oña, 2011). This paper presents an analysis of traffic accidents based on a combination of cluster analysis and Bayesian Networks. To the best of our knowledge, this is the first time that both approaches have been used together. The paper is structured as follows: Section 2 shows the methodology used to conduct the analysis, with a description of the Latent Class Clustering analysis and Bayesian Network techniques. Next, key characteristics of the data analysed are described. Section 4 shows the results and discussion, followed by the conclusions.

نتیجه گیری انگلیسی

This paper presents an analysis of traffic accident injury severity on rural highways conducted with the combined use of LCC and BN. The study uses 3229 traffic accidents’ records on rural highways. It is based on the standard police reports used in Spain, with information about 18 variables related with the injury severity of the accidents. LCC analysis identified four clusters (C1–C4) based on the variables accident type, shoulder type, paved shoulder, occupant involved and number of vehicle involved. The main differences in cluster identification are accident type (collisions or run-off road), and the existence of paved shoulders on highways. Therefore, the conclusion is that the two variables are important in accident analyses. BNs were built for each one of the four clusters and for the entire database (EDB). Accuracy, sensitivity, specificity, ROC area and HMSS were used as indicators for comparing model fitting (EDB's BN vs. the clusters’ BN). The models of clusters C1 and C3 (which showed the highest homogeneity) show global results that are identical to or better than the model using the EDB. Therefore, the results show that increasing homogeneity improves the models’ overall fitting. The results were compared with the BN that uses the EDB and the BNs generated for each cluster in terms of: direct dependence relationships between severity (SEV) and all the others variables for EDB and for all the clusters; and inference for EDB and for the two clusters that improved the performance indicators with regards to the EDB (C1 and C3). This comparison has provided information and insights from the analysis that would not have been obtained if only the EDB had been analysed, without making a LCC analysis beforehand. For instance, it can be seen that a set of variables (month, time, day, number of injuries, accident type, cause, age, gender, pavement width, shoulder type, pavement markings and sight distance) show direct dependence relationships with severity both in the EDB and in all the clusters. This implies that those variables are highly correlated with crash severity. On the other hand, no direct link is observed between severity and atmospheric factors in the case of the EDB, whereas a relationship does exist in all the clusters identifed, highlighting the important relationship between this two variables, which has been also identify in previous studies (Mujalli and De Oña, 2011 and Xie et al., 2009). The results from inference analysis identify several variables that have an influence on KSI accidents. They are identified by EDB, and by C1 and C3. These variables are accident type (ACT) and sight distance (SID). In all three cases (EDB, C1 and C3), when a collision with pedestrians (CP) occurs on rural highways, the probability of KSI is very high (0.6663–0.8747, in Table 6). Therefore, when pedestrians are frequent on such highways (i.e. on roads that link two villages that are close to each other) it is advisable to take precautions against such accidents (e.g. use of safety barriers on stretches of road where pedestrians walk on the shoulder). In the three cases it is also observed that when the SID is very restricted by topography (TOP), the probability of KSI is very high (0.6243–0.7497, in Table 6). Horizontal and vertical traffic signs generally take limited visibility into account (e.g. signals for overtaking other vehicles). However, the results also reveal that when SID is restricted by buildings (BUI), the probability of KSI is very high for EDB and C1. Therefore, it would be advisable to take limited visibility into account on rural highways, and to reassess visibility where there are buildings are close to the road. Inference also shows that certain variables that have not been identified as significant with the EDB's BN in determining whether or not an accident could be KSI, are identified as significant for a specific cluster. For example, in cluster C3 if there is only one vehicle involved in a collision (i.e. fixed object collision, run-off-road collision, or collision with pedestrian) the probability of KSI is higher than the probability of SI. In cluster C1, the variables number of injuries, atmospheric factors, pavement markings and pavement width are found to have a significant impact on the probability of KSI. Taking into account these results, specific road safety improvements could be applied. For example, in order to reduce the severity of collisions on highways with shoulders (cluster C1), road markings should be repainted and signs of narrow lanes should be used when road markings do not exist or are deleted or when pavement width is less than 6 m. None of these results would have been obtained if only the EDB had been analysed, with no prior LCC analysis. This study shows that the combined use of both methods (LCC and BNs) provide new information and insights on the main causes of accident severity that could be useful for road safety analysts. Therefore, this study agrees with previous research (Sohn, 1999, Karlaftis and Tarko, 1998, Ng et al., 2002, Wong et al., 2004, Depaire et al., 2008 and Pardillo-Mayora et al., 2010) and shows that when analysing traffic accidents, it is worthwhile to segment the accident records to increase data homogeneity before applying other analysis techniques. Several considerations should be kept in mind when interpreting and generalizing the results of this study. The results obtained in this paper are very dependent on the initial data (two lane highways accidents with 1, 2 or 3 vehicles involved) and by the methods used (Latent Class Clustering and Bayesian Networks). Different results might have been obtained if other analysis data and methods had been used. All clustering techniques are very sensitive to the possibility of finding a local maximum instead of a global maximum. In this regard, the solution found is dependent on the initial parameter values. To prevent ending up with a local solution, the Latent GOLD program uses 10 sets of random start values (Vermunt and Magidson, 2005). Bayesian Networks need large datasets. The number of cases in EDB and C1 are comparable with previous studies (De Oña et al., 2011 and Mujalli and De Oña, 2011). However, because of the clustering procedure, C2, C3 and C4 present a limited number of cases. Therefore, BN results for these three clusters should be interpreted carefully.