Multi-Level Crash Prediction Models Considering Influence of Adjacent Zonal Attributes

This study investigates factors affecting accidents across transport facilities and modes, using micro and macro levels variables simultaneously while accounting for the influence of adjacent zones on the accidents occurrence in a zone. To this end, 15968 accidents in 96 traffic analysis zones of Tehran were analyzed. Adverting to the multi-level structure of accidents data, the present study adopts a multilevel model for its modeling processes. The effects of the adjacent zones on the accidents which have occurred in one zone were assessed using the independent variables obtained from the zones adjacent to that specific zone. A Negative Binomial (NB) model was also developed, and results show that the multilevel model that considers the effect of adjacent zones shows a better performance compared to the multilevel model that does not consider the adjacent zones’ effect and NB model. Moreover, the final models show that at intersections and road segments, the significant independent variables are different for each mode of transport. Adopting a comprehensive approach to incorporate a multi-level, multi-resolution (micro/macro) model accounting for adjacent zones’ influence on multi-mode, multi-segment accidents is the contribution of this paper to accident studies.


Introduction
Accidents are and have always been regarded as one of the sad consequences of transportation systems. In 2015, 19.9 people out of each 100000 have died in accidents in Iran; compared to 5.1 casualties in Europe [1,2]. Hence, the necessity of paying more attention to transportation safety and carrying out pertinent investigations seems unavoidable. Since the nature and mechanism of accidents varies across diverse transportation facilities, it is essential to run separate investigations on the accidents of each mode. The studies done over the recent years have mostly addressed accidents of all modes together or have considered only a single mode (vehicle, motorcycle and pedestrian) [3]. This has led to an inaccurate understanding of the factors affecting accidents since one factor might increase accidents in one mode while decreasing them in the other. Therefore, the significance of reaching an accurate understanding of the factors leading to accidents necessitates separate investigation of accidents across different transportation modes and facilities.
Most of the previous studies on transportation safety have addressed micro-level factors which are related to accidents such as road geometry or the road lighting quality. Most of them are carried out at the operation time of transport 650 facilities and are based on the data related to the existing accidents. Then, some engineering solutions are proposed based on interpretation of model results [4]. On the other hand, some researchers have recently investigated the factors affecting accidents at a macro level. These factors, for example economic and demographic variables, have been considered at different geographical levels such as Traffic Analysis Zones (TAZs) and census blocks. The development of accident prediction models based on macro-level variables has increased the amount of attention paid to safety in studies related to road network planning. These models result in safety enhancements prior to the operation of transportation facilities.
To get an accurate understanding of the factors leading to accidents, both micro-level and macro-level factors need to be attended to simultaneously. That's why a number of studies have recently addressed the factors affecting accidents while considering the factors at both micro and macro levels simultaneously [5]. The present research has investigated both micro and macro variables across different types of facilities (intersections and road segments) and modes of transport (vehicle, motorcycle and pedestrian). The macro-level variables used in this study were collected at TAZ-level. The use of TAZs is more common than other geographical levels (e.g. census block) because zone divisions are more in line with the studies related to transport planning models and variables (e.g. trip generation and trip distribution) are more readily accessible.
Since in the present study macro-level variables are extracted at TAZ-level, they are the same for the accidents which have occurred at the intersections and road segments in zones. Therefore, the present study has adopted a multilevel model for investigating the amount of intra-zonal correlation resulted from similar macro variables. Multilevel models are more suitable for sets of data which are multilevel and in which low-level data are nested in higher-level data [6].
In this study, the data were categorized in two levels. The first level accommodated the micro-level variables related to each accident and the second level included the macro-level variables related to TAZs. The hierarchical structure of the database is shown in Figure 1.

Figure 1. Hierarchical structure of the variables
On the other hand, considering the same macro-level variables for the accidents which have occurred in a single TAZ but in different situations might lead to an inaccurate understanding of the factors affecting accidents. This is because the occurrence of some of them might have been affected by the factors available in adjacent TAZs. For instance, an accident which has occurred at the borders of a TAZ can be more affected by the factors available in adjacent TAZs than the factors of its own zone.
There are two options available for measuring the influence of adjacent TAZs on the accidents which have occurred in a TAZ. The first option is through Spatial Error Correlation Effects in which the unobserved exogenous variables available in one zone exert influence on the dependent variables which exist in the target as well as adjacent zones [7,8]. The second method is carried out by means of Spatial Spillover Effects in which the observed exogenous variables which are available in one zone affect the dependent variables existing in both the target zone as well as the adjacent zones [9, 10, and 11].
In this study, to evaluate the influence of neighboring zones on the accidents of one zone through spatial spillover effects procedure, the adjacent TAZs of one zone were identified. Then, some new variables were determined based on the amounts of this variable in the neighboring TAZs. The final models were developed using these variables.
Considering the above-mentioned, the objectives of the present study can be summarized as follows: 1. Investigating the factors affecting accidents across transport facilities and modes; 2. Considering the factors affecting accidents at both micro and macro levels simultaneously and measuring the influence of each; 3. Investigating and measuring the intra-zone correlation effect using a multilevel model; 4. Studying the influence of zones adjacent to a TAZ on the accidents which have occurred in that TAZ. Figure 2 shows different steps followed in this research. The next section reviews related studies in this regard. Section 3 addresses data collection and Section 4 explains the methodology of the study. Next, the results obtained from the final models are presented followed by the conclusion.

Literature Review
Finding the suitable method of analysis and selecting influential independent variables are two factors that affect the development of safety models. In the past years, researchers have proposed numerous methods for developing accident prediction models using different variables. The details of these methods and their results are presented in review papers [12,13] The accident prediction models are mostly developed using micro-level variables [14][15][16]. These studies have helped to identify the factors affecting the accidents at this level and determine solutions for decreasing the number of accidents in different transport facilities like intersections or road segments. On the other hand, extensive efforts have been made in recent years to develop accident prediction models using macro-level variables. These variables include roads length with different functional classification in a zone [17,18], and trip generation and trip distribution data for each TAZ [19]. They also include environmental conditions like land use specifications [20], and socioeconomic factors like household income [21]. The results of these researches have led to the consideration of safety indices in road network planning.
Cai et al, investigated the influence of macro-level variables at TAZ-level on pedestrians and cyclists' accidents using Dual-State models. In their study, they were also trying to measure the influence of the neighboring zones on the accidents of one zone. According to their results, some factors like population density, employment rate, and the number of public transport users in one TAZ increase the number of accidents. Moreover, the influence of adjacent zones on the accidents of one zone turned out to be significant and Dual-State models, especially Zero-Inflated Negative Binomial model, showed a better performance in comparison to Single-State models [9].
To reach an accurate understanding of the factors affecting accidents, it seems necessary to consider suitable variables at both micro and macro levels simultaneously and develop appropriate models. Although extensive researches have been carried out on investigating the factors affecting accidents at micro and macro levels, few studies have so far merged the data at both levels to develop accident prediction models.
Huang et al, developed accident prediction models at micro and macro levels and compared the performance of these models in predicting hot zones. Based on this study, accident prediction models were developed for TAZs using macrolevel variables, and for intersections and road segments using micro-level variables. The results of this research indicated that models have a better performance at micro level and present a better picture of the micro variables affecting traffic accidents. Whereas, for investigating safety at TAZs, using accident models at macro level is more fruitful because less detailed data are needed here [22].
Guo et al, developed accident prediction models for signalized intersections based on variables at micro and macro levels. Based on the results of this study, the researchers found that at corridor level, Poisson models have a better performance in comparison to other models [23]. Mitra and Washington, investigated the effect of variables at micro level like Annual Average Daily Traffic (AADT), and at macro level like population across different age groups, total population and the number of schools in zones adjacent to intersections on the number of accidents. Based on the results of this study, the amount of population in 16-64 age groups and the annual average rainy days have a significant effect on the occurrence of accidents. Moreover, through comparing the models developed based on traffic parameters and the models developed based on all variables, the researchers found out that the omission of macro-level variables can significantly increase the effect of other factors like AADT [24].
In this study, the variables are considered at both micro and macro levels. Since the macro-level variables available for the accidents in a TAZ are the same, the structure of data in this study is multilevel. A multi-level data constitutes of correlation among observations and inter-group independence where lower-level data are nested in the higher-level data. When accident data is multilevel, using multilevel models, which account for intra-group correlation of accidents data, is beneficial [6]. Detailed information about multilevel data and the adoption of multilevel models in studies related to safety can be found in [25].
Huang and Abdel-Aty, adopted a five-level structure (geographic region level, traffic site level, traffic crash level, driver-vehicle unite level and occupant level) as the general structure of accidents data. In this study, macro-level analysis is considered based on the three high levels of geographic region level, traffic site level, and traffic crash level; and micro-level analysis has considered the three low levels of traffic crash level, driver-vehicle unit level, and occupant level. The authors in this study have proposed different methods for investigating multilevel data including analysis of accidents data at intersections and time level [26].
Shi et al, investigated the number of highway accidents using multilevel and Negative Binomial (NB) models. In their research, the highway was divided into 196 segments based on its geometrical specifications. The traffic data were obtained through Automotive Vehicle Identification (AVI) systems installed in the highway. Since the output data of AVI systems divided the highway into 43 segments, each AVI system represented the data related to some segments. Due to the dual-level structure of the data, a multilevel model was used for investigating traffic accidents. Results show that the multilevel model had a better performance than NB model. Moreover, some factors such as the increase of speed or the increase of the horizontal degree of curvature decrease the number of accidents [27].
Considering the information presented above, in this study, both micro-level and macro-level independent variables for the accidents which have occurred at intersections and road segments are collected across transport modes (vehicle, motorcycle, and pedestrian) so that a comprehensive investigation can be carried out. Besides, the performance of multilevel models in estimating the number of traffic accidents was evaluated as well. Finally, the influence of the adjacent zones on the accidents were investigated.

Data Collection
In the present study, for developing accident prediction models using micro and macro variables, accident data over the years 2014 and 2015, were collected for the west and the south west of Tehran, Iran. In general, data related to 15968 accidents (1231 accidents occurring at intersections and 14737 in road segments) which have occurred in 360 intersections and 892 road segments were collected. Tehran, as the capital city of Iran, has 5 main areas which in total comprise 22 districts. The west and the south west main areas are composed of districts no. 9, 10, 17, 18, and 19 that include 96 TAZs. The accident data was obtained through the database available in Tehran Traffic Police Center, and the demographic data, i.e. population, education and employment were obtained through Iran National Census Center. Besides, the traffic data was collected through Transportation and Traffic Organization of Tehran Municipality and based on the results obtained from running Tehran traffic model. After collecting the required data, all information was imported to the GIS application. Then the traffic, social, and demographic data related to each accident was calculated. Figure 3 shows the districts under study along with the TAZs in those districts.

districts of Tehran including study area
Study area including related TAZs The independent variables are shown in figure 4 and table 1 and table 2 list the variables used in this study at micro and macro levels at intersections and in road segments respectively across modes of transport along with their descriptive statistics.
Since in regression models there is usually a logarithmic relation between independent variables and the response variable, using the logarithm of independent variables in the modeling process makes interpretation of the results much easier. This is also very common in the previous studies [28,29]. Moreover, this method also decreases variance among variables [17,30]. Hence, the present study uses logarithmic conversion of the variables related to the population and trip generation, trip distribution of TAZs and traffic volume in road segment.
As already mentioned, to evaluate the effect of adjacent zones on the accidents occurring in a TAZ, all TAZs adjacent to that TAZ were identified. Then, a new variable based on the value of the each independent variable from surrounding TAZs was obtained. These variables capture the effect of neighboring TAZs on crash frequency in one TAZ. In Table  3 you can find a descriptive summary of the variables extracted from adjacent TAZs.

Methodology
Poisson model is a type of statistical model which, due to the random, non-negative and sporadic nature of accidents data, has had remarkable and successful applications. One of the fundamental assumptions of this model is equality of accidents' mean frequency and variance. To consider the over-dispersion of accidents data, an NB model would be adopted. By adding gamma-distributed error term to the average available in Poisson model, this model considers the over-dispersion available in accidents data and thus is preferred over Poisson model.
The formula for the NB model is presented in the following equations: Where: Y i : Represents the crash frequency by modes at intersection i or road segment i; λ i : Shows the expectation of Y i ; X i : Indicates a vector of explanatory variables; β 0 : The intercept; Β: The vector of estimable parameters; ε i : Represents the error term which is considered to be independent X and has a two-parameter gamma distribution.
One of the main assumptions in NB models is the independence of observations. However, it is hardly possible in practice to consider accidents independent from one another. For instance, the accidents occurring in one area might have unobserved common factors [12]. To enhance accidents models for intersections and road segments and to consider the correlation among accidents occurring in one zone due to their common macro-level variables, the present model adopted a multilevel modelling approach.
The general equation for the single-level model or the conventional simple regression model is as follows: In the above equation, the subscript i represents an individual respondent, y and x stand for the dependent and independent variables respectively. There are also two fixed parameters (β0 and β1) that show the intercept and the slope, and a random part (e) that makes it possible to have fluctuations around the fixed part. The word "random" here means "allowed to vary".
The micro-level of the individual is the sole place where this equation is specified. For developing a multilevel model, this micro-model needs to be re-specified through differentiating TAZs with the subscript j. This provides the following for the random intercept and random slope model: y ij = β 0j + β 1j X 1ij + e ij (4) At TAZ-level, two macro models exist: β 0j = β 0 + u 0j (5) β 1j = β 1 + u 1j (6) The first macro-model allows for different TAZ-level intercept (β0j) to change from one TAZ to another around the overall intercept (β0) through the addition of random component u0j. The second macro-model allows for differential slope (β1j) to change around the overall slope (β1) through the addition of random component u1j [31].
Once more, the micro model is regarded as an intra-zonal equation, whereas the macro models are between zonal equations where the parameters of the intra-model are the responses.
It is worthy of notice that when the notation is used with eij as a part of the micro model as opposed to the macro model for in that case just the micro-model includes both subscripts i and j, and this demonstrates a within situation, whereas the macro-model in that case just includes subscript j, which demonstrates a between situation. The completely random two-level model includes a combination of all three equations: The best accident model for intersections and road segments in each mode of transport was chosen based on three criteria, namely log-likelihood, Akaike's Information Criterion Corrected (AICC) and Bayesian Information Criterion (BIC). What follows are the formulae for this measure: In the above formulae k represents the number of parameters, n indicates the number of observations, and LL(full) shows the log-likelihood for the full model.

Results and Discussion
Accident prediction models were developed across intersections and road segments and for three modes of transport (vehicle, motorcycle, and pedestrian). After developing numerous models including 3 types of NB models, multilevel models with and without neighboring zones' effect, 18 final models were created in total. To compare the performance of the multilevel model with that of the NB model and to find out how much influence is exerted by the neighboring zones on the accidents occurring in one zone, some comparisons were made among the final models based on the criteria Model Goodness of Fit, Log-Likelihood, AICC, and BIC. The summary of model performances for models of intersections and road segments are given in Table 4 for each mode of transport (vehicle, motorcycle, and pedestrian).
Results show that multilevel models which take into account the effect of neighboring zones have a better performance than other models. Since the independent variables used in the modeling process proved significant in some modes of transport yet insignificant in some others, the investigation of factors affecting accidents across different facilities and in different modes of transport seems necessary. This result agreed with the safety research that the sets of significant variables in crash frequency analysis differed for different transportation modes [32]. Moreover, multilevel models can better estimate the number of accidents for they consider the multilevel structure of the data. Considering the effect of the neighboring zones on the accidents occurring in a TAZ has a significant effect on both model performance and the results of model goodness of fit. Table 5 lists the coefficients and average marginal effects of the significant variables (P-value<0.05) in six final models. Based on the final models the following results were achieved. Figure 5 shows spatial distribution of modeled vs. observed accidents per TAZ by transportation modes.

Micro Variables
Based on the final model results for vehicle accidents at intersections, higher age of drivers leads to less accidents. Since older drivers are more experienced and practice more caution in driving, the number of their accidents would decrease. This result is similar to the findings of Kazazi et al, which concluded the older drivers had fewer accidents because of their more cautious behavior [33].
Drivers' level of education is negatively correlated with the number of pedestrian accidents in road segments, i.e. as the level of education increases among drivers, fewer pedestrian accidents occur in road segments. Since more educated people observe the rules more meticulously, the number of accidents, quite naturally, decreases. It is consistent with previous study that drivers who had a lower education level had more accidents [34].
Rainy and snowy weather decreases the number of motorcycle accidents in road segments. This variable does not show a significant change in other modes of transport and for the accidents occurring in intersections. This significant change can be explained by the fact that when it rains or snows, the use of motorcycle decreases which naturally leads to fewer accidents for this mode of transport in such weather conditions. Mitra and Washington, showed that the annual average number of rainy days had negative relationships with crash occurrence at intersection. They explained that is because of decreased driving population during the rainy season in the sample data [24]. On the other hand, some previous studies found that rainy weather increases crashes [35]. In this case, further studies and consideration of all related factors such as friction factor of road surface is required.
Higher volumes of traffic leads to more accidents in road segments in all three modes of transport (vehicle, motorcycle, pedestrian), which is consistent with previous studies [36][37][38]. Obviously, such an increase in traffic increases the amount of activity on roads which in turn results in a higher likelihood of accidents.

Macro Variables
Higher ratios of bus lines to roads length in a TAZ leads to fewer motorcycle accidents at intersections and fewer accidents in road segments for all modes of transport. Since increasing the number of bus lines leads to increasing public transportation facilities in one zone which in turn encourages people to use these facilities, the traffic of vehicles and motorcycles would naturally decrease in that zone. This lower volume of traffic also results in lowering the possibility of accidents (i.e. exposure).
Some researchers found that population is statistically a significant variable to predict crashes [39,24,5]. Lee et al, observed that a higher population density had a propensity to increase pedestrian and bicycle crashes in intersection [5]. In this study, population increase in one TAZ also increases the number of pedestrian accidents in intersections and the number of motorcycle accidents in road segments. Moreover, population increase decreases the number of vehicle accidents in road segments. Since the study area is a densely populated urban area, the road network is mainly composed of collectors and due to the low capacity of these roads, the volume of vehicle traffic is not much. Therefore, in densely populated areas, due to their roads network structure and the composition of the passing traffic, the number of pedestrian and motorcycle accidents increases and that of vehicles decreases. Population increase in the neighboring zones of one TAZ leads to more pedestrian accidents in intersections and vehicle accidents in road segments. Since population increase leads to more people moving about in that zone as well as in the neighboring zones, the likelihood of pedestrianvehicle accidents increases.
Trip generation increase in one TAZ leads to a higher number of pedestrian and vehicle accidents in road segments and this increase in the neighboring zones leads to more vehicle accidents in intersections. Increase of trip generation in one zone and in its neighboring zones leads to a higher volume of vehicle passing traffic in the roads network which, in turn, increases the likelihood of vehicle collisions. Several previous studies found that trip generations and attractions have a significant impact on crash frequency [40]. For example, trip generations and attractions per area are positively associated with segment crash frequency; however, this factor has no significant effect on intersection crashes [41].
Considering the results of the final models, a higher percentage of literate people in one TAZ leads to more vehicle accidents in intersections and more motorcycle and pedestrian accidents in road segments. Moreover, a higher percentage of literate people in the neighboring zones of one zone leads to more pedestrian traffic accidents in intersections and generally accidents in road segments in all three modes of transport. Since higher levels of education usually leads to higher levels of public welfare, when the percentage of literate people increases in one zone, in fact the level of public welfare increases which normally leads to the possession of more personal vehicles. Therefore the volume of vehicle traffic increases which, in turn, results in more accidents in the mentioned transportation modes and facilities.
Number of employment plays a significant role in the probability of an accident [42,17]. According to the results in this study, as the number of employed people increases in a TAZ, the number of vehicle accidents in intersections and the number of motorcycle and pedestrian accidents in road segments increase as well. Also, as the number of employed people increases in the neighboring zones of one TAZ, the number of motorcycle accidents in intersections and road segments and the number of vehicle accidents in road segments increase. When the number of employment increases in a TAZ, not only does the vehicle traffic volume increase in that zone and its neighboring zones, but also the level of public welfare and, in turn, the number of personal vehicles increases, too. This leads to a higher volume of passing motor vehicle traffic and more accidents in those zones.

Sensitivity Analysis of Variable
To run a quantitative comparison of the effects of different variables on the accidents occurring in road segments, a sensitivity analysis of variables was carried out. This would help in understanding how important a variable is in the accidents of each mode of transport (vehicle, motorcycle, and pedestrian). In fact, the sensitivity analysis shows how much of the variance in the response variable is accounted for by one unit of change in the independent variable. The results are presented in Table 5.
As can be seen, in intersection accidents, trip generation in the neighboring zones has the greatest impact on the number of accident, so that adding 1 unit to the logarithm of trip generation in the neighboring zones of a zone increases vehicle accidents in that zone by a factor of 79.04. On the other words, when the trip generation is 10 times, vehicle crashes is 79.04 times. The percentage of literate people in one TAZ has the least effect, such that for each percentage of literate people in one TAZ increase, the expected number of accident is increased by a factor of 1.03.
Regarding crashes in road segments, the logarithm of traffic volume in a road segment has different effects on accidents based on transportation modes, such that adding 1 unit to this variable increases vehicle, motorcycle and pedestrian accidents by factors of 3.03, 2.29 and 1.54 respectively. Therefore, the influence of this variable on vehicle accidents is approximately 2 times stronger than that on pedestrian accidents. Regarding the variable population's logarithm of a TAZ, adding 1 unit to this variable decreases the number of vehicle accidents by a factor of 0.22. Whereas, this variable increases motorcycle accidents by a weight of 1.4.
The same interpretation is true for the other variables available in Table 5. Based on the results obtained from analyzing the sensitivity of the significant variables of the final models, it is observable that a variable influences the accidents in various modes of transport differently. Therefore, the necessity of investigating the factors influencing accidents across different modes of transport becomes evident.

Conclusion
Since the factors affecting accidents are different across transport facilities and modes, the accident prediction model of the present study was developed across different modes of transport (vehicle, motorcycle, and pedestrian). The independent variables were considered at both micro and macro levels. To consider the intra-zone correlation due to common macro variables for the accidents occurring in one TAZ, a multilevel model was adopted in the modeling process. Since the accidents occurring in one TAZ might be affected by the variables available in neighboring zones, a multilevel model was also developed using the variables extracted from the neighboring zones of one TAZ. To this end, the data related to 15968 accidents occurring in 96 TAZs of Tehran were collected. Next, the traffic, social and demographic data related to the area under study were also collected and a database was created under GIS application platform.
Multilevel models were developed with/without considering the effect of neighboring zones, and for comparison purposes, an NB model was also developed and its results were compared with those of the multilevel models.
The final models were developed for the accidents occurring in intersections and road segments for each mode of transport based on the criteria model fit, Log-Likelihood, AICC, and BIC.
According to the results, the multilevel model which considered the influence of neighboring zones had a better performance in comparison to the other two models, namely the multilevel model that did not take into account the effect of neighboring zones and the NB model. Considering the results obtained from the final models, some variables like higher driver age, higher level of driver education, rainy and snowy weather, and higher ratio of bus lines to the roads length available in a TAZ lead to decreasing the number of accidents; while some variables like higher vehicle traffic and higher percentage of employed and literate people in a TAZ lead to a higher number of accidents. Since the variables used in the modeling process proved significant in intersections and road segments for some modes of transport but insignificant for some other modes, and also Based on the results obtained from analyzing the sensitivity of the significant variables of the final models, because the effects of these variables were of different types and amounts in intersections and road segments for different modes, the necessity of investigating the factors affecting accidents across different modes and different transport facilities seems unavoidable.

Acknowledgements
The authors would like to thank from Transportation and Traffic Organization of Tehran Municipality for providing the traffic data.