Ranking and Determining the Factors Affecting the Road Freight Accidents Model

The tremendous growth of population, particularly in developing countries, has led to increased number of travels, especially those with load and freight specifications. Hence, expanding the present facilities or developing new networks or systems concerning freight and transportation is an essential issue. Among the various transportation systems, road freight has secured a significant place in sub-urban transportation, as it is responsible for transporting loads, decreasing transportation costs, and increasing the safety of highway users. Besides these advantages, poor and nonstandard design and performance of sub-urban highways and transport fleet and equipment leads to the increased number of accidents and inefficiency of these facilities. Based on these facts, the primary aim of the present study is to probe into the factors affecting road freight accident severity. For this purpose, the data obtained from road freight accidents occurring in 2016, 2017, and 2018 in Gilan Province, Iran, were used for analyzing the frequency, ranking and determining the factors, and creating models for accident severity. The results indicated that in accordance with the accident severity model in 2016, several factors such as the season of autumn, daytime light, drivers aged from 18 to 60, and pickup trucks have impacted the on-road freight accident severity. While, in 2017 the severity was affected by factors like rural road, freight trucks, non-faulty passenger cars, motorcycles, and pedestrians. When considering the effective variables in 2018, it was found that such factors as the accident time (usually occurring between 12 p.m. to 6 p.m)., rural and major roads, freight trucks, non-faulty motorcycles, and the careless driving without due care and attention to the front were the variables affecting road freight accidents. Moreover, not following safety guidelines during freighting is the most effective variable in road freight accidents.


Introduction
Over the last decade and also nowadays, road accidents have experienced an increasing trend; and there have been various studies on accidents around the world [1][2][3]. Road accidents are considered to be a typical phenomenon all over the world and approximately 1.3 million civilians die as a result of this phenomenon. Moreover, approximately 20 to 50 million people have been injured in these accidents, where the majority were young people with ages ranging from 15 to 45. Road accidents are claimed to be the ninth most important factor of fatality in the world as it accounts for 2.2% of the mortality rate in the world. The costs of accidents are estimated to be about 500 million dollars all over the world. This is equal to between 1 to 2 percent of GDP in the countries with low to average income. The current trend of accidents indicates that if emergent measures are not taken in this regard, it is likely that road injuries will become the seventh most important factor of fatality by 2030, 90% of which will have occurred in the countries with low to average income. It is essential that emotional and mental injuries as well as permanent disabilities resulting from these accidents be added to these complications [4].
Analyzing the accidents, it could be concluded that normally several parameters account for the accidents, which can be categorized into human, environmental, and vehicle-dependent factors [5]. Various studies have revealed that some parameters like AADT, traffic congestion, exceeding the speed limit and the number of lanes, drivers' distraction, and weather conditions are among the factors which affect accidents [6][7][8]. Increase in the severity of any of the above-mentioned factors and/ or any inappropriateness of these conditions can lead to increased number of accidents [9][10][11].
An increase in the number of accidents results in the increase in the number of owners of motor vehicles. It can be said that the mentioned growth has been about 65% over the last 2 decades; while in developing countries it has occurred faster than this [12]. Because of this, road accidents are considered to be one of the most significant issues of public health in any given society. Besides, it can be concluded that this problem is more serious than other public health-related issues as the majority of its victims are young and healthy people [13,14]. Ghaffar et al. (2004) [15] evaluated the effects of road traffic injuries (RTI) in Pakistan. The results indicated that most accidents happened between 12 and 18 pm. The level of RTI is higher among people aged between 16 and 45 years. Furthermore, the results showed that RTI is almost three times higher in males than females. Labinjo et al. (2009) [16] provided a population-based survey to explore the epidemiology of RTI in Nigeria. The results showed that motorcycle accidents accounted for 54.33% of all RTI. The risk of crashes was higher among males aged between 18 and 44. Hu et al. (2012) [17] explored the characteristics of traffic accidents on rural roads using the quantitative analysis. The research shows that 92.68% and 5.42% of casualties occurred on tangents and curves, respectively. Considering the time parameter, casualties during the daytime have been more serious than the night. Also, the results showed that crashes that cause injuries are most common during the day rather than night and motor vehicle accidents account for the majority of casualties. Zangooei Dovom et al. (2013) [18] explored the fatal accident distribution in Mashhad, Iran. According to the results, the male had more fatalities than the females. For both genders, most accidents had a peak at the ages of 21-30. The male to female overall casualty ratio was 3.41. Among all the road users, the riskiest group was male motorcyclists. Lee and Jeong (2016) [19] investigated the characteristics of traffic collisions occurred in expressways and rural roads among the truck drivers. The results showed that with respect to the day of the week, the accident rate was higher in the middle of the week. On rural roads, the accident rate during the daytime was much higher than the night time (81.7%). The accidents occurred mostly in clear/cloudy weather (76.2%). Besides, the majority of accidents occurred over a straight road (62.2%), followed by an intersection (15.4%) and a curved road (9.4%).
Road freight is the oldest way of transporting cargo in the world. In terms of the price and speed, it is the most appropriate way to transport a variety of goods and cargo. Nowadays, the majority of the cargo is transported by means of road freight, which along with its merits, has some disadvantages. One of the disadvantages is the accidents occurring due to the failure of freight vehicles on roads, which in addition to economic losses results in fatality or injury of road users as well. Hence, in this study, the effective factors are identified by means of considering the severity of road freight accidents, particularly the accidents resulting from some vehicles like pickup trucks, trucks, trailer cars, etc. Then, through modelling and statistical analyses of accidental data, the appropriate approaches to decrease road accidents and increase civil welfare and traveller's safety are identified. For this purpose, after analyzing the frequency of accidents, their ranking is performed and the factors affecting the severity of accidents are determined. Moreover, the impact of independent variables on the severity of freight vehicle accidents is modelled. The purposes and adopted innovations in the present study can be summarized as follows:  Application of Friedman test and Factor analysis methods for the road freight accidents,  Investigating the effect of independent variables on the severity of freight vehicles,  Frequency analysis of variables affecting freight vehicles accidents,  Ranking independent variables affecting the severity of freight vehicles,  Modelling of independent variables affecting the dependent variables of freight vehicles accidents severity,

Research Methodology
In this section, initially, the specifications of the study area will be introduced. Then, different statistical analysis methods and modelling of accident severity will be executed according to Figure 1.

Case study
Gilan is one of the northern provinces of Iran, whose capital is Rasht megacity. This province lies along the Caspian Sea and Azerbaijan, sharing with it an international boarder via Astara in the north. It is located to the west of Mazandaran Province, east of Ardabil Province and north of Zanjan and Qazvin Provinces. Gilan covers an area of 14044 square kilometres and based on the census carried out in 2012, its population is 2480874. Gilan is the tenth province in Iran in terms of population and is the second most populated province in northern Iran i.e. it ranks only second to Mazandaran Province. The population density of this province is 177 persons per square kilometre, which secures third place in Iran. Constituting 46% of the total population of the province, Rasht megacity is the center of the province and the most populated city in the north of the country and the 11 th most populated city in Iran [20,21].
Sub-urban highways of Gilan Province are 2573 kilometres in length, of which 1682 km, i.e. 65% of the total length of highways within the province and 2.2% of the total length of highways of Iran, are capable to be utilized for freight purposes. There are 363, 256, and 1063 kilometres of the mentioned total highway network length function as highway, main, and rural roads, respectively. Figure 2 displays all the existing roads in Gilan Province, which can be used for freight transport. Due to the abundance of details, the functions of rural roads are excluded [22].

Statistical Analysis and Modeling Methods
For analysing the Road Freight Accidents of Gilan Province, some statistical analysis and modelling methods, including Kolmogorov-Smirnov test (K-S test), Friedman analysis, Factor analysis, and Logit modelling were used. The statistical analyses used in this study were performed using SPSS software.

Kolmogorov-Smirnov Test (K-S test)
One of the main assumptions for most statistical tests is the normality of data distribution where the Kolmogorov-Smirnov test (K-S test) is utilized for this purpose. This test is a nonparametric test for data distribution. In approximate significance test, comparing | the output with α (significance level), the normality of data distribution can be determined. If α =0.05(means with 95% certainty) if P-value >0.05, the data distribution can be assumed as normal. Indeed this test is a compliance testing of the quantitative data distribution. Normality distribution test is the most common test for examining the normality of a specific distribution [23].

Friedman Test
The Friedman test is one of the statistical tests used to compare between several groups and, ranks groups by using the average value, whether these groups belong to one community or not. This test is a non-parametric one corresponding to the F test and is usually used in ranking scales rather than the F test [24]. In the F test, there should be homogeneity of variances that is less observed in ranking scales. The Friedman test is applied for the analysis of two-way variance (for non-parametric data) by a ranking method. Also it is used to compare the average ranking of different groups.

Factor Analysis
The factor analysis method is used to find out the underlying variables of a phenomenon or for summarizing a set of data. The primary data for factor analysis is the matrix of correlation between variables. Factor analysis does not have predetermined dependent variables. Factor analysis is applied for two general categories: exploratory purposes and confirmatory purposes. If there is no speculation about the structure of the dimensions relationships, exploratory factor analysis is used. Otherwise, the confirmatory factor analysis is used [25].
In the exploratory factor analysis, the researcher seeks to investigate the empirical data to discover and identify the indices as well as the relationships between them. There is no pre-defined model here. In other words, exploratory analysis, in addition to its exploratory or suggested value can be a structure maker, modeller, or hypothesis creator. Exploratory factor analysis is used when the researcher does not have sufficient previous and pre-empirical evidence to create a hypothesis about the number of underlying factors and wants to determine the number or nature of the factors justifying the covariance of variables. Therefore, exploratory analysis is more considered as a method of compilation and production of a theory, rather than a method of testing a.

Multiple Logit Regression
Establishing a relationship between the set of variables x and the dependent variable Y, we would encounter a multivariable problem. In analysing such a problem, various types of mathematical models have been used to consider the complexity of the relationship between these variables. The logit regression method is a mathematical method used to describe the relationship between multiple variables denoted by x and a two-valued dependent variable. A function that is used in this method is an S-shaped function called the logit function, which can also be applied in multi-valued problems by expansion [26]. As it is known, the logit regression method can be utilized to define the variable Y as the multi-valued parameter. In the simplest case, we can consider P(Y=i) as a linear function of XI (P i = x i β), where β is the vector of regression coefficients. This equation considers that the probability P i at the left side of the equation should be between zero and one, but the linear vector product x i β at the right side could include all the real numbers. A simple method for solving this problem is to use the probability transfer function to remove the distance limits and model the transferred function as a linear function of the parameters. This conversion occurs in two steps. First, the P i probability changes to the chance of success according to Equation 1: In the second step, the logarithm of the above-mentioned equation is taken to obtain the logit or success chance logarithm (Equation 2): The results are quite similar. The reverse transfer function, also called anti-logic, is applied to calculate the probability in terms of logit (Equation 3): The fact that the value of the logit function varies between zero and one, is the first reason for using this function in the probability problems. The second reason in this regard is the form of this function, so that if we start from negative infinity and move to the right, by increasing z, the value of f (z) does not change much and remains in the range of zero until we reach the growth threshold. In this range, the value of the function increases rapidly to approach unity, and at this time, the increase of z does not have much effect on the increase of the function value. Therefore, the logit is a transfer function that associates the probabilities in the interval (0, 1) with all the real numbers. The negative logit represents a less than 50% probability, and the positive logit represents a more than 50% probability. Thus, the logit model is a general linear model that has a logit transfer function. In other words, the logit of Pi probability, instead of the probability, follows the linear model [27].

Results and Analysis
In this part, accidents data in 2016, 2017, and 2018 were used to identify the variables affecting the accidents leading to damage, injury, and fatality when encountering the freight vehicles. Then, the data were considered in terms of statistics and frequency. K-S test, Factor Analysis, Friedman, and Logit test analyses were employed to consider the variables affecting the severity of accidents.

Frequency Analysis of the Accidents Results
In this study, there is one dependent variable, i.e. accidents severity, and there are 12 independent variables such as time of the accident, day of the accident, season of the accident, road function type, road pavement condition, accident point geometry, lighting status, type of freight vehicle responsible for the accident, age of the faulty driver, type of the non-faulty vehicle, weather condition, the main cause of the accident, where their frequency analysis is presented in the following part.  The results of Figures 3 to 14 indicate that more than 50% and 40% of road freight accidents are related to the damage and injury accidents, respectively. Meanwhile, the accidents resulting in fatality are only about 10% of the total accidents. Furthermore, the majority of accidents occur between 6 a.m. to 12 a.m and 12 p.m to 6 p.m on weekdays in autumn. In more than 70% of the cases, the accidents occur on daytime light and sunny weather where roads are on a tangent with no curve and good dry pavement condition, which prevent the diver from acceleration. Among all the faulty freight vehicles in the road accidents, pickup trucks and trucks are responsible for 60% and 30% of the accidents, respectively. The other freight vehicles cause 10% of the road freight accidents.

Results of Kolmogorov Smirnov Test
Initially, to select an appropriate test to evaluate the data, it is essential to ensure that there is a normal distribution of statistical data. Thus, K-S test was used to examine whether the distribution is normal. Table 1 summarizes the results of this test. The results indicate that the test is significant for all three years, namely 2016, 2017, and 2018. As a result, these variables do not have a normal distribution and nonparametric tests should be used to make deductions.

Results of Friedman Test
Friedman Test can be utilized to test the equality of variable levels rank. In this study, there are 12 independent variables for ranking the accident severity. Friedman Test is used to determine the rank of each variable. Table 2 presents the data for each variable, Chi-Square statistics range, degrees of freedom, and sig in order. As sig is less than 5%, H0 is rejected and ranking equality (priority) hypothesis of these 12 factors is not accepted. Table 3 also displays descriptive statistics which indicate the mean rank of each variable. The smaller the mean rank, the more important is the corresponding variable.  According to the obtained rankings from Table 3, it can be concluded that the most important variables affecting road freight accidents in all the three years under study are the accident point geometry (straight, horizontal curve, and intersections), road pavement status, and lighting status, respectively.

Exploratory Factor Analysis
It is inevitable to come across with a large number of variables in any study. In order to obtain more accurate analysis data as well as accomplish more scientific and at the same time practical results, researchers have always been attempting to reduce the number of variables and establish a new structure for them. Therefore, Factor Analysis is typically used to achieve this goal. Factor Analysis tries to identify basic variables or factors to explain the correlation pattern among the observed variables. Factor Analysis plays a pivotal role in identifying the hidden variables or factors utilizing the observed variables.
When performing Factor Analysis, first one should be sure whether the available data could be used for the analysis purpose. In other words, it should be determined whether the number of the intended data (sample size and the relationship between variables) is appropriate for Factor Analysis or not. Consequently, in this study the KMO index and Bartlett test were utilized to test the referred hypothesis. Table 4 displays the results for the KMO index and Bartlett test in the present study. The more the index approximates 1, the more appropriate the intended data will be for Factor Analysis. Similarly, if KMO index is smaller than 0.5, the Factor Analysis results are not appropriate for the intended data and this analysis should not be used to interpret the results. Besides, the sig value obtained from the Bartlett Test is smaller than 5% for all the cases and therefore the assumption that the correlation Matrix is known is rejected. The tables obtained from Factor Analysis consist of two parts. The first part is for special values which determine the factors that are included in the analysis. Those factors whose specific values are smaller than 1 are excluded from the analysis. In this study, the factors 1, 2, 3, 4, and 5 which their special values are larger than 1 are included in the analysis.
The second part indicates the specific values of the factors extracted through the rotation. It should be noticed that in the rotation of all the remaining factors, a proportion of the total changes, which are explained via these 5 factors, is taken as fixed (approximately 60%). Tables 5 to 7 present the specific values of road freight accidents from 2016 to 2018, respectively.   Tables 8 to 10 indicate the rotated matrix of the components from 2016 to 2018, which include the factor loads for each variable in the factors remaining after rotation. The higher the absolute value of these coefficients in each row, the more noticeable role the related factor plays in the total changes of the given variable.   Based on the Factor Analysis performed for 12 variables affecting the road freight accidents in 2016, as indicated in Table 8 Based on the Factor Analysis carried out for 12 variables affecting the road freight accidents in 2017, as indicated in Table 9, 5 factors are identified as major factors. Factor Analysis shows that the first important category of factors include variables related to the road pavement status and weather conditions; while, variables like time of the accident and lighting status are the second most important category of factors; variables like accident point geometry, nonfaulty vehicle, and the main cause are the third most important factor; faulty vehicle as the fourth most important factor; and the season of accident, weekdays, type of highway, and age of the driver are ranked as the fifth most important category of factors affecting the severity of road freight accidents in 2017.
Based on the Factor Analysis conducted for 12 variables affecting the road freight accidents in 2018, as presented in Table 10, five factors are identified as major factors. Factor Analysis indicates that factors like season of the accident, road pavement status and weather conditions are ranked as the first most important factor; while variables like time of the accident, lighting status, and main cause as the second most important factor, type of road and nonfaulty vehicle as the third; faulty vehicle as the fourth; and weekdays and the age of driver as the fifth most important factor affecting the severity of road freight accidents in 2018.

Road Freight Accidents Severity Model
To establish a model for road freight accident severity, 12 independent variables and 1 dependent variable were defined. Afterward, they were converted into nominal variables (0 and 1) to be aptly used in SPSS. The dependent variable, i.e. accident severity, was defined as injury, fatality, and damage accidents. As there were a small number of fatal accidents, such accidents were categorized as injury accidents. Ultimately, the number of dependent and independent variables was narrowed down into 2 and 12 respectively. The variables include the type of the faulty vehicle, type of the road function, the main cause, type of collision, age of the faulty driver, time of the accident, road pavement status, etc. The enter, backward, and forward methods can be utilized to establish the Logit model. Now, it should be noticed which of the above-mentioned methods contributes to more appropriate output. In other words, one should consider which of the above-mentioned methods can present a better model for road freight accidents in Gilan Province.
To identify such a significant issue, the correct percentage and goodness criteria for the fit model were considered to identify the fitness of the model. The goodness criterion of the fit model is indicated by R 2 parameter. This parameter shows the percentage of the changes in a given dependent variable determined through Logit independent variables. Also, the correct percentage criterion determines to what extent the model prediction is correct. In other words, these two criteria are used to make comparisons between the models and identify a better model. It is performed in such a way that the more the R 2 value approaches unity, a better fit model is established. Similarly, a higher correct percentage value of a given model, indicates a more powerful model in predicting the accidents.
It should be noticed that as in the first method (enter) all the variables simultaneously are input into the equation, the model lacks enough time to appropriately process the data and select the most significant variables; hence, it cannot be an appropriate method. Because of this, the forward and backward methods are used to input the data into Logit equation. Each method which has the highest accuracy in predicting the number of accidents could be adopted as the best method. In the backward and forward methods, respectively, those variables exit and/or enter where by their exit or entrance, the minimum change would occur in the value of R 2 corresponding to the equation. Likewise, exit or entrance of the variable leads to improvement or in other words increase of R 2 value. This method helps us with choosing the way of entering the independent variables to be analyzed. Applying various methods provides us with establishing different equations with the same data and ultimately selecting the best equation. Table 11 summarizes Logit models in two forward and backward methods. The determiner of the best model is its degree of accuracy when making predictions. Accordingly, the backward method with its higher degree of correct percentage in all the cases is selected as the best method to establish Logit model for the severity of road freight accidents. As it was explicated, the present backward method model was selected owing to its high degree of accuracy in predicting accidents. Hence, this chapter is devoted to introducing the best model. The Chi-Square statistic is used to determine the effectiveness of dependent variables on independent variables and the fitness of the overall model, which is comparable with F statistics in ordinary regression analysis. Tables 12 to 14 present the backward method model coefficients for the vehicle accidents in 2016, 2017, and 2018, respectively. According to these Tables, the Chi-Square model shows whether the independent variable(s) affects the dependent variable or not. As it is observed, in all the models the Chi-Square values have zero Sig. Therefore, the independent variables affect the dependent variable and indicate a high degree of fitness. Based on Tables 15 to 17, in 2016, variables like autumn, daylight time, the drivers aging 18 to 60, pickup truck, and not following safety guidelines in case of freighting goods were the most effective variables on the severity of road freight accidents. Whereas, in 2017, such variables like rural, trucks and trailers, non-faulty private vehicles, motors, pedestrians, and no following safety guidelines were the most effective variables. When considering the most effective variables in 2018, it was determined that the variables of accidents occurring between 12 p.m to 6 p.m, rural, highways, trailers, non-faulty motorcycles, disregarding front are the main causes, and not following safety guidelines when freighting are the effective variables on road freight accidents.   The most noticeable points obtained from the type of accidents over the years in this case study reveal the need for special and more attention of the authorities responsible for road accidents. These include the police and ministry of road and city planning, rural planners, trailers, motorists, as well as drivers' following safety guidelines while freighting loads, all of these variables have been effective on the road freight accidents over the years 2016, 2017, and 2018.

Conclusion
Through probing into the data concerning the road freight accidents, the present article focused on analyzing the effects of various variables on the severity of accidents from 2016 to 2018. Based on the results of the frequency of variables, it was determined that more than 50% and 40% of road freight accidents were related to the damage and injury accidents, respectively. The accidents leading to fatality just constituted 10% of the total number of accidents. The majority of accidents occur between 6 a.m to 12 p.m and 12 p.m to 6 p.m on weekdays in autumn. In more than 70% of the cases, road pavement is dry, the weather is sunny, the route is straight, and it is daytime. Among the faulty freight vehicles in road accidents, the pickup trucks and trucks are engaged in 60% and 30% of the accidents, respectively. Other freight vehicles account for 10% of road the freight accidents. Furthermore, according to the obtained ranking, variables of accident point geometry (tangent, horizontal curve, and intersections), road pavement status, and lighting status have the highest influence on the road freight accident severity.
By performing the Factor Analysis method, 5 most effective factors on road accidents in 2016, 2017, and 2018 were identified. Moreover, the output of the results obtained from Logit model indicates that the severity of accidents increases as the result of trailers and motorcycles presence. Moreover, regarding freighting goods in rural areas, some special measures should be taken. Unfortunately, the main cause of road freight accidents is not following safety guidelines by the drivers of freighting vehicles. Hence, some strict measures should be adopted as well as more effective fining strategies should be applied for such drivers.