Optimized Feature Selection for Predicting the Number of Casualties in Traffic Crashes

Traffic Crash Analysis Feature Selection Machine Learning Traffic Safety Predictive Analytics.

Authors

  • Muamer Abuzwidah
    mabuzwidah@sharjah.ac.ae
    Department of Civil and Environmental Engineering, College of Engineering, University of Sharjah, Sharjah,, United Arab Emirates https://orcid.org/0000-0002-5605-3609
  • Ahmed Elawady Department of Civil and Environmental Engineering, College of Engineering, University of Sharjah, Sharjah,, United Arab Emirates
  • Jaeyoung Jay Lee 2) School of Traffic and Transportation Engineering, Central South University, Changsha, Hunan 410075, China. 3) Department of Civil, Environmental & Construction Engineering, University of Central Florida, United States. 4) Queensland University of Technology, School of Civil and Environmental Engineering, Australia.
  • Ghazi G. Al-Khateeb Department of Civil and Environmental Engineering, College of Engineering, University of Sharjah, Sharjah,, United Arab Emirates
  • Salah Haridy 5) Department of Industrial Engineering and Engineering Management, University of Sharjah, Sharjah, United Arab Emirates. 6) Benha Faculty of Engineering, Benha University, Benha, Egypt.
  • Waleed Zeiada 1) Department of Civil and Environmental Engineering, College of Engineering, University of Sharjah, Sharjah, United Arab Emirates. 7) Department of Public Works Engineering, Mansoura University, Mansoura 35516, Egypt.

Downloads

Traffic crash prediction remains a critical challenge in transportation safety management, with increasing emphasis on leveraging machine learning techniques for accurate casualty prediction. This study aims to develop an optimized feature selection framework for traffic crash casualty prediction by comparing six selection techniques: Design of Experiments (DOE), Forward and Backward Sequential Feature Selection, Information Gain, Lasso Regularization, and Random Forest (RF) Feature Importance, with subsequent integration using the Borda count method. By analyzing 517,000 UK traffic crash records (2019-2023), 25 machine learning models (linear models, decision trees, ensemble methods, and neural networks) were evaluated across 12 critical attributes. Results demonstrate eXtreme Gradient Boosting (XGBoost)'s superior performance with a Root Mean Square Error (RMSE) of 0.671 and Mean Absolute Error (MAE) of 0.372 using the proposed Borda count integration method while maintaining efficient computation time (11.3 minutes compared to the baseline's 17 minutes). Five factors consistently emerged as the most influential predictors across all selection methods: number of vehicles involved, speed limit, police officer attendance, day of the week, and urban/rural classification, while environmental factors showed lower importance than traditionally assumed. The novel integration of multiple feature selection techniques through Borda count provides a more robust feature subset than any individual method, offering an optimal balance between computational efficiency and prediction accuracy. The framework enables transportation safety authorities to implement more efficient crash prediction systems while providing actionable insights about key risk factors for targeted interventions, especially to support the Highway Safety Manual development.

 

Doi: 10.28991/CEJ-2025-011-04-01

Full Text: PDF