A Comparative Study of a Series of Supervised Learning Models for Motorcycle Crash Injury Severity Prediction
Downloads
Motorcycle crashes pose a major public health challenge in Thailand, where motorcyclists account for most traffic fatalities. This study aims to evaluate and compare the predictive performance of four supervised learning models—Decision Tree (DT), K-Nearest Neighbor (KNN), Naïve Bayes (NB), and Random Forest (RF)—for motorcycle crash injury severity using data from the Highway Accident Information Management System (2020–2022). After preprocessing, 36 explanatory variables covering roadway, environmental, accident causes, crash characteristics, and vehicle involvement were analyzed. To address class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) and cost-sensitive learning were applied, and models were validated using train–test splits with cross-validation. The Random Forest model achieved the best performance with an AUC of 0.726, balanced accuracy of 0.649, and Matthews Correlation Coefficient (MCC) of 0.308, outperforming the other algorithms. SHapley Additive exPlanations (SHAP) were used to interpret the RF model, identifying nighttime crashes, large truck involvement, and roadway features (e.g., depressed medians and two-lane roads) as key predictors of severe outcomes. These insights suggest countermeasures such as improving nighttime safety, dedicating truck lanes, and designing safer medians. The novelty of this study lies in integrating model comparison, imbalance-aware metrics, and SHAP interpretability to provide actionable, context-specific policy recommendations for motorcycle safety in Thailand.
Downloads
[1] WHO. (2023). Road traffic injuries. World Health Organization (WHO), Geneva, Switzerland. Available online: https://www.who.int/news-room/fact-sheets/detail/road-traffic-injuries (accessed on September 2025).
[2] Santos, K., Dias, J. P., & Amado, C. (2022). A literature review of machine learning algorithms for crash injury severity prediction. Journal of Safety Research, 80, 254–269. doi:10.1016/j.jsr.2021.12.007.
[3] Chan, J. Y. Le, Leow, S. M. H., Bea, K. T., Cheng, W. K., Phoong, S. W., Hong, Z. W., & Chen, Y. L. (2022). Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review. Mathematics, 10(8), 1283. doi:10.3390/math10081283.
[4] Mohamad, I., Jomnonkwao, S., & Ratanavaraha, V. (2022). Using a decision tree to compare rural versus highway motorcycle fatalities in Thailand. Case Studies on Transport Policy, 10(4), 2165–2174. doi:10.1016/j.cstp.2022.09.016.
[5] Sahu, S., Maram, B., Gampala, V., & Daniya, T. (2023). Analysis of Road Accidents Prediction and Interpretation Using KNN Classification Model. Emerging Technologies in Data Mining and Information Security. Lecture Notes in Networks and Systems, vol 490. Springer, Singapore. doi:10.1007/978-981-19-4052-1_18.
[6] Yahaya, M., Jiang, X., Fu, C., Bashir, K., & Fan, W. (2019). Enhancing Crash Injury Severity Prediction on Imbalanced Crash Data by Sampling Technique with Variable Selection. 2019 IEEE Intelligent Transportation Systems Conference, ITSC 2019, 363–368. doi:10.1109/ITSC.2019.8917223.
[7] Scarano, A., Rella Riccardi, M., Mauriello, F., D’Agostino, C., Pasquino, N., & Montella, A. (2023). Injury severity prediction of cyclist crashes using random forests and random parameters logit models. Accident Analysis & Prevention, 192, 107275. doi:10.1016/j.aap.2023.107275.
[8] Yan, M., & Shen, Y. (2022). Traffic Accident Severity Prediction Based on Random Forest. Sustainability (Switzerland), 14(3), 1729. doi:10.3390/su14031729.
[9] Yang, J., Han, S., & Chen, Y. (2023). Prediction of Traffic Accident Severity Based on Random Forest. Journal of Advanced Transportation, 2023. doi:10.1155/2023/7641472.
[10] Wahab, L., & Jiang, H. (2020). Severity prediction of motorcycle crashes with machine learning methods. International Journal of Crashworthiness, 25(5), 485–492. doi:10.1080/13588265.2019.1616885.
[11] Wahab, L., & Jiang, H. (2019). A comparative study on machine learning based algorithms for prediction of motorcycle crash severity. PLoS ONE, 14(4), 214966. doi:10.1371/journal.pone.0214966.
[12] Rezapour, M., Farid, A., Nazneen, S., & Ksaibati, K. (2021). Using machine leaning techniques for evaluation of motorcycle injury severity. IATSS Research, 45(3), 277–285. doi:10.1016/j.iatssr.2020.07.004.
[13] Rezapour, M., Nazneen, S., & Ksaibati, K. (2020). Application of deep learning techniques in predicting motorcycle crash severity. Engineering Reports, 2(7), 12175. doi:10.1002/eng2.12175.
[14] Rezapour, M., Mehrara Molan, A., & Ksaibati, K. (2020). Analyzing injury severity of motorcycle at-fault crashes using machine learning techniques, decision tree and logistic regression models. International Journal of Transportation Science and Technology, 9(2), 89–99. doi:10.1016/j.ijtst.2019.10.002.
[15] Mansoor, U., Jamal, A., Su, J., Sze, N. N., & Chen, A. (2023). Investigating the risk factors of motorcycle crash injury severity in Pakistan: Insights and policy recommendations. Transport Policy, 139, 21–38. doi:10.1016/j.tranpol.2023.05.013.
[16] Kashifi, M. T. (2023). Investigating two-wheelers risk factors for severe crashes using an interpretable machine learning approach and SHAP analysis. IATSS Research, 47(3), 357–371. doi:10.1016/j.iatssr.2023.07.005.
[17] Santos, K., Firme, B., Dias, J. P., & Amado, C. (2024). Analysis of Motorcycle Accident Injury Severity and Performance Comparison of Machine Learning Algorithms. Transportation Research Record, 2678(1), 736–748. doi:10.1177/03611981231172507.
[18] Ali, Y., Hussain, F., & Haque, M. M. (2024). Advances, challenges, and future research needs in machine learning-based crash prediction models: A systematic review. Accident Analysis & Prevention, 194, 107378. doi:10.1016/j.aap.2023.107378.
[19] Rezapour, M., & Ksaibati, K. (2020). Application of various machine learning architectures for crash prediction, considering different depths and processing layers. Engineering Reports, 2(8), 12215. doi:10.1002/eng2.12215.
[20] Sum, S., Se, C., Champahom, T., Jomnonkwao, S., Sinha, S., & Ratanavaraha, V. (2025). A random forest and SHAP-based analysis of motorcycle crash severity in Thailand: Urban-rural and day-night perspectives. Transportation Engineering, 21, 100369. doi:10.1016/j.treng.2025.100369.
[21] Agheli, A., & Aghabayk, K. (2025). How does distraction affect cyclists’ severe crashes? A hybrid CatBoost-SHAP and random parameters binary logit approach. Accident Analysis & Prevention, 211, 107896. doi:10.1016/j.aap.2024.107896.
[22] Sadeghi, M., Aghabayk, K., & Quddus, M. (2024). A hybrid Machine learning and statistical modeling approach for analyzing the crash severity of mobility scooter users considering temporal instability. Accident Analysis & Prevention, 206, 107696. doi:10.1016/j.aap.2024.107696.
[23] Sambasivam, G., Amudhavel, J., & Sathya, G. (2020). A Predictive Performance Analysis of Vitamin D Deficiency Severity Using Machine Learning Methods. IEEE Access, 8, 109492–109507. doi:10.1109/ACCESS.2020.3002191.
[24] Domingos, P., & Pazzani, M. (1997). On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Machine Learning, 29(2–3), 103–130. doi:10.1023/a:1007413511361.
[25] Kazmierska, J., & Malicki, J. (2008). Application of the Naïve Bayesian Classifier to optimize treatment decisions. Radiotherapy and Oncology, 86(2), 211–216. doi:10.1016/j.radonc.2007.10.019.
[26] Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. doi:10.1023/a:1010933404324.
[27] Chen, J., Li, K., Tang, Z., Bilal, K., Yu, S., Weng, C., & Li, K. (2017). A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment. IEEE Transactions on Parallel and Distributed Systems, 28(4), 919–933. doi:10.1109/TPDS.2016.2603511.
[28] Sonnatthanon, N., & Choocharukul, K. (2025). Crash severity prediction using a virtual geometry-group-based deep learning approach with images-based feature representation. Results in Engineering, 27, 106155. doi:10.1016/j.rineng.2025.106155.
[29] Mohsin, A. S. M., Choudhury, S. H., & Muyeed, M. A. (2025). Automatic priority analysis of emergency response systems using internet of things (IoT) and machine learning (ML). Transportation Engineering, 19, 100304. doi:10.1016/j.treng.2025.100304.
[30] Acı, Ç. İ., Mutlu, G., Ozen, M., & Acı, M. (2025). Enhanced Multi-Class Driver Injury Severity Prediction Using a Hybrid Deep Learning and Random Forest Approach. Applied Sciences (Switzerland), 15(3), 1586. doi:10.3390/app15031586.
[31] Dia, Y., Faty, L., Sarr, M. D., Sall, O., Bousso, M., & Landu, T. T. (2022). Study of Supervised Learning Algorithms for the Prediction of Road Accident Severity in Senegal. 2022 7th International Conference on Computational Intelligence and Applications (ICCIA), 123–127. doi:10.1109/iccia55271.2022.9828434.
[32] Dong, S., Khattak, A., Ullah, I., Zhou, J., & Hussain, A. (2022). Predicting and Analyzing Road Traffic Injury Severity Using Boosting-Based Ensemble Learning Models with SHAPley Additive exPlanations. International Journal of Environmental Research and Public Health, 19(5), 2925. doi:10.3390/ijerph19052925.
[33] Ijaz, M., lan, L., Zahid, M., & Jamal, A. (2021). A comparative study of machine learning classifiers for injury severity prediction of crashes involving three-wheeled motorized rickshaw. Accident Analysis & Prevention, 154, 106094. doi:10.1016/j.aap.2021.106094.
[34] Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December, 2017, Long Beach, United States.
[35] Štrumbelj, E., & Kononenko, I. (2014). Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, 41(3), 647–665. doi:10.1007/s10115-013-0679-x.
[36] Tahfim, S. A. S., & Yan, C. (2021). Analysis of severe injuries in crashes involving large trucks using K-prototypes clustering-based GBDT model. Safety, 7(2), 32. doi:10.3390/safety7020032.
[37] Chang, F., Yasmin, S., Huang, H., Chan, A. H. S., & Haque, M. M. (2021). Injury severity analysis of motorcycle crashes: A comparison of latent class clustering and latent segmentation based models with unobserved heterogeneity. Analytic Methods in Accident Research, 32, 100188. doi:10.1016/j.amar.2021.100188.
[38] Marcoux, R., Yasmin, S., Eluru, N., & Rahman, M. (2018). Evaluating temporal variability of exogenous variable impacts over 25 years: An application of scaled generalized ordered logit model for driver injury severity. Analytic Methods in Accident Research, 20, 15–29. doi:10.1016/j.amar.2018.09.001.
[39] Kanitpong, K., Jensupakarn, A., Dabsomsri, P., & Issalakul, K. (2024). Characteristics of motorcycle crashes in Thailand and factors affecting crash severity: Evidence from in-depth crash investigation. Transportation Engineering, 16, 100227. doi:10.1016/j.treng.2024.100227.
[40] Laphrom, W., Se, C., Champahom, T., Jomnonkwao, S., Wipulanusat, W., Satiennam, T., & Ratanavaraha, V. (2024). XGBoost-SHAP and Unobserved Heterogeneity Modelling of Temporal Multivehicle Truck-Involved Crash Severity Patterns. Civil Engineering Journal (Iran), 10(6), 1890–1908. doi:10.28991/CEJ-2024-010-06-011.
[41] Agyemang, W., Adanu, E. K., & Jones, S. (2021). Understanding the Factors That Are Associated with Motorcycle Crash Severity in Rural and Urban Areas of Ghana. Journal of Advanced Transportation, 2021, 6336517. doi:10.1155/2021/6336517.
[42] Se, C., Champahom, T., Jomnonkwao, S., Wisutwattanasak, P., Laphrom, W., & Ratanavaraha, V. (2023). Temporal Instability and Transferability Analysis of Daytime and Nighttime Motorcyclist-Injury Severities Considering Unobserved Heterogeneity of Data. Sustainability (Switzerland), 15(5), 4486. doi:10.3390/su15054486.
[43] Se, C., Champahom, T., Jomnonkwao, S., Chaimuang, P., & Ratanavaraha, V. (2021). Empirical comparison of the effects of urban and rural crashes on motorcyclist injury severities: A correlated random parameters ordered probit approach with heterogeneity in means. Accident Analysis & Prevention, 161, 106352. doi:10.1016/j.aap.2021.106352.
[44] Champahom, T., Wisutwattanasak, P., Chanpariyavatevong, K., Laddawan, N., Jomnonkwao, S., & Ratanavaraha, V. (2022). Factors affecting severity of motorcycle accidents on Thailand’s arterial roads: Multiple correspondence analysis and ordered logistics regression approaches. IATSS Research, 46(1), 101–111. doi:10.1016/j.iatssr.2021.10.006.
[45] Chang, F., Li, M., Xu, P., Zhou, H., Haque, M. M., & Huang, H. (2016). Injury severity of motorcycle riders involved in traffic crashes in Hunan, China: A mixed ordered logit approach. International Journal of Environmental Research and Public Health, 13(7), 714. doi:10.3390/ijerph13070714.
[46] Prentkovskis, O., Sokolovskij, E., & Bartulis, V. (2010). Investigating Traffic Accidents: A Collision of Two Motor Vehicles. Transport, 25(2), 105–115. doi:10.3846/transport.2010.14.
[47] Huang, H., Chin, H. C., & Haque, M. M. (2008). Severity of driver injury and vehicle damage in traffic crashes at intersections: A Bayesian hierarchical analysis. Accident Analysis & Prevention, 40(1), 45–54. doi:10.1016/j.aap.2007.04.002.
[48] Zhou, M., & Chin, H. C. (2019). Factors affecting the injury severity of out-of-control single-vehicle crashes in Singapore. Accident Analysis & Prevention, 124, 104–112. doi:10.1016/j.aap.2019.01.009.
[49] Ghasemzadeh, A., & Ahmed, M. M. (2019). Exploring factors contributing to injury severity at work zones considering adverse weather conditions. IATSS Research, 43(3), 131–138. doi:10.1016/j.iatssr.2018.11.002.
- Authors retain all copyrights. It is noticeable that authors will not be forced to sign any copyright transfer agreements.
- This work (including HTML and PDF Files) is licensed under a Creative Commons Attribution 4.0 International License.![]()














