Predicting Mortality Rates of Foodborne Bacteria Using Machine Learning: A Comparative Study of Regression Models

Authors

  • DreShawn Bradford Plant Genomics and Bioinformatics Lab, Department of Biological and Forensic Sciences, Fayetteville State University, Fayetteville, NC 28301, USA
  • My Abdelmajid Kassem Plant Genomics and Bioinformatics Lab, Department of Biological and Forensic Sciences, Fayetteville State University, Fayetteville, NC 28301, USA https://orcid.org/0000-0003-3478-0327

DOI:

https://doi.org/10.5147/jaimlb.vi.258

Keywords:

Foodborne bacteria, Machine learning (ML), Mortality prediction, Antimicrobial resistance, Gradient Boosting Regressor (GBR), Random Forest (RF), SHAP analysis, Public health

Abstract

Foodborne bacterial infections remain a major public health concern, contributing to significant morbidity and mortality worldwide. Understanding the genomic and epidemiological factors that influence bacterial mortality rates is crucial for developing effective risk assessment strategies. In this study, we applied machine learning (ML) models to predict mortality rates of 50 foodborne bacterial species using genomic, virulence, antimicrobial resistance (AMR), and epidemiological features. Five regression models were evaluated: Linear Regression (LR), Random Forest (RF), Gradient Boosting Regressor (GBR), Support Vector Regressor (SVR), and K-Nearest Neighbors (KNN). Our results indicate that ensemble models (RF, GBR) outperform traditional linear regression in capturing the complex relationships between bacterial features and mortality rates. Feature importance analysis revealed that annual reported cases worldwide, genome size, GC content, and virulence gene count are the strongest predictors of mortality. Interestingly, AMR gene count had a lower-than-expected impact, suggesting that antibiotic resistance alone does not strongly determine mortality outcomes. SHapley Additive exPlanation (SHAP) analysis confirmed the significance of genomic and epidemiological factors in shaping model predictions. However, all models exhibited low R² scores and high Mean Absolute Error (MAE), indicating room for improvement. Residual analysis suggests that outliers and data variability may be limiting model performance. Future research should explore larger datasets, feature engineering, and advanced deep learning approaches to enhance predictive accuracy. Despite these limitations, this study demonstrates the potential of ML in quantifying bacterial pathogenicity and informing food safety and public health decision-making.

Downloads

Published

06/06/2025 — Updated on 07/04/2025

Versions

Issue

Section

Articles

How to Cite

Predicting Mortality Rates of Foodborne Bacteria Using Machine Learning: A Comparative Study of Regression Models. (2025). Journal of Artificial Intelligence, Machine Learning, and Bioinformatics, 40-46. https://doi.org/10.5147/jaimlb.vi.258 (Original work published 2025)