| Literature DB >> 35844400 |
Muhammad Yasir1,2, Asad Mustafa Karim3, Sumera Kausar Malik3, Amal A Bajaffer1, Esam I Azhar1,2.
Abstract
The lowest concentration of an antimicrobial agent that can inhibit the visible growth of a microorganism after overnight incubation is called as minimum inhibitory concentration (MIC) and the drug prescriptions are made on the basis of MIC data to ensure successful treatment outcomes. Therefore, reliable antimicrobial susceptibility data is crucial, and it will help clinicians about which drug to prescribe. Although few prediction studies based on strategies have been conducted, however, no single machine learning (ML) modelling has been carried out to predict MICs in N. gonorrhoeae. In this study, we propose a ML based approach that can predict MICs of a specific antibiotic using unitigs sequences data. We retrieved N. gonorrhoeae genomes from European Nucleotide Archive and NCBI and analysed them combined with their respective MIC data for cefixime, ciprofloxacin, and azithromycin and then we constructed unitigs by using de Brujin graphs. We built and compared 35 different ML regression models to predict MICs. Our results demonstrate that RandomForest and CATBoost models showed best performance in predicting MICs of the three antibiotics. The coefficient of determination, R2, (a statistical measure of how well the regression predictions approximate the real data points) for cefixime, ciprofloxacin, and azithromycin was 0.75787, 0.77241, and 0.79009 respectively using RandomForest. For CATBoost model, the R2 value was 0.74570, 0.77393, and 0.79317 for cefixime, ciprofloxacin, and azithromycin respectively. Lastly, using feature importance, we explore the important genomic regions identified by the models for predicting MICs. The major mutations which are responsible for resistance against these three antibiotics were chosen by ML models as a top feature in case of each antibiotics. CATBoost, DecisionTree, GradientBoosting, and RandomForest regression models chose the same unitigs which are responsible for resistance. This unitigs-based strategy for developing models for MIC prediction, clinical diagnostics, and surveillance can be applicable for other critical bacterial pathogens.Entities:
Keywords: Antimicrobial resistance; Machine learning; Minimum inhibitory concentration; Neisseria gonorrhoeae
Year: 2022 PMID: 35844400 PMCID: PMC9280306 DOI: 10.1016/j.sjbs.2022.02.047
Source DB: PubMed Journal: Saudi J Biol Sci ISSN: 2213-7106 Impact factor: 4.052
Metrics comparison of 35 different ML regression models. Models were compared on the basis of their performance to predict MIC values of three antibiotics on test datasets.
| Names of models used | Ciprofloxacin | Cefixime | Azithromycin | ||||
|---|---|---|---|---|---|---|---|
| RMSE | RMSE | RMSE | |||||
| 1 | ADABoostRegressor | 13.61701 | 0.12067 | 0.04143 | 0.66328 | 6.28431 | 0.14474 |
| 2 | ADARegressor | 9.58793 | 0.62305 | 0.04086 | 0.76301 | 1.61560 | 0.78313 |
| 3 | BaggingRegressor | 6.10417 | 0.69089 | 0.04038 | 0.67140 | 1.37854 | 0.7369 |
| 4 | BayesianRidge | 6.06775 | 0.69462 | 0.03931 | 0.68590 | 1.54538 | 0.70676 |
| 5 | CATBoostRegressor | 3.87531 | 0.77393 | 0.03807 | 0.74570 | 1.40781 | 0.79317 |
| 6 | DecisionTreeRegressor | 7.82284 | 0.54359 | 0.04396 | 0.61863 | 1.91145 | 0.66770 |
| 7 | DummyRegressor | 9.12458 | 0.31455 | 0.07010 | 1.2696e-32 | 2.84645 | −0.00093 |
| 8 | ElasticNet | 7.39072 | 0.58821 | 0.07011 | 1.8281e-19 | 2.72904 | 0.65383 |
| 9 | ElasticNetCV | 0.03914 | 0.68890 | 0.68433 | 0.68433 | 1.51770 | 0.71596 |
| 10 | ExtraTreeRegressor | 8.01472 | 0.53616 | 0.04531 | 0.59814 | 1.81783 | 0.68063 |
| 11 | ExtraTreesRegressor | 0.04073 | 0.66861 | 0.04373 | 0.62071 | 1.75033 | 0.70090 |
| 12 | GaussianProcessRegressor | 0.04942 | 0.68436 | 0.04259 | 0.65792 | 1.99184 | 0.54625 |
| 13 | GradientBoostingRegressor | 6.02614 | 0.70108 | 0.03914 | 0.68890 | 1.44105 | 0.74478 |
| 14 | HistGradientBoostingRegressor | 5.71240 | 0.73384 | 0.03816 | 0.70486 | 1.35686 | 0.77633 |
| 15 | HuberRegressor | 6.68917 | 0.65186 | 0.04073 | 0.66861 | 1.59841 | 0.68702 |
| 16 | KNeighborsRegressor | 0.08358 | 0.63807 | 3.49214 | 0.63424 | 1.21033 | 0.71268 |
| 17 | KernelRidge | 9.60091 | 0.46778 | 0.03990 | 0.67779 | 1.62867 | 0.68134 |
| 18 | LarsCV | 9.51242 | 0.45764 | 0.04045 | 0.66715 | 1.55067 | 0.70463 |
| 19 | Lasso | 7.58347 | 0.57166 | 0.07010 | 1.883e-15 | 2.84645 | 0.73652 |
| 20 | LassoCV | 8.64752 | 0.54124 | 0.03942 | 0.68436 | 1.50917 | 0.71937 |
| 21 | LassoLarsCV | 0.03989 | 0.67792 | 0.04104 | 0.65787 | 1.52546 | 0.71292 |
| 22 | LassoLarsIC | 6.72328 | 0.63309 | 0.04039 | 0.66812 | 1.53725 | 0.70890 |
| 23 | LinearRegression | 8.41567 | 0.54621 | 0.34524 | 0.01717 | 1.76178 | 0.64022 |
| 24 | MLPRegressor | 6.61081 | 0.68461 | 0.05356 | 0.53807 | 1.57986 | 0.73443 |
| 25 | NuSVR | 7.65251 | 0.58619 | 0.03863 | 0.69966 | 2.20857 | 0.66841 |
| 26 | OrthogonalMatchingPursuit | 11.75760 | 0.31796 | 0.03991 | 0.67679 | 1.55638 | 0.70262 |
| 27 | OrthogonalMatchingPursuitCV | 6.73327 | 0.62534 | 0.04989 | 0.66592 | 1.54569 | 0.70547 |
| 28 | PoissonRegressor | 6.10017 | 0.69149 | 0.06490 | 0.58745 | 2.30000 | 0.46240 |
| 29 | RandomForestRegressor | 2.69214 | 0.77241 | 0.04104 | 0.75787 | 1.33418 | 0.79009 |
| 30 | Ridge | 9.60075 | 0.46806 | 0.03989 | 0.67792 | 1.62883 | 0.68132 |
| 31 | RidgeCV | 6.79794 | 0.64527 | 0.03931 | 0.68599 | 1.53935 | 0.70838 |
| 32 | SGDRegressor | 4.29214 | 0.68241 | 0.04483 | 0.67187 | 1.82504 | 0.70766 |
| 33 | SVR | 7.71597 | 0.58373 | 0.07486 | 0.57187 | 2.21276 | 0.66886 |
| 34 | XGBoostRegressor | 6.01090 | 0.70446 | 0.03859 | 1.44787 | 0.76031 | |
| 35 | XGBoostRFRegressor | 5.81138 | 0.72186 | 0.44421 | 0.68538 | 1.47658 | 0.73711 |
RMSE: root mean square error, R2: Coefficient of determination, NuSVR: Nu Support Vector Regression, SVR: Support Vector Regression.
Fig. 1Graph shows machine-learning based predicted and true MIC values of azithromycin in the dataset for CATBoost model. Blue dots show the true MIC values of azithromycin antibiotic for N. gonorrhoeae, while orange dots show the predicted MIC values by the CATBoost machine learning-based regression model.
Fig. 2Most important features (unitigs) selected by different machine learning-based models for MIC predictions of Azithromycin. Figure shows the feature importance results selected by ADABoostRegressor model.
Fig. 3Performance of ML models during each of the 10 folds of cross validation. The results are plotted as box and whisker plots for the best performing models for each of the three antibiotics. The minimum variation is shown by BR model while the largest variation is shown by GB model for azithromycin (A). In case of ciprofloxacin and cefixime, the minimum variation was shown by GB (B) and HGB (C) respectively. Orange lines within the box show median MSE values. LGBM: Light Gradient Boosted Machine, RF: Random Forest, CATB: CATBoost, HGB: Hits Gradient Boosting, GB: Gradient Boosting, BR: Bagging Regressor, XGB: XGBoost.