Literature DB >> 33659219

Learning From Limited Data: Towards Best Practice Techniques for Antimicrobial Resistance Prediction From Whole Genome Sequencing Data.

Lukas Lüftinger1,2, Peter Májek1, Stephan Beisken1, Thomas Rattei2, Andreas E Posch1.   

Abstract

Antimicrobial resistance prediction from whole genome sequencing data (WGS) is an emerging application of machine learning, promising to improve antimicrobial resistance surveillance and outbreak monitoring. Despite significant reductions in sequencing cost, the availability and sampling diversity of WGS data with matched antimicrobial susceptibility testing (AST) profiles required for training of WGS-AST prediction models remains limited. Best practice machine learning techniques are required to ensure trained models generalize to independent data for optimal predictive performance. Limited data restricts the choice of machine learning training and evaluation methods and can result in overestimation of model performance. We demonstrate that the widely used random k-fold cross-validation method is ill-suited for application to small bacterial genomics datasets and offer an alternative cross-validation method based on genomic distance. We benchmarked three machine learning architectures previously applied to the WGS-AST problem on a set of 8,704 genome assemblies from five clinically relevant pathogens across 77 species-compound combinations collated from public databases. We show that individual models can be effectively ensembled to improve model performance. By combining models via stacked generalization with cross-validation, a model ensembling technique suitable for small datasets, we improved average sensitivity and specificity of individual models by 1.77% and 3.20%, respectively. Furthermore, stacked models exhibited improved robustness and were thus less prone to outlier performance drops than individual component models. In this study, we highlight best practice techniques for antimicrobial resistance prediction from WGS data and introduce the combination of genome distance aware cross-validation and stacked generalization for robust and accurate WGS-AST.
Copyright © 2021 Lüftinger, Májek, Beisken, Rattei and Posch.

Entities:  

Keywords:  antibiotics; antimicrobial resistance; genomics; machine learning; whole genome sequencing (WGS)

Mesh:

Substances:

Year:  2021        PMID: 33659219      PMCID: PMC7917081          DOI: 10.3389/fcimb.2021.610348

Source DB:  PubMed          Journal:  Front Cell Infect Microbiol        ISSN: 2235-2988            Impact factor:   5.293


  33 in total

1.  Database resources of the National Center for Biotechnology Information.

Authors:  Eric W Sayers; Jeff Beck; J Rodney Brister; Evan E Bolton; Kathi Canese; Donald C Comeau; Kathryn Funk; Anne Ketter; Sunghwan Kim; Avi Kimchi; Paul A Kitts; Anatoliy Kuznetsov; Stacy Lathrop; Zhiyong Lu; Kelly McGarvey; Thomas L Madden; Terence D Murphy; Nuala O'Leary; Lon Phan; Valerie A Schneider; Françoise Thibaud-Nissen; Bart W Trawick; Kim D Pruitt; James Ostell
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

Review 2.  Mechanisms of resistance to quinolones.

Authors:  George A Jacoby
Journal:  Clin Infect Dis       Date:  2005-07-15       Impact factor: 9.079

3.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

4.  KMC 3: counting and manipulating k-mer statistics.

Authors:  Marek Kokot; Maciej Dlugosz; Sebastian Deorowicz
Journal:  Bioinformatics       Date:  2017-09-01       Impact factor: 6.937

5.  A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.

Authors:  Shayan Tabe-Bordbar; Amin Emad; Sihai Dave Zhao; Saurabh Sinha
Journal:  Sci Rep       Date:  2018-04-26       Impact factor: 4.379

6.  ProteinNet: a standardized data set for machine learning of protein structure.

Authors:  Mohammed AlQuraishi
Journal:  BMC Bioinformatics       Date:  2019-06-11       Impact factor: 3.169

7.  Antimicrobial Resistance Prediction in PATRIC and RAST.

Authors:  James J Davis; Sébastien Boisvert; Thomas Brettin; Ronald W Kenyon; Chunhong Mao; Robert Olson; Ross Overbeek; John Santerre; Maulik Shukla; Alice R Wattam; Rebecca Will; Fangfang Xia; Rick Stevens
Journal:  Sci Rep       Date:  2016-06-14       Impact factor: 4.379

8.  Species Identification and Antibiotic Resistance Prediction by Analysis of Whole-Genome Sequence Data by Use of ARESdb: an Analysis of Isolates from the Unyvero Lower Respiratory Tract Infection Trial.

Authors:  Ines Ferreira; Stephan Beisken; Lukas Lueftinger; Thomas Weinmaier; Matthias Klein; Johannes Bacher; Robin Patel; Arndt von Haeseler; Andreas E Posch
Journal:  J Clin Microbiol       Date:  2020-06-24       Impact factor: 5.948

9.  A k-mer-based method for the identification of phenotype-associated genomic biomarkers and predicting phenotypes of sequenced bacteria.

Authors:  Erki Aun; Age Brauer; Veljo Kisand; Tanel Tenson; Maido Remm
Journal:  PLoS Comput Biol       Date:  2018-10-22       Impact factor: 4.475

View more
  5 in total

1.  Core Genome Multilocus Sequence Typing and Prediction of Antimicrobial Susceptibility Using Whole-Genome Sequences of Escherichia coli Bloodstream Infection Isolates.

Authors:  Ritu Banerjee; Scott A Cunningham; Stephan Beisken; Andreas E Posch; Brian Johnston; James R Johnson; Robin Patel
Journal:  Antimicrob Agents Chemother       Date:  2021-08-23       Impact factor: 5.191

2.  A genomic data resource for predicting antimicrobial resistance from laboratory-derived antimicrobial susceptibility phenotypes.

Authors:  Margo VanOeffelen; Marcus Nguyen; Derya Aytan-Aktug; Thomas Brettin; Emily M Dietrich; Ronald W Kenyon; Dustin Machi; Chunhong Mao; Robert Olson; Gordon D Pusch; Maulik Shukla; Rick Stevens; Veronika Vonstein; Andrew S Warren; Alice R Wattam; Hyunseung Yoo; James J Davis
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 13.994

3.  Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction.

Authors:  Peter Májek; Lukas Lüftinger; Stephan Beisken; Thomas Rattei; Arne Materna
Journal:  Int J Mol Sci       Date:  2021-12-02       Impact factor: 5.923

4.  Automated antimicrobial susceptibility testing and antimicrobial resistance genotyping using Illumina and Oxford Nanopore Technologies sequencing data among Enterobacteriaceae.

Authors:  Rick Conzemius; Yehudit Bergman; Peter Májek; Stephan Beisken; Shawna Lewis; Emily B Jacobs; Pranita D Tamma; Patricia J Simner
Journal:  Front Microbiol       Date:  2022-08-08       Impact factor: 6.064

5.  The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus.

Authors:  Mike Ruigrok; Bing Xue; Andrew Catanach; Mengjie Zhang; Linley Jesson; Marcus Davy; Maren Wellenreuther
Journal:  Genes (Basel)       Date:  2022-06-23       Impact factor: 4.141

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.