BACKGROUND: Recent years have witnessed the development of several k-mer-based approaches aiming to predict phenotypic traits of bacteria on the basis of their whole-genome sequences. While often convincing in terms of predictive performance, the underlying models are in general not straightforward to interpret, the interplay between the actual genetic determinant and its translation as k-mers being generally hard to decipher. RESULTS: We propose a simple and computationally efficient strategy allowing one to cope with the high correlation inherent to k-mer-based representations in supervised machine learning models, leading to concise and easily interpretable signatures. We demonstrate the benefit of this approach on the task of predicting the antibiotic resistance profile of a Klebsiella pneumoniae strain from its genome, where our method leads to signatures defined as weighted linear combinations of genetic elements that can easily be identified as genuine antibiotic resistance determinants, with state-of-the-art predictive performance. CONCLUSIONS: By enhancing the interpretability of genomic k-mer-based antibiotic resistance prediction models, our approach improves their clinical utility and hence will facilitate their adoption in routine diagnostics by clinicians and microbiologists. While antibiotic resistance was the motivating application, the method is generic and can be transposed to any other bacterial trait. An R package implementing our method is available at https://gitlab.com/biomerieux-data-science/clustlasso.
BACKGROUND: Recent years have witnessed the development of several k-mer-based approaches aiming to predict phenotypic traits of bacteria on the basis of their whole-genome sequences. While often convincing in terms of predictive performance, the underlying models are in general not straightforward to interpret, the interplay between the actual genetic determinant and its translation as k-mers being generally hard to decipher. RESULTS: We propose a simple and computationally efficient strategy allowing one to cope with the high correlation inherent to k-mer-based representations in supervised machine learning models, leading to concise and easily interpretable signatures. We demonstrate the benefit of this approach on the task of predicting the antibiotic resistance profile of a Klebsiella pneumoniae strain from its genome, where our method leads to signatures defined as weighted linear combinations of genetic elements that can easily be identified as genuine antibiotic resistance determinants, with state-of-the-art predictive performance. CONCLUSIONS: By enhancing the interpretability of genomic k-mer-based antibiotic resistance prediction models, our approach improves their clinical utility and hence will facilitate their adoption in routine diagnostics by clinicians and microbiologists. While antibiotic resistance was the motivating application, the method is generic and can be transposed to any other bacterial trait. An R package implementing our method is available at https://gitlab.com/biomerieux-data-science/clustlasso.
Authors: David W Eyre; Dilrini De Silva; Kevin Cole; Joanna Peters; Michelle J Cole; Yonatan H Grad; Walter Demczuk; Irene Martin; Michael R Mulvey; Derrick W Crook; A Sarah Walker; Tim E A Peto; John Paul Journal: J Antimicrob Chemother Date: 2017-07-01 Impact factor: 5.790
Authors: Alex van Belkum; Carey-Ann D Burnham; John W A Rossen; Frederic Mallard; Olivier Rochas; William Michael Dunne Journal: Nat Rev Microbiol Date: 2020-02-13 Impact factor: 60.633
Authors: Timothy M Walker; Thomas A Kohl; Shaheed V Omar; Jessica Hedge; Carlos Del Ojo Elias; Phelim Bradley; Zamin Iqbal; Silke Feuerriegel; Katherine E Niehaus; Daniel J Wilson; David A Clifton; Georgia Kapatai; Camilla L C Ip; Rory Bowden; Francis A Drobniewski; Caroline Allix-Béguec; Cyril Gaudin; Julian Parkhill; Roland Diel; Philip Supply; Derrick W Crook; E Grace Smith; A Sarah Walker; Nazir Ismail; Stefan Niemann; Tim E A Peto Journal: Lancet Infect Dis Date: 2015-06-23 Impact factor: 25.071
Authors: Marcus Nguyen; Thomas Brettin; S Wesley Long; James M Musser; Randall J Olsen; Robert Olson; Maulik Shukla; Rick L Stevens; Fangfang Xia; Hyunseung Yoo; James J Davis Journal: Sci Rep Date: 2018-01-11 Impact factor: 4.379
Authors: John A Lees; T Tien Mai; Marco Galardini; Nicole E Wheeler; Samuel T Horsfield; Julian Parkhill; Jukka Corander Journal: mBio Date: 2020-07-07 Impact factor: 7.867
Authors: Jee In Kim; Finlay Maguire; Kara K Tsang; Theodore Gouliouris; Sharon J Peacock; Tim A McAllister; Andrew G McArthur; Robert G Beiko Journal: Clin Microbiol Rev Date: 2022-05-25 Impact factor: 50.129
Authors: Zackery P Bulman; Fiorella Krapp; Nathan B Pincus; Eric Wenzler; Katherine R Murphy; Chao Qi; Egon A Ozer; Alan R Hauser Journal: mSystems Date: 2021-09-14 Impact factor: 6.496
Authors: Amogelang R Raphenya; James Robertson; Casper Jamin; Leonardo de Oliveira Martins; Finlay Maguire; Andrew G McArthur; John P Hays Journal: Sci Data Date: 2022-06-15 Impact factor: 8.501