Edi Prifti1,2, Yann Chevaleyre3, Blaise Hanczar4, Eugeni Belda2, Antoine Danchin5, Karine Clément6,7, Jean-Daniel Zucker1,2,6. 1. IRD, Sorbonne University, UMMISCO, 32 Avenue Henri Varagnat, F-93143 Bondy, France. 2. Institute of Cardiometabolism and Nutrition, ICAN, Integromics, 91 Boulevard de l'Hopital, F-75013, Paris, France. 3. Paris-Dauphine University, PSL Research University, CNRS, UMR 7243, LAMSADE, place du Mal. de Lattre de Tassigny, F-75016, Paris, France. 4. IBISC, University Paris-Saclay, University Evry, Evry, 23 Boulevard de France, F-91034, France. 5. Institut Cochin INSERM U1016-CNRS UMR8104-Université Paris Descartes, 24 Rue du Faubourg Saint-Jacques, F-75014, Paris, France. 6. Sorbonne University, INSERM, Nutrition and Obesities; Systemic Approach Research Unit (NutriOmics), 91 Boulevard de l'Hopital, F-75013, Paris, France. 7. Assistance Publique-Hôpitaux de Paris, Nutrition Department, CRNH Ile de France, Pitié-Salpêtrière Hospital, 91 Boulevard de l'Hopital, F-75013, Paris, France.
Abstract
BACKGROUND: Microbiome biomarker discovery for patient diagnosis, prognosis, and risk evaluation is attracting broad interest. Selected groups of microbial features provide signatures that characterize host disease states such as cancer or cardio-metabolic diseases. Yet, the current predictive models stemming from machine learning still behave as black boxes and seldom generalize well. Their interpretation is challenging for physicians and biologists, which makes them difficult to trust and use routinely in the physician-patient decision-making process. Novel methods that provide interpretability and biological insight are needed. Here, we introduce "predomics", an original machine learning approach inspired by microbial ecosystem interactions that is tailored for metagenomics data. It discovers accurate predictive signatures and provides unprecedented interpretability. The decision provided by the predictive model is based on a simple, yet powerful score computed by adding, subtracting, or dividing cumulative abundance of microbiome measurements. RESULTS: Tested on >100 datasets, we demonstrate that predomics models are simple and highly interpretable. Even with such simplicity, they are at least as accurate as state-of-the-art methods. The family of best models, discovered during the learning process, offers the ability to distil biological information and to decipher the predictability signatures of the studied condition. In a proof-of-concept experiment, we successfully predicted body corpulence and metabolic improvement after bariatric surgery using pre-surgery microbiome data. CONCLUSIONS: Predomics is a new algorithm that helps in providing reliable and trustworthy diagnostic decisions in the microbiome field. Predomics is in accord with societal and legal requirements that plead for an explainable artificial intelligence approach in the medical field.
BACKGROUND: Microbiome biomarker discovery for patient diagnosis, prognosis, and risk evaluation is attracting broad interest. Selected groups of microbial features provide signatures that characterize host disease states such as cancer or cardio-metabolic diseases. Yet, the current predictive models stemming from machine learning still behave as black boxes and seldom generalize well. Their interpretation is challenging for physicians and biologists, which makes them difficult to trust and use routinely in the physician-patient decision-making process. Novel methods that provide interpretability and biological insight are needed. Here, we introduce "predomics", an original machine learning approach inspired by microbial ecosystem interactions that is tailored for metagenomics data. It discovers accurate predictive signatures and provides unprecedented interpretability. The decision provided by the predictive model is based on a simple, yet powerful score computed by adding, subtracting, or dividing cumulative abundance of microbiome measurements. RESULTS: Tested on >100 datasets, we demonstrate that predomics models are simple and highly interpretable. Even with such simplicity, they are at least as accurate as state-of-the-art methods. The family of best models, discovered during the learning process, offers the ability to distil biological information and to decipher the predictability signatures of the studied condition. In a proof-of-concept experiment, we successfully predicted body corpulence and metabolic improvement after bariatric surgery using pre-surgery microbiome data. CONCLUSIONS: Predomics is a new algorithm that helps in providing reliable and trustworthy diagnostic decisions in the microbiome field. Predomics is in accord with societal and legal requirements that plead for an explainable artificial intelligence approach in the medical field.
Authors: Edoardo Pasolli; Lucas Schiffer; Paolo Manghi; Audrey Renson; Valerie Obenchain; Duy Tin Truong; Francesco Beghini; Faizan Malik; Marcel Ramos; Jennifer B Dowd; Curtis Huttenhower; Martin Morgan; Nicola Segata; Levi Waldron Journal: Nat Methods Date: 2017-10-31 Impact factor: 28.547
Authors: Ruth E Ley; Fredrik Bäckhed; Peter Turnbaugh; Catherine A Lozupone; Robin D Knight; Jeffrey I Gordon Journal: Proc Natl Acad Sci U S A Date: 2005-07-20 Impact factor: 11.205
Authors: James T Morton; Jon Sanders; Robert A Quinn; Daniel McDonald; Antonio Gonzalez; Yoshiki Vázquez-Baeza; Jose A Navas-Molina; Se Jin Song; Jessica L Metcalf; Embriette R Hyde; Manuel Lladser; Pieter C Dorrestein; Rob Knight Journal: mSystems Date: 2017-01-17 Impact factor: 6.496
Authors: Anna Paola Carrieri; Niina Haiminen; Sean Maudsley-Barton; Laura-Jayne Gardiner; Barry Murphy; Andrew E Mayes; Sarah Paterson; Sally Grimshaw; Martyn Winn; Cameron Shand; Panagiotis Hadjidoukas; Will P M Rowe; Stacy Hawkins; Ashley MacGuire-Flanagan; Jane Tazzioli; John G Kenny; Laxmi Parida; Michael Hoptroff; Edward O Pyzer-Knapp Journal: Sci Rep Date: 2021-02-25 Impact factor: 4.379
Authors: Advait Balaji; Bryce Kille; Anthony D Kappell; Gene D Godbold; Madeline Diep; R A Leo Elworth; Zhiqin Qian; Dreycey Albin; Daniel J Nasko; Nidhi Shah; Mihai Pop; Santiago Segarra; Krista L Ternus; Todd J Treangen Journal: Genome Biol Date: 2022-06-20 Impact factor: 17.906