Congmin Xu1,2, Man Zhou1,2, Zhongjie Xie1, Mo Li1, Xi Zhu3, Huaiqiu Zhu4,5. 1. State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China. 2. Center for Quantitative Biology, Peking University, Beijing, 100871, China. 3. Department of Critical Care Medicine, Peking University Third Hospital, Beijing, 100191, China. xizhuccm@163.com. 4. State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China. hqzhu@pku.edu.cn. 5. Center for Quantitative Biology, Peking University, Beijing, 100871, China. hqzhu@pku.edu.cn.
Abstract
BACKGROUND: The diagnosis of inflammatory bowel disease (IBD) and discrimination between the types of IBD are clinically important. IBD is associated with marked changes in the intestinal microbiota. Advances in next-generation sequencing (NGS) technology and the improved hospital bioinformatics analysis ability motivated us to develop a diagnostic method based on the gut microbiome. RESULTS: Using a set of whole-genome sequencing (WGS) data from 349 human gut microbiota samples with two types of IBD and healthy controls, we assembled and aligned WGS short reads to obtain feature profiles of strains and genera. The genus and strain profiles were used for the 16S-based and WGS-based diagnostic modules construction respectively. We designed a novel feature selection procedure to select those case-specific features. With these features, we built discrimination models using different machine learning algorithms. The machine learning algorithm LightGBM outperformed other algorithms in this study and thus was chosen as the core algorithm. Specially, we identified two small sets of biomarkers (strains) separately for the WGS-based health vs IBD module and ulcerative colitis vs Crohn's disease module, which contributed to the optimization of model performance during pre-training. We released LightCUD as an IBD diagnostic program built with LightGBM. The high performance has been validated through five-fold cross-validation and using an independent test data set. LightCUD was implemented in Python and packaged free for installation with customized databases. With WGS data or 16S rRNA sequencing data of gut microbiome samples as the input, LightCUD can discriminate IBD from healthy controls with high accuracy and further identify the specific type of IBD. The executable program LightCUD was released in open source with instructions at the webpage http://cqb.pku.edu.cn/ZhuLab/LightCUD/ . The identified strain biomarkers could be used to study the critical factors for disease development and recommend treatments regarding changes in the gut microbial community. CONCLUSIONS: As the first released human gut microbiome-based IBD diagnostic tool, LightCUD demonstrates a high-performance for both WGS and 16S sequencing data. The strains that either identify healthy controls from IBD patients or distinguish the specific type of IBD are expected to be clinically important to serve as biomarkers.
BACKGROUND: The diagnosis of inflammatory bowel disease (IBD) and discrimination between the types of IBD are clinically important. IBD is associated with marked changes in the intestinal microbiota. Advances in next-generation sequencing (NGS) technology and the improved hospital bioinformatics analysis ability motivated us to develop a diagnostic method based on the gut microbiome. RESULTS: Using a set of whole-genome sequencing (WGS) data from 349 human gut microbiota samples with two types of IBD and healthy controls, we assembled and aligned WGS short reads to obtain feature profiles of strains and genera. The genus and strain profiles were used for the 16S-based and WGS-based diagnostic modules construction respectively. We designed a novel feature selection procedure to select those case-specific features. With these features, we built discrimination models using different machine learning algorithms. The machine learning algorithm LightGBM outperformed other algorithms in this study and thus was chosen as the core algorithm. Specially, we identified two small sets of biomarkers (strains) separately for the WGS-based health vs IBD module and ulcerative colitis vs Crohn's disease module, which contributed to the optimization of model performance during pre-training. We released LightCUD as an IBD diagnostic program built with LightGBM. The high performance has been validated through five-fold cross-validation and using an independent test data set. LightCUD was implemented in Python and packaged free for installation with customized databases. With WGS data or 16S rRNA sequencing data of gut microbiome samples as the input, LightCUD can discriminate IBD from healthy controls with high accuracy and further identify the specific type of IBD. The executable program LightCUD was released in open source with instructions at the webpage http://cqb.pku.edu.cn/ZhuLab/LightCUD/ . The identified strain biomarkers could be used to study the critical factors for disease development and recommend treatments regarding changes in the gut microbial community. CONCLUSIONS: As the first released humangut microbiome-based IBD diagnostic tool, LightCUD demonstrates a high-performance for both WGS and 16S sequencing data. The strains that either identify healthy controls from IBDpatients or distinguish the specific type of IBD are expected to be clinically important to serve as biomarkers.
Entities:
Keywords:
Biomarker; Human gut microbiome; IBD; Machine learning algorithm
Authors: James D Lewis; Eric Z Chen; Robert N Baldassano; Anthony R Otley; Anne M Griffiths; Dale Lee; Kyle Bittinger; Aubrey Bailey; Elliot S Friedman; Christian Hoffmann; Lindsey Albenberg; Rohini Sinha; Charlene Compher; Erin Gilroy; Lisa Nessel; Amy Grant; Christel Chehoud; Hongzhe Li; Gary D Wu; Frederic D Bushman Journal: Cell Host Microbe Date: 2015-10-14 Impact factor: 21.023
Authors: James M Dahlhamer; Emily P Zammitti; Brian W Ward; Anne G Wheaton; Janet B Croft Journal: MMWR Morb Mortal Wkly Rep Date: 2016-10-28 Impact factor: 17.586
Authors: Subra Kugathasan; Lee A Denson; Dirk Gevers; Yoshiki Vázquez-Baeza; Will Van Treuren; Boyu Ren; Emma Schwager; Dan Knights; Se Jin Song; Moran Yassour; Xochitl C Morgan; Aleksandar D Kostic; Chengwei Luo; Antonio González; Daniel McDonald; Yael Haberman; Thomas Walters; Susan Baker; Joel Rosh; Michael Stephens; Melvin Heyman; James Markowitz; Robert Baldassano; Anne Griffiths; Francisco Sylvester; David Mack; Sandra Kim; Wallace Crandall; Jeffrey Hyams; Curtis Huttenhower; Rob Knight; Ramnik J Xavier Journal: Cell Host Microbe Date: 2014-03-12 Impact factor: 21.023
Authors: Dorottya Nagy-Szakal; Emily B Hollister; Ruth Ann Luna; Reka Szigeti; Nina Tatevian; C Wayne Smith; James Versalovic; Richard Kellermayer Journal: PLoS One Date: 2013-02-20 Impact factor: 3.240
Authors: J R Cole; Q Wang; E Cardenas; J Fish; B Chai; R J Farris; A S Kulam-Syed-Mohideen; D M McGarrell; T Marsh; G M Garrity; J M Tiedje Journal: Nucleic Acids Res Date: 2008-11-12 Impact factor: 16.971
Authors: Luke Jostins; Stephan Ripke; Rinse K Weersma; Richard H Duerr; Dermot P McGovern; Ken Y Hui; James C Lee; L Philip Schumm; Yashoda Sharma; Carl A Anderson; Jonah Essers; Mitja Mitrovic; Kaida Ning; Isabelle Cleynen; Emilie Theatre; Sarah L Spain; Soumya Raychaudhuri; Philippe Goyette; Zhi Wei; Clara Abraham; Jean-Paul Achkar; Tariq Ahmad; Leila Amininejad; Ashwin N Ananthakrishnan; Vibeke Andersen; Jane M Andrews; Leonard Baidoo; Tobias Balschun; Peter A Bampton; Alain Bitton; Gabrielle Boucher; Stephan Brand; Carsten Büning; Ariella Cohain; Sven Cichon; Mauro D'Amato; Dirk De Jong; Kathy L Devaney; Marla Dubinsky; Cathryn Edwards; David Ellinghaus; Lynnette R Ferguson; Denis Franchimont; Karin Fransen; Richard Gearry; Michel Georges; Christian Gieger; Jürgen Glas; Talin Haritunians; Ailsa Hart; Chris Hawkey; Matija Hedl; Xinli Hu; Tom H Karlsen; Limas Kupcinskas; Subra Kugathasan; Anna Latiano; Debby Laukens; Ian C Lawrance; Charlie W Lees; Edouard Louis; Gillian Mahy; John Mansfield; Angharad R Morgan; Craig Mowat; William Newman; Orazio Palmieri; Cyriel Y Ponsioen; Uros Potocnik; Natalie J Prescott; Miguel Regueiro; Jerome I Rotter; Richard K Russell; Jeremy D Sanderson; Miquel Sans; Jack Satsangi; Stefan Schreiber; Lisa A Simms; Jurgita Sventoraityte; Stephan R Targan; Kent D Taylor; Mark Tremelling; Hein W Verspaget; Martine De Vos; Cisca Wijmenga; David C Wilson; Juliane Winkelmann; Ramnik J Xavier; Sebastian Zeissig; Bin Zhang; Clarence K Zhang; Hongyu Zhao; Mark S Silverberg; Vito Annese; Hakon Hakonarson; Steven R Brant; Graham Radford-Smith; Christopher G Mathew; John D Rioux; Eric E Schadt; Mark J Daly; Andre Franke; Miles Parkes; Severine Vermeire; Jeffrey C Barrett; Judy H Cho Journal: Nature Date: 2012-11-01 Impact factor: 49.962
Authors: Rai Khalid Farooq; Widyan Alamoudi; Amani Alhibshi; Suriya Rehman; Ashish Ranjan Sharma; Fuad A Abdulla Journal: Microorganisms Date: 2022-03-25