Janet C Siebert1, Martine Saint-Cyr2, Sarah J Borengasser2, Brandie D Wagner3, Catherine A Lozupone4, Carsten Görg3. 1. Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA. jsiebert@cytoanalytics.com. 2. Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA. 3. Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA. 4. Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
Abstract
BACKGROUND: One goal of multi-omic studies is to identify interpretable predictive models for outcomes of interest, with analytes drawn from multiple omes. Such findings could support refined biological insight and hypothesis generation. However, standard analytical approaches are not designed to be "ome aware." Thus, some researchers analyze data from one ome at a time, and then combine predictions across omes. Others resort to correlation studies, cataloging pairwise relationships, but lacking an obvious approach for cohesive and interpretable summaries of these catalogs. METHODS: We present a novel workflow for building predictive regression models from network neighborhoods in multi-omic networks. First, we generate pairwise regression models across all pairs of analytes from all omes, encoding the resulting "top table" of relationships in a network. Then, we build predictive logistic regression models using the analytes in network neighborhoods of interest. We call this method CANTARE (Consolidated Analysis of Network Topology And Regression Elements). RESULTS: We applied CANTARE to previously published data from healthy controls and patients with inflammatory bowel disease (IBD) consisting of three omes: gut microbiome, metabolomics, and microbial-derived enzymes. We identified 8 unique predictive models with AUC > 0.90. The number of predictors in these models ranged from 3 to 13. We compare the results of CANTARE to random forests and elastic-net penalized regressions, analyzing AUC, predictions, and predictors. CANTARE AUC values were competitive with those generated by random forests and penalized regressions. The top 3 CANTARE models had a greater dynamic range of predicted probabilities than did random forests and penalized regressions (p-value = 1.35 × 10-5). CANTARE models were significantly more likely to prioritize predictors from multiple omes than were the alternatives (p-value = 0.005). We also showed that predictive models from a network based on pairwise models with an interaction term for IBD have higher AUC than predictive models built from a correlation network (p-value = 0.016). R scripts and a CANTARE User's Guide are available at https://sourceforge.net/projects/cytomelodics/files/CANTARE/ . CONCLUSION: CANTARE offers a flexible approach for building parsimonious, interpretable multi-omic models. These models yield quantitative and directional effect sizes for predictors and support the generation of hypotheses for follow-up investigation.
BACKGROUND: One goal of multi-omic studies is to identify interpretable predictive models for outcomes of interest, with analytes drawn from multiple omes. Such findings could support refined biological insight and hypothesis generation. However, standard analytical approaches are not designed to be "ome aware." Thus, some researchers analyze data from one ome at a time, and then combine predictions across omes. Others resort to correlation studies, cataloging pairwise relationships, but lacking an obvious approach for cohesive and interpretable summaries of these catalogs. METHODS: We present a novel workflow for building predictive regression models from network neighborhoods in multi-omic networks. First, we generate pairwise regression models across all pairs of analytes from all omes, encoding the resulting "top table" of relationships in a network. Then, we build predictive logistic regression models using the analytes in network neighborhoods of interest. We call this method CANTARE (Consolidated Analysis of Network Topology And Regression Elements). RESULTS: We applied CANTARE to previously published data from healthy controls and patients with inflammatory bowel disease (IBD) consisting of three omes: gut microbiome, metabolomics, and microbial-derived enzymes. We identified 8 unique predictive models with AUC > 0.90. The number of predictors in these models ranged from 3 to 13. We compare the results of CANTARE to random forests and elastic-net penalized regressions, analyzing AUC, predictions, and predictors. CANTARE AUC values were competitive with those generated by random forests and penalized regressions. The top 3 CANTARE models had a greater dynamic range of predicted probabilities than did random forests and penalized regressions (p-value = 1.35 × 10-5). CANTARE models were significantly more likely to prioritize predictors from multiple omes than were the alternatives (p-value = 0.005). We also showed that predictive models from a network based on pairwise models with an interaction term for IBD have higher AUC than predictive models built from a correlation network (p-value = 0.016). R scripts and a CANTARE User's Guide are available at https://sourceforge.net/projects/cytomelodics/files/CANTARE/ . CONCLUSION: CANTARE offers a flexible approach for building parsimonious, interpretable multi-omic models. These models yield quantitative and directional effect sizes for predictors and support the generation of hypotheses for follow-up investigation.
Entities:
Keywords:
IBD; Metabolome; Metagenome; Microbiome; Multi-omic; Predictive model
Authors: Abigail J S Armstrong; Michael Shaffer; Nichole M Nusbacher; Christine Griesmer; Suzanne Fiorillo; Jennifer M Schneider; C Preston Neff; Sam X Li; Andrew P Fontenot; Thomas Campbell; Brent E Palmer; Catherine A Lozupone Journal: Microbiome Date: 2018-11-05 Impact factor: 14.650
Authors: Charles Preston Neff; Owen Krueger; Kathy Xiong; Sabrina Arif; Nichole Nusbacher; Jennifer M Schneider; Annie W Cunningham; Abigail Armstrong; Sam Li; Martin D McCarter; Thomas B Campbell; Catherine A Lozupone; Brent E Palmer Journal: EBioMedicine Date: 2018-03-26 Impact factor: 8.143
Authors: Mohammad Sajjad Ghaemi; Daniel B DiGiulio; Kévin Contrepois; Benjamin Callahan; Thuy T M Ngo; Brittany Lee-McMullen; Benoit Lehallier; Anna Robaczewska; David Mcilwain; Yael Rosenberg-Hasson; Ronald J Wong; Cecele Quaintance; Anthony Culos; Natalie Stanley; Athena Tanada; Amy Tsai; Dyani Gaudilliere; Edward Ganio; Xiaoyuan Han; Kazuo Ando; Leslie McNeil; Martha Tingle; Paul Wise; Ivana Maric; Marina Sirota; Tony Wyss-Coray; Virginia D Winn; Maurice L Druzin; Ronald Gibbs; Gary L Darmstadt; David B Lewis; Vahid Partovi Nia; Bruno Agard; Robert Tibshirani; Garry Nolan; Michael P Snyder; David A Relman; Stephen R Quake; Gary M Shaw; David K Stevenson; Martin S Angst; Brice Gaudilliere; Nima Aghaeepour Journal: Bioinformatics Date: 2019-01-01 Impact factor: 6.937
Authors: Eric A Franzosa; Alexandra Sirota-Madi; Julian Avila-Pacheco; Nadine Fornelos; Henry J Haiser; Stefan Reinker; Tommi Vatanen; A Brantley Hall; Himel Mallick; Lauren J McIver; Jenny S Sauk; Robin G Wilson; Betsy W Stevens; Justin M Scott; Kerry Pierce; Amy A Deik; Kevin Bullock; Floris Imhann; Jeffrey A Porter; Alexandra Zhernakova; Jingyuan Fu; Rinse K Weersma; Cisca Wijmenga; Clary B Clish; Hera Vlamakis; Curtis Huttenhower; Ramnik J Xavier Journal: Nat Microbiol Date: 2018-12-10 Impact factor: 17.745
Authors: Janet C Siebert; Charles Preston Neff; Jennifer M Schneider; Emilie H Regner; Neha Ohri; Kristine A Kuhn; Brent E Palmer; Catherine A Lozupone; Carsten Görg Journal: BMC Bioinformatics Date: 2019-08-20 Impact factor: 3.169