Qunyuan Zhang1, Haley Abel1, Alan Wells1, Petra Lenzini1, Felicia Gomez1, Michael A Province1, Alan A Templeton2, George M Weinstock1, Nita H Salzman1, Ingrid B Borecki1. 1. Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, Department of Biology, Washington University, St. Louis, MO, USA, The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA. 2. Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, Department of Biology, Washington University, St. Louis, MO, USA, The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA Division of Statistical Genomics, Washington University School of Medicine, St. Louis, MO, USA, Department of Biology, Washington University, St. Louis, MO, USA, The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA and Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI, USA.
Abstract
MOTIVATION: Establishment of a statistical association between microbiome features and clinical outcomes is of growing interest because of the potential for yielding insights into biological mechanisms and pathogenesis. Extracting microbiome features that are relevant for a disease is challenging and existing variable selection methods are limited due to large number of risk factor variables from microbiome sequence data and their complex biological structure. RESULTS: We propose a tree-based scanning method, Selection of Models for the Analysis of Risk factor Trees (referred to as SMART-scan), for identifying taxonomic groups that are associated with a disease or trait. SMART-scan is a model selection technique that uses a predefined taxonomy to organize the large pool of possible predictors into optimized groups, and hierarchically searches and determines variable groups for association test. We investigate the statistical properties of SMART-scan through simulations, in comparison to a regular single-variable analysis and three commonly-used variable selection methods, stepwise regression, least absolute shrinkage and selection operator (LASSO) and classification and regression tree (CART). When there are taxonomic group effects in the data, SMART-scan can significantly increase power by using bacterial taxonomic information to split large numbers of variables into groups. Through an application to microbiome data from a vervet monkey diet experiment, we demonstrate that SMART-scan can identify important phenotype-associated taxonomic features missed by single-variable analysis, stepwise regression, LASSO and CART.
MOTIVATION: Establishment of a statistical association between microbiome features and clinical outcomes is of growing interest because of the potential for yielding insights into biological mechanisms and pathogenesis. Extracting microbiome features that are relevant for a disease is challenging and existing variable selection methods are limited due to large number of risk factor variables from microbiome sequence data and their complex biological structure. RESULTS: We propose a tree-based scanning method, Selection of Models for the Analysis of Risk factor Trees (referred to as SMART-scan), for identifying taxonomic groups that are associated with a disease or trait. SMART-scan is a model selection technique that uses a predefined taxonomy to organize the large pool of possible predictors into optimized groups, and hierarchically searches and determines variable groups for association test. We investigate the statistical properties of SMART-scan through simulations, in comparison to a regular single-variable analysis and three commonly-used variable selection methods, stepwise regression, least absolute shrinkage and selection operator (LASSO) and classification and regression tree (CART). When there are taxonomic group effects in the data, SMART-scan can significantly increase power by using bacterial taxonomic information to split large numbers of variables into groups. Through an application to microbiome data from a vervet monkey diet experiment, we demonstrate that SMART-scan can identify important phenotype-associated taxonomic features missed by single-variable analysis, stepwise regression, LASSO and CART.
Authors: T Z DeSantis; P Hugenholtz; N Larsen; M Rojas; E L Brodie; K Keller; T Huber; D Dalevi; P Hu; G L Andersen Journal: Appl Environ Microbiol Date: 2006-07 Impact factor: 4.792
Authors: Venkata S Voruganti; Matthew J Jorgensen; Jay R Kaplan; Kylie Kavanagh; Larry L Rudel; Ryan Temel; Lynn A Fairbanks; Anthony G Comuzzie Journal: Am J Primatol Date: 2013-01-11 Impact factor: 2.371
Authors: Alan R Templeton; Taylor Maxwell; David Posada; Jari H Stengård; Eric Boerwinkle; Charles F Sing Journal: Genetics Date: 2004-09-15 Impact factor: 4.562
Authors: Patricio S La Rosa; J Paul Brooks; Elena Deych; Edward L Boone; David J Edwards; Qin Wang; Erica Sodergren; George Weinstock; William D Shannon Journal: PLoS One Date: 2012-12-20 Impact factor: 3.240
Authors: Harald Hampel; Nicola Toschi; Claudio Babiloni; Filippo Baldacci; Keith L Black; Arun L W Bokde; René S Bun; Francesco Cacciola; Enrica Cavedo; Patrizia A Chiesa; Olivier Colliot; Cristina-Maria Coman; Bruno Dubois; Andrea Duggento; Stanley Durrleman; Maria-Teresa Ferretti; Nathalie George; Remy Genthon; Marie-Odile Habert; Karl Herholz; Yosef Koronyo; Maya Koronyo-Hamaoui; Foudil Lamari; Todd Langevin; Stéphane Lehéricy; Jean Lorenceau; Christian Neri; Robert Nisticò; Francis Nyasse-Messene; Craig Ritchie; Simone Rossi; Emiliano Santarnecchi; Olaf Sporns; Steven R Verdooner; Andrea Vergallo; Nicolas Villain; Erfan Younesi; Francesco Garaci; Simone Lista Journal: J Alzheimers Dis Date: 2018 Impact factor: 4.472