Shirin Moossavi1,2,3,4,5, Kelsey Fehr6,7, Ehsan Khafipour8,9, Meghan B Azad10,11,12. 1. Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB, Canada. Shirin.moossavi@ucalgary.ca. 2. Children's Hospital Research Institute of Manitoba, Winnipeg, MB, Canada. Shirin.moossavi@ucalgary.ca. 3. Developmental Origins of Chronic Diseases in Children Network (DEVOTION), Winnipeg, MB, Canada. Shirin.moossavi@ucalgary.ca. 4. Digestive Oncology Research Center, Digestive Disease Research Institute, Tehran University of Medical Sciences, Tehran, Iran. Shirin.moossavi@ucalgary.ca. 5. Department of Physiology and Pharmacology & Mechanical and Manufacturing Engineering, University of Calgary, Calgary, AB, Canada. Shirin.moossavi@ucalgary.ca. 6. Children's Hospital Research Institute of Manitoba, Winnipeg, MB, Canada. 7. Department of Pediatrics and Child Health, University of Manitoba, Winnipeg, MB, Canada. 8. Department of Animal Science, University of Manitoba, Winnipeg, MB, Canada. 9. Microbiome Research and Technical Support, Cargill Animal Nutrition, Diamond V brand, Cedar Rapids, USA. 10. Children's Hospital Research Institute of Manitoba, Winnipeg, MB, Canada. Meghan.Azad@umanitoba.ca. 11. Developmental Origins of Chronic Diseases in Children Network (DEVOTION), Winnipeg, MB, Canada. Meghan.Azad@umanitoba.ca. 12. Department of Pediatrics and Child Health, University of Manitoba, Winnipeg, MB, Canada. Meghan.Azad@umanitoba.ca.
Abstract
BACKGROUND: Quality control including assessment of batch variabilities and confirmation of repeatability and reproducibility are integral component of high throughput omics studies including microbiome research. Batch effects can mask true biological results and/or result in irreproducible conclusions and interpretations. Low biomass samples in microbiome research are prone to reagent contamination; yet, quality control procedures for low biomass samples in large-scale microbiome studies are not well established. RESULTS: In this study, we have proposed a framework for an in-depth step-by-step approach to address this gap. The framework consists of three independent stages: (1) verification of sequencing accuracy by assessing technical repeatability and reproducibility of the results using mock communities and biological controls; (2) contaminant removal and batch variability correction by applying a two-tier strategy using statistical algorithms (e.g. decontam) followed by comparison of the data structure between batches; and (3) corroborating the repeatability and reproducibility of microbiome composition and downstream statistical analysis. Using this approach on the milk microbiota data from the CHILD Cohort generated in two batches (extracted and sequenced in 2016 and 2019), we were able to identify potential reagent contaminants that were missed with standard algorithms and substantially reduce contaminant-induced batch variability. Additionally, we confirmed the repeatability and reproducibility of our results in each batch before merging them for downstream analysis. CONCLUSION: This study provides important insight to advance quality control efforts in low biomass microbiome research. Within-study quality control that takes advantage of the data structure (i.e. differential prevalence of contaminants between batches) would enhance the overall reliability and reproducibility of research in this field. Video abstract.
BACKGROUND: Quality control including assessment of batch variabilities and confirmation of repeatability and reproducibility are integral component of high throughput omics studies including microbiome research. Batch effects can mask true biological results and/or result in irreproducible conclusions and interpretations. Low biomass samples in microbiome research are prone to reagent contamination; yet, quality control procedures for low biomass samples in large-scale microbiome studies are not well established. RESULTS: In this study, we have proposed a framework for an in-depth step-by-step approach to address this gap. The framework consists of three independent stages: (1) verification of sequencing accuracy by assessing technical repeatability and reproducibility of the results using mock communities and biological controls; (2) contaminant removal and batch variability correction by applying a two-tier strategy using statistical algorithms (e.g. decontam) followed by comparison of the data structure between batches; and (3) corroborating the repeatability and reproducibility of microbiome composition and downstream statistical analysis. Using this approach on the milk microbiota data from the CHILD Cohort generated in two batches (extracted and sequenced in 2016 and 2019), we were able to identify potential reagent contaminants that were missed with standard algorithms and substantially reduce contaminant-induced batch variability. Additionally, we confirmed the repeatability and reproducibility of our results in each batch before merging them for downstream analysis. CONCLUSION: This study provides important insight to advance quality control efforts in low biomass microbiome research. Within-study quality control that takes advantage of the data structure (i.e. differential prevalence of contaminants between batches) would enhance the overall reliability and reproducibility of research in this field. Video abstract.
Authors: Marcus C de Goffau; Susanne Lager; Susannah J Salter; Josef Wagner; Andreas Kronbichler; D Stephen Charnock-Jones; Sharon J Peacock; Gordon C S Smith; Julian Parkhill Journal: Nat Microbiol Date: 2018-08 Impact factor: 17.745
Authors: Shirin Moossavi; Shadi Sepehri; Bianca Robertson; Lars Bode; Sue Goruk; Catherine J Field; Lisa M Lix; Russell J de Souza; Allan B Becker; Piushkumar J Mandhane; Stuart E Turvey; Padmaja Subbarao; Theo J Moraes; Diana L Lefebvre; Malcolm R Sears; Ehsan Khafipour; Meghan B Azad Journal: Cell Host Microbe Date: 2019-02-13 Impact factor: 21.023
Authors: Rashmi Sinha; Galeb Abu-Ali; Emily Vogtmann; Anthony A Fodor; Boyu Ren; Amnon Amir; Emma Schwager; Jonathan Crabtree; Siyuan Ma; Christian C Abnet; Rob Knight; Owen White; Curtis Huttenhower Journal: Nat Biotechnol Date: 2017-10-02 Impact factor: 54.908
Authors: Benjamin J Callahan; Paul J McMurdie; Michael J Rosen; Andrew W Han; Amy Jo A Johnson; Susan P Holmes Journal: Nat Methods Date: 2016-05-23 Impact factor: 28.547
Authors: J Gregory Caporaso; Christian L Lauber; William A Walters; Donna Berg-Lyons; James Huntley; Noah Fierer; Sarah M Owens; Jason Betley; Louise Fraser; Markus Bauer; Niall Gormley; Jack A Gilbert; Geoff Smith; Rob Knight Journal: ISME J Date: 2012-03-08 Impact factor: 10.302
Authors: Sophie Weiss; Amnon Amir; Embriette R Hyde; Jessica L Metcalf; Se Jin Song; Rob Knight Journal: Genome Biol Date: 2014-12-17 Impact factor: 13.583