| Literature DB >> 34254823 |
Matthew Y Cho1,2,3, Marc Oliva4,5, Anna Spreafico4, Bo Chen6, Xu Wei6, Yoojin Choi1,2,3, Rupert Kaul1,2,3, Lillian L Siu4, Bryan Coburn1,2,3, Pierre H H Schneeberger1,2,7,8,3.
Abstract
When determining human microbiota composition, shotgun sequencing is a powerful tool that can generate high-resolution taxonomic and functional information at once. However, the technique is limited by missing information about host-to-microbe ratios observed in different body compartments. This limitation makes it difficult to plan shotgun sequencing assays, especially in the context of high sample multiplexing and limited sequencing output and is of particular importance for studies employing the recently described shallow shotgun sequencing technique. In this study, we evaluated the use of a quantitative PCR (qPCR)-based assay to predict host-to-microbe ratio prior to sequencing. Combining a two-target assay involving the bacterial 16S rRNA gene and the human beta-actin gene, we derived a model to predict human-to-microbe ratios from two sample types, including stool samples and oropharyngeal swabs. We then validated it on two independently collected sample types, including rectal swabs and vaginal secretion samples. This assay enabled accurate prediction in the validation set in a range of sample compositions between 4% and 98% nonhuman reads and observed proportions varied between -18.8% and +19.2% from the expected values. We hope that this easy-to-use assay will help researchers to plan their shotgun sequencing experiments in a more efficient way. IMPORTANCE When determining human microbiota composition, shotgun sequencing is a powerful tool that can generate large amounts of data. However, in sample compositions with low or variable microbial density, shallowing sequencing can negatively affect microbial community metrics. Here, we show that variable sequencing depth decreases measured alpha diversity at differing rates based on community composition. We then derived a model that can determine sample composition prior to sequencing using quantitative PCR (qPCR) data and validated the model using a separate sample set. We have included a tool that uses this model to be available for researchers to use when gauging shallow sequencing viability of samples.Entities:
Keywords: host DNA proportion; metagenomics; microbiome; sample composition; shallow shotgun; shotgun sequencing
Year: 2021 PMID: 34254823 PMCID: PMC8409737 DOI: 10.1128/mSystems.00552-21
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
FIG 1Alpha diversity indices are shown across a range of simulated sequencing depths from 103 to 106 reads per sample. Each sample was subsampled 10 times for a range of sequencing depths. Each resulting rarefaction was profiled using MetaPhlAn 2.0. Richness, Shannon index, and Berger-Parker indexes were calculated for each rarefaction. The mean value of each index was calculated per sample per depth. Displayed are the median values and interquartile ranges of these means by sample origin. (A) Sample-specific rarefaction curves of species richness. (B) Shannon index calculated across a range of rarefactions, by sample type. (C) Sample dominance, measured with the Berger-Parker index, across a range of sequencing depths, stratified by sample type.
FIG 2Statistical models to predict sample composition using qPCR prior to high-throughput sequencing. (A) Sigmoidal model derived from oropharyngeal swabs and stool samples depicting the relationship between the difference of human (ACTB) and bacterial (16S) qPCR values (Ct) with the percentage of microbial reads (R2 = 0.990). Nonlinear regression line (solid) is based on the following logistic growth equation: percent microbial reads = 2.7201549/([99.50267 × e−0.7218 × { − 16S}] + 0.02733). One-tailed 95% prediction interval is depicted with a dotted line. (B) Model residuals. (C) Fitting of validation sample set on prediction model. The orange dots represent values derived from a validation sample set composed of vaginal secretion and rectal swab samples and correlate well (R2 = 0.930) with the prediction model (solid black line). (D) Difference between expected and observed composition across the range of microbial content.