| Literature DB >> 29629423 |
Yi-Chun Yeh1, David M Needham1, Ella T Sieradzki1, Jed A Fuhrman1.
Abstract
Mock communities have been used in microbiome method development to help estimate biases introduced in PCR amplification and sequencing and to optimize pipeline outputs. Nevertheless, the strong value of routine mock community analysis beyond initial method development is rarely, if ever, considered. Here we report that our routine use of mock communities as internal standards allowed us to discover highly aberrant and strong biases in the relative proportions of multiple taxa in a single Illumina HiSeqPE250 run. In this run, an important archaeal taxon virtually disappeared from all samples, and other mock community taxa showed >2-fold high or low abundance, whereas a rerun of those identical amplicons (from the same reaction tubes) on a different date yielded "normal" results. Although obvious from the strange mock community results, we could have easily missed the problem had we not used the mock communities because of natural variation of microbiomes at our site. The "normal" results were validated over four MiSeqPE300 runs and three HiSeqPE250 runs, and run-to-run variation was usually low. While validating these "normal" results, we also discovered that some mock microbial taxa had relatively modest, but consistent, differences between sequencing platforms. We strongly advise the use of mock communities in every sequencing run to distinguish potentially serious aberrations from natural variations. The mock communities should have more than just a few members and ideally at least partly represent the samples being analyzed to detect problems that show up only in some taxa and also to help validate clustering. IMPORTANCE Despite the routine use of standards and blanks in virtually all chemical or physical assays and most biological studies (a kind of "control"), microbiome analysis has traditionally lacked such standards. Here we show that unexpected problems of unknown origin can occur in such sequencing runs and yield completely incorrect results that would not necessarily be detected without the use of standards. Assuming that the microbiome sequencing analysis works properly every time risks serious errors that can be detected by the use of mock communities.Entities:
Keywords: DNA sequencing; microbiome analysis; mock community
Year: 2018 PMID: 29629423 PMCID: PMC5883066 DOI: 10.1128/mSystems.00023-18
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
FIG 1 Comparisons of “even” mock communities (a) and “staggered” mock communities (b) sequenced by MiSeqPE300 and HiSeqPE250. Values that are significantly different for a clone by MiSeqPE300 versus HiSeqPE250 are indicated with an asterisk before the clone name (P < 0.05 by Wilcoxon rank sum test). Significant differences in the whole-community composition by MiSeqPE300 and HiSeqPE250 were found only in the even mock community (P < 0.05 by ANOSIM test).
Quality statistics of each sequencing run
| Sequencing platform and run | Sample types | No. of mock replicates/run | Avg length of forward reads after QC | Avg length of reverse reads after QC | Sequence error rate (%) | % sequences | |
|---|---|---|---|---|---|---|---|
| MiSeqPE300 | |||||||
| Run 06 | Amplicons + 10 to 15% PhiX | 4 | 286.1 | 261.2 | 0.029 | 98 | 0.94 |
| Run 20 | Amplicons + 10 to 15% PhiX | 3 | 285.6 | 244.7 | 0.03 | 98 | 0.95 |
| Run 31 | Amplicons + 10 to 15% PhiX | 4 | 278.0 | 221.4 | 0.023 | 99 | 0.95 |
| Run 46 | Amplicons + 10 to 15% PhiX | 1 | 252.7 | 216.9 | 0.029 | 98 | 0.95 |
| | |||||||
| HiSeqPE250 | |||||||
| Run 36 | 5% amplicons + 95% metagenomes | 1 | 235.9 | 231.7 | 0.02 | 92 | 0.97 |
| Run 44 | Metagenomes + 0.1% mock | 1 | 235.1 | 233.3 | 0.033 | 95 | 0.94 |
| Run 47 | 20% amplicons + 80% metagenomes | 1 | 237.5 | 228.2 | 0.018 | 98 | 0.91 |
| | |||||||
The characteristics and results for the aberrant run and the rerun of the library from the aberrant run are shown in boldface type.
Number of mock replicates included in each run.
The trimmed length after quality control (QC) as described in Materials and Methods.
The error rate is defined as the sum of mismatches to the reference divided by sum of bases in query for mock communities using Mothur.
Coefficient of variation of observed staggered mock community versus in silico staggered mock community under log (x + 0.001) at 99% similarity level.
The R2 of the aberrant run is far outside the range of other runs.
The sequencing facility adds 10 to 15% of PhiX174 (phage DNA) for “amplicons-only” runs as recommended by Illumina to increase sample complexity.
FIG 2 Rerun of the same PCR products (same as those shown in Fig. 1) from the “aberrant” sequencing run.
FIG 3 Field community comparisons via 16S and between sequencing runs. (a) Good replication of rank abundance curves between different sequencing runs and with slightly different primers (515F-C is the original EMP primer and 515F-Y is the version where a C is replaced with a Y [16]). The abundance rank was defined by primer 515F-C. (b) Comparison of rank abundance curves from the June 2015 sample analysis, showing the aberrant sequencing run and the exact same PCR products reanalyzed on the other sequencing run on a different day. The abundance rank was defined by the rerun.