| Literature DB >> 33133687 |
Dongsheng Han1,2,3, Peng Gao1,2,3, Rui Li1,2,3, Ping Tan1,2,3, Jiehong Xie1,3, Rui Zhang1,3, Jinming Li1,2,3.
Abstract
INTRODUCTION: Microbiome research based on high-throughput sequencing has grown exponentially in recent years, but methodological variations can easily undermine the reproducibility across studies.Entities:
Keywords: 16S rRNA gene sequencing; Microbial community profiling; Microbiome; Microbiota; Shotgun metagenomic sequencing
Year: 2020 PMID: 33133687 PMCID: PMC7584675 DOI: 10.1016/j.jare.2020.07.010
Source DB: PubMed Journal: J Adv Res ISSN: 2090-1224 Impact factor: 10.479
Methodological variance reported by the participating laboratories.
| 16Ss (26 laboratories) | SMs (23 laboratories) | ||
|---|---|---|---|
| N | N | ||
| Qiagen | 9 | Qiagen | 7 |
| Tiangen | 3 | Tiangen | 3 |
| Zymo Research | 3 | Zymo Research | 2 |
| Omega | 3 | Omega | 3 |
| Other/Custom | 8 | Other/Custom | 8 |
| Yes | 19 | Yes | 14 |
| No | 7 | No | 9 |
| V3-4 | 13 | Enzymatic | 13 |
| V4 | 6 | Physical (ultrasound) | 10 |
| V1-9 | 3 | ||
| V4-5 | 1 | Illumina NovaSeq | 9 |
| V1-2 | 1 | Illumina Hiseq | 4 |
| V2, V3, V4, V6-7, V8, V9 | 2 | MGISEQ-2000 | 3 |
| Illumina MiSeq | 2 | ||
| Illumina NovaSeq | 3 | DA8600 | 2 |
| Illumina MiSeq | 14 | NextSeq CN500 | 2 |
| Illumina Hiseq | 5 | MGISEQ-200RS | 1 |
| Ion Torrent PGM | 3 | ||
| PacBio Sequel | 1 | Paired-end | 18 |
| Single-end | 5 | ||
| Paired-end | 22 | ||
| Single-end | 4 | 100 | 4 |
| 150 | 15 | ||
| 150 | 5 | 200 | 2 |
| 250 | 10 | 300 | 2 |
| 300 | 7 | ||
| 400 | 1 | MetaPhlAn2 | 12 |
| 600 | 2 | Kraken 2 | 5 |
| 68,866 | 1 | SOAPaligner/soap2 | 3 |
| Diamond 0.9.27 | 2 | ||
| Qiime | 11 | Explify V2.1.0 (IDbyDNA Inc.) | 1 |
| USEARCH | 7 | ||
| Parallel-META Pipeline | 1 | Default database in taxonomic classifiers or NCBI nr database | 20 |
| Mother | 3 | MetaHIT database | 3 |
| EzBioCloud | 1 | ||
| Ion Reporter™ | 3 | ||
| Greengenes | 11 | ||
| SILVA | 10 | ||
| NCBI 16S rRNA database | 1 | ||
| PrecisionGene Database (PRS-DB) | 2 | ||
| EzBioCloud 16S database | 1 | ||
| Custom | 1 | ||
Generated using a third generation of sequencer (PacBio Sequel) by P24.
Fig. 1The analysis of the observed results of mock communities in 35 laboratories. Spearman rank correlation coefficients were calculated for both the 16Ss (A) and SMs laboratories (C) to identify the correlation between the median observed microbial abundances and the expected microbial abundances in the samples 201901 and 201902. The scatter plot shows that the observed relative abundance of each bacterium at the genus level for 16Ss (B) and at the species level for SMs (D) varied greatly among laboratories. The observed relative abundance of the designed low-abundance Bifidobacterium spp. in 16Ss or B. bifidum in SMs is displayed on the right axis. The line displays the interquartile range (lower quartile to upper quartile).
Fig. 2The correlation between the results for the sample 201901 reported by any two 16Ss laboratories (A) or SMs laboratories (B) was evaluated by Spearman's rank correlation coefficient. The number in every square represents the Spearman r value. The higher the r value is, the stronger the correlation. A square marked with a black circle means that there is no correlation between the results of the two corresponding laboratories (P value >0.05). “Expected” indicates the designed value by the NCCL.
Fig. 3The clustered histogram intuitively reflects the interlaboratory differences in the reported microbiota composition in the sample 201901 (A for 16Ss at the genus level and C for SMs at the species level). The number of unexpected bacteria with a relative abundance >0.01% was counted in every laboratory (B for 16Ss at the genus level and D for SMs at the species level). The number on each petal refers to the identified unexpected bacteria for the corresponding laboratory.
Unexpected bacteria found in sample 201901.
| 16 s rRNA gene sequencing | Shotgun metagenomic sequencing | ||
|---|---|---|---|
| Bacterium (Genus) | No. of Laboratory | Bacterium (Species) | No. of Laboratory |
| 17 | 10 | ||
| 9 | 7 | ||
| 8 | 7 | ||
| 8 | 6 | ||
| 7 | 6 | ||
| 7 | 6 | ||
| 6 | 6 | ||
| 6 | Others | ≤5 | |
| Others | ≤5 | ||
Full list is showed in supplementary Table S2.
Fig. 4Potential sources of variation in 16Ss. (A) The cumulative relative abundance of G+ bacteria detected in the samples 201901 and 201902. Each point represents a participating laboratory. The line displays the interquartile range (lower quartile to upper quartile). The performance of the 16Ss laboratories in detecting Enterobacter spp. using different amplification regions (B) and in identifying Enterobacter spp. using different reference databases (C). (D) More accurate results were obtained by reanalyzing the raw data of the sample 201901 reported by several laboratories (P2, P25 and P24). The raw data of P2 were reanalyzed by P2 and P10 with pipelines including the SILVA database. The raw data of P25 were reanalyzed using the EzBioCloud database. The raw data of P24 were reanalyzed by changing the annotation algorithm. The correlations between the observed microbial abundances and the expected microbial abundances were calculated, and the corresponding Spearman's R and P values are shown above the histograms.
Fig. 5Variations detected in the fecal sample 201903. Violin plot showing the results reported by 16Ss (A) and SMs (B) laboratories for the dominant 4 phyla. PCA analysis was applied for the evaluation of variations introduced by microbial cell wall disruption methods (C) and reference databases (D) in 16Ss laboratories and taxonomic classifiers (E) and microbial cell wall disruption methods (F) in SMs laboratories. For 16Ss, the results of bead-beating strategies could be clearly distinguished from those of enzymatic lysis methods (C) and the results generated by the SILVA database and those by the Greengenes database also tended to cluster into distinct groups (D). For SMs, the results generated by two taxonomic classifiers (MetaPhlAn and Kraken) were visibly discriminated (E) and the results produced using bead-beating methods are clustered together (F).
Effect sizes and explained variances of the main factors assessed by PERMANOVA analysis.
| Factors | Sums Of Sqs | Mean Sqs | F. Model | R2 | ||
|---|---|---|---|---|---|---|
| 16S rRNA gene sequencing | Databases | 1.91 | 0.38 | 2.67 | 0.40 | 0.00 |
| Microbial cell wall-breaking methods | 0.84 | 0.42 | 2.45 | 0.18 | 0.01 | |
| Classifiers | 1.40 | 0.23 | 1.31 | 0.29 | 0.11 | |
| DNA extraction kits | 1.25 | 0.21 | 1.13 | 0.26 | 0.27 | |
| Sequencers | 0.78 | 0.19 | 1.02 | 0.16 | 0.44 | |
| Primers (amplified regions) | 0.81 | 0.16 | 0.81 | 0.17 | 0.79 | |
| Shotgun metagenomic sequencing | Classifiers | 2.96 | 0.74 | 4.69 | 0.51 | 0.00 |
| Databases | 0.78 | 0.78 | 3.28 | 0.13 | 0.00 | |
| DNA ectraction kits | 2.49 | 0.36 | 1.62 | 0.43 | 0.01 | |
| Microbial cell wall-breaking methods | 0.84 | 0.42 | 1.69 | 0.14 | 0.04 | |
| Sequencers | 1.81 | 0.30 | 1.21 | 0.31 | 0.17 | |