| Literature DB >> 26763898 |
Rosalinda D'Amore1, Umer Zeeshan Ijaz2, Melanie Schirmer3, John G Kenny4, Richard Gregory5, Alistair C Darby6, Migun Shakya7, Mircea Podar8, Christopher Quince9, Neil Hall10.
Abstract
BACKGROUND: In the last 5 years, the rapid pace of innovations and improvements in sequencing technologies has completely changed the landscape of metagenomic and metagenetic experiments. Therefore, it is critical to benchmark the various methodologies for interrogating the composition of microbial communities, so that we can assess their strengths and limitations. The most common phylogenetic marker for microbial community diversity studies is the 16S ribosomal RNA gene and in the last 10 years the field has moved from sequencing a small number of amplicons and samples to more complex studies where thousands of samples and multiple different gene regions are interrogated.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26763898 PMCID: PMC4712552 DOI: 10.1186/s12864-015-2194-9
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Composition of synthetic communities with biological classification of the organisms as well as the proportions used for UM community in this study
| ID | Genome name | Genome size (bp) | Domain | Phylum | Class | Proportion UM community (%) |
|---|---|---|---|---|---|---|
| ACI_CAP |
| 4127356 | Bacteria | Acidobacteria | Acidobacteriae | 8.1 |
| AKK_MUC |
| 2664102 | Bacteria | Verrucomicrobia | Verrucomicrobiae | 0.9 |
| ANE_THE |
| 2919718 | Bacteria | Firmicutes | Clostridia | 1.2 |
| BAC_THE |
| 6293399 | Bacteria | Bacteroidetes | Bacteroidia | 0.2 |
| BAC_VUL |
| 5163189 | Bacteria | Bacteroidetes | Bacteroidia | 0.9 |
| BOR_BRO |
| 5339179 | Bacteria | Proteobacteria | Betaproteobacteria | 9.2 |
| BUR_XEN |
| 973113 | Bacteria | Proteobacteria | Betaproteobacteria | 2.6 |
| CAL_SAC |
| 2970275 | Bacteria | Firmicutes | Clostridia | 2 |
| CHL_TEP |
| 2154946 | Bacteria | Chlorobi | Chlorobia | 0.5 |
| CHL_LIM |
| 2763181 | Bacteria | Chlorobi | Chlorobia | 0.4 |
| CHL_PHA226 |
| 3133902 | Bacteria | Chlorobi | Chlorobia | 1.9 |
| CHL_PHA265 |
| 1966858 | Bacteria | Chlorobi | Chlorobia | 0.3 |
| CHL_AUR |
| 5258541 | Bacteria | Chloroflexi | Chloroflexi | 0.9 |
| CLO_THE |
| 3843301 | Bacteria | Firmicutes | Clostridia | 0.6 |
| DEI_RAD |
| 3284156 | Bacteria | Thermi | Deinococci | 1.7 |
| DES_DES |
| 2873437 | Bacteria | Proteobacteria | Deltaproteobacteria | 1.4 |
| DES_PIG |
| 2826240 | Bacteria | Proteobacteria | Deltaproteobacteria | 3.1 |
| DIC_TUR |
| 1855560 | Bacteria | Dictyoglomi | Dictyoglomia | 3.5 |
| ENT_FAE |
| 3359974 | Bacteria | Firmicutes | Bacilli | 4.3 |
| FUS_NUC |
| 2174500 | Bacteria | Fusobacteria | Fusobacteria | 0.3 |
| GEM_AUR |
| 4636964 | Bacteria | Gemmatimonadetes | Gemmatimonadetes | 0.7 |
| HER_AUR |
| 6785430 | Bacteria | Chloroflexi | Chloroflexi | 1.8 |
| HYD_Y04AAS1 |
| 1559514 | Bacteria | Aquificae | Aquificae | 1.1 |
| LEP_CHO |
| 4909403 | Bacteria | Proteobacteria | Betaproteobacteria | 1.8 |
| NIT_EUR |
| 2812094 | Bacteria | Proteobacteria | Betaproteobacteria | 4.3 |
| NOS_PCC7120 |
| 7211789 | Bacteria | Cyanobacteria | unclassified | 2.7 |
| PEL_PHA |
| 3018238 | Bacteria | Chlorobi | Chlorobia | 0.1 |
| PER_MAR |
| 2467104 | Bacteria | Aquificae | Aquificae | 5.5 |
| POR_GIN |
| 2354886 | Bacteria | Bacteroidetes | Bacteroidia | 0.2 |
| RHO_BAL |
| 7145576 | Bacteria | Planctomycetes | Planctomycetacia | 1 |
| RHO_RUB |
| 4406557 | Bacteria | Proteobacteria | Alphaproteobacteria | 1.2 |
| RUE_POM |
| 4601053 | Bacteria | Proteobacteria | Alphaproteobacteria | 0.6 |
| SAL_ARE |
| 5786361 | Bacteria | Actinobacteria | Actinobacteria | 0.5 |
| SAL_TRO |
| 5183331 | Bacteria | Actinobacteria | Actinobacteria | 1.6 |
| SHE_BAL_OS185 |
| 5312910 | Bacteria | Proteobacteria | Gammaproteobacteria | 3.1 |
| SHE_BAL_OS223 |
| 5358884 | Bacteria | Proteobacteria | Gammaproteobacteria | 1.4 |
| SUL_EE.36 |
| 3547243 | Bacteria | Proteobacteria | Alphaproteobacteria | 2 |
| SUL_NAS.14.1 |
| 4002069 | Bacteria | Proteobacteria | Alphaproteobacteria | 4.3 |
| SUL_YO3AOP1 |
| 1838442 | Bacteria | Aquificae | Aquificae | 1.6 |
| SUL_YEL |
| 1534471 | Bacteria | Aquificae | Aquificae | 2.6 |
| THE_PSE |
| 2362816 | Bacteria | Firmicutes | Clostridia | 0.8 |
| THE_NEA |
| 1884562 | Bacteria | Thermotogae | Thermotogae | 0.7 |
| THE_PET |
| 1824357 | Bacteria | Thermotogae | Thermotogae | 1 |
| THE_RQ2 |
| 877693 | Bacteria | Thermotogae | Thermotogae | 3.4 |
| THE_THE |
| 2116056 | Bacteria | Thermi | Thermi | 0.5 |
| TRE_DEN |
| 2843201 | Bacteria | Spirochaetes | Spirochaetes | 0.2 |
| TRE_VIN |
| 2512734 | Bacteria | Spirochaetes | Spirochaetes | 0.2 |
| ZYM_MOB |
| 2223497 | Bacteria | Proteobacteria | Alphaproteobacteria | 0.8 |
| ARC_FUL |
| 2178400 | Archaea | Euryarchaeota | Archaeoglobi | 0.3 |
| IGN_HOS |
| 1297538 | Archaea | Crenarchaeota | Thermoprotei | 1.2 |
| MET_JAN |
| 1664970 | Archaea | Euryarchaeota | Methanococci | 0.9 |
| MET_MAR_C5 |
| 1780761 | Archaea | Euryarchaeota | Methanococci | 0.4 |
| MET_MAR_S2 |
| 1661137 | Archaea | Euryarchaeota | Methanococci | 0.5 |
| NAN_EQU |
| 490885 | Archaea | Nanoarchaeota | Nanoarchaea | 1 |
| PYR_AER |
| 2222430 | Archaea | Crenarchaeota | Thermoprotei | 0.5 |
| PYR_CAL |
| 2009313 | Archaea | Crenarchaeota | Thermoprotei | 2.6 |
| PYR_HOR |
| 1738505 | Archaea | Euryarchaeota | Thermococci | 1.9 |
| SUL_TOK |
| 2694756 | Archaea | Crenarchaeota | Thermoprotei | 0.7 |
Fig. 1Experimental design. (a) Design of single and dual-index sequencing strategy and schematic describing the 3 amplicon designs: Fusion Primer Design (A) is a one step PCR which uses a single 12-nt error-correcting Golay index sequence (blue) allowing a high multiplexing capability. Tag tailed design (B) is a 2-step PCR which uses a universal primer for the first step and a dual index barcoded primer set in the second step. Standard Illumina Nextera 8-nt index sequences were used (pink Index 5; blue Index 7). The Pac Bio Ligate Adapters design (C): Two harpin adapters (grey) were ligated to a barcoded template (BF forward barcode; BR reverse barcode) to allow multiplexing. (b) Platform Specific Amplicon Libraries: Illumina paired-end sequencing (1,2) generates 2 sequencing reads (R1 and R2) per each cluster and can have single (Standard/Golay) or dual indexes (I5, I7). Ion Torrent and 454 (3) have a single read for each bead with a single index (MID). Pacific Bioscience generate a single circular read for each molecule (SMRT bell) and can have one (BF or BR) or two indexes. The starting point and direction of sequencing reads are indicated by a solid blue line and arrows, respectively. In the case of Fusion Primer Design custom sequencing primer were used
Fig. 2Schematic representation of the combination of primers covering the 16S rRNA hypervariable regions and the sequencing platform used in this study
Experimental conditions assessed in this study on IT, MS, 454 FLX + and PB sequencing platforms
| Synthetic community | Experimental design | Nm cycles | Template concentration (ng) | Sequencing platform |
|---|---|---|---|---|
| EM/UM | Fusion primer | 25 | 1,5,10 | MS |
| EM | Fusion primer | 25 | 2,5 | IT, 454 |
| EM | Universal tailed tag | 5+15 | 1,10 | MS |
| EM | Universal tailed tag | 8+15 | 1,10 | MS |
| EM/UM | Universal tailed tag | 10+15 | 2,5 | MS |
| EM/UM | Adapter ligation | 25 | 500–750 | PB |
The error-correcting multiplex identifier sequences used with MS technology. A 12 bp reverse index used for unidirectional tagging in the fusion primer approach (F)
| Name | Type | Design | Sequence |
|---|---|---|---|
| 806rcbc0 | Golay | F | TCCCTTGTCTCC |
| 806rcbc1 | Golay | F | ACGAGACTGATT |
| 806rcbc2 | Golay | F | GCTGTACGGATT |
| 806rcbc3 | Golay | F | ATCACCAGGTGT |
| 806rcbc4 | Golay | F | TGGTCAACGATA |
| 806rcbc5 | Golay | F | ATCGCACAGTAA |
| 806rcbc6 | Golay | F | GTCGTGTAGCCT |
| 806rcbc7 | Golay | F | AGCGGAGGTTAG |
| 806rcbc8 | Golay | F | ATCCTTTGGTTC |
| 806rcbc9 | Golay | F | TACAGCGCATAC |
| 806rcbc10 | Golay | F | ACCGGTATGTAC |
PCR primers used in this study
| Primer name | Platform | Library design | Variable region | Sequence |
|---|---|---|---|---|
| 454_27YMF | 454 | F | V1-V3 | CCATCTCATCCCTGCGTGTCTCCGACTCAG |
| 454_515R | CCTATCCCCTGTGTGCCTTGGCAGTCTCAG | |||
| 454_F341 | 454 | F | V3-V4 | CCATCTCATCCCTGCGTGTCTCCGACTCAG |
| 454_816R1 | CCTATCCCCTGTGTGCCTTGGCAGTCTCAG | |||
| 454_F515 | 454 | F | V4-V5 | CCATCTCATCCCTGCGTGTCTCCGACTCAG |
| 454_926R | CCTATCCCCTGTGTGCCTTGGCAGTCTCAG | |||
| 454_F515 | 454 | F | V4-V6 | CCATCTCATCCCTGCGTGTCTCCGACTCAG |
| 454_1061R | CCTATCCCCTGTGTGCCTTGGCAGTCTCAG | |||
| 454_F515 | 454 | F | V4 | CCATCTCATCCCTGCGTGTCTCCGACTCAG |
| 454_816R1 | CCTATCCCCTGTGTGCCTTGGCAGTCTCAG | |||
| 454_F515A | 454 | F | V4A | CCATCTCATCCCTGCGTGTCTCCGACTCAG |
| 454_805RA | CCTATCCCCTGTGTGCCTTGGCAGTCTCAG | |||
| 454_F787 | 454 | F | V5-V9 | CCATCTCATCCCTGCGTGTCTCCGACTCAG |
| 454_1492R | CCTATCCCCTGTGTGCCTTGGCAGTCTCAG | |||
| 1Round515For | MS | DI | V4 | CTACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNN |
| 1Round806Rev | GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
| 1RounN515F | MS | DI | V4 | CTACACTCTTTCCCTACACGACGCTCTTCCGATCT |
| 1Round806R | GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
| 1Round341For | MS | DI | V4 | CTACACTCTTTCCCTACACGACGCTCTTCCGATCT |
| 1Round805RARev | GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
| 1Round515AFor | MS | DI | V4 | CTACACTCTTTCCCTACACGACGCTCTTCCGATCT |
| 1Round805RARev | GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
| DI_N5XXFor | MS | DI | V4 | AATGATACGGCGACCACCGAGATCTACACxxxxxxxx |
| DI_N7xxRev | CAAGCAGAAGACGGCATACGAGATxxxxxxxx | |||
| FG515for | MS | F | V4 | AATGATACGGCGACCACCGAGATCTACACTATGGTAATTGT |
| FG8xxrev | CAAGCAGAAGACGGCATACGAGATxxxxxxxxxx | |||
| Read 1 Seq Primer | MS | F | V4 | TATGGTAATTGTGTGCCAGCMGCCGCGGTAA |
| Read 2 Seq Primer | MS | F | V4 | AGTCAGTCAGCCGGACTACHVGGGTWTCTAAT |
| Index Seq Primer | MS | F | V4 | ATTAGAWACCCBDGTAGTCCGGCTGACTGACT |
| 454_F341 | IT | F | V3-V4 | CCATCTCATCCCTGCGTGTCTCCGACTCAGCxxxxxxxxxx |
| TtP1_Kn805rev | CCTCTCTATGGGCAGTCGGTGATGGACTACHVGGGTWTCTAAT | |||
| 454_F515A | IT | F | V4 | CCATCTCATCCCTGCGTGTCTCCGACTCAGxxxxxxxxxx |
| TtP1_Kn805rev | CCTCTCTATGGGCAGTCGGTGATG | |||
| 454_F515 | IT | F | V4 | CCATCTCATCCCTGCGTGTCTCCGACTCAGACATACGCGTGTGNCAGCMGCCGCGGTAA |
| TtP1_Kn806rev | CCTCTCTATGGGCAGTCGGTGATG | |||
| PBv1F | PB | LA | V1-V9 | ggtagxxxxxxxxxxxxxxxx |
| PBv9R | ccatcxxxxxxxxxxxxxxxx |
The variable region primer sequence is displayed in bold. The position of the multiplex identifier (MID) is shown as [x] and the respective sequences are shown in Tables 3, 5, 6, and 7. Degenerated bases in the sequence are represented as follows: M: C or A; B: not A; Y: C or T; R: A or G; W: A or T; H: not G; K: G or T; V: not T
Unique barcode adaptors specifically designed and validated for optimal performance with Illumina technology
| Name | Type | Design | Sequence |
|---|---|---|---|
| 501 | Illumina | DI | TAGATCGC |
| 502 | Illumina | DI | CTCTCTAT |
| 503 | Illumina | DI | TATCCTCT |
| 504 | Illumina | DI | AGAGTAGA |
| 505 | Illumina | DI | GTAAGGAG |
| 506 | Illumina | DI | ACTGCATA |
| 507 | Illumina | DI | AAGGAGTA |
| 508 | Illumina | DI | CTAAGCCT |
| 701 | Illumina | DI | TCGCCTTA |
| 702 | Illumina | DI | CTAGTACG |
| 703 | Illumina | DI | TTCTGCCT |
| 704 | Illumina | DI | GCTCAGGA |
| 705 | Illumina | DI | AGGAGTCC |
| 706 | Illumina | DI | CATGCCTA |
| 709 | Illumina | DI | AGCGTAGC |
| 710 | Illumina | DI | CAGCCTCG |
| 711 | Illumina | DI | CAGCCTCG |
An 8 bp reverse Index (I7) and forward index (I5) used in the universalTailed Tag design (DI) to barcode the reads in both directions
Unique barcode adaptors specifically designed and validated for optimal performance with IT, 454 FLX and FLX+ sequencing technologies
| Name | Type | Design | Sequence |
|---|---|---|---|
| TC20 | 454 | F | ACGACTACAG |
| TC21 | 454 | F | CGTAGACTAG |
| TC22 | 454 | F | TACGAGTATG |
| TC23 | 454 | F | TACTCTCGTG |
| TC24 | 454 | F | TAGAGACGAG |
| TC25 | 454 | F | TCGTCGCTCG |
| TC26 | 454 | F | ACATACGCGT |
A forward 10 bp MID was used in the fusion approach (F) to tag the reads in forward direction
Unique barcode adaptors specifically designed and validated for optimal performance with PB sequencing technology. A forward and a reverse 16 bp MID was used in the Ligation approach (LA) to tag the reads in both directions
| Name | Type | Design | Sequence |
|---|---|---|---|
| F12 | PB | LA | CGCATCGACTACGCTA |
| R13 | PB | LA | TGAGTAGCATGACACG |
| R14 | PB | LA | GACATGCAGTCTCACA |
| R15 | PB | LA | CAGTAGCGCACTGAGC |
| R16 | PB | LA | CTGCGTGCGCGATAGT |
| R17 | PB | LA | CGCGTGCAGAGTGTCA |
| R18 | PB | LA | ATATCAGTCACGTCTG |
Experimental design parameters for MS EM datasets
| Region | Amplicon design method | Primer (f) | Primer (r) | Input (ng) | PCR cycle no. | Taq | No. |
|---|---|---|---|---|---|---|---|
| V4 | DI | 515 | 805RA | 2 | 12 + 18 | HF | 1 |
| V4 | DI | 515 | 806rcb | 2 | 12 + 18 | HF | 1 |
| V4 | DI | 515 | 806rcb | 2 | 10 + 15 | HF | 3 |
| V4 | DI | F515A | 806rcb | 2 | 8 + 15 | HF | 1 |
| V4 | DI | F515A | 806rcb | 2 | 8 + 15 | Q5 | 3 |
| V4 | DI | F515A | 806rcb | 2 | 10 + 15 | HF | 3 |
| V4 | FG | 515 | 806rcb | 1 | 25 | HF | 3 |
| V4 | FG | 515 | 806rcb | 5 | 15 | Q5 | 2 |
| V4 | FG | 515 | 806rcb | 5 | 25 | Q5 | 2 |
| V4 | FG | 515 | 806rcb | 5 | 25 | HF | 1 |
| V4 | FG | 515 | 806rcb | 10 | 15 | HF | 1 |
| V4 | FG | 515 | 806rcb | 10 | 25 | HF | 2 |
| V3-V4 | DI | 341f | 806rcb | 2 | 10 + 15 | HF | 3 |
| V3-V4 | DI | 341f | 805RA | 2 | 10 + 15 | HF | 3 |
Fig. 3a Error rates across four different platforms. Platform had a significant impact on error rate (Kruskal-Wallis comparing MS, 454, IT and PB ROI non-parametric ANOVA p=0.015) as did number of CCS cycles for PB (p=0.016). b Percentage of reads not matching across the four different platforms. Platform had a significant impact on percentage matching (Kruskal-Wallis non-parametric ANOVA p=0.0001) as did number of CCS cycles for PB (p=0.016)
Fig. 4a Impact of overlapping reads on MS error rates for the DI library preparation method. Overlapping reads significantly reduced error rates for the DI library preparation method (t-test comparing forward [mean 1.38 %] and overlapped error rates [0.13 %] p=0.00016). b Impact of overlapping reads on MS error rates for the FG library preparation method. Overlapping reads did not significantly reduce error rate for the FG library preparation method (t-test comparing forward [mean 0.50 %] and overlapped error rates [0.42 %] p=0.36). It is also worth mentioning here that not all the reads overlapped, for example, for the MS platform, and with the given settings in PANDAseq (as discussed in the main text), the statistics for the percentage of reads that were assembled successfully are: 80.93 % (1 quantile); 89.02 % (median); 81.07 % (mean); and 95.67 % (3 quantile)
Fig. 5a Impact of no. of PCR cycles on the forward MS error rate. Increasing number of cycles did increase forward error rate with marginal significance for the FG library preparation method with Q5 Taq (t-test 15 cycles [mean 0.58 %] vs 25 cycles [mean 0.64 %] p=0.11). b Impact of PCR starting amount on percentage of chimeric reads. Decreased starting amount reduced percentage of chimeras for the FG library preparation method with HiFi Taq but not significantly (t-test comparing 1 ng [mean 0.08 %] and 10 ng [mean 0.2 %] p=0.20). c Impact of no. of PCR cycles on the percentage of chimeric reads. Increasing cycle number increased the percentage of chimeric reads for the FG library preparation method with Q5 Taq (t-test 15 cycles [mean 0.00 %] vs 25 cycles [mean 0.66 %] p=0.0245)
Fig. 6a Heatmap for EM communities (showing the bacterial species) reconstructed from different platforms using a range of experimental designs for amplicons. The design parameters are shown on top (b) NMDS plot based on Bray-Curtis distance comparing the samples showns in (a)
Fig. 7Quantitative results for two EM-UM pairs (among a total of 22) for MS and PB are shown. The fitted line through the points is represented by a blue line with R-squared shown on top. The red line is the ground-truth with the slope difference from the blue line also shown on top
Species with significantly different quantification accuracies between MS and PB
| Species | Mean error MS | Mean error PB |
|
|---|---|---|---|
|
| 0.132 | 0.643 | 0.000056966 |
|
| |||
|
| 0.510 | 1.117 | 0.000592824 |
|
| |||
|
| 0.012 | 0.459 | 0.000072079 |
|
| |||
|
| 0.330 | 0.662 | 0.000132560 |
|
| |||
|
| 0.560 | 0.207 | 0.000018219 |
|
|