| Literature DB >> 23579286 |
Swee Hoe Ong1, Vinutha Uppoor Kukkillaya, Andreas Wilm, Christophe Lay, Eliza Xin Pei Ho, Louie Low, Martin Lloyd Hibberd, Niranjan Nagarajan.
Abstract
The high throughput and cost-effectiveness afforded by short-read sequencing technologies, in principle, enable researchers to perform 16S rRNA profiling of complex microbial communities at unprecedented depth and resolution. Existing Illumina sequencing protocols are, however, limited by the fraction of the 16S rRNA gene that is interrogated and therefore limit the resolution and quality of the profiling. To address this, we present the design of a novel protocol for shotgun Illumina sequencing of the bacterial 16S rRNA gene, optimized to amplify more than 90% of sequences in the Greengenes database and with the ability to distinguish nearly twice as many species-level OTUs compared to existing protocols. Using several in silico and experimental datasets, we demonstrate that despite the presence of multiple variable and conserved regions, the resulting shotgun sequences can be used to accurately quantify the constituents of complex microbial communities. The reconstruction of a significant fraction of the 16S rRNA gene also enabled high precision (>90%) in species-level identification thereby opening up potential application of this approach for clinical microbial characterization.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23579286 PMCID: PMC3620293 DOI: 10.1371/journal.pone.0060811
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1In silico evaluation of 16S rRNA PCR primers.
A) Percentage of sequences matching individual primers, with the top two primers highlighted in boxes. B) Percentage of sequences amplifiable by various primer pairs (338F*/1061R is the best pair). Percentage of matched sequences is measured against the Greengenes 16S rRNA sequence database. See Table S4 in File S1 for primer sequences and results measured against the RDP and SILVA databases. Primer numbering is based on the E. coli system of nomenclature as in Brosius et al. [37] and for simplicity the same name (say 784F) is used for both forward and reverse primers at a given position.
Species- and genus-level resolution of various sequencing approaches.
| Sequencing Approach | Reads From | Species-level OTUs | Genus-level OTUs |
| End sequencing | |||
| V3 (338F*/533R*) | 5′-end | 7,388 (76%) | 4,526 (83%) |
| 3′-end | 35,763 (92%) | 27,699 ( | |
| V4 (533R*/805R) | 5′-end | 10,971 (83%) | 6,671 (88%) |
| 3′-end | 15,000 (87%) | 9,993 (92%) | |
| V5 (805R/907F) | 5′-end | 23,301 (91%) | 17,138 (96%) |
| 3′-end | 10,501 (83%) | 6,746 (89%) | |
| V6 (907F/1061R) | 5′-end | 3,701 (73%) | 2,221 (77%) |
| 3′-end |
|
| |
| Shotgun sequencing | |||
| V3–V6 (338F/1061R) | Whole amplicon | 59,378 (97%) | 34,869 (99%) |
| V3–V6 (338F*/1061R) | Whole amplicon |
|
|
| V3–V6 (341F/1061R) | Whole amplicon | 59,272 (97%) | 35,109 (99%) |
| V4–V6 (533R*/1061R) | Whole amplicon | 59,436 (97%) | 35,161 (99%) |
Resolution was measured by the number of OTUs/clusters produced using UCLUST [21] at the species (97% identity) and genus level (95% identity) for 16S rRNA sequences in the Greengenes database, based on various end-sequencing (76 bases in length from either the 5′ or 3′ end) and shotgun-sequencing approaches and primer combinations. A higher OTU/cluster number indicates a theoretical higher level of resolution for taxonomic classification. The numbers in parenthesis provide the purity of clusters as measured by the percentage of clusters with homogenous taxonomy assignments in Greengenes. Entries with the highest resolution and/or purity for each sequencing approach are marked in bold. The primer sequences can be found in Table S4 in .
Evaluation of EMIRGE, modQIIME and RTAX on different datasets.
| Method | Genus-level recall (%) | Genus-level precision (%) | Species-level recall (%) | Species-level precision (%) |
|
| ||||
| EMIRGE (33%) | 88 | 90 | 66 | 96 |
| modQIIME (93%) | 97 | 63 | 66 | 51 |
| RTAX (95%) | 88 | 88 | 61 | 68 |
|
| ||||
| EMIRGE (30%) | 84 | 95 | 69 | 100 |
| modQIIME (92%) | 92 | 82 | 71 | 94 |
| RTAX (92%) | 88 | 76 | 82 | 77 |
|
| ||||
| EMIRGE (13%) | 64 | 100 | 32 | 86 |
| modQIIME (78%) | 100 | 55 | 59 | 59 |
| RTAX (86%) | 76 | 53 | 49 | 38 |
|
| ||||
| EMIRGE (60%) | 83 | 94 | 39 | 93 |
| modQIIME (95%) | 94 | 85 | 48 | 70 |
| RTAX (96%) | 100 | 90 | 52 | 61 |
Precision and recall rates for the “Oral”, “Gut”, “Complex” and ABC33 datasets using EMIRGE, modQIIME and RTAX at a 0.1% relative abundance threshold. The percentage of sequences/OTUs removed because of the abundance threshold is given in parentheses for each method.
Figure 2Community composition based on 16S rRNA sequence reconstruction using EMIRGE.
A) Correlation between known and estimated relative abundances of predicted species on three in silico datasets. A log-scaled version of this plot can be seen in Figure S1 in . B) Composition at the phylum level for the throat swab and stool sequencing datasets.