| Literature DB >> 28327976 |
Bonnie L Brown1, Mick Watson2, Samuel S Minot3, Maria C Rivera1, Rima B Franklin1.
Abstract
Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing or 16S amplicon sequences. Both of these approaches are limited, however, by read length, among other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥1 × 104 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex.Entities:
Keywords: Long-read sequencing; Metagenome; MinION™; Oxford Nanopore Technologies; Whole-genome sequencing
Mesh:
Substances:
Year: 2017 PMID: 28327976 PMCID: PMC5467020 DOI: 10.1093/gigascience/gix007
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Identity of single-species used in this study as determined by Sanger sequencing of 16S rDNA amplicons from different DNA preparations of each species.
| Final sequence | |||
|---|---|---|---|
| Culture | length (bp) | % | Sequence matches in BlastN organism |
|
| 1440–1696 | 98 |
|
|
| 1418 | 90 |
|
|
| 1478–1570 | 96 |
|
|
| 1431–1719 | 99 |
|
Multiple DNA preparations from bacterial cultures were used during the progress of the study, and each was tested, yielding for each strain slightly different final 16S sequence lengths, but the same BLAST matches.
Taxonomic assignment accuracy of metagenomic reads across three analysis methods.
| Accuracy of assignment to known genus (%) | |||
|---|---|---|---|
| Experiment | MG-RAST | Kraken | One Codex |
| Single species | |||
| | 74.4 | 99.5 | 98.7 |
| | 84.9 | 84.6 | 84.2 |
| | 53.1 | 85.8 | 95.1 |
| | 87.9 | 98.1 | 97.6 |
| Mixtures | |||
| Equal (5) | 65.0 | 97.6 | 87.4 |
| Equal (6) | 85.9 | 98.0 | 98.7 |
| Rare (6) | 92.9 | 99.1 | 98.7 |
15% of reads assigned to Shigella.
7–15% of reads assigned to Stenotrophomonas.
7% of reads assigned to Stenotrophomonas.
Accuracy was calculated as the proportion of reads assigned to the known input organism at the genus level out of the total number reads given any assignment at that rank.
Details of MinION™ WGS output for single-species and synthetic mixtures. Sequencing experiments used the MinION device and new R7.3 flow cells. Libraries were prepared with kit SQK–MAP005 as indicated by (5) and SQK-MAP006 chemistry, indicated by (6). Columns relating to 2D indicate bi-directional reads with quality above Q9.
| Experiment | Pores with | Run time | Total bp | Number of 2D | Mean 2D read | MG-RAST | ENA | |
|---|---|---|---|---|---|---|---|---|
| (chemistry) | reads | (h) | (Mbp) | Total reads | pass reads | length (bp) | accession | accession |
| Single species | ||||||||
| | 430 | 42 | 83.6 | 26 590 | 1112 | 5274 | 4629367.3 | ERR1713483 |
| | 453 | 48 | 119.4 | 25 228 | 777 | 7784 | 4629445.3 | ERR1713487 |
| | 377 | 18 | 40.8 | 22 760 | 569 | 5676 | 4629369.3 | ERR1713486 |
| | 367 | 23 | 18.3 | 6163 | 224 | 5101 | 4629381.3 | ERR1713489 |
| Mixtures | ||||||||
| Equal (5) | 129 | 24 | 26.5 | 10 592 | 714 | 5527 | 4614572.3 | ERR1713484 |
| Equal (6) | 437 | 44 | 77.1 | 12 174 | 1358 | 5202 | 4685746.3 | ERR1713485 |
| Rare (6) | 449 | 18 | 39.0 | 6728 | 899 | 6194 | 4685745.3 | ERR1713488 |
| Staggered (6) | 300 | 33 | 39.0 | 14 711 | 3497 | 2612 | 4705090.3 | ERR1713490 |
Runs were set to either 24 or 48 h and were allowed to continue until either sufficient sequence data were collected or until the 2D pass rate was greatly reduced.
Figure 1:Result of “What's in my pot” analysis of a mixture with equal DNA mass from four bacterial strains. Rendering of real-time analysis using WIMP [20] of WGSs from a synthetic mixture prepared from equal DNA quantities of four cultured microbe species (experiment ‘Equal’ in Tables 1 and 2) and run on the MinION™ sequencing platform. Arc angle is proportional to the number of reads assigned to the indicated species. Colors (scale at bottom of diagram) refer to the classification score threshold (for this analysis, the threshold for inclusion was 0.01).
Figure 2:PCA of normalized 5-mer frequency (i.e., percentage) within each MinION™ read for a mixture with equal DNA mass from four bacterial strains and a mixture with one rare component. (A) Sequencing run with equal DNA mass from four species. (B) sequencing run with three equally represented (33% DNA mass each) and one rare (1% DNA mass) species included in the DNA pool. None: read had no BlastN hits. Other: read had BlastN hits but not one of the four species included in the mix.
Known composition of 20-species mock staggered community compared with analysis results for WIMP and One Codex. “nd”: not detected; “–” indicates that these species are included in the genus sum shown directly above.
| Operon | Quantity | % DNA in | WIMP | WIMP | One Codex | One Codex | |
|---|---|---|---|---|---|---|---|
| Organism | count/mL | pg/mL | template | % species | % genus | % species | % genus |
|
| 10 000 | 8.2 | 0.24 | 0.14 | 0.14 | 0.29 | 0.29 |
|
| 1000 | 1 | 0.03 | nd | nd | nd | nd |
|
| 100 000 | 45 | 1.33 | 0.53 | 0.53 | 0.66 | 0.75 |
|
| 1000 | 0.8 | 0.02 | 0.1 | 0.1 | 0.07 | 0.12 |
|
| 100 000 | 44 | 1.30 | 0.19 | 0.19 | 0.29 | 0.35 |
|
| 1000 | 1 | 0.03 | 0.05 | 0.05 | 0.07 | 0.06 |
|
| 1000 | 0.7 | 0.02 | nd | nd | nd | nd |
|
| 1 000 000 | 680 | 20.04 | 45.61 | 45.66 | 52.15 | 52.52 |
|
| 10 000 | 8.6 | 0.25 | 1.68 | 1.68 | 3.43 | 2.72 |
|
| 10 000 | 3.2 | 0.09 | 0.14 | 0.14 | 0.22 | 0.23 |
|
| 10 000 | 5 | 0.15 | 0.38 | 0.38 | 0.58 | 0.52 |
|
| 10 000 | 5.8 | 0.17 | 0.24 | 0.24 | 0.44 | 0.41 |
|
| 10 000 | 8.8 | 0.26 | 0.48 | 0.48 | 0.07 | 0.64 |
|
| 100 000 | 160 | 4.71 | 1.25 | 1.25 | 3.07 | 3.18 |
|
| 1 000 000 | 1,400 | 41.25 | 1.01 | 1.01 | 1.46 | 1.27 |
|
| 100 000 | 59 | 1.74 | 0.38 | 3.88 | 1.31 | 12.74 |
|
| 1 000 000 | 510 | 15.03 | 7.67 | 7.72 | 6.65 | – |
|
| 100 000 | 32 | 0.94 | 0.96 | 1.01 | 0.95 | 16.97 |
|
| 1 000 000 | 420 | 12.38 | 10.17 | 10.17 | 19.50 | – |
|
| 1000 | 0.6 | 0.02 | nd | nd | nd | – |
| Other | 0 | 0 | 29.02 | 25.37 | 8.77 | 7.24 | |
| Correct assignments | 70.98 | 74.63 | 91.23 | 92.76 |
Theoretical copy number provided by BEI Resources certificate of analysis.
gDNA content provided by BEI Resources certificate of analysis.
Proportion of individual species within the mock community.
Of these, 12.7% were correctly assigned to genus, 86.4% were Enterobacteriaceae, and only 0.7% were misclassifications.
Of these, 86.4% were Enterobacteriaceae and only 0.7% were misclassified.
Of these, 56.8% were Shigella.
Of these, 63.3% were species of Escherichia and Shigella.
Figure 3:Log abundance of reads assigned from staggered mixture. DNA of 20 species mixed in various proportions (BEI Resources, ATCC, HM-783D, operon counts μL−1 in original mixture indicated along bottom margin of bars) was preamplified with Φ29 polymerase prior to library preparation and sequenced with MinION™ R7.3 flow cells. The 2D reads that passed quality filtering were assigned to taxa using Kraken. Colored bars are species included in the mix, whereas gray bars indicate species detected but not included in the original DNA mixture.
Figure 4:Read production using a MinION™ device and an R7.3 flow cell. Illustration of reads collected from a synthetic metagenome made with equal DNA mass from four microbias species and a library prepared using SQK–MAP006 kit. Inflections along the graph correspond to approximate times when additional aliquots of library and fuel were added.