| Literature DB >> 30511783 |
Roxanne R Zascavage1,2, Kelcie Thorson1,3, John V Planz1.
Abstract
Mitochondrial DNA sequence data are often utilized in disease studies, conservation genetics and forensic identification. The current approaches for sequencing the full mtGenome typically require several rounds of PCR enrichment during Sanger or MPS protocols followed by fairly tedious assembly and analysis. Here we describe an efficient approach to sequencing directly from genomic DNA samples without prior enrichment or extensive library preparation steps. A comparison is made between libraries sequenced directly from native DNA and the same samples sequenced from libraries generated with nine overlapping mtDNA amplicons on the Oxford Nanopore MinION™ device. The native and amplicon library preparation methods and alternative base calling strategies were assessed to establish error rates and identify trends of discordance between the two library preparation approaches. For the complete mtGenome, 16 569 nucleotides, an overall error rate of approximately 1.00% was observed. As expected with mtDNA, the majority of error was detected in homopolymeric regions. The use of a modified basecaller that corrects for ambiguous signal in homopolymeric stretches reduced the error rate for both library preparation methods to approximately 0.30%. Our study indicates that direct mtDNA sequencing from native DNA on the MinION™ device provides comparable results to those obtained from common mtDNA sequencing methods and is a reliable alternative to approaches using PCR-enriched libraries.Entities:
Keywords: MinION; Nanopore; Sequencing; mtDNA
Mesh:
Substances:
Year: 2018 PMID: 30511783 PMCID: PMC6590251 DOI: 10.1002/elps.201800083
Source DB: PubMed Journal: Electrophoresis ISSN: 0173-0835 Impact factor: 3.535
Depth of coverage of the mitochondrial genome was assessed for whole genome DNA libraries and PCR enriched DNA libraries
| Native DNA samples | PCR enriched DNA samples | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sample | Input DNA (in ng) | Reads | Mapped reads | Average coverage | Coverage range | Input DNA (in ng) | Reads | Mapped reads | Average coverage | Coverage range |
| HL60 | 212 | 307 277 | 888 | 118 | 71–139 | 0.4 | 314 567 | 227 279 | 20 417 | 3213—47 838 |
| 101 | 144.8 | 297 624 | 429 | 92 | 47–113 | 0.4 | 6135 | 6105 | 686 | 52–2042 |
| 102 | 229.5 | 96 710 | 281 | 39 | 23–49 | 0.4 | 2557 | 2550 | 295 | 54–784 |
| 103 | 207 | 160 337 | 390 | 69 | 42–82 | 0.4 | 4357 | 4344 | 507 | 123–1274 |
| 433 | 176.3 | 222 357 | 490 | 33 | 15–44 | 0.4 | 3431 | 3423 | 399 | 74–1004 |
| 441 | 259.5 | 258 867 | 230 | 37 | 20–52 | 0.4 | 6178 | 6114 | 743 | 180–1555 |
| 442 | 237 | 235 930 | 280 | 70 | 42–82 | 0.4 | 6835 | 6767 | 813 | 201–1789 |
| 449 | 204 | 179 255 | 184 | 15 | 8–20 | 0.4 | 2874 | 2852 | 334 | 80–812 |
| 459 | 234 | 175 492 | 230 | 24 | 12–33 | 0.4 | 4393 | 4365 | 487 | 75–1414 |
*Reads reflect the number of reads to pass the quality filter of the basecaller for a particular sample. This is not the total number of reads generated by the sequencing run. For the barcoded PCR enriched samples, this is the number of reads represented by a given barcode.
Number of nucleotide differences observed and percent concordance between mitochondrial DNA sequences generated from native libraries and PCR‐amplified libraries
| Sample | Number of differences (out of 16569 nt) | Percent concordance |
|---|---|---|
| HL60 | 77 | 99.54% |
| 101 | 59 | 99.64% |
| 102 | 105 | 99.36% |
| 103 | 65 | 99.61% |
| 433 | 89 | 99.46% |
| 441 | 147 | 99.11% |
| 442 | 118 | 99.28% |
| 449 | 165 | 99.00% |
| 459 | 99 | 99.40% |
|
|
|
|
Figure 1The types of differences observed between mitochondrial DNA sequences generated from native libraries and PCR‐amplified libraries were classified. Color coding is as follows: Green represents errors in a homopolymeric stretch (single nucleotide repeated 3 or more times. Example: CCCCC. Blue represents a disagreement between the base called from one strand to another. Example A/T. Red represents errors in a dinucleotide repeat region (Two nucleotides repeated at least twice. Example: ATAT). Yellow a single nucleotide outside of homopolymer regions present in one consensus sequence but not the other. Purple represents a single nucleotide repeat error (A single nucleotide repeated twice in one sequence but only once in the other. Example: GN/GG). Orange represents other errors, including a misalignment before or after a homopolymeric stretch, or a single varying nucleotide located within a homopolymeric stretch that was misaligned. Examples: AAATTT/AAAATT; AAAATAAAA/AAATAAAAA.
Evaluation of efficiency of homopolymeric correction algorithm
| HL60 native | HL60 enriched | |||
|---|---|---|---|---|
| Original | Corrected | Original | Corrected | |
| Number of differences (out of 16 569 nt) | 165 | 63 | 151 | 50 |
| Percent concordance | 99.00% | 99.62% | 99.09% | 99.70% |
| Error rate | 1.00% | 0.38% | 0.91% | 0.30% |
The control sample HL60 was rebasecalled using ONT's new homopolymeric algorithm, and consensus sequences were compared to the reference. Overall, 102 incorrectly called nucleotides(nt) from the native sequencing were corrected, and 101 from the enriched sequencing were corrected. In each case, this improved the accuracy of the sequencing platform by approximately 66%.