| Literature DB >> 32459850 |
Kathryn Dumschott1,2, Maximilian H-W Schmidt1,2, Harmeet Singh Chawla3, Rod Snowdon3, Björn Usadel1,2,4.
Abstract
DNA sequencing was dominated by Sanger's chain termination method until the mid-2000s, when it was progressively supplanted by new sequencing technologies that can generate much larger quantities of data in a shorter time. At the forefront of these developments, long-read sequencing technologies (third-generation sequencing) can produce reads that are several kilobases in length. This greatly improves the accuracy of genome assemblies by spanning the highly repetitive segments that cause difficulty for second-generation short-read technologies. Third-generation sequencing is especially appealing for plant genomes, which can be extremely large with long stretches of highly repetitive DNA. Until recently, the low basecalling accuracy of third-generation technologies meant that accurate genome assembly required expensive, high-coverage sequencing followed by computational analysis to correct for errors. However, today's long-read technologies are more accurate and less expensive, making them the method of choice for the assembly of complex genomes. Oxford Nanopore Technologies (ONT), a third-generation platform for the sequencing of native DNA strands, is particularly suitable for the generation of high-quality assemblies of highly repetitive plant genomes. Here we discuss the benefits of ONT, especially for the plant science community, and describe the issues that remain to be addressed when using ONT for plant genome sequencing.Entities:
Keywords: zzm321990 de novo assembly; Basecalling; MinION flow cell; Oxford Nanopore; gene annotation; third-generation sequencing
Mesh:
Year: 2020 PMID: 32459850 PMCID: PMC7501810 DOI: 10.1093/jxb/eraa263
Source DB: PubMed Journal: J Exp Bot ISSN: 0022-0957 Impact factor: 6.992
Plant species sequenced using the ONT platform
| Plant species | Genome size/N50 | Sequencing technology | Assembler | Reference |
|---|---|---|---|---|
|
| 119.5 Mbp/N50 12.3 Mbp (contig) | Illumina, ONT | Canu, Miniasm, Pilon |
|
|
| 116.9 Mbp/N50 155.5 kbp (contig) 17.3 Mbp (scaffold) (Bonn strain); 122.9 Mbp/N50 1.8 Mbp (contig) (Oxford strain) | ONT, Hi-C, Illumina (Bonn strain); ONT, Illumina (Oxford strain) | MaSuRCA, Pilon, HiRise (Bonn strain); Miniasm, Racon, Pilon (Oxford strain) | F.W. |
|
| 132.8 Mbp/N50 1.7 Mb (contig) | ONT, Illumina | Miniasm, Racon, Pilon | |
|
| 138.49 Mbp/N50 3.34 Mbp (contig), 7.68 (scaffold) | ONT, Hi-C | Miniasm; Proximo (for Hi-C data) |
|
| 139.7 Mbp/N50 2.9 Mbp (contig) | Illumina, ONT | Miniasm, Racon, Pilon |
| |
|
| 317 Mbp/N50 357 kbp (scaffold), 277 kbp (contig) | Illumina, Illumina Mate Pairs, ONT | MaSuRCA, SSPACE, GapCloser, |
|
|
| 367 Mbp/N50 1.6 Mbp (scaffold) | ONT, 10× Genomics | Supernova, Canu |
|
|
| 370 Mbp/N50 36.65 Mbp (scaffold) | Illumina, ONT, Hi-C | MaSuRCA, HiRise |
|
|
| 377 Mbp/N50 1.72 Mbp (scaffold), N50 1.63 Mbp (contig) | ONT, Illumina | MaSuRCA, Flye |
|
|
| 386.5 Mbp N50 6.32 Mbp (contig) (Basmati 334); 383.6 Mbp/N50 10.53 Mbp (contig) (Dom Sufid) | ONT, Illumina | Canu, Fly, Medaka, Pilon |
|
|
| 451 Mbp/N50 9.88 Mbp (scaffold), 7.11 Mbp (contig) | ONT, PacBio, Illumina, Bionano optical mapping | Canu, Falcon (for PacBio data only), Pilon, Bionano Solve |
|
|
| 485 Mbp/N50 3.2 Mbp (contig) | ONT, Illumina | Canu, Racon, Pilon |
|
|
| 536.5 Mbp/N50 16.43 Mbp (scaffold), N50 4.34 Mbp (contig) | ONT, Illumina, Bionano, Hi-C | Canu, wtdbg, Pilon |
|
|
| 547 Mbp/N50 31.49 Mbp (scaffold), 1.36 Mbp (contig) | ONT, Illumina short read, Hi-C | MaSuRCA, HiRise |
|
|
| 594.87 Mbp/N50 3.23 Mb | ONT, Illumina | MaSuRCA |
|
|
| 630 Mbp N50 29.5 Mbp (scaffold), 7.3 Mbp (contig) | Illumina, ONT, Bionano | Ra, (SMARTdenovo, wtdbg), Racon, Pilon, Bionano Solve and Access |
|
|
| 529 Mbp/N50 15.4 Mbp (scaffold), 3.8 Mbp (contig) | |||
|
| 587 Mbp/N50 36.8 Mbp (scaffold), 4.0 Mbp (contig) | |||
|
| 665 Mbp/N50 1.86 Mbp (scaffold), 15.13 kbp (contig) | Illumina, ONT, Illumina Mate-Pair | PLATANUS, SSPACE, GapCloser |
|
|
| 710.15 Mbp/N50 2.19 Mbp (scaffold) | ONT, Illumina, 10× Genomics, Hi-C | Canu, Pilon; LACHESIS (for Hi-C data) | S.F. |
|
| 725.2 Mbp/N50 4.75 Mbp (contig) | ONT, Illumina, Hi-C | Canu, Pilon; LACHESIS (for Hi-C data) |
|
|
| 733.3 Mbp/N50 1.56 Mbp (contig) | |||
|
| 732 Mbp/N50 33.28 Mbp (scaffold), 3.05 Mbp (contigs) | Illumina, ONT, Bionano | Canu, SMARTdenovo, Pilon, Nanopolish, Bionano |
|
|
| 748 Mbp (1.39 Gbp F1 hybrid)/N50 742 kbp (contig) (172 kbp for F1 hybrid) | Illumina, PacBio, ONT | Miniasm, Racon, Pilon |
|
|
| 760.1 Mbp/N50 39.7 (scaffold) | ONT, Illumina, Hi-C | Canu, SMARTdenovo, Racon, Pilon; BWA and LACHESIS (for Hi-C data) |
|
|
| 843.2 Mbp N50 84.4 Mbp (scaffold) | ONT, Illumina, Hi-C | Canu, SMARTdenovo, Pilon; LACHESIS, SLR, SALSA (for Hi-C data) |
|
|
| 1.0 Gbp/N50 2.45 Mbp (contig) | Illumina, ONT | Canu, SMARTdenovo, Pilon |
|
|
| 2.53 Gbp/N50 130.7 kbp (contig) | Illumina, ONT | Canu, SMARTdenovo, Pilon |
|
Fig. 1.ONT offers a variety of important advantages to the wider plant genomics community.
Fig. 2.From plant tissue to genome assembly: the main steps in ONT sequencing. Optimizing each step can significantly increase the sequencing output and assembly quality.
Current challenges and solutions when using ONT to sequence plant genomes
| Challenge | Potential solutions |
|---|---|
| Low DNA quality and quantity | Test multiple extraction protocols and optimize for each plant species. |
| Short read contamination | Removal of short and medium-sized fragments using BluePippin Prep or Circulomics Short Read Eliminator kits, the latter being easier to use. |
| Basecalling speed and computational requirements | PromethION includes the hardware needed for fast basecalling. MinION basecalling time can be significantly reduced by using GPUs. |
| Long assembly computation time | Newer assemblers can significantly reduce computational time (e.g. wtdbg2). |
| Remaining uncorrectable base errors | Additional Illumina sequencing and polishing is currently required ( |
| Assembly is not (near) chromosome scale | Additional techniques such as optical mapping or Hi-C can be used to order and place contigs and obtain (near) chromosome-scale assemblies, at least for small and medium-sized plant genomes. |
| Genome structural and functional annotation | For structural annotation, long-read technology can be used with programs such as Stringtie2 ( |
Fig. 3.Difference in read lengths between an untreated sample and a sample treated with the Circulomics Short Read Eliminator kit. DNA was extracted from rapeseed (Brassica napus) and sequenced on an ONT MinION (image created using NanoComp by De Coster .