| Literature DB >> 36015459 |
Yamkela Mgwatyu1, Stephanie Cornelissen2, Peter van Heusden3, Allison Stander1, Mary Ranketse2, Uljana Hesse1,3,4.
Abstract
While plant genome analysis is gaining speed worldwide, few plant genomes have been sequenced and analyzed on the African continent. Yet, this information holds the potential to transform diverse industries as it unlocks medicinally and industrially relevant biosynthesis pathways for bioprospecting. Considering that South Africa is home to the highly diverse Cape Floristic Region, local establishment of methods for plant genome analysis is essential. Long-read sequencing is becoming standard procedure for plant genome research, as these reads can span repetitive regions of the DNA, substantially facilitating reassembly of a contiguous genome. With the MinION, Oxford Nanopore offers a cost-efficient sequencing method to generate long reads; however, DNA purification protocols must be adapted for each plant species to generate ultra-pure DNA, essential for these analyses. Here, we describe a cost-effective procedure for the extraction and purification of plant DNA and evaluate diverse genome assembly approaches for the reconstruction of the genome of rooibos (Aspalathus linearis), an endemic South African medicinal plant widely used for tea production. We discuss the pros and cons of nine tested assembly programs, specifically Redbean and NextDenovo, which generated the most contiguous assemblies, and Flye, which produced an assembly closest to the predicted genome size.Entities:
Keywords: Canu; Flye; Haslr; MaSuRCA; Medaka; NextDenovo; Nextpolish; Oxford Nanopore; Racon; Raven; Redbean; Wengan; plant genome assembly; rooibos
Year: 2022 PMID: 36015459 PMCID: PMC9416007 DOI: 10.3390/plants11162156
Source DB: PubMed Journal: Plants (Basel) ISSN: 2223-7747
Yields and quality metrics for rooibos DNA samples from two rooibos plants generated using various DNA extraction and purification procedures.
| DNA Extraction Method | Plant | Purification Method + | 260/280 | 260/230 | Qubit (ng/µL) | Nanodrop (ng/µL) | Q:ND * | Starting Material | Total Yield (µg) | Sequencing Run |
|---|---|---|---|---|---|---|---|---|---|---|
| SDS | 1 | C:I extraction | 2.12 | 1.93 | 20.8 | 67.5 | 0.3 | 1 g leaves | 2.0 | n/a |
| CTAB | 1 | None | 2.11 | 2.01 | 358 | 1944 | 0.2 | 1 g leaves | 35.8 | 1 |
| CTAB | 1 | Zymoclean | 2.94 | 0.04 | 6.3 | 13.0 | 0.5 | 3.5 µg DNA | 0.06 | n/a |
| CTAB | 1 | Genomic-tip 1 | 2.07 | 1.91 | 102 | 1816 | 0.1 | 100 µg DNA | 2.0 | 3 |
| CTAB | 1 | Genomic-tip 2 | 2.06 | 2.22 | 347 | 1143 | 0.3 | 100 µg DNA | 6.9 | 4 |
| CTAB | 1 | DNeasy | 1.92 | 2.38 | 323 | 337 | 1.0 | 10 µg DNA | 6.4 | 2 |
| CTAB | 2 | DNeasy | 1.98 | 2.35 | 222 | 260 | 0.9 | 10 µg DNA | 4.4 | 6 |
| CTAB | 1 | DNeasy | 1.92 | 2.33 | 325 | 381 | 0.9 | 10 µg DNA | 6.5 | 7 |
| CTAB | 1 | DNeasy | 1.97 | 2.35 | 324 | 336 | 1.0 | 10 µg DNA | 6.4 | 8 |
+ Purification procedures include chloroform:isoamyl alcohol (C:I) extraction, the Zymoclean™ Large Fragment DNA Recovery Kit, the QIAGEN® DNeasy PowerClean CleanUp Kit, and the QIAGEN® Genomic-tip 500/G (Genomic-tip 1 and Genomic-tip 2). * Qubit:Nanodrop ratio.
Figure 1Agarose gel electrophoresis of 120 ng DNA extracted from rooibos leaves using either SDS or CTAB extraction protocols. Lane M: HindIII molecular weight marker; Lane 1: SDS, unpurified; Lane 2: CTAB, unpurified; Lane 3: CTAB, ZymocleanTM Large Fragment DNA Recovery Kit; Lane 4: CTAB, QIAGEN® DNeasy PowerClean CleanUp Kit; Lane 5: CTAB, Genomic-tip 1; Lane 6: CTAB, Genomic-tip 2.
Sequencing statistics for seven MinION runs with genomic DNA samples from rooibos (Quality ≥ 7).
| Run1 | Run2 | Run3 | Run4 | Run5 | Run6 | Run7 | |
|---|---|---|---|---|---|---|---|
| total gigabases (Gbp) | 1.01 | 4.56 | 6.62 | 19.08 | 11.24 | 4.26 | 2.24 |
| # reads (×1 M) | 0.30 | 1.21 | 1.56 | 4.98 | 3.91 | 4.25 | 2.30 |
| N50 length (bp) | 4372 | 6002 | 6641 | 6483 | 5642 | 1666 | 1480 |
| median length (bp) | 2656 | 2519 | 3027 | 2496 | 1607 | 569 | 587 |
| max length (bp) | 60,435 | 11,5278 | 75,142 | 79,588 | 180,920 | 116,028 | 110,672 |
| median Quality | 10 | 11.1 | 11.4 | 11.3 | 11.2 | 11.2 | 11 |
|
| |||||||
| >10 kb | 11,270 | 86,876 | 136,418 | 401,054 | 167,363 | 10,551 | 4616 |
| >20 kb | 598 | 13,964 | 17,790 | 54,087 | 6700 | 460 | 215 |
| >50 kb | 2 | 91 | 61 | 123 | 2 | 3 | 4 |
| >100 kb | 0 | 1 | 0 | 0 | 1 | 1 | 2 |
| >200 kb | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| |||||||
| 1 | 60,435 (9.0) | 115,278 (10.0) | 75,142 (10.9) | 79,588 (8.5) | 180,920 (19.6) | 116,028 (7.9) | 110,672 (14.0) |
| 2 | 57,561 (8.8) | 84,582 (9.8) | 71,014 (9.0) | 75,403 (11.6) | 135,178 (7.1) | 64,587 (14.9) | 102,055 (14.7) |
| 3 | 45,472 (10.8) | 74,614 (12.7) | 69,281 (10.6) | 74,146 (13.1) | 91,826 (7.9) | 61,155 (7.7) | 99,741 (6.7) |
| 4 | 42,770 (9.6) | 71,567 (10.0) | 65,014 (11.0) | 70,447 (9.3) | 47,153 (10.2) | 44,661 (10.3) | 95,441 (14.8) |
| 5 | 42,761 (9.6) | 71,536 (11.0) | 64,036 (10.0) | 68,961 (10.1) | 46,147 (12.4) | 39,799 (9.9) | 90,125 (14.6) |
Run1: plant 1 unpurified DNA; run2: plant 1 Genomic-tip 1; run3: plant 1 Genomic-tip 2; run4: plant 1 DNeasy; run5: plant 2 DNeasy; runs 6 and 7: oxidized plant 1 DNeasy.
Relevant assembly and BUSCO statistics for the assemblies of the rooibos genome generated using Illumina and MinION sequencing data with short-read and hybrid assembly programs.
| Assembly Parameters | Illumina | Illumina + MinION | |||
|---|---|---|---|---|---|
| Platanus | MaSuRCA | MaSuRCA | Haslr | Wengan | |
| Number of scaffolds | 78,315 | 70,462 | 29,263 | 19,723 | 12,972 |
| Largest scaffold (bp) | 154,059 | 257,349 | 1,225,954 | 77,419 | 222,008 |
| Total length (Mbp) | 693.7 | 857.9 | 1482.6 | 173.6 | 218.6 |
| N50 length (bp) | 10,871 | 17,142 | 80,888 | 10,950 | 22,199 |
| Complete BUSCOs (%) | 56.1 | 45.5 | 84.3 | 67.0 | 49.4 |
| Complete single BUSCOs (%) | 51.8 | 36.9 | 49.0 | 62.7 | 48.6 |
| Complete duplicated BUSCOs (%) | 4.3 | 5.9 | 35.3 | 4.3 | 0.8 |
| Fragmented BUSCOs (%) | 34.5 | 18.8 | 2.0 | 12.9 | 8.6 |
| Missing BUSCOs (%) | 9.4 | 35.7 | 13.7 | 20.1 | 42.0 |
Relevant assembly and BUSCO statistics for the unpolished and polished assemblies of the rooibos genome generated using the five long-read assembly programs: Flye, Canu, Raven, Redbean and NextDenovo.
| Assembly Parameters | Unpolished | Racon_Long | Medaka | Racon_Short | Nextpolish | |
|---|---|---|---|---|---|---|
| Flye | Number of contigs | 33,346 | 32,286 | 32,393 | 31,921 | 32,392 |
| Largest contig (bp) | 921,299 | 925,383 | 928,488 | 926,118 | 925,075 | |
| Total length (Mbp) | 1118.5 | 1107.2 | 1112.5 | 1107.4 | 1110.7 | |
| N50 length (bp) | 77,709 | 78,172 | 78,358 | 78,512 | 78,176 | |
| Complete BUSCOs (%) | 97.3 | 96.5 | 98.0 | 99.2 | 99.2 | |
| Complete single BUSCOs (%) | 65.9 | 71.8 | 70.2 | 65.5 | 64.3 | |
| Complete duplicated BUSCOs (%) | 31.4 | 24.7 | 27.8 | 33.7 | 34.9 | |
| Fragmented BUSCOs (%) | 2.4 | 2.7 | 1.6 | 0.8 | 0.8 | |
| Missing BUSCOs (%) | 0.3 | 0.8 | 0.4 | 0.0 | 0.0 | |
| Canu | Number of contigs | 33,477 | 33,351 | 33,357 | 33,320 | 33,356 |
| Largest contig (bp) | 434,840 | 443,413 | 444,838 | 443,863 | 443,487 | |
| Total length (Mbp) | 949.0 | 965.0 | 970.8 | 968.1 | 969.1 | |
| N50 length (bp) | 40,175 | 41,134 | 41,324 | 41,225 | 41,247 | |
| Complete BUSCOs (%) | 96.1 | 98.8 | 99.2 | 99.6 | 99.6 | |
| Complete single BUSCOs (%) | 71.4 | 72.5 | 72.5 | 57.6 | 56.9 | |
| Complete duplicated BUSCOs (%) | 24.7 | 26.3 | 26.7 | 42.0 | 42.7 | |
| Fragmented BUSCOs (%) | 2.4 | 0.0 | 0.4 | 0.0 | 0.0 | |
| Missing BUSCOs (%) | 1.5 | 1.2 | 0.4 | 0.4 | 0.4 | |
| Raven | Number of contigs | 11,675 | 11,674 | 11,674 | 11,672 | 11,674 |
| Largest contig (bp) | 760,847 | 765,889 | 767,221 | 765,347 | 764,913 | |
| Total length (Mbp) | 905.6 | 909.8 | 915.2 | 912.6 | 913.1 | |
| N50 length (bp) | 99,352 | 99,903 | 100,315 | 100,069 | 100,060 | |
| Complete BUSCOs (%) | 96.9 | 95.7 | 96.9 | 97.6 | 98.0 | |
| Complete single BUSCOs (%) | 79.6 | 76.9 | 79.6 | 69.0 | 69.4 | |
| Complete duplicated BUSCOs (%) | 17.3 | 18.8 | 17.3 | 28.6 | 28.6 | |
| Fragmented BUSCOs (%) | 1.2 | 2.4 | 1.2 | 0.8 | 0.8 | |
| Missing BUSCOs (%) | 1.9 | 1.9 | 1.9 | 1.6 | 1.2 | |
| Redbean | Number of contigs | 16,753 | 16,347 | 16,350 | 16,275 | 16,350 |
| Largest contig (bp) | 1,686,036 | 1,739,027 | 1,738,674 | 1,734,655 | 1,734,062 | |
| Total length (Mbp) | 962.0 | 993.5 | 996.8 | 994.0 | 995.1 | |
| N50 length (bp) | 142,771 | 148,572 | 148,398 | 148,157 | 148,018 | |
| Complete BUSCOs (%) | 83.9 | 96.5 | 98.8 | 99.2 | 99.2 | |
| Complete single BUSCOs (%) | 80.4 | 82.4 | 85.9 | 74.9 | 75.3 | |
| Complete duplicated BUSCOs (%) | 3.5 | 14.1 | 12.9 | 24.3 | 23.9 | |
| Fragmented BUSCOs (%) | 8.6 | 2.7 | 0.4 | 0.4 | 0.4 | |
| Missing BUSCOs (%) | 7.5 | 0.8 | 0.8 | 0.4 | 0.4 | |
| NextDenovo | Number of contigs | 5431 | 5431 | 5431 | 5431 | 5431 |
| Largest contig (bp) | 3,406,093 | 3,359,923 | 3,371,738 | 3,361,759 | 3,361,805 | |
| Total length (Mbp) | 824.2 | 817.1 | 820.9 | 818.6 | 818.9 | |
| N50 length (bp) | 218,190 | 217,600 | 218,885 | 218,094 | 218,285 | |
| Complete BUSCOs (%) | 90.2 | 92.1 | 93.7 | 94.5 | 94.5 | |
| Complete single BUSCOs (%) | 72.9 | 72.5 | 76.1 | 69.8 | 69.8 | |
| Complete duplicated BUSCOs (%) | 17.3 | 19.6 | 17.6 | 24.7 | 24.7 | |
| Fragmented BUSCOs (%) | 3.1 | 1.2 | 0.4 | 0.4 | 0.4 | |
| Missing BUSCOs (%) | 6.7 | 6.7 | 5.9 | 5.1 | 5.1 |
Computational requirements for assemblers and assembly polishing algorithms.
| Program | Data | Cores | CPU Time (h) | RAM Memory (GB) | Disc Space (GB) |
|---|---|---|---|---|---|
| Platanus | Illumina | 56 | 1308 | 624 | 5 |
| MaSuRCA | Illumina | 56 | 1420 | 426 | 245 |
| Haslr | Illumina + MinION | 56 | 37 | 39 | 50 |
| Wengan | Illumina + MinION | 56 | 465 | 599 | 37 |
| MaSuRCA | Illumina + MinION | 56 | 2462 | 864 | 543 |
| Flye | MinION | 56 | 863 | 584 | 205 |
| Raven | MinION | 56 | 44 | 36 | 24 |
| Redbean | MinION | 56 | 205 | 54 | 16 |
| Canu | MinION | 112 | 35,401 | 2960 | 57 |
| NextDenovo | MinION | 56 | 244 | 804 | 7.2 |
| Racon_long * | Polishing | 56 | 186 ± 1.5 | 776 ± 0.0 | 577 ± 0.0 |
| Medaka * | Polishing | 56 | 33 ± 0.0 | 106 ± 0.1 | 40 ± 0.0 |
| Racon_short * | Polishing | 56 | 167 ± 0.3 | 654 ± 0.0 | 554 ± 0.0 |
| Nextpolish * | Polishing | 56 | 37 ± 0.5 | 54 ± 0.0 | 51 ± 0.0 |
* Average values for polishing of the Flye, Canu, Raven, Redbean and NextDenovo assemblies.