| Literature DB >> 32546503 |
M Renee Bellinger1, Roshan Paudel2, Steven Starnes3, Lukas Kambic3, Michael B Kantar2, Thomas Wolfgruber2, Kurt Lamour4, Scott Geib5, Sheina Sim5, Susan C Miyasaka2, Martin Helmkampf6, Michael Shintaku7.
Abstract
Taro (Colocasia esculenta) is a food staple widely cultivated in the humid tropics of Asia, Africa, Pacific and the Caribbean. One of the greatest threats to taro production is Taro Leaf Blight caused by the oomycete pathogen Phytophthora colocasiae Here we describe a de novo taro genome assembly and use it to analyze sequence data from a Taro Leaf Blight resistant mapping population. The genome was assembled from linked-read sequences (10x Genomics; ∼60x coverage) and gap-filled and scaffolded with contigs assembled from Oxford Nanopore Technology long-reads and linkage map results. The haploid assembly was 2.45 Gb total, with a maximum contig length of 38 Mb and scaffold N50 of 317,420 bp. A comparison of family-level (Araceae) genome features reveals the repeat content of taro to be 82%, >3.5x greater than in great duckweed (Spirodela polyrhiza), 23%. Both genomes recovered a similar percent of Benchmarking Universal Single-copy Orthologs, 80% and 84%, based on a 3,236 gene database for monocot plants. A greater number of nucleotide-binding leucine-rich repeat disease resistance genes were present in genomes of taro than the duckweed, ∼391 vs. ∼70 (∼182 and ∼46 complete). The mapping population data revealed 16 major linkage groups with 520 markers, and 10 quantitative trait loci (QTL) significantly associated with Taro Leaf Blight disease resistance. The genome sequence of taro enhances our understanding of resistance to TLB, and provides markers that may accelerate breeding programs. This genome project may provide a template for developing genomic resources in other understudied plant species.Entities:
Keywords: Colocasia esculenta; Taro Leaf Blight; disease resistance genes; linkage mapping; linked-read genome assembly
Mesh:
Year: 2020 PMID: 32546503 PMCID: PMC7407455 DOI: 10.1534/g3.120.401367
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Descriptive characteristics for three draft taro genome assemblies. The pseudochromosome-level taro assembly (“Ps_chr”) was composed from a linked-read assembly (“LR”) that was gap-filled using contigs assembled from nanopore MinION long-reads (merge step, “Merged”), filtered for assembly artifacts, and then concatenated into pseudochromosomes using a linkage map. Kilobase = kb
| # Scaffolds (% of assembled nucleotides) | ||||||||
|---|---|---|---|---|---|---|---|---|
| Assembled bases | Largest scaffold | > 5 kb | > 10 kb | > 50 kb | Total | N50 | GC content (%) | |
| Ps_chr | 2,448,853,100 | 38,380,923 | 55,692 (92%) | 25,535 (83%) | 5,180 (68%) | 140,400 (100%) | 317,420 | 41.98 |
| Merged | 2,451,639,670 | 6,251,291 | 56,504 (92%) | 25,535 (83%) | 5,527 (67%) | 142,854 (100%) | 270,514 | 42.04 |
| LR | 2,331,885,920 | 3,965,393 | 53,768 (91%) | 22,039 (81%) | 4,782 (68%) | 144,852 (100%) | 336,981 | 42.13 |
Figure 1Genome assembly completeness assessed by the recovery of Benchmarking Universal Single-Copy Orthologs (BUSCOs). The percent of BUSCO genes found in each genome is listed for categories single (S) or multiple copies (D), as well as fragmented (F) and missing (M). Analyses are based on the BUSCO liliopsida_odb10 dataset representing class monocot (n = 3,236 genes). See text for scientific names and NCBI genome accession identifiers.
Repetitive content of taro (Colocasia esculenta) and great duckweed (Spirodela polyrhiza) genome assembles. Total repeat content was quantified using de novo repeat libraries constructed with RepeatModeler and screened with RepeatMasker. The percent (%) of sequence is relative to each individual assembly’s total length excluding runs of NNN”s between scaffolded contigs. Short and long interspersed elements are denoted as SINEs and LINEs
| Element | Length Occupied (bp) | % Genome | ||
|---|---|---|---|---|
| taro | duckweed | taro | duckweed | |
| SINEs | 0 | 2,139 | 0.0 | <0.01 |
| LINEs | 36,753,946 | 366,596 | 1.6 | 0.3 |
| LINE1 | 33,001,257 | 366,596 | 1.4 | 0.3 |
| LTR elements | 834,904,293 | 12,263,206 | 36.3 | 9.1 |
| DNA elements | 91,787,447 | 221,956 | 4.0 | 0.2 |
| Unclassified | 874,609,262 | 14,284,860 | 38.1 | 10.6 |
| Total interspersed repeats | 1,838,054,948 | 27,138,757 | 80.0 | 20.2 |
| Simple repeats | 35,678,844 | 3,145,222 | 1.6 | 2.3 |
| Low complexity repeats | 8,251,536 | 671,992 | 0.4 | 0.5 |
| Total assembly L | 2,451,787,670 | 135,172,123 | ||
| L excluding Ns runs | 2,297,336,160 | 134,368,765 | ||
| Bases masked | 1,881,985,328 | 30,955,971 | ||
| Total % repeat content | 81.9 | 23.0 | ||
Genotyping by sequencing (GBS) mapping data and variant calls for 86 taro samples. Summary data include sample-averaged counts and percentages for total mapped and unmapped reads and reads that mapped uniquely (1x) or were multi-mapped (>1x). Numbers for variant calls include raw variable sites, INDELs, variable sites that passed filters (see text), and the number of variable sites present the mapping population (MP) after applying a minor allele frequency (maf) threshold of 0.012
| Total | Mapped | Unmapped | 1x | >1 x | Raw data | INDELs | Filtered | MP |
|---|---|---|---|---|---|---|---|---|
| 2,723,124 | 2,257,353 (82.9%) | 465,771 (17.1%) | 1,502,585 (66.4%) | 754,767 (33.6%) | 15,021,591 | 10,982 | 7,018 | 1,519 |
Linkage mapping results for GBS data mapped to the merged taro genome reference. Markers passing the initial test for segregation distortion are listed as number of initial markers. Homozygous, redundant, and distorted SNPs were removed from the subsequent analysis. After binning of unique markers and filtering for segregation distortion, a suggested logarithm of the odds (LOD) score was calculated and used for linkage group (LG) formation
| # Initial markers | # Homozygous markers | # Unique bins | % distorted markers | Suggested LOD | # LG | # markers final map | # LG >= 8 markers |
|---|---|---|---|---|---|---|---|
| 1,423 | 107 | 802 | 22% | 5.97 | 31 | 558 | 14 |
Descriptors for linkage groups (LG) constructed from ‘1025’ taro mapping population genotypes. Sixteen major linkage groups (LG) were present in the final linkage map constructed from a mapping population of taro. The SNP markers were called using the “merged” taro genome assembly as a reference (see text for details)
| Linkage groups | Total markers | LG length (cM) | Average distance (cM) | Genetic length |
|---|---|---|---|---|
| LG1 | 37 | 269.93 | 7.3 | 24,124,590 |
| LG2 | 73 | 590.87 | 8.1 | 42,680,486 |
| LG3 | 65 | 491.26 | 7.6 | 30,369,817 |
| LG4 | 25 | 205.08 | 8.2 | 16,485,140 |
| LG5 | 35 | 256.4 | 7.3 | 18,072,045 |
| LG6 | 43 | 305.31 | 7.1 | 13,916,264 |
| LG7 | 43 | 237.94 | 5.5 | 18,481,439 |
| LG8 | 43 | 514.12 | 12 | 23,744,905 |
| LG9 | 38 | 220.57 | 5.8 | 21,816,219 |
| LG10 | 35 | 374.35 | 10.7 | 15,471,071 |
| LG11 | 8 | 43.42 | 5.4 | 4,286,715 |
| LG12 | 5 | 22.09 | 4.4 | 2,732,664 |
| LG13 | 10 | 127.25 | 12.7 | 4,820,723 |
| LG14 | 5 | 31.51 | 6.3 | 621,528 |
| LG15 | 23 | 179.72 | 7.8 | 11,651,500 |
| LG16 | 32 | 224.56 | 7 | 16,237,138 |
| Total | 520 | 4,094.38 | 123.3 | 265,512,244 |
| Average | 33 | 255.90 | 7.7 | 16,594,515 |
Figure 2Genetic map of taro based on 520 high quality SNP markers covering 16 linkage groups.
A list of QTL significant for TLB resistance in the ‘1025’ TLB-resistant mapping population. The QTL naming convention includes the isolate name preceded by a q, DPI4, the linkage group identifier, and a number representing the number of significant QTL in the linkage group. PVE = Phenotypic variance explained, a = additive effect, d = dominance effect, d/a = QTL mode of action
| QTL name | Isolate | LG | Left Marker | Right Marker | LOD | PVE (%) | a | d | d/a |
|---|---|---|---|---|---|---|---|---|---|
| qS1_DPI4-3-1 | S1 | 3 | TIG00056386_694420 | 1544_160709 | 22.38 | 10.25 | 0.03 | 0.46 | 16.17 |
| qS1_DPI4-6-1 | S1 | 6 | TIG00044844_224587 | TIG01551418_41957 | 29.56 | 17.52 | 0.29 | 0.02 | 0.06 |
| qS1_DPI4-8-1 | S1 | 8 | 3452_96171 | TIG01552475_45417 | 25.74 | 13.64 | 0.29 | 0.00 | 0.00 |
| qS1_DPI4-8-2 | S1 | 8 | TIG00020381_142717 | TIG00037610_138242 | 19.66 | 8.82 | 0.00 | 0.43 | ∞ |
| qS1_DPI4-9-1 | S1 | 9 | TIG01555954_659795 | 270115_508 | 22.68 | 10.74 | 0.25 | −0.02 | −0.09 |
| qS3_DPI4-2-1 | S3 | 2 | 399_398671 | 399_398779 | 7.70 | 7.75 | −0.16 | −0.04 | 0.23 |
| qS3_DPI4-3-1 | S3 | 3 | 293503_21603 | 285165_4595 | 9.60 | 9.97 | 0.24 | −0.06 | −0.24 |
| qS3_DPI4-3-2 | S3 | 3 | 285165_4624 | 283183_2064 | 19.66 | 27.86 | −0.34 | −0.05 | 0.15 |
| qS3_DPI4-6-1 | S3 | 6 | TIG00044844_224587 | TIG01551418_41957 | 9.81 | 9.85 | 0.24 | −0.02 | −0.08 |
| qS3_DPI4-7-1 | S3 | 7 | 1068_74730 | TIG00030638_157187 | 6.75 | 6.55 | −0.03 | 0.32 | −9.81 |
Figure 3Genome wide representation of major linkage groups, including significant QTL and their LOD scores for S1 (A) and S3 (B) isolates of Phytophthora colocasiae exposed to the ‘1025’ mapping population. The Y-axis indicates the logarithm of odds (LOD) values. Peaks above the threshold (dotted line) of LOD = 15.32 (S1 isolate) and LOD = 6.01 (S3 isolate) represent a QTL having significant interaction with the TLB tolerance. Linkage groups >100 centimorgans (cM) are labeled, with total length shown on y-axis. Approximate locations of nucleotide binding leucine rich proteins (NLRs) (complete, partial, and pseudogenes) are indicated by colored symbols.