| Literature DB >> 30622124 |
Tingting Zhu1, Le Wang1, Juan C Rodriguez1, Karin R Deal1, Raz Avni2, Assaf Distelfeld2, Patrick E McGuire1, Jan Dvorak1, Ming-Cheng Luo3.
Abstract
Wild emmer (Triticum turgidum ssp. dicoccoides) is the progenitor of all modern cultivated tetraploid wheat. Its genome is large (> 10 Gb) and contains over 80% repeated sequences. The successful whole-genome-shotgun assembly of the wild emmer (accession Zavitan) genome sequence (WEW_v1.0) was an important milestone for wheat genomics. In an effort to improve this assembly, an optical map of accession Zavitan was constructed using Bionano Direct Label and Stain (DLS) technology. The map spanned 10.4 Gb. This map and another map produced earlier by us with the Bionano's Nick Label Repair and Stain (NLRS) technology were used to improve the current wild emmer assembly. The WEW_v1.0 assembly consisted of 151,912 scaffolds. Of them, 3,102 could be confidently aligned on the optical maps. Forty-seven were chimeric. They were disjoined and new scaffolds were assembled with the aid of the optical maps. The total number of scaffolds was reduced from 151,912 to 149,252 and N50 increased from 6.96 Mb to 72.63 Mb. Of the 149,252 scaffolds, 485 scaffolds, which accounted for 97% of the total genome length, were aligned and oriented on genetic maps, and new WEW_v2.0 pseudomolecules were constructed. The new pseudomolecules included 333 scaffolds (68.51 Mb) which were originally unassigned, 226 scaffolds (554.84 Mb) were placed into new locations, and 332 scaffolds (394.83 Mb) were re-oriented. The improved wild emmer genome assembly is an important resource for understanding genomic modification that occurred by domestication.Entities:
Keywords: DLS; Genome assembly; Pseudomolecules; Triticum dicoccoides
Mesh:
Year: 2019 PMID: 30622124 PMCID: PMC6404602 DOI: 10.1534/g3.118.200902
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Overview of the strategy for the construction of WEW_v2.0 pseudomolecules. The major steps include scaffold chimera resolving, scaffolding with optical maps, and pseudomolecule construction.
Characteristics of optical maps generated using different protocols
| Feature | DLS protocol | NLRS protocol |
|---|---|---|
| DLE-1 | Nt. | |
| 284 | 341 | |
| 150 | 180 | |
| 1,107 | 1,101 | |
| 110x | 110x | |
| 601 | 7,098 | |
| 296.90 | 19.42 | |
| 10.37 | 10.25 | |
| 56.79 | 2.14 |
Figure 2Detection of chimeras and reconstruction of pseudomolecules. (A) Discrepancy (pink shade) between scaffold3886 (pink rectangle) and DLS ctg93 (blue rectangle). Three copies of repeat 1 (red boxed) and two copies of repeat 2 (green boxed) were in tandem in an ∼287 kb region in DLS ctg93, but only two copies of repeat 1 were present in the ∼114 Kb region in scaffold3886, which was then disjoined into two scaffolds. (B) Illustration of pseudomolecule reconstruction. For a portion of the Chr2A (green rectangle) in the WEW_v2.0, 9 scaffolds of WEW_scf_v5.1 (pink rectangle) were ordered and oriented with the aid of DLS ctg93 (blue rectangle). In comparison, the portion in WEW_v2.0 is 9.1 Mb, whereas 7.9 Mb in the WEW_v1.0 (purple rectangle); three scaffolds (scaffold24368, scaffold100813, and scaffold103979) were re-oriented (green shades); three scaffolds (scaffold31939, scaffold24368, and scaffold100813) were re-ordered; scaffold3886 showed a discrepancy compared to DLS ctg93 (blue rectangle) and was disjoined (see detail in A); and two scaffolds of a total length of 490 kb in ChrUn of the WEW_v1.0 assembly were anchored onto Chr2A based on their alignments on DLS ctg93. The scaffolds in WEW_v1.0 were linked with 100 Ns, while they were linked with the number of Ns estimated with the optical maps in WEW_v2.0.
Scaffold characteristics at each step of their improvement with optical maps
| Feature | WEW_scf_v5 | WEW_scf_v5.1 | Scaffolded using DLS map (WEW_scf_v5.2) | Further scaffolded using NLRS map (WEW_scf_v5.3) |
|---|---|---|---|---|
| 151,912 | 151,968 | 149,550 | 149,252 | |
| 43,781,372 | 43,781,372 | 238,732,153 | 278,440,484 | |
| 10,494,678,545 | 10,494,611,785 | 10,650,512,398 | 10,661,158,675 | |
| 6,955,166 | 6,888,339 | 48,768,823 | 72,632,893 | |
| 1.63 | 1.63 | 3.07 | 3.30 |
Summary of the WEW_v2.0 and WEW_v1.0 pseudomolecules (Psm)
| Psm | WEW_v2.0 | WEW_v1.0 | ||||
|---|---|---|---|---|---|---|
| Length (bp) | Effective length (bp) | N% | Length (bp) | Effective length (bp) | N% | |
| Chr1A | 609,493,238 | 589,191,139 | 3.33 | 593,586,810 | 585,358,717 | 1.39 |
| Chr2A | 788,782,410 | 766,375,931 | 2.84 | 775,183,943 | 764,437,182 | 1.39 |
| Chr3A | 767,616,973 | 747,178,907 | 2.66 | 754,274,518 | 743,839,968 | 1.38 |
| Chr4A | 751,837,965 | 724,085,122 | 3.69 | 726,427,787 | 715,660,361 | 1.48 |
| Chr5A | 715,386,202 | 694,794,407 | 2.88 | 700,855,599 | 691,202,877 | 1.38 |
| Chr6A | 633,698,003 | 616,090,333 | 2.78 | 621,432,051 | 612,835,755 | 1.38 |
| Chr7A | 747,227,478 | 721,432,789 | 3.45 | 727,576,108 | 716,586,138 | 1.51 |
| Chr1B | 712,626,289 | 683,358,120 | 4.10 | 690,537,804 | 679,507,080 | 1.60 |
| Chr2B | 825,750,385 | 798,504,965 | 3.30 | 803,365,466 | 791,358,810 | 1.49 |
| Chr3B | 865,950,040 | 834,300,602 | 3.65 | 841,096,276 | 827,748,505 | 1.59 |
| Chr4B | 684,047,826 | 666,197,808 | 2.61 | 673,896,466 | 664,082,181 | 1.46 |
| Chr5B | 726,095,352 | 704,902,457 | 2.92 | 712,180,895 | 700,915,297 | 1.58 |
| Chr6B | 724,204,431 | 699,071,820 | 3.47 | 703,217,322 | 692,164,878 | 1.57 |
| Chr7B | 777,835,607 | 749,691,077 | 3.62 | 755,408,349 | 742,865,000 | 1.66 |
| Total | 10,330,552,199 | 9,995,175,477 | 3.25 | 10,079,039,394 | 9,928,562,749 | 1.49 |
Figure 3An overview of gap closing and gap size estimation in the 14 improved WEW_v2.0 pseudomolecules. Gray bars represent each of the 14 pseudomolecules. For each pseudomolecule, the upper ticks (blue) indicate ChrUn scaffolds of WEW_v1.0 assembly that were anchored onto chromosomes in the WEW_v2.0 pseudomolecules; the lower ticks in red indicate the gaps of unknown length in the WEW_v1.0 pseudomolecules that were estimated by optical maps in the WEW_v2.0 pseudomolecules; the lower ticks in black indicate gaps of unknown sizes in both versions of the pseudomolecules.
Numbers of annotated high-confidence genes in each WEW_v2.0 and WEW_v1.0 pseudomolecule
| Psm | WEW_v2.0 | WEW_v1.0 |
|---|---|---|
| 3,974 | 3,804 | |
| 4,441 | 4,232 | |
| 5,121 | 4,963 | |
| 5,834 | 5,544 | |
| 4,731 | 4,565 | |
| 5,254 | 5,072 | |
| 4,523 | 4,350 | |
| 3,725 | 3,639 | |
| 4,900 | 4,818 | |
| 5,168 | 5,026 | |
| 3,685 | 3,594 | |
| 4,315 | 4,187 | |
| 4,817 | 4,636 | |
| 4,504 | 4,383 | |
| 64,992 | 62,813 |
Numbers of gaps of unknown size in each WEW_v2.0 and WEW_v1.0 pseudomolecule (Psm)
| Psm | WEW_v2.0 | WEW_v1.0 |
|---|---|---|
| 15 | 133 | |
| 15 | 157 | |
| 15 | 167 | |
| 26 | 209 | |
| 21 | 141 | |
| 11 | 144 | |
| 10 | 172 | |
| 48 | 233 | |
| 34 | 218 | |
| 44 | 270 | |
| 57 | 209 | |
| 48 | 218 | |
| 60 | 235 | |
| 67 | 261 | |
| 471 | 2,767 |