| Literature DB >> 34515796 |
Le Wang1, Tingting Zhu1, Juan C Rodriguez1, Karin R Deal1, Jorge Dubcovsky1, Patrick E McGuire1, Thomas Lux2, Manuel Spannagl2, Klaus F X Mayer2, Patricia Baldrich3, Blake C Meyers3,4, Naxin Huo5, Yong Q Gu5, Hongye Zhou6, Katrien M Devos7,8,9, Jeffrey L Bennetzen10, Turgay Unver11, Hikmet Budak12, Patrick J Gulick13, Gabor Galiba14,15, Balázs Kalapos14, David R Nelson16, Pingchuan Li17, Frank M You17, Ming-Cheng Luo1, Jan Dvorak1.
Abstract
Aegilops tauschii is the donor of the D subgenome of hexaploid wheat and an important genetic resource. The reference-quality genome sequence Aet v4.0 for Ae. tauschii acc. AL8/78 was therefore an important milestone for wheat biology and breeding. Further advances in sequencing acc. AL8/78 and release of the Aet v5.0 sequence assembly are reported here. Two new optical maps were constructed and used in the revision of pseudomolecules. Gaps were closed with Pacific Biosciences long-read contigs, decreasing the gap number by 38,899. Transposable elements and protein-coding genes were reannotated. The number of annotated high-confidence genes was reduced from 39,635 in Aet v4.0 to 32,885 in Aet v5.0. A total of 2245 biologically important genes, including those affecting plant phenology, grain quality, and tolerance of abiotic stresses in wheat, was manually annotated and disease-resistance genes were annotated by a dedicated pipeline. Disease-resistance genes encoding nucleotide-binding site domains, receptor-like protein kinases, and receptor-like proteins were preferentially located in distal chromosome regions, whereas those encoding transmembrane coiled-coil proteins were dispersed more evenly along the chromosomes. Discovery, annotation, and expression analyses of microRNA (miRNA) precursors, mature miRNAs, and phasiRNAs are reported, including miRNA target genes. Other small RNAs, such as hc-siRNAs and tRFs, were characterized. These advances enhance the utility of the Ae. tauschii genome sequence for wheat genetics, biotechnology, and breeding.Entities:
Keywords: Pacific Biosciences; disease resistance; hc-siRNA; miRNA; optical map; phasiRNA; tRF; tRNA; transposable elements
Mesh:
Year: 2021 PMID: 34515796 PMCID: PMC8664484 DOI: 10.1093/g3journal/jkab325
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Metrics for the genome-wide optical maps of Aegilops tauschii acc. CIae 1 and AS75
| Metric | CIae 1 | AS75 |
|---|---|---|
| Contigs (no.) | 3603 | 5507 |
| Map total length (Gb) | 4.29 | 4.69 |
| Contig N50 (Mb) | 1.72 | 1.12 |
| Max contig length (Mb) | 14.66 | 12.62 |
Figure 1Differences in gap distribution on optical maps for different Ae. tauschii accessions. (Middle) Four separate optical contigs (ctg59, ctg1501, ctg149, and ctg1050) of Ae. tauschii accession CIae 23 (lineage 2) (green rectangles) aligned on a single optical contig (ctg475) of Ae. tauschii accession AS75 (lineage 1) (blue rectangles). (Top) A detail of the alignment of optical contigs ctg1501 and ctg149 on optical contig ctg475. The vertical lines connect corresponding restriction sites in aligned contigs. (Bottom) An overlap between AS75 optical contig ctg454 and CIae 23 optical contig ctg59 closes a gap between AS75 optical contigs ctg475 and ctg454 and extends the contiguity of optical map alignments.
Figure 2Gaps in the Aet v4.0 and v5.0 assemblies. (A) Number of gaps filled with 10, 100, and 1000 Ns in the two assemblies. (B) A dot plot showing details of a 10-N gap and duplication created by erroneous scaffold assembly in two regions harboring tandem repeats in pseudomolecule Chr1 of Aet v4.0, region 1,227,510–1,251,751 bp.
Comparison of characteristics of the Aet v4.0 and v5.0 pseudomolecules
| Pseudo-molecule | Total length (bp) | Length of gaps (bp) | Effective length (bp) | Gap number | |
|---|---|---|---|---|---|
| Chr1 | Aet v4.0 | 502,330,251 | 10,377,979 | 491,952,272 | 11,744 |
| Aet v5.0 | 501,967,303 | 8,406,331 | 493,560,972 | 6,647 | |
| Chr2 | Aet v4.0 | 651,661,114 | 10,780,418 | 640,880,696 | 14,522 |
| Aet v5.0 | 650,458,083 | 8,558,830 | 641,899,253 | 8,139 | |
| Chr3 | Aet v4.0 | 627,182,665 | 11,118,990 | 616,063,675 | 14,404 |
| Aet v5.0 | 627,456,150 | 9,202,775 | 618,253,375 | 9,437 | |
| Chr4 | Aet v4.0 | 526,018,785 | 9,454,858 | 516,563,927 | 10,892 |
| Aet v5.0 | 525,206,139 | 8,086,205 | 517,119,934 | 6,175 | |
| Chr5 | Aet v4.0 | 577,375,663 | 14,628,972 | 562,746,691 | 13,506 |
| Aet v5.0 | 576,238,907 | 12,629,354 | 563,609,553 | 7,615 | |
| Chr6 | Aet v4.0 | 496,019,527 | 9,112,991 | 486,906,536 | 11,357 |
| Aet v5.0 | 495,363,004 | 7,581,178 | 487,781,826 | 6,199 | |
| Chr7 | Aet v4.0 | 644,716,137 | 17,324,389 | 627,391,748 | 15,384 |
| Aet v5.0 | 644,841,383 | 14,867,162 | 629,974,221 | 8,698 | |
| Total | Aet v4.0 | 4,025,304,142 | 82,798,597 | 3,942,505,545 | 91,809 |
| Aet v5.0 | 4,021,530,969 | 69,331,835 | 3,952,199,134 | 52,910 | |
| Difference | −3,773,173 | −13,466,762 | 9,693,589 | −38,899 |
Numbers, total length, and percentages of the Aet v5.0 genome sequence represented by classes of transposable elements
| Class | Elements (no) | Sequence length (bp) | Genome length (%) |
|---|---|---|---|
| Class I | |||
| LTR/Gypsy | 853,285 | 1,333,833,165 | 33.17 |
| LTR/Copia | 383,555 | 644,052,368 | 16.02 |
| LTR/Unknown | 456,559 | 611,381,110 | 15.20 |
| Non-LTR/LINE | 192,648 | 80,369,034 | 2.00 |
| Non-LTR/SINE | 19,503 | 3,344,990 | 0.08 |
| Class II | |||
| CACTA | 201,749 | 203,285,995 | 5.05 |
| Mutator | 29,706 | 10,397,597 | 0.26 |
| PIF-Harbinger | 77,072 | 22,128,071 | 0.55 |
| Tc1-Mariner | 166,095 | 48,513,995 | 1.2 |
| hAT | 17,127 | 3,870,055 | 0.09 |
| Helitron | 219,796 | 113,044,338 | 2.81 |
| MITE/Stowaway | 1,139 | 174,827 | 0.00 |
| MITE/Tourist | 453 | 80,546 | 0.00 |
| DNA/unknown | 1,245 | 296,555 | 0.01 |
| Unspecified | 393,632 | 410,074,883 | 10.20 |
| Total interspersed | 3,013,564 | 3,484,847,529 | 86.65 |
| Low complexity | 20,328 | 1,104,950 | 0.03 |
| Simple repeat | 173,933 | 7,922,189 | 0.20 |
| Total | 3,207,825 | 3,493,874,668 | 86.88 |
Comparison of complete LTR-retrotransposons in the Aet v4.0 and v5.0 pseudomolecules
| Pseudomolecule |
|
| Unknown | |||
|---|---|---|---|---|---|---|
| Aet v4.0 | Aet v5.0 | Aet v4.0 | Aet v5.0 | Aet v4.0 | Aet v5.0 | |
| Chr1 | 1,674 | 1,941 | 3,139 | 3,530 | 1,637 | 1,795 |
| Chr2 | 2,030 | 2,434 | 3,922 | 4,427 | 2,184 | 2,375 |
| Chr3 | 1,942 | 2,254 | 3,777 | 4,091 | 2,067 | 2,197 |
| Chr4 | 1,625 | 1,905 | 3,461 | 3,879 | 1,736 | 1,884 |
| Chr5 | 1,975 | 2,392 | 3,485 | 3,888 | 1,890 | 2,099 |
| Chr6 | 1,603 | 1,931 | 3,019 | 3,429 | 1,663 | 1,799 |
| Chr7 | 2,081 | 2,439 | 3,689 | 4,080 | 2,134 | 2,307 |
| Total | 12,930 | 15,296 | 24,492 | 27,324 | 13,311 | 14,456 |
High-confidence (HC) and low-confidence (LC) genes generated by automated annotation in the Aet v4.0 and Aet v5.0 assemblies
| Metric | Aet v4.0 | Aet v5.0 | ||
|---|---|---|---|---|
| HC | LC | HC | LC | |
| Total genes (no.) | 39,635 | 43,495 | 32,885 | 35,903 |
| Single-exon genes (no.) | 15,389 | 36,567 | 9,837 | 25,186 |
| Multi-exon genes (no.) | 24,246 | 6,928 | 23,048 | 10,717 |
| Mean CDS length (bp) | 1,133 | 319 | 1,250 | 593 |
| Median CDS length (bp) | 942 | 258 | 1,071 | 435 |
| Mean exons per transcript (no.) | 3.9 | 1.2 | 4.45 | 1.68 |
| Median exons per transcript (no.) | 2 | 1 | 3 | 1 |
| Missing BUSCO genes | 93 | 50 | ||
Comparison of numbers of resistance gene analogues (RGAs) identified in the Ae. tauschii genome and the D subgenome of wheat (T. aestivum cv Chinese Spring)
| RGAs | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| NBS | CNL | TNL | CN | TN | NL | RLP | RLK | TM-CC | |
|
| 119 | 215 | 0 | 85 | 3 | 221 | 145 | 991 | 142 |
|
| 145 | 235 | 0 | 105 | 4 | 280 | 206 | 1094 | 174 |
NBS, nucleotide-binding site domain; LRR, leucine-rich repeat; TNL, Toll/interleukin-1 receptor-like domain; CNL, CC–NBS–LRR; TNL, TIR-NBS-LRRs; TN, TIR–NBS; RLK, receptor-like protein kinase; RLP, receptor-like protein; TM-CC, transmembrane coiled-coil protein.
Computed from Mayer .
Figure 3Distribution of resistance gene analogs (RGAs) along the Ae. tauschii pseudomolecules. Genes encoding (A) nucleotide-binding site domains (NBS), (B) receptor-like protein kinases (RLK), (C) receptor-like proteins (RLP), and (D) transmembrane coiled-coil proteins (TM-CC) in the Aet v5.0 genome sequence are indicated by ticks to the right of the chromosome bars. A registry in bp is shown to the left of each figure.
Figure 4Abundance of sRNAs in Ae. tauschii tissues. Blue, purple, orange, ochre, and yellow represent sRNAs derived from CDS, miRNAs, phasiRNAs, TE-derived sRNAs, and tRNA-derived sRNAs, respectively.
Figure 5miRNA and phasiRNA expression. Heatmaps of miRNA (A) and phasiRNA (B) expression in different Ae. tauschii tissues. The rows represent mature miRNAs and PHAS loci, respectively, and the columns represent samples from mature leaves (2 replicates), roots (2 replicates), seedlings, and spikes. The miRNAs and PHAS loci were grouped into four different clades (I–IV) based on the expression level.
Figure 6Composition and size distribution of hc-siRNA in Ae. tauschii. The hc-siRNAs were categorized by their size and their source in mature leaves (2 replicates), roots (2 replicates), seedlings, and spikes. Yellow, green, cyan, and purple bars represent hc-siRNAs 21, 22, 23, and 24 nt long, respectively.
Figure 7Characterization of tRNA-derived fragments. (A) Heatmap of tRNA-derived fragments in Ae. tauschii tissues. The rows of the heat map represent tRNA genes and the columns represent samples. (B) Origins of the tRNA-derived fragments in the six Ae. tauschii samples. The tRFs are classified depending on their original position within the tRNA molecule. The x-axis represents the positions within each tRNA (divided into nine different bins) and the y-axis represents the abundance (in %).