| Literature DB >> 35831814 |
Istvan Nagy1, Elisabeth Veeckman2,3,4, Chang Liu5,6, Michiel Van Bel3,7,8, Klaas Vandepoele3,7,8, Christian Sig Jensen9, Tom Ruttink2, Torben Asp10.
Abstract
BACKGROUND: The availability of chromosome-scale genome assemblies is fundamentally important to advance genetics and breeding in crops, as well as for evolutionary and comparative genomics. The improvement of long-read sequencing technologies and the advent of optical mapping and chromosome conformation capture technologies in the last few years, significantly promoted the development of chromosome-scale genome assemblies of model plants and crop species. In grasses, chromosome-scale genome assemblies recently became available for cultivated and wild species of the Triticeae subfamily. Development of state-of-the-art genomic resources in species of the Poeae subfamily, which includes important crops like fescues and ryegrasses, is lagging behind the progress in the cereal species.Entities:
Keywords: Chromosome-scale assembly; Comparative genomics; Festuca-Lolium complex; Lolium perenne; Perennial ryegrass
Mesh:
Year: 2022 PMID: 35831814 PMCID: PMC9281035 DOI: 10.1186/s12864-022-08697-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 4.547
Fig. 1Structural genome annotation, including genome-wide distribution of gene content, transposable elements (TEs), localization of centromere-specific transposons (Cereba, Quinta and Abia, relative frequencies in 1 Mb windows), and k-mer frequencies (median frequencies of 20-mers in 1 Mb windows) on L. perenne pseudo-chromosomes and unanchored scaffolds (chrUn)
Classification of LTR retrotransposons of the perennial ryegrass genome
| Order | Superfamily | Code | All transposons | Full length tranposons | ||
|---|---|---|---|---|---|---|
| Nr. | % | Nr. | % | |||
| Class I | ||||||
| LTR | Gypsy | RLG | 227472 | 72.07 | 30760 | 73.09 |
| Copia | RLC | 63194 | 20.02 | 7881 | 18.72 | |
| - | RLX | 1193 | 0.38 | 721 | 1.71 | |
| LINE | - | RIX | 1072 | 0.34 | 221 | 0.53 |
| L1 | RIL | 4 | <0.01 | 4 | <0.01 | |
| R2 | RIR | 3 | <0.01 | 2 | <0.01 | |
| SINE | - | RSX | 1095 | 0.35 | 4 | <0.01 |
| Class II | ||||||
| CACTA | DTC | 15579 | 4.94 | 229 | 0.54 | |
| Pif-Harbinger | DTH | 1942 | 0.62 | 45 | 0.17 | |
| Mutator | DTM | 1247 | 0.39 | 33 | 0.08 | |
| Tc1-Mariner | DTT | 375 | 0.11 | 12 | 0.03 | |
| Helitron | DHH | 79 | 0.02 | 7 | 0.02 | |
| hAT | DTA | 10 | <0.01 | - | - | |
| - | DTX | 9 | <0.01 | - | - | |
| - | DXX | 37 | 0.01 | - | - | |
| Other/Unknown | ||||||
| - | XXX | 1954 | 0.62 | 2165 | 5.14 | |
| Total | 315265 | 42085 | ||||
Characterization of genes and gene features of the v2 and v3 annotations
| Lolium_2.6.1 gene models | ||
|---|---|---|
| v2 | v3 | |
| Genes | ||
| Total number of genes | 139003 | 80821 |
| High confidence genes | 48812 | 54629 |
| Low confidence genes | 90191 | 15905 |
| lncRNA genes | - | 10287 |
| Gene features | ||
| Single-exon genes | 44091 (31.7%) | 23581 (29.2%) |
| Multi-exon genes | 94912 (68.3%) | 57240 (70.8%) |
| Mean exon per gene | 3.16 | 3.73 |
| Median gene length, bp | 1434 | 2330 |
| Median exon length, bp | 207 | 233 |
| Median intron length, bp | 128 | 127 |
Completeness of the v2 and v3 annotations ofL. perenne
| Completeness categories | Gene models | |||
|---|---|---|---|---|
| v2 | v3 | |||
| Nr. of hits | % of total | Nr. of hits | % of total | |
| (A) BUSCO completeness (n=1440) | ||||
| Complete BUSCOs (C) | 1340 | 93.1 | 1391 | 96.6 |
| Complete and single copy BUSCOs (S) | 1291 | 89.7 | 1331 | 92.4 |
| Complete and duplicated BUSCOs (D) | 49 | 3.4 | 60 | 4.2 |
| Fragmented BUSCOs (F) | 44 | 3.1 | 27 | 1.9 |
| Missing BUSCOs (M) | 56 | 3.8 | 22 | 1.5 |
| (B) coreGF completeness (n=7076) | ||||
| Represented gene families | 6762 | 95.6 | 6851 | 96.8 |
| Missing gene families | 314 | 4.4 | 225 | 3.2 |
| coreGF completeness score | 0.938 | 0.956 | ||
| (C) BLAST to reference proteomes | ||||
| Barley MIPS HC proteins (26159 sequences) | 23524 | 89.9 | 23977 | 91.7 |
| Barley Morex_V2 HC proteins (32787 sequences) | 27637 | 84.3 | 28233 | 86.1 |
| 26330 | 84.9 | 26815 | 86.4 | |
(A): Completeness scores assessed by BUSCO (v3.0.2 [23]) using the embryophyta_odb9 reference set (1440 single-copy orthologs)
(B): Core Gene Families (coreGFs) completeness scores using the monocot reference set of PLAZA v4 (7076 coreGFs from five species, [25]). The representation across all individual coreGFs is summarized in a global weighted coreGF score
(C): Transcript nucleotide sequences were searched by BLASTx against reference protein sequences. Top hits at an e-value threshold of e-4 with least 70% subject coverage were considered as significant matches
Fig. 3Localization of 10,368 single-copy orthologs on homologous pseudo-chromosomes of L. perenne and H. vulgare (Morex_V2). Green lines show mappings in collinear orientation, red lines mappings in inverted orientation of homologous chromosome pairs
Fig. 2Pairwise whole-genome alignments of pseudo-chromosomes of L. perenne against H. vulgare (Morex_V2) (left panels) and L. perenne against B. distachyon (right panels). Colors representing B. distachyon pseudo-chromosomes: red: Bd chr1; blue: Bd chr2; green: Bd chr3; orange: Bd chr4; purple: Bd chr5. Axis labels show sizes in Mb
Fig. 4Comparative gene family analysis. a Gene family expansion plots show the relatively stable gene family size in L. perenne compared to seven closely related grass species. b Phylogenetic analysis of the ELF4 family shows that one clade contains multiple duplicated L. perenne genes. c Constans / VRN2 gene family analysis (CCT domain; HOM05M000693) shows the presence of multiple copies of VRN2 genes (ZCCT domain; ORTHO05M004293) in Ae. tauschii, S. cereale, and L. perenne, a single copy gene in B. distachyon, Z. mays, O. sativa ssp. japonica, and absence in S. bicolor and H. vulgare Morex_V2. The phylogenetic tree further shows that a sister clade to CO is specifically expanded in the Pooideae (Bd, Lp, Sc, Ae, Hv). See Fig. S5 for complete gene names. d FT gene family analysis (PEBP domain; HOM05M000217) shows relatively stable numbers of genes across species, and identifies the MFT, TFL, FT-I and FT-II clades [52]. e Histone demethylase gene family analysis (histone demethylase; HOM05M000330) identifies the FLD clade with single orthologous members across the grass species. Eight species are included in the comparative analysis: 5 of the Pooideae (Ae. tauschii (Ae), L. perenne (Lp), H. vulgare Morex_V2 (Hv), B. distachyon (Bd), S. cereale (Sc)) and O. sativa ssp. japonica (Os), S. bicolor (Sb), Z. mays (Zm) as more distantly related grass species