| Literature DB >> 34267227 |
Shan-Shan Zhou1, Xue-Mei Yan1, Kai-Fu Zhang2, Hui Liu1, Jie Xu1, Shuai Nie1, Kai-Hua Jia1, Si-Qian Jiao1, Wei Zhao1, You-Jie Zhao2, Ilga Porth3, Yousry A El Kassaby4, Tongli Wang4, Jian-Feng Mao5.
Abstract
LTR retrotransposons (LTR-RTs) are ubiquitous and represent the dominant repeat element in plant genomes, playing important roles in functional variation, genome plasticity and evolution. With the advent of new sequencing technologies, a growing number of whole-genome sequences have been made publicly available, making it possible to carry out systematic analyses of LTR-RTs. However, a comprehensive and unified annotation of LTR-RTs in plant groups is still lacking. Here, we constructed a plant intact LTR-RTs dataset, which is designed to classify and annotate intact LTR-RTs with a standardized procedure. The dataset currently comprises a total of 2,593,685 intact LTR-RTs from genomes of 300 plant species representing 93 families of 46 orders. The dataset is accompanied by sequence, diverse structural and functional annotation, age determination and classification information associated with the LTR-RTs. This dataset will contribute valuable resources for investigating the evolutionary dynamics and functional implications of LTR-RTs in plant genomes.Entities:
Year: 2021 PMID: 34267227 PMCID: PMC8282616 DOI: 10.1038/s41597-021-00968-x
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Schematic diagram illustrating the overall process of the intact LTR-RTs characterization in plant genomes. The top section shows the data sources of plant genomes, and the following four different modules represent different analyses.
Fig. 2Illustration of the data structure.
Description of metadata keys for the plain text (.txt) files.
| Key | Type | Description |
|---|---|---|
| string | Species name | |
| string | ID of intact LTR-RTs | |
| string | Chromosome of intact LTR-RTs | |
| int | Start position of domain in intact LTR-RTs | |
| int | End position of domain in intact LTR-RTs | |
| string | Type of domain in intact LTR-RTs | |
| int | Length of intact LTR-RTs | |
| string | Type of superfamilies | |
| string | Type of Lineages | |
| float | Sequence divergence of intact LTR-RTs |
Fig. 3Intact LTR-RT (Gypsy and Copia) occupation of plant genomes. Resolved intact LTR-RT lineages were identified in 300 plant genomes of diverse systematic assignment. The presence of intact LTR-RT lineages is shown as heatmap determined by the log-transformed (log10) value of the intact LTR-RT copy number. The realized phylogenetic relationship of LTR-RT lineages[24] is shown in the bottom right corner. (a) Gypsy superfamily. (b) Copia superfamily.
Fig. 4Density map of age distribution of intact LTR-RTs in representative Triticum species. For each species, intact LTR-RTs were grouped in both superfamilies and lineages (only the first few dominant lineages are shown here). The proportion of intact LTR-RTs of each specific age bin is shown, and subgenomes (A, B and D) from three Triticum species are colored red, blue and yellow, respectively.
Comparison of LTR-RT annotated in two Oryza sativa genome assemblies.
| Key | Nip-BRI | Nip-MSU7 |
|---|---|---|
| 380.70 | 373.25 | |
| 17 | 7.7 | |
| 18 | 905 | |
| 2,941 | 2,636 | |
| 29,589,668 | 25,529,468 | |
| 7.78 | 6.84 | |
| 0 | 2 |
Fig. 5Comparison of LTR-RT length and insertion time identified in two rice genome assemblies. (a) Difference of LTR-RT length between Nip-BRI and Nip-MSU7. *** shows a P-value of less than 0.001. (b) Insertion time of LTR-RT estimated by sequence divergence of the two LTRs in Nip-BRI and Nip-MSU7.
| Measurement(s) | LTR_retrotransposon • genome |
| Technology Type(s) | bioinformatics method • digital curation |
| Factor Type(s) | Species |
| Sample Characteristic - Organism | Rhodophyta • Chlorophyta • Bryophyta • Tracheophyta |