| Literature DB >> 28903462 |
Hongliang Mao1, Hao Wang1,2.
Abstract
Instances of highly conserved plant short interspersed nuclear element (SINE) families and their enrichment near genes have been well documented, but little is known about the general patterns of such conservation and enrichment and underlying mechanisms. Here, we perform a comprehensive investigation of the structure, distribution, and evolution of SINEs in the grass family by analyzing 14 grass and 5 other flowering plant genomes using comparative genomics methods. We identify 61 SINE families composed of 29,572 copies, in which 46 families are first described. We find that comparing with other grass TEs, grass SINEs show much higher level of conservation in terms of genomic retention: The origin of at least 26% families can be traced to early grass diversification and these families are among most abundant SINE families in 86% species. We find that these families show much higher level of enrichment near protein coding genes than families of relatively recent origin (51%:28%), and that 40% of all grass SINEs are near gene and the percentage is higher than other types of grass TEs. The pattern of enrichment suggests that differential removal of SINE copies in gene-poor regions plays an important role in shaping the genomic distribution of these elements. We also identify a sequence motif located at 3' SINE end which is shared in 17 families. In short, this study provides insights into structure and evolution of SINEs in the grass family.Entities:
Keywords: SINE; comparative genomics; genome evolution; transposable elements
Mesh:
Year: 2017 PMID: 28903462 PMCID: PMC5585668 DOI: 10.1093/gbe/evx145
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
F—Taxonomic distribution and abundance of 61 grass SINE families. Each column shows CPN of a SINE family in the 19 genomes. Presence of families is shown by rectangles with gray gradient. Absence is represented by white rectangles. Family names are abbreviated by exclude the prefix of “Grass_.” For example, Grass_0 is abbreviated as 0.
F—Key characteristics about genomic distribution, abundance and evolutionary speed of grass SINEs. (a) Chromosomal distribution of grass SINEs. Pseudo-chromosomes of seven genomes are used. Short and long arms of chromosomes are normalized separately, with the centromere is located at 0 and the short and long arm end at − 1 and 1, respectively. The locations of genes are calculated as distance from centromere divided by total length of chromosome arm, where distance is a negative number for the short arm. (b) Distribution of SINEs around genes. Up- and downstream 10 kb is shown. (c) Correlation of the SINE content and genome size. (d) Comparison of sequence divergence between highly conserved gene families and SINE families at five time points. At every time point, the distributions of sequence identity of genes and SINE families are showed side by side. Key features of these distributions are captured by standard box plot.
Retained Conserved TE Families in Species Diverged at Different Times
| Species Pair | Divergence Time (Ma) | TE Category | No. of Families in A | No. of Families in B | No. of Common Families | % of Common Families |
|---|---|---|---|---|---|---|
| 60 | LTR | 27 | 364 | 0 | 0.00 | |
| TIR | 414 | 412 | 8 | 1.94 | ||
| SINE | 8 | 12 | 4 | 40.00 | ||
| 43 | LTR | 382 | 27 | 0 | 0.00 | |
| TIR | 373 | 414 | 3 | 0.76 | ||
| SINE | 20 | 8 | 1 | 7.14 | ||
| 28 | LTR | 74 | 177 | 2 | 1.59 | |
| TIR | 78 | 609 | 6 | 1.75 | ||
| SINE | 8 | 12 | 6 | 60.00 | ||
| 12 | LTR | 364 | 177 | 2 | 0.74 | |
| TIR | 412 | 609 | 26 | 5.09 | ||
| SINE | 12 | 12 | 7 | 58.33 | ||
| 5 | LTR | 382 | 234 | 93 | 30.19 | |
| TIR | 373 | 229 | 57 | 18.94 | ||
| SINE | 20 | 23 | 20 | 93.02 |
Percentage = 100 * 2 * # of conserved families/(# of all families in the pair of species).