| Literature DB >> 26802115 |
Roy N Platt1, Laura Blanco-Berdugo1, David A Ray2.
Abstract
Transposable elements (TEs) are mobile genetic elements with the ability to replicate themselves throughout the host genome. In some taxa TEs reach copy numbers in hundreds of thousands and can occupy more than half of the genome. The increasing number of reference genomes from nonmodel species has begun to outpace efforts to identify and annotate TE content and methods that are used vary significantly between projects. Here, we demonstrate variation that arises in TE annotations when less than optimal methods are used. We found that across a variety of taxa, the ability to accurately identify TEs based solely on homology decreased as the phylogenetic distance between the queried genome and a reference increased. Next we annotated repeats using homology alone, as is often the case in new genome analyses, and a combination of homology and de novo methods as well as an additional manual curation step. Reannotation using these methods identified a substantial number of new TE subfamilies in previously characterized genomes, recognized a higher proportion of the genome as repetitive, and decreased the average genetic distance within TE families, implying recent TE accumulation. Finally, these finding-increased recognition of younger TEs-were confirmed via an analysis of the postman butterfly (Heliconius melpomene). These observations imply that complete TE annotation relies on a combination of homology and de novo-based repeat identification, manual curation, and classification and that relying on simple, homology-based methods is insufficient to accurately describe the TE landscape of a newly sequenced genome.Entities:
Keywords: Heliconius melpomene; Heterocephalus glaber; Microtus ochrogaster; genome annotation; transposable elements
Mesh:
Substances:
Year: 2016 PMID: 26802115 PMCID: PMC4779615 DOI: 10.1093/gbe/evw009
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FHomology-based TE annotations using human TEs. (A) TEs in several mammalian genomes were identified and quantified using human TEs. The percentage of TEs identified using human TEs is given as a percentage of the known repeat content. Time since divergence from the human lineage for each taxa was taken from TimeTree.org. Taxonomically related species are grouped by color. The dotted line represents 100% recognition. (B) A phylogram depicting the radiation of the mammals, modified from Murphy et al. (2007).
Transposable Element Load in the Naked Mole Rat (
| Naked Mole Rat | Prarie vole | |||||
|---|---|---|---|---|---|---|
| Rodent (Mb) | De novo (Mb) | Rodent (Mb) | De novo (Mb) | |||
| Class I retrotransposons | 594.79 | 661.63 | 646.21 | 790.68 | ||
| LTRs | 157.39 | 175.2 | 210.39 | 346.24 | ||
| ERV | 7.55 | 7.45 | 2.02 | 1.74 | ||
| ERV1 | 17.05 | 15.47 | 10.28 | 13.04 | ||
| ERV2 | 21.35 | 14.61 | 89.97 | 221.59 | ||
| ERV3 | 110.65 | 84.39 | 105.43 | 102.27 | ||
| Gypsy | 0.54 | 0.51 | 0.1 | 0.1 | ||
| LTR | 0.25 | 52.77 | 2.6 | 7.5 | ||
| LINEs | 368.83 | 400.35 | 213.68 | 230.1 | ||
| CR1 | 16.18 | 15.94 | 2.29 | 2.29 | ||
| L1 | 352.16 | 383.94 | 211.24 | 227.66 | ||
| L2 | 0.12 | 0.11 | 0.03 | 0.03 | ||
| Penelope | 0.01 | 0.01 | 0 | 0 | ||
| R4 | 0.01 | 0.01 | 0 | 0 | ||
| RTE | 0.02 | 0.02 | 0.01 | 0.01 | ||
| RTEX | 0.33 | 0.31 | 0.1 | 0.1 | ||
| Tx1 | 0.01 | 0.01 | 0 | 0 | ||
| SINEs | 68.5 | 86.03 | 222.12 | 214.31 | ||
| SINE1/7SL | 68.42 | 74.29 | 84.4 | 77.34 | ||
| SINE2 | 0 | 11.66 | 137.64 | 136.89 | ||
| SINE3/5S | 0.04 | 0.04 | 0.04 | 0.04 | ||
| Unk | 0.05 | 0.05 | 0.03 | 0.03 | ||
| Unclassified non-LTRs | 0.06 | 0.06 | 0.02 | 0.02 | ||
| Unclassified | 0.06 | 0.06 | 0.02 | 0.02 | ||
| Class II DNA transposons | 33.17 | 51.36 | 17.2 | 17.2 | ||
| PiggyBac | 0 | 1.73 | 0 | 0 | ||
| TcMariner | 14.45 | 30.07 | 4.88 | 4.88 | ||
| hAT | 15.33 | 16.42 | 9.18 | 9.17 | ||
| MuDR | 1.43 | 1.24 | 0.39 | 0.39 | ||
| Helitron | 0.13 | 0.13 | 0.03 | 0.03 | ||
| Kolobok | 0.02 | 0.02 | 0.01 | 0.01 | ||
| Unk | 1.8 | 1.76 | 2.71 | 2.71 | ||
| Unclassified tes | 5.61 | 8.89 | 26.63 | 13.51 | ||
| Unclassified | 5.61 | 8.89 | 26.63 | 13.51 | ||
| Total | 633.57 | 721.88 | 690.04 | 821.39 | ||
Note.—Rodent-specific libraries were taken from Repbase (August 2014). De novo libraries were combined with the rodent-specific libraries in an effort to generate the complete annotations.
FDifferences in TE accumulation histories of the (A, D) naked mole rat (Heterocephalus glaber), (B, E) prairie vole (Michrotus ochrogaster), and (C, F) postman butterfly (Heliconius melpomene) before and after de novo TE identification and curation. RepeatMasker searches against the (A) mole rat and (C) prairie vole used all known mammal TEs and all known arthropod TEs were used against the (E) postman butterfly genome to identify all known TEs based on homology only. De novo identification and curation altered the content, quantity, and distribution of elements identified for the (B) mole rat, (D) prairie vole, and (F) postman butterfly genomes. Divergence from a consensus sequence from each element was calculated and binned to demonstrate the accumulation profile for each taxa. For the mole rate and prairie vole, highly mutable CpG sites were excluded from analyses.