| Literature DB >> 26244060 |
Douglas R Hoen1, Glenn Hickey2, Guillaume Bourque3, Josep Casacuberta4, Richard Cordaux5, Cédric Feschotte6, Anna-Sophie Fiston-Lavier7, Aurélie Hua-Van8, Robert Hubley9, Aurélie Kapusta6, Emmanuelle Lerat10, Florian Maumus11, David D Pollock12, Hadi Quesneville11, Arian Smit9, Travis J Wheeler13, Thomas E Bureau14, Mathieu Blanchette2.
Abstract
DNA derived from transposable elements (TEs) constitutes large parts of the genomes of complex eukaryotes, with major impacts not only on genomic research but also on how organisms evolve and function. Although a variety of methods and tools have been developed to detect and annotate TEs, there are as yet no standard benchmarks-that is, no standard way to measure or compare their accuracy. This lack of accuracy assessment calls into question conclusions from a wide range of research that depends explicitly or implicitly on TE annotation. In the absence of standard benchmarks, toolmakers are impeded in improving their tools, annotators cannot properly assess which tools might best suit their needs, and downstream researchers cannot judge how accuracy limitations might impact their studies. We therefore propose that the TE research community create and adopt standard TE annotation benchmarks, and we call for other researchers to join the authors in making this long-overdue effort a success.Entities:
Year: 2015 PMID: 26244060 PMCID: PMC4524446 DOI: 10.1186/s13100-015-0044-6
Source DB: PubMed Journal: Mob DNA
Tools and databases used to annotate TEs in the genomes of multicellular eukaryotes published in 2014
| Genome | Homology-based | De novo | Pipeline | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Repbase | RepeatMasker | CENSOR | RepeatProteinMask | RepeatModeler | RepeatScout | PILER | LTR_FINDER | LTR_STRUC | MITE-Hunter | REPET | Other Databases | Other toolsa | Ref. | ||
|
| Plant (monocot) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [ | ||||||
|
| Animal (bony fish) | ✓ | ✓ | ✓ | TEClass | [ | |||||||||
|
| Animal (bony fish) | ✓ | ✓ | ✓ | Genbank, UniprotKB/SwissProt | Custom | [ | ||||||||
|
| Plant (monocot) | ✓ | ✓ | ✓ | ✓ | MSU Repeats, custom (rice-specific) | Custom | [ | |||||||
|
| Animal (primate) | ✓ | ✓ | [ | |||||||||||
|
| Plant (dicot) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [ | |||||
|
| Plant (dicot) | ✓ | ✓ | TIGR, SGN (Solanaceae-specific) | [ | ||||||||||
|
| Animal (insect) | ✓ | ✓ | ✓ | ✓ | Genbank | RECON, TARGeT | [ | |||||||
|
| Animal (bony fish) | ✓ | ✓ | ✓ | ✓ | ✓ | E-inverted, Manual | [ | |||||||
|
| Animal (bird) | ✓ | ✓ | [ | |||||||||||
|
| Plant (gymnosperm) | ✓ | ✓ | ✓ | ✓ | ✓ | PIER 2.0 (conifer-specific) | Custom | [ | ||||||
|
| Plant (monocot) | ✓ | ✓ | MipsREdat, MIPS PlantsDB | Custom | [ | |||||||||
|
| Animal (flatfish) | ✓ | ✓ | ✓ | ✓ | RepBase (for classification) | E-inverted, Custom | [ | |||||||
|
| Plant (dicot) | ✓ | ✓ | ✓ | ✓ | ✓ | MSU repeats | [ | |||||||
|
| Plant (dicot) | ✓ | ✓ | ✓ | [ | ||||||||||
|
| Animal (insect) | ✓ | ✓ | Efam (mosquito-specific) | [ | ||||||||||
aNot all tools used in building TE libraries are listed (e.g., UCLUST, MUSCLE)
Fig. 1Variation among TE annotation tools. a TE coverage in the Arabidopsis thaliana genome resulting from three commonly used repetitiveness-based de novo tools, compared to a reference set of TEs [8]. The total amount of TE coverage differs between the three, as does the fraction of the reference TEs that were found or missed and the amount of non-reference putative TEs. b Full-length LTR TEs in the Drosophila melanogaster X chromosome found by five different LTR-specific de novo tools, compared to a reference set of TEs [24]. Similar to a but even more pronounced, the number of TEs found by the tools and their agreement with the reference set vary widely. c A 100-kbp segment of the Arabidopsis lyrata genome (scaffold_1:14,957,501-15,057,500) displayed on a custom UCSC genome browser [76, 77], illustrating differences among TE annotations resulting from several approaches, as well as additional genomic data useful in identifying bona fide TEs. From top to bottom, the tracks represent: RepeatMasker annotations using libraries from Repbase [37], RepeatModeler [30], REPET [44], or de la Chaux et al. [78]; full-length LTR TE predictions by LTR_Finder [33] or LTRharvest [79]; tandem repeat predictions by TRF [29]; gene models predictions by FGenesH [80]; a set of TE-specific domains [13]; mapped mRNA and small RNA short reads [77]; inter-species conservation (alignment percent identity plots) to other Brassicaceae species [77]; and genome self-alignment depth (generated with LASTZ)