| Literature DB >> 24786468 |
Claire Hoede1, Sandie Arnoux2, Mark Moisset3, Timothée Chaumier3, Olivier Inizan3, Véronique Jamilloux3, Hadi Quesneville3.
Abstract
SUMMARY: The classification of transposable elements (TEs) is key step towards deciphering their potential impact on the genome. However, this process is often based on manual sequence inspection by TE experts. With the wealth of genomic sequences now available, this task requires automation, making it accessible to most scientists. We propose a new tool, PASTEC, which classifies TEs by searching for structural features and similarities. This tool outperforms currently available software for TE classification. The main innovation of PASTEC is the search for HMM profiles, which is useful for inferring the classification of unknown TE on the basis of conserved functional domains of the proteins. In addition, PASTEC is the only tool providing an exhaustive spectrum of possible classifications to the order level of the Wicker hierarchical TE classification system. It can also automatically classify other repeated elements, such as SSR (Simple Sequence Repeats), rDNA or potential repeated host genes. Finally, the output of this new tool is designed to facilitate manual curation by providing to biologists with all the evidence accumulated for each TE consensus. AVAILABILITY: PASTEC is available as a REPET module or standalone software (http://urgi.versailles.inra.fr/download/repet/REPET_linux-x64-2.2.tar.gz). It requires a Unix-like system. There are two standalone versions: one of which is parallelized (requiring Sun grid Engine or Torque), and the other of which is not.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24786468 PMCID: PMC4008368 DOI: 10.1371/journal.pone.0091929
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Comparison of the classifications obtained with the PASTEC, TEClass and RepClass tools.
| PASTEC (Wicker's) Class | PASTEC (Wicker's) Order | TEClass Class | TEClass Order | RepClass Class | RepClass Order |
|
| |||||
| LTR | Retro | LTR | Class I | LTR/DIRS/PLE | |
| DIRS | Retro | LTR | Class I | LTR/DIRS/PLE | |
| PLE | Retro | LINE | Class I | LTR/DIRS/PLE | |
| LARD* | Retro |
| Class I |
| |
| TRIM* | Retro |
| Class I |
| |
| LINE | Retro | LINE | Class I | LINE/SINE | |
| SINE | Retro | SINE | Class I | LINE/SINE | |
|
| |||||
| TIR | DNA | DNA | Class II | TIR/Crypton/Polinton | |
| MITE* | DNA | DNA | Class II | TIR/Crypton/Polinton | |
| Crypton | DNA |
| Class II | TIR/Crypton/Polinton | |
| Helitron | DNA |
| Class II | Helitron | |
| Maverick | DNA |
| Class II | TIR/Crypton/Polinton |
Notes: (*) Non-autonomous element. na: not considered by the tool.
Figure 1Agents implemented in the system.
Orange agents are retriever agents, blue agents are classifier and filter agents. The super-agent is shown in green. The arrows indicate the principal communications between the different agents, with only requests shown.
Comparison of the performances of PASTEC, REPCLASS, and TECLASS for classification into TE classes.
| Dataset (sequence #) | Performance | PASTEC | REPCLASS | TECLASS |
| Repbase-atha (318) | ||||
| Well classified | 80.7 | 83.6 | 98.4 | |
| Misclassified |
| 3.5 | 1.3 | |
| Not classified | 18.4 | 12.9 | 0.3 | |
| Repbase-diff (5546) | ||||
| Well classified | 63.7 | 26.1 | 59.2 | |
| Misclassified |
| 31.8 |
| |
| Not classified | 33.4 | 42.1 | 0.5 | |
| Repbase-all (9665) | ||||
| Well classified | 52.7 | 20.35 | 53.8 | |
| Misclassified |
| 27.09 |
| |
| Not classified | 41.3 | 52.6 | 0.3 | |
Comparison of the performances of PASTEC, REPCLASS, and TECLASS for classification to TE order level.
| Dataset (sequence #) | Performance | PASTEC | PASTEC mapped to REPCLASS order (1) | PASTEC mapped to TECLASS order (2) | REPCLASS (3) | TECLASS (4) |
| Repbase-atha (318) | ||||||
| Well classified | 71.4 | 79.7 | 93.7 | 85.5 | 97.5 | |
| Misclassified |
|
| 2.5 | 2.8 | 1.9 | |
| Not classified | 18.4 | 18.4 | 3.8 | 11.6 | 0.6 | |
| Repbase-diff (5546) | ||||||
| Well classified | 51.3 | 59.1 | 49.7 | 66.9 | 47.7 | |
| Misclassified |
|
| 3.5 | 11.1 |
| |
| Not classified | 33.4 | 33.4 | 46.8 | 17.6 | 1.9 | |
| Repbase-all (9665) | ||||||
| Well classified | 22.4 | 31.9 | 8.6 | 12.4 | 40 | |
| Misclassified |
|
| 6.7 | 33.3 |
| |
| Not classified | 61.8 | 61.8 | 84.7 | 50.84 | 0.25 | |
Note that the classification differs between the three tools. We therefore mapped the PASTEC classification results onto those for REPCLASS (1) and TECLASS (2). (2) Mapped onto TECLASS class I orders only. (3) order considered are: DNA transposon, LTR retrotransposon, helitron, non LTR retrotransposon. (4) order considered are only LTR, LINE/SINE.