| Literature DB >> 20880995 |
Abstract
Miniature inverted-repeat transposable elements (MITEs) are a special type of Class 2 non-autonomous transposable element (TE) that are abundant in the non-coding regions of the genes of many plant and animal species. The accurate identification of MITEs has been a challenge for existing programs because they lack coding sequences and, as such, evolve very rapidly. Because of their importance to gene and genome evolution, we developed MITE-Hunter, a program pipeline that can identify MITEs as well as other small Class 2 non-autonomous TEs from genomic DNA data sets. The output of MITE-Hunter is composed of consensus TE sequences grouped into families that can be used as a library file for homology-based TE detection programs such as RepeatMasker. MITE-Hunter was evaluated by searching the rice genomic database and comparing the output with known rice TEs. It discovered most of the previously reported rice MITEs (97.6%), and found sixteen new elements. MITE-Hunter was also compared with two other MITE discovery programs, FINDMITE and MUST. Unlike MITE-Hunter, neither of these programs can search large genomic data sets including whole genome sequences. More importantly, MITE-Hunter is significantly more accurate than either FINDMITE or MUST as the vast majority of their outputs are false-positives.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20880995 PMCID: PMC3001096 DOI: 10.1093/nar/gkq862
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The five main steps of the MITE-Hunter pipeline. Gray bars are genomic sequences, black and red triangles are TSDs and TIRs, respectively, blue bars are predicted TEs, white bars are homolog sequences, dashed lines are gaps and yellow bars are sequences that are similar to each other but not to those represented by green bars (and vice versa). (A) Identification of candidate TEs. Three predicted candidate TEs are shown. (B) Filtering of false-positives based on the PSA. Four types of alignments are shown (a–d). Except for the candidates in (d), all the others are filtered as false-positives. (C) Selection of TE exemplars. (D) Filtering of false-positives based on the MSA, predicting TSDs and generating consensus sequences. (e) and (f) are two special types of MSA (see text for detail). (E) Selecting new exemplars and grouping TEs into families.
Figure 2.Flowchart of the manual curation of rice Class 2 non-autonomous TEs from MITE-Hunter output. The authentication process began with 700 consensus TEs and was reduced by the number shown for each step. The numbers on the right are the remaining consensus TEs after each step (see text for detail). Three different types of compound TEs are shown (a, b and c). Open and solid bars represent different TEs from different families. (a) One TE inserted into another. (b) Two different adjacent TEs. (c) Two adjacent copies from the same TE family.
Comparison between MITE-Hunter output and rice TEs in Repbase
| Superfamily | Repbase data masked by MITE-Hunter output (%) | MITE-Hunter output masked by Repbase data (%) | ||
|---|---|---|---|---|
| All | MITEs only | All | MITEs only | |
| 93.3 | 100.0 | 72.5 | 99.9 | |
| 83.8 | 94.6 | 53.1 | 93.0 | |
| 85.8 | 100.0 | 25.6 | 28.4 | |
| 81.0 | 99.3 | 49.5 | 80.0 | |
| 88.2 | – | 81.7 | – | |
| Together | 84.9 | 97.6 | 47.9 | 83.4 |
a185 rice Class 2 non-autonomous TEs that are <1.7 kb in Repbase.
b101 MITEs identified and isolated from the data seta.
c551 Class 2 non-autonomous TE consensus sequences curated from the MITE-Hunter output.
d132 MITEs identified and isolated from the data setc.
Comparisons of MITE-Hunter with FINDMITE and MUST
| Program | Running time | Predicted TEs | False-positives (%) |
|---|---|---|---|
| MITE-Hunter | 1.7 h | 114 | 4.4 |
| FINDMITE | <1 min | 10 864 | 85.0 |
| MUST | 5.5 h | 5485 | 86.0 |
aRice chromosome 12 was used as the input data (∼28.2 Mb).
bParameters were set to find only Stowaway MITEs.