| Literature DB >> 35052350 |
Mikhail Biryukov1, Kirill Ustyantsev1.
Abstract
Retrotransposons comprise a substantial fraction of eukaryotic genomes, reaching the highest proportions in plants. Therefore, identification and annotation of retrotransposons is an important task in studying the regulation and evolution of plant genomes. The majority of computational tools for mining transposable elements (TEs) are designed for subsequent genome repeat masking, often leaving aside the element lineage classification and its protein domain composition. Additionally, studies focused on the diversity and evolution of a particular group of retrotransposons often require substantial customization efforts from researchers to adapt existing software to their needs. Here, we developed a computational pipeline to mine sequences of protein-coding retrotransposons based on the sequences of their conserved protein domains-DARTS (Domain-Associated Retrotransposon Search). Using the most abundant group of TEs in plants-long terminal repeat (LTR) retrotransposons (LTR-RTs)-we show that DARTS has radically higher sensitivity for LTR-RT identification compared to the widely accepted tool LTRharvest. DARTS can be easily customized for specific user needs. As a result, DARTS returns a set of structurally annotated nucleotide and amino acid sequences which can be readily used in subsequent comparative and phylogenetic analyses. DARTS may facilitate researchers interested in the discovery and detailed analysis of the diversity and evolution of retrotransposons, LTR-RTs, and other protein-coding TEs.Entities:
Keywords: LTR retrotransposons; automatic pipeline; domain annotation; retroelements; software
Mesh:
Substances:
Year: 2021 PMID: 35052350 PMCID: PMC8775202 DOI: 10.3390/genes13010009
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Principle scheme of the DARTS (Domain-Associated Retrotransposon Search) workflow. Detailed description of each step is in the text.
Figure 2Sensitivity of LTR retrotransposon (LTR-RT) identification by the DARTS and LTRharvest pipelines. Orange circles—number of elements found by DARTS; green circles—number of LTR-RTs found by DARTS with predicted LTRs; purple circles—number of elements found by LTRharvest. Sizes of the circles are proportional to the number of elements with relation to the minimum and the maximum values. Exact numbers of elements are indicated to the left of the circles. Parameters of the LTRharvest search were the same for both the approaches (a,b). (a) Prediction of LTR-RTs by DARTS when the search was initiated from the RT domain; LTRharvest elements were retained if the RT domain was present. (b) Prediction of LTR-RTs by DARTS when the search was initiated from the aRNH domain; DARTS and LTRharvest elements were retained if both the RT and aRNH domains were present.