Literature DB >> 17485477

LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons.

Zhao Xu1, Hao Wang.   

Abstract

Long terminal repeat retrotransposons (LTR elements) are ubiquitous eukaryotic transposable elements. They play important roles in the evolution of genes and genomes. Ever-growing amount of genomic sequences of many organisms present a great challenge to fast identifying them. That is the first and indispensable step to study their structure, distribution, functions and other biological impacts. However, until today, tools for efficient LTR retrotransposon discovery are very limited. Thus, we developed LTR_FINDER web server. Given DNA sequences, it predicts locations and structure of full-length LTR retrotransposons accurately by considering common structural features. LTR_FINDER is a system capable of scanning large-scale sequences rapidly and the first web server for ab initio LTR retrotransposon finding. We illustrate its usage and performance on the genome of Saccharomyces cerevisiae. The web server is freely accessible at http://tlife.fudan.edu.cn/ltr_finder/.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17485477      PMCID: PMC1933203          DOI: 10.1093/nar/gkm286

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

LTR retrotransposons exist in all eukaryotic genomes (1–4) and are especially widespread in plants. They have been found to be the main components of large plant genomes (5–8). Dynamics of these elements are now regarded as an important force in genome and gene evolution. For example, their amplification and removal shape the organization and change the size of genomes (9,10); their transposition effects gene expression (11); and cases of gene movement via LTR retrotransposons were also reported recently (12). High throughput technologies for DNA sequencing are providing unprecedented chance to explore their functions and evolutionary impact on the basis of large-scale genetic information (13–16). It is urgent to develop efficient tools for locating these elements in rapidly deposited genomic sequences. To date, most widely adopted methods of LTR retrotransposon identification in DNA sequences are based on alignment of known elements database to target genome. This class of methods can well detect elements in the database, but can hardly discover elements that is far related to or not in the database. On the other hand, analysis of many sequences of LTR elements in nearly 20 years revealed some structural features (signals) common in these elements, including Long Terminal Repeats (LTRs), Target Site Repeats (TSRs), Primer Binding Sites (PBSs), Polypurine Tract (PPT) and TG … CA box, as well as sites of Reverse Transcriptase (RT), Integrase (IN) and RNaseH (RH). These results have made ab initio computer discovery of LTR elements possible. However, tools for ab initio detection of LTR retrotransposons are still very limited: to the best of our knowledge, only two programs, LTR_STRUC (17) and LTR_par (18), have been reported, none of them being a web server. We present here LTR_FINDER, a web server for efficient discovery of full-length LTR elements in large-scale DNA sequences. Considering the relationship between neighboring exactly matched sequence pairs, LTR_FINDER applies rapid algorithms to construct reliable LTRs and to predict accurate element boundaries through a multi-refinement process. Furthermore, it detects important enzyme domains to improve the confidence of predictions for autonomous elements. LTR_FINDER is freely available at http://tlife.fudan.edu.cn/ltr_finder/.

INPUT AND OUTPUT

User input

LTR_FINDER accepts DNA sequences file of FASTA or multi-FASTA format. Only the first ungapped string in the description line is recorded to identify the input sequence, and the rest of descriptions are ignored. In the sequence lines, Only A, C, G, T and N are allowed, and aligning an ‘N’ with any character is treated as a mismatch. Users are allowed to paste sequences in the ‘Sequence’ box, or upload a local file in the ‘File upload ’ box. The size of web uploading file should not exceed 50Mb. For users who need to scan very large size sequences, binary codes are available on request. When submitting a job, users can choose different parameters for different purposes. We explain some commonly used parameters here. The ‘tRNAs database’ of target species is for prediction of PBS. Because they are relatively conserved across organisms, tRNAs of a close related species can be used if those of the target species are not available. Since PBS is critical in deciding 3′boundaries of 5′LTRs, omitting this parameter will probably cause missing prediction. RT, IN and RH domains are important for an element to transpose. Occurrence of these sites adds weight of a candidate model to be a true autonomous element. If users choose domains in ‘Domain restriction’ options, only models containing selected ones are reported. ‘Extension cutoff’ controls if two neighboring exactly matched pairs should be joined into a longer one, that is, the regions covering them is regarded as a longer highly similar pair. ‘Reliable extension’ effects on identification of obscure overlapping elements. The higher the value is, the more models will be reported.

Program output

LTR_FINDER offers two types of output: full-output and summary-output. Full-output shows details of predictions, including LTRs sizes, element locations in the input sequence, similarity of two LTRs, sharpness (an index for boundary prediction reliability of LTR regions) and so on. Summary-output is extracted from full-output by omitting some detailed information. For each sequence, a diagram can be drawn simultaneously with either type of output. It visualizes location information of full-output. Users can obtain it by clicking on the ‘Output with figure’ button. The diagrams are convenient for human inspection and are very useful when analyzing potential overlapping elements: one can view the relative positions of signals inside LTR elements in details. In a diagram, two background colors, silver and white, are used to show sizes of objects. The program draws l pixels to represent l bases on the silver background while draws nlog(l) pixels to represent l bases on the white background, where n is a constant controlling overall size of the diagram. If users fill in the ‘Get result by e-mail’ box with a valid email address, the server will send the result instead of displaying it. The output file will be stored on the server for 3 days.

APPLICATION EXAMPLES

We describe an example of running LTR_FINDER on yeast chromosome 10 to show the usage of the server. Upload the sequence file, which can be obtained from Saccharomyces Genome Database (http://www.yeastgenome.org/). Here we use the version released on July 27, 1997 in order to compare the results with that described in (19), in which a standard benchmark of 50 full-length LTR retrotransposons on 16 yeast chromosomes were given. Using the default parameters, choosing ‘Saccharomyces cerevisiae tRNA database’ and ‘Output with figure’, we get the result as shown in Figures 1 and 2. Figure 1 gives a complete description of element 1 (pictures of the same element 1 appear in Figures 2 and 3). Explanation of the output items is given in the caption of Figure 1 and more information on output format can be found in documents on the webpage. The diagram of this run is shown in Figure 2. Yeast chromosome X contains a region where two tandem elements resulted from recombination. The program reports two sets of RTs and INs indicating the tandem structure (Figure 2, elements 2). A more sensitive search for overlapping elements by resetting ‘Reliable extension’ and ‘Sharpness lower threshold’ parameters reports the inserted LTR (Figure 3, element 3). Compared with the benchmark, locations of all elements are accurately predicted.
Figure 1.

LTR_FINDER sample output. ‘Status’ is an 11 bits binary string with each position indicating the occurrence of a certain signal. If a signal appears, the corresponding position is recorded ‘1’ and ‘0’ otherwise. From left to right, positions are as follows: [1] TG in 5′end of 5′LTR; [2] CA in 3′end of 5′LTR; [3] TG in 5′end of 3′LTR; [4] CA in 3′end of 3′LTR; [5] TSR; [6] PBS; [7] PPT; [8] RT; [9] IN(core); [10] IN(c-term) and [11] RH. ‘Score’ is an integer varying from 0 to 11. A detected signal adds 1 to its value.

Figure 2.

Diagram of two predicted elements with default parameters. Information of element 1 is shown in Figure 1. Element 2 is composed of two tandem LTR retrotransposons, which resulted from recombined insertion of a circular element. Two sets of enzyme domains are detected.

Figure 3.

Diagram of two tandem elements. Setting ‘Reliable extension’ to 0.95 and “Sharpness lower threshold’ to 0.2, the inserted element (element 3), its 5′LTR locating at 477837—478072, is reported.

LTR_FINDER sample output. ‘Status’ is an 11 bits binary string with each position indicating the occurrence of a certain signal. If a signal appears, the corresponding position is recorded ‘1’ and ‘0’ otherwise. From left to right, positions are as follows: [1] TG in 5′end of 5′LTR; [2] CA in 3′end of 5′LTR; [3] TG in 5′end of 3′LTR; [4] CA in 3′end of 3′LTR; [5] TSR; [6] PBS; [7] PPT; [8] RT; [9] IN(core); [10] IN(c-term) and [11] RH. ‘Score’ is an integer varying from 0 to 11. A detected signal adds 1 to its value. Diagram of two predicted elements with default parameters. Information of element 1 is shown in Figure 1. Element 2 is composed of two tandem LTR retrotransposons, which resulted from recombined insertion of a circular element. Two sets of enzyme domains are detected. Diagram of two tandem elements. Setting ‘Reliable extension’ to 0.95 and “Sharpness lower threshold’ to 0.2, the inserted element (element 3), its 5′LTR locating at 477837—478072, is reported. Using the whole genome of yeast (∼12 Mb) as input, the web server implemented on a 600MHz PC took only 30 s, with RAM consumption <18 M. A total of 52 models were detected and all the 50 target elements were found. Among the test set, 48 were identified exactly, the remaining two predicted ones containing the targets with only 7 bp and 18 bp more in the 5′LTRs, respectively. The testing results gave no false negative and only two false positive reports, showing high speed, high sensitivity (100%) and specificity (96%).

LTR ELEMENT DISCOVERY STRATEGIES

LTR_FINDER identifies full-length LTR element models in genomic sequence in four main steps. The first step selects possible LTR pairs. In the beginning, LTR_FINDER searches for all exactly matched string pairs in the input sequence by a linear time suffix-array algorithm (20). Each pair, say a, is composed of two identical members: string located upstream (a5′) and downstream (a3′). Here upstream and downstream complies with that of the input sequence. Then it selects pairs of which distances between a5′ and a3′ as well as the overall sizes satisfy given restrictions. For each pair a and its downstream neighbor b, if the order of their locations in input sequence is 5′ a5′ … b5′ … a3′ … b3′ 3′, the regions [a5′,b5′] and [a3′,b3′] will be checked whether they should be regarded as a longer highly similar pair. Here ‘highly similar’ means that similarity between two members of the merged pair is greater than ‘Extension cutoff’). Calculation of the similarity involves in a global alignment of two regions: that inside two neighboring upstream strings and that inside two downstream strings. The pair keeps on extending until similarity between its members becomes less than ‘Extension cutoff’. Then it is recorded as an LTR candidate for further analysis. After that, Smith–Waterman algorithm is used to adjust the near-end regions of LTR candidates to get alignment boundaries. These boundaries are subject to re-adjustment again by TG … CA box and TSR supporting. At the end of this step, a set of regions in the input sequence is marked as possible loci for further verification. Secondly, LTR_FINDER tries to find signals in near-LTR regions inside these loci. The program detects PBS by aligning these regions to the 3′tail of tRNAs and PPT by counting purines in a 15 bp sliding window along these regions. This step produces reliable candidates. Additional validation comes from recognizing important enzyme domains. The program locates the most widely shared domain, RT, by first searching for its seven conserved subdomains, then chaining them together under distance restrictions using dynamic programming. This strategy is implemented to all six ORFs and is capable to detect RT domain even when there is a frame shift. For other protein domains such as IN and RH, it calls PS_SCAN (21) to find their locations and possible ORFs. At last, the program gathers information and reports possible LTR retrotransposon models at different confidence levels according to how many signals and domains they hit.

DISCUSSION

LTR_FINDER is the first web server devoted specially to full-length LTR retrotransposon discovery. It processes large-scale genomic sequences efficiently, which makes it applicable to rapid analysis of large genomes such as that of maize and wheat. A few improvements of the server are under way: (i) To make the interface more user-friendly, we plan to add buttons for automatic retrieval of sequences from GeneBank, EMBL and DDBJ by accession number to facilitate user input. (ii) LTR elements close to functional units (e.g. tRNAs, genes or centermeres) will be reported specially. The graphic output of the vicinity of LTR elements will be enhanced to reflect the local organization of functional units and LTR elements. (iii) It is also known that LTR elements may insert into internal regions of other elements to form nested structure. We expect LTR_FINDER to incorporate modules of finding nested elements.
  20 in total

Review 1.  Plant retrotransposons.

Authors:  A Kumar; J L Bennetzen
Journal:  Annu Rev Genet       Date:  1999       Impact factor: 16.830

2.  Initial sequencing and analysis of the human genome.

Authors:  E S Lander; L M Linton; B Birren; C Nusbaum; M C Zody; J Baldwin; K Devon; K Dewar; M Doyle; W FitzHugh; R Funke; D Gage; K Harris; A Heaford; J Howland; L Kann; J Lehoczky; R LeVine; P McEwan; K McKernan; J Meldrim; J P Mesirov; C Miranda; W Morris; J Naylor; C Raymond; M Rosetti; R Santos; A Sheridan; C Sougnez; Y Stange-Thomann; N Stojanovic; A Subramanian; D Wyman; J Rogers; J Sulston; R Ainscough; S Beck; D Bentley; J Burton; C Clee; N Carter; A Coulson; R Deadman; P Deloukas; A Dunham; I Dunham; R Durbin; L French; D Grafham; S Gregory; T Hubbard; S Humphray; A Hunt; M Jones; C Lloyd; A McMurray; L Matthews; S Mercer; S Milne; J C Mullikin; A Mungall; R Plumb; M Ross; R Shownkeen; S Sims; R H Waterston; R K Wilson; L W Hillier; J D McPherson; M A Marra; E R Mardis; L A Fulton; A T Chinwalla; K H Pepin; W R Gish; S L Chissoe; M C Wendl; K D Delehaunty; T L Miner; A Delehaunty; J B Kramer; L L Cook; R S Fulton; D L Johnson; P J Minx; S W Clifton; T Hawkins; E Branscomb; P Predki; P Richardson; S Wenning; T Slezak; N Doggett; J F Cheng; A Olsen; S Lucas; C Elkin; E Uberbacher; M Frazier; R A Gibbs; D M Muzny; S E Scherer; J B Bouck; E J Sodergren; K C Worley; C M Rives; J H Gorrell; M L Metzker; S L Naylor; R S Kucherlapati; D L Nelson; G M Weinstock; Y Sakaki; A Fujiyama; M Hattori; T Yada; A Toyoda; T Itoh; C Kawagoe; H Watanabe; Y Totoki; T Taylor; J Weissenbach; R Heilig; W Saurin; F Artiguenave; P Brottier; T Bruls; E Pelletier; C Robert; P Wincker; D R Smith; L Doucette-Stamm; M Rubenfield; K Weinstock; H M Lee; J Dubois; A Rosenthal; M Platzer; G Nyakatura; S Taudien; A Rump; H Yang; J Yu; J Wang; G Huang; J Gu; L Hood; L Rowen; A Madan; S Qin; R W Davis; N A Federspiel; A P Abola; M J Proctor; R M Myers; J Schmutz; M Dickson; J Grimwood; D R Cox; M V Olson; R Kaul; C Raymond; N Shimizu; K Kawasaki; S Minoshima; G A Evans; M Athanasiou; R Schultz; B A Roe; F Chen; H Pan; J Ramser; H Lehrach; R Reinhardt; W R McCombie; M de la Bastide; N Dedhia; H Blöcker; K Hornischer; G Nordsiek; R Agarwala; L Aravind; J A Bailey; A Bateman; S Batzoglou; E Birney; P Bork; D G Brown; C B Burge; L Cerutti; H C Chen; D Church; M Clamp; R R Copley; T Doerks; S R Eddy; E E Eichler; T S Furey; J Galagan; J G Gilbert; C Harmon; Y Hayashizaki; D Haussler; H Hermjakob; K Hokamp; W Jang; L S Johnson; T A Jones; S Kasif; A Kaspryzk; S Kennedy; W J Kent; P Kitts; E V Koonin; I Korf; D Kulp; D Lancet; T M Lowe; A McLysaght; T Mikkelsen; J V Moran; N Mulder; V J Pollara; C P Ponting; G Schuler; J Schultz; G Slater; A F Smit; E Stupka; J Szustakowki; D Thierry-Mieg; J Thierry-Mieg; L Wagner; J Wallis; R Wheeler; A Williams; Y I Wolf; K H Wolfe; S P Yang; R F Yeh; F Collins; M S Guyer; J Peterson; A Felsenfeld; K A Wetterstrand; A Patrinos; M J Morgan; P de Jong; J J Catanese; K Osoegawa; H Shizuya; S Choi; Y J Chen; J Szustakowki
Journal:  Nature       Date:  2001-02-15       Impact factor: 49.962

3.  Abundance, distribution, and transcriptional activity of repetitive elements in the maize genome.

Authors:  B C Meyers; S V Tingey; M Morgante
Journal:  Genome Res       Date:  2001-10       Impact factor: 9.043

4.  LTR_STRUC: a novel search and identification program for LTR retrotransposons.

Authors:  Eugene M McCarthy; John F McDonald
Journal:  Bioinformatics       Date:  2003-02-12       Impact factor: 6.937

5.  Transcriptional activation of retrotransposons alters the expression of adjacent genes in wheat.

Authors:  Khalil Kashkush; Moshe Feldman; Avraham A Levy
Journal:  Nat Genet       Date:  2002-12-16       Impact factor: 38.330

6.  Efficient algorithms and software for detection of full-length LTR retrotransposons.

Authors:  Anantharaman Kalyanaraman; Srinivas Aluru
Journal:  J Bioinform Comput Biol       Date:  2006-04       Impact factor: 1.122

7.  Evolutionary history of Cer elements and their impact on the C. elegans genome.

Authors:  E W Ganko; K T Fielman; J F McDonald
Journal:  Genome Res       Date:  2001-12       Impact factor: 9.043

8.  Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis.

Authors:  Katrien M Devos; James K M Brown; Jeffrey L Bennetzen
Journal:  Genome Res       Date:  2002-07       Impact factor: 9.043

9.  Molecular paleontology of transposable elements in the Drosophila melanogaster genome.

Authors:  Vladimir V Kapitonov; Jerzy Jurka
Journal:  Proc Natl Acad Sci U S A       Date:  2003-05-12       Impact factor: 11.205

10.  Long terminal repeat retrotransposons of Oryza sativa.

Authors:  Eugene M McCarthy; Jingdong Liu; Gao Lizhi; John F McDonald
Journal:  Genome Biol       Date:  2002-09-13       Impact factor: 13.583

View more
  659 in total

1.  Diversity, distribution and dynamics of full-length Copia and Gypsy LTR retroelements in Solanum lycopersicum.

Authors:  Rosalía Cristina Paz; Melisa Eliana Kozaczek; Hernán Guillermo Rosli; Natalia Pilar Andino; Maria Virginia Sanchez-Puerta
Journal:  Genetica       Date:  2017-08-03       Impact factor: 1.082

2.  A widespread occurrence of extra open reading frames in plant Ty3/gypsy retrotransposons.

Authors:  Veronika Steinbauerová; Pavel Neumann; Petr Novák; Jiří Macas
Journal:  Genetica       Date:  2012-04-29       Impact factor: 1.082

3.  Characterization of transcriptional activation and inserted-into-gene preference of various transposable elements in the Brassica species.

Authors:  Caihua Gao; Meili Xiao; Lingyan Jiang; Jiana Li; Jiaming Yin; Xiaodong Ren; Wei Qian; Ortegón Oscar; Donghui Fu; Zhanglin Tang
Journal:  Mol Biol Rep       Date:  2012-02-11       Impact factor: 2.316

4.  The amplification and evolution of orthologous 22-kDa α-prolamin tandemly arrayed genes in coix, sorghum and maize genomes.

Authors:  Liangliang Zhou; Binbin Huang; Xiangzong Meng; Gang Wang; Fei Wang; Zhengkai Xu; Rentao Song
Journal:  Plant Mol Biol       Date:  2010-10-12       Impact factor: 4.076

5.  The conserved chimeric transcript UPGRADE2 is associated with unreduced pollen formation and is exclusively found in apomictic Boechera species.

Authors:  Martin Mau; José M Corral; Heiko Vogel; Michael Melzer; Jörg Fuchs; Markus Kuhlmann; Nico de Storme; Danny Geelen; Timothy F Sharbel
Journal:  Plant Physiol       Date:  2013-10-15       Impact factor: 8.340

6.  Evolution of teleost fish retroviruses: characterization of new retroviruses with cellular genes.

Authors:  Holly A Basta; Sean B Cleveland; Rochelle A Clinton; Alexander G Dimitrov; Marcella A McClure
Journal:  J Virol       Date:  2009-07-22       Impact factor: 5.103

7.  Repbase Update, a database of repetitive elements in eukaryotic genomes.

Authors:  Weidong Bao; Kenji K Kojima; Oleksiy Kohany
Journal:  Mob DNA       Date:  2015-06-02

8.  Genetic characterization and mapping of the Rht-1 homoeologs and flanking sequences in wheat.

Authors:  Edward P Wilhelm; Rhian M Howells; Nadia Al-Kaff; Jizeng Jia; Catherine Baker; Michelle A Leverington-Waite; Simon Griffiths; Andy J Greenland; Margaret I Boulton; Wayne Powell
Journal:  Theor Appl Genet       Date:  2013-02-05       Impact factor: 5.699

9.  The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution.

Authors:  Ignazio Verde; Albert G Abbott; Simone Scalabrin; Sook Jung; Shengqiang Shu; Fabio Marroni; Tatyana Zhebentyayeva; Maria Teresa Dettori; Jane Grimwood; Federica Cattonaro; Andrea Zuccolo; Laura Rossini; Jerry Jenkins; Elisa Vendramin; Lee A Meisel; Veronique Decroocq; Bryon Sosinski; Simon Prochnik; Therese Mitros; Alberto Policriti; Guido Cipriani; Luca Dondini; Stephen Ficklin; David M Goodstein; Pengfei Xuan; Cristian Del Fabbro; Valeria Aramini; Dario Copetti; Susana Gonzalez; David S Horner; Rachele Falchi; Susan Lucas; Erica Mica; Jonathan Maldonado; Barbara Lazzari; Douglas Bielenberg; Raul Pirona; Mara Miculan; Abdelali Barakat; Raffaele Testolin; Alessandra Stella; Stefano Tartarini; Pietro Tonutti; Pere Arús; Ariel Orellana; Christina Wells; Dorrie Main; Giannina Vizzotto; Herman Silva; Francesco Salamini; Jeremy Schmutz; Michele Morgante; Daniel S Rokhsar
Journal:  Nat Genet       Date:  2013-03-24       Impact factor: 38.330

10.  A high-quality reference genome of wild Cannabis sativa.

Authors:  Shan Gao; Baishi Wang; Shanshan Xie; Xiaoyu Xu; Jin Zhang; Li Pei; Yongyi Yu; Weifei Yang; Ying Zhang
Journal:  Hortic Res       Date:  2020-05-02       Impact factor: 6.793

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.