MOTIVATION: Repetitive DNA makes up large portions of plant and animal nuclear genomes, yet it remains the least-characterized genome component in most species studied so far. Although the recent availability of high-throughput sequencing data provides necessary resources for in-depth investigation of genomic repeats, its utility is hampered by the lack of specialized bioinformatics tools and appropriate computational resources that would enable large-scale repeat analysis to be run by biologically oriented researchers. RESULTS: Here we present RepeatExplorer, a collection of software tools for characterization of repetitive elements, which is accessible via web interface. A key component of the server is the computational pipeline using a graph-based sequence clustering algorithm to facilitate de novo repeat identification without the need for reference databases of known elements. Because the algorithm uses short sequences randomly sampled from the genome as input, it is ideal for analyzing next-generation sequence reads. Additional tools are provided to aid in classification of identified repeats, investigate phylogenetic relationships of retroelements and perform comparative analysis of repeat composition between multiple species. The server allows to analyze several million sequence reads, which typically results in identification of most high and medium copy repeats in higher plant genomes.
MOTIVATION: Repetitive DNA makes up large portions of plant and animal nuclear genomes, yet it remains the least-characterized genome component in most species studied so far. Although the recent availability of high-throughput sequencing data provides necessary resources for in-depth investigation of genomic repeats, its utility is hampered by the lack of specialized bioinformatics tools and appropriate computational resources that would enable large-scale repeat analysis to be run by biologically oriented researchers. RESULTS: Here we present RepeatExplorer, a collection of software tools for characterization of repetitive elements, which is accessible via web interface. A key component of the server is the computational pipeline using a graph-based sequence clustering algorithm to facilitate de novo repeat identification without the need for reference databases of known elements. Because the algorithm uses short sequences randomly sampled from the genome as input, it is ideal for analyzing next-generation sequence reads. Additional tools are provided to aid in classification of identified repeats, investigate phylogenetic relationships of retroelements and perform comparative analysis of repeat composition between multiple species. The server allows to analyze several million sequence reads, which typically results in identification of most high and medium copy repeats in higher plant genomes.
Authors: Paul G Wolf; Emily B Sessa; Daniel Blaine Marchant; Fay-Wei Li; Carl J Rothfels; Erin M Sigel; Matthew A Gitzendanner; Clayton J Visger; Jo Ann Banks; Douglas E Soltis; Pamela S Soltis; Kathleen M Pryer; Joshua P Der Journal: Genome Biol Evol Date: 2015-08-26 Impact factor: 3.416
Authors: Petr Novák; Laura Ávila Robledillo; Andrea Koblížková; Iva Vrbová; Pavel Neumann; Jirí Macas Journal: Nucleic Acids Res Date: 2017-07-07 Impact factor: 16.971
Authors: Ilya V Kirov; Anna V Kiseleva; Katrijn Van Laere; Nadine Van Roy; Ludmila I Khrustaleva Journal: Mol Genet Genomics Date: 2017-02-01 Impact factor: 3.291
Authors: Francisco J Ruiz-Ruano; Ángeles Cuadrado; Eugenia E Montiel; Juan Pedro M Camacho; María Dolores López-León Journal: Chromosoma Date: 2014-11-12 Impact factor: 4.316
Authors: Francisco J Ruiz-Ruano; Josefa Cabrero; María Dolores López-León; Antonio Sánchez; Juan Pedro M Camacho Journal: Chromosoma Date: 2017-09-04 Impact factor: 4.316
Authors: Juana Gutiérrez; Luz Lamelas; Gaël Aleix-Mata; María Arroyo; Juan Alberto Marchal; Teresa Palomeque; Pedro Lorite; Antonio Sánchez Journal: Genetica Date: 2018-08-25 Impact factor: 1.082