Literature DB >> 18541131

Identification of repeat structure in large genomes using repeat probability clouds.

Wanjun Gu1, Todd A Castoe, Dale J Hedges, Mark A Batzer, David D Pollock.   

Abstract

The identification of repeat structure in eukaryotic genomes can be time-consuming and difficult because of the large amount of information ( approximately 3 x 10(9) bp) that needs to be processed and compared. We introduce a new approach based on exact word counts to evaluate, de novo, the repeat structure present within large eukaryotic genomes. This approach avoids sequence alignment and similarity search, two of the most time-consuming components of traditional methods for repeat identification. Algorithms were implemented to efficiently calculate exact counts for any length oligonucleotide in large genomes. Based on these oligonucleotide counts, oligonucleotide excess probability clouds, or "P-clouds," were constructed. P-clouds are composed of clusters of related oligonucleotides that occur, as a group, more often than expected by chance. After construction, P-clouds were mapped back onto the genome, and regions of high P-cloud density were identified as repetitive regions based on a sliding window approach. This efficient method is capable of analyzing the repeat content of the entire human genome on a single desktop computer in less than half a day, at least 10-fold faster than current approaches. The predicted repetitive regions strongly overlap with known repeat elements as well as other repetitive regions such as gene families, pseudogenes, and segmental duplicons. This method should be extremely useful as a tool for use in de novo identification of repeat structure in large newly sequenced genomes.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18541131      PMCID: PMC2533575          DOI: 10.1016/j.ab.2008.05.015

Source DB:  PubMed          Journal:  Anal Biochem        ISSN: 0003-2697            Impact factor:   3.365


  17 in total

Review 1.  Recent duplication, domain accretion and the dynamic mutation of the human genome.

Authors:  E E Eichler
Journal:  Trends Genet       Date:  2001-11       Impact factor: 11.639

Review 2.  Alu repeats and human genomic diversity.

Authors:  Mark A Batzer; Prescott L Deininger
Journal:  Nat Rev Genet       Date:  2002-05       Impact factor: 53.242

3.  Repbase update: a database and an electronic journal of repetitive elements.

Authors:  J Jurka
Journal:  Trends Genet       Date:  2000-09       Impact factor: 11.639

4.  Distributional regimes for the number of k-word matches between two random sequences.

Authors:  Ross A Lippert; Haiyan Huang; Michael S Waterman
Journal:  Proc Natl Acad Sci U S A       Date:  2002-10-08       Impact factor: 11.205

5.  Estimating the repeat structure and length of DNA sequences using L-tuples.

Authors:  Xiaoman Li; Michael S Waterman
Journal:  Genome Res       Date:  2003-08       Impact factor: 9.043

6.  Automated de novo identification of repeat sequence families in sequenced genomes.

Authors:  Zhirong Bao; Sean R Eddy
Journal:  Genome Res       Date:  2002-08       Impact factor: 9.043

Review 7.  Mobile elements: drivers of genome evolution.

Authors:  Haig H Kazazian
Journal:  Science       Date:  2004-03-12       Impact factor: 47.728

8.  Annotating large genomes with exact word matches.

Authors:  John Healy; Elizabeth E Thomas; Jacob T Schwartz; Michael Wigler
Journal:  Genome Res       Date:  2003-09-15       Impact factor: 9.043

9.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools.

Authors:  J D Thompson; T J Gibson; F Plewniak; F Jeanmougin; D G Higgins
Journal:  Nucleic Acids Res       Date:  1997-12-15       Impact factor: 16.971

10.  Analysis of 14 BAC sequences from the Aedes aegypti genome: a benchmark for genome annotation and assembly.

Authors:  Neil F Lobo; Kathy S Campbell; Daniel Thaner; Becky Debruyn; Hean Koo; William M Gelbart; Brendan J Loftus; David W Severson; Frank H Collins
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

View more
  19 in total

1.  Finding and Characterizing Repeats in Plant Genomes.

Authors:  Jacques Nicolas; Sébastien Tempel; Anna-Sophie Fiston-Lavier; Emira Cherif
Journal:  Methods Mol Biol       Date:  2022

2.  Bioinformatics and genomic analysis of transposable elements in eukaryotic genomes.

Authors:  Mateusz Janicki; Rebecca Rooke; Guojun Yang
Journal:  Chromosome Res       Date:  2011-08       Impact factor: 4.620

3.  Bioinformatics Approaches for Determining the Functional Impact of Repetitive Elements on Non-coding RNAs.

Authors:  Chao Zeng; Atsushi Takeda; Kotaro Sekine; Naoki Osato; Tsukasa Fukunaga; Michiaki Hamada
Journal:  Methods Mol Biol       Date:  2022

4.  The Burmese python genome reveals the molecular basis for extreme adaptation in snakes.

Authors:  Todd A Castoe; A P Jason de Koning; Kathryn T Hall; Daren C Card; Drew R Schield; Matthew K Fujita; Robert P Ruggiero; Jack F Degner; Juan M Daza; Wanjun Gu; Jacobo Reyes-Velasco; Kyle J Shaney; Jill M Castoe; Samuel E Fox; Alex W Poole; Daniel Polanco; Jason Dobry; Michael W Vandewege; Qing Li; Ryan K Schott; Aurélie Kapusta; Patrick Minx; Cédric Feschotte; Peter Uetz; David A Ray; Federico G Hoffmann; Robert Bogden; Eric N Smith; Belinda S W Chang; Freek J Vonk; Nicholas R Casewell; Christiaan V Henkel; Michael K Richardson; Stephen P Mackessy; Anne M Bronikowski; Anne M Bronikowsi; Mark Yandell; Wesley C Warren; Stephen M Secor; David D Pollock
Journal:  Proc Natl Acad Sci U S A       Date:  2013-12-02       Impact factor: 11.205

5.  Considering transposable element diversification in de novo annotation approaches.

Authors:  Timothée Flutre; Elodie Duprat; Catherine Feuillet; Hadi Quesneville
Journal:  PLoS One       Date:  2011-01-31       Impact factor: 3.240

6.  Discovery of highly divergent repeat landscapes in snake genomes using high-throughput sequencing.

Authors:  Todd A Castoe; Kathryn T Hall; Marcel L Guibotsy Mboulas; Wanjun Gu; A P Jason de Koning; Samuel E Fox; Alexander W Poole; Vijetha Vemulapalli; Juan M Daza; Todd Mockler; Eric N Smith; Cédric Feschotte; David D Pollock
Journal:  Genome Biol Evol       Date:  2011-05-13       Impact factor: 3.416

7.  Repetitive elements may comprise over two-thirds of the human genome.

Authors:  A P Jason de Koning; Wanjun Gu; Todd A Castoe; Mark A Batzer; David D Pollock
Journal:  PLoS Genet       Date:  2011-12-01       Impact factor: 5.917

8.  LTR retrotransposons contribute to genomic gigantism in plethodontid salamanders.

Authors:  Cheng Sun; Donald B Shepard; Rebecca A Chong; José López Arriaza; Kathryn Hall; Todd A Castoe; Cédric Feschotte; David D Pollock; Rachel Lockridge Mueller
Journal:  Genome Biol Evol       Date:  2011-12-26       Impact factor: 3.416

9.  LINE dancing in the human genome: transposable elements and disease.

Authors:  Victoria P Belancio; Prescott L Deininger; Astrid M Roy-Engel
Journal:  Genome Med       Date:  2009-10-27       Impact factor: 11.117

10.  Microsatellites for next-generation ecologists: a post-sequencing bioinformatics pipeline.

Authors:  Iria Fernandez-Silva; Jonathan Whitney; Benjamin Wainwright; Kimberly R Andrews; Heather Ylitalo-Ward; Brian W Bowen; Robert J Toonen; Erica Goetze; Stephen A Karl
Journal:  PLoS One       Date:  2013-02-12       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.