Literature DB >> 19256873

Duplication count distributions in DNA sequences.

Suzanne S Sindi1, Brian R Hunt, James A Yorke.   

Abstract

We study quantitative features of complex repetitive DNA in several genomes by studying sequences that are sufficiently long that they are unlikely to have repeated by chance. For each genome we study, we determine the number of identical copies, the "duplication count," of each sequence of length 40, that is of each "40-mer." We say a 40-mer is "repeated" if its duplication count is at least 2. We focus mainly on "complex" 40-mers, those without short internal repetitions. We find that we can classify most of the complex repeated 40-mers into two categories: one category has its copies clustered closely together on one chromosome, the other has its copies distributed widely across multiple chromosomes. For each genome and each of the categories above, we compute N(c), the number of 40-mers that have duplication count c, for each integer c. In each case, we observe a power-law-like decay in N(c) as c increases from 3 to 50 or higher. In particular, we find that N(c) decays much more slowly than would be predicted by evolutionary models where each 40-mer is equally likely to be duplicated. We also analyze an evolutionary model that does reflect the slow decay of N(c).

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 19256873      PMCID: PMC3121164          DOI: 10.1103/PhysRevE.78.061912

Source DB:  PubMed          Journal:  Phys Rev E Stat Nonlin Soft Matter Phys        ISSN: 1539-3755


  31 in total

1.  Fast algorithms for large-scale genome alignment and comparison.

Authors:  Arthur L Delcher; Adam Phillippy; Jane Carlton; Steven L Salzberg
Journal:  Nucleic Acids Res       Date:  2002-06-01       Impact factor: 16.971

Review 2.  Genome organization and reorganization in evolution: formatting for computation and function.

Authors:  James A Shapiro
Journal:  Ann N Y Acad Sci       Date:  2002-12       Impact factor: 5.691

Review 3.  Short, local duplications in eukaryotic genomes.

Authors:  Elizabeth E Thomas
Journal:  Curr Opin Genet Dev       Date:  2005-10-07       Impact factor: 5.578

Review 4.  Repbase Update, a database of eukaryotic repetitive elements.

Authors:  J Jurka; V V Kapitonov; A Pavlicek; P Klonowski; O Kohany; J Walichiewicz
Journal:  Cytogenet Genome Res       Date:  2005       Impact factor: 1.636

Review 5.  Segmental duplications and the evolution of the primate genome.

Authors:  Rhea Vallente Samonte; Evan E Eichler
Journal:  Nat Rev Genet       Date:  2002-01       Impact factor: 53.242

Review 6.  Evolutionary dynamics of microsatellite DNA.

Authors:  C Schlötterer
Journal:  Chromosoma       Date:  2000-09       Impact factor: 4.316

7.  Equilibrium distributions of microsatellite repeat length resulting from a balance between slippage events and point mutations.

Authors:  S Kruglyak; R T Durrett; M D Schug; C F Aquadro
Journal:  Proc Natl Acad Sci U S A       Date:  1998-09-01       Impact factor: 11.205

8.  Distribution of short paired duplications in mammalian genomes.

Authors:  Elizabeth E Thomas; Nathan Srebro; Jonathan Sebat; Nicholas Navin; John Healy; Bud Mishra; Michael Wigler
Journal:  Proc Natl Acad Sci U S A       Date:  2004-07-06       Impact factor: 11.205

Review 9.  Genome sequence of the nematode C. elegans: a platform for investigating biology.

Authors: 
Journal:  Science       Date:  1998-12-11       Impact factor: 47.728

10.  Birth and death of protein domains: a simple model of evolution explains power law behavior.

Authors:  Georgy P Karev; Yuri I Wolf; Andrey Y Rzhetsky; Faina S Berezovskaya; Eugene V Koonin
Journal:  BMC Evol Biol       Date:  2002-10-14       Impact factor: 3.260

View more
  7 in total

1.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

Authors:  Guillaume Marçais; Carl Kingsford
Journal:  Bioinformatics       Date:  2011-01-07       Impact factor: 6.937

2.  Algebraic distribution of segmental duplication lengths in whole-genome sequence self-alignments.

Authors:  Kun Gao; Jonathan Miller
Journal:  PLoS One       Date:  2011-07-14       Impact factor: 3.240

3.  Diminishing return for increased Mappability with longer sequencing reads: implications of the k-mer distributions in the human genome.

Authors:  Wentian Li; Jan Freudenberg; Pedro Miramontes
Journal:  BMC Bioinformatics       Date:  2014-01-03       Impact factor: 3.169

4.  How evolution of genomes is reflected in exact DNA sequence match statistics.

Authors:  Florian Massip; Michael Sheinman; Sophie Schbath; Peter F Arndt
Journal:  Mol Biol Evol       Date:  2014-11-13       Impact factor: 16.240

5.  Evolutionary dynamics of selfish DNA explains the abundance distribution of genomic subsequences.

Authors:  Michael Sheinman; Anna Ramisch; Florian Massip; Peter F Arndt
Journal:  Sci Rep       Date:  2016-08-04       Impact factor: 4.379

6.  Genome Sequencing of Hericium coralloides by a Combination of PacBio RS II and Next-Generation Sequencing Platforms.

Authors:  Caixia Zhang; Lijun Xu; Jian Li; Jiansong Chen; Manjun Yang
Journal:  Int J Genomics       Date:  2022-01-31       Impact factor: 2.326

7.  A benchmark study of k-mer counting methods for high-throughput sequencing.

Authors:  Swati C Manekar; Shailesh R Sathe
Journal:  Gigascience       Date:  2018-12-01       Impact factor: 6.524

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.