Literature DB >> 15087315

How independent are the appearances of n-mers in different genomes?

Yuriy Fofanov1, Yi Luo, Charles Katili, Jim Wang, Yuri Belosludtsev, Thomas Powdrill, Chetan Belapurkar, Viacheslav Fofanov, Tong-Bin Li, Sergey Chumakov, B Montgomery Pettitt.   

Abstract

MOTIVATION: Analysis of statistical properties of DNA sequences is important for evolutional biology as well as for DNA probe and PCR technologies. These technologies, in turn, can be used for organism identification, which implies applications in the diagnosis of infectious diseases, environmental studies, etc.
RESULTS: We present results of the correlation analysis of distributions of the presence/absence of short nucleotide subsequences of different length ('n-mers', n = 5-20) in more than 1500 microbial and virus genomes, together with five genomes of multicellular organisms (including human). We calculate whether a given n-mer is present or absent (frequency of presence) in a given genome, which is not the usually calculated number of appearances of n-mers in one or more genomes (frequency of appearance). For organisms that are not close relatives of each other, the presence/absence of different 7-20mers in their genomes are not correlated. For close biological relatives, some correlation of the presence of n-mers in this range appears, but is not as strong as expected. Suppressed correlations among the n-mers present in different genomes leads to the possibility of using random sets of n-mers (with appropriately chosen n) to discriminate genomes of different organisms and possibly individual genomes of the same species including human with a low probability of error.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15087315     DOI: 10.1093/bioinformatics/bth266

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  30 in total

1.  The theoretical basis of universal identification systems for bacteria and viruses.

Authors:  S Chumakov; C Belapurkar; C Putonti; T-B Li; B M Pettitt; G E Fox; R C Willson; Yu Fofanov
Journal:  J Biol Phys Chem       Date:  2005-12-01

2.  Scale-invariant structure of strongly conserved sequence in genomic intersections and alignments.

Authors:  William Salerno; Paul Havlak; Jonathan Miller
Journal:  Proc Natl Acad Sci U S A       Date:  2006-08-21       Impact factor: 11.205

3.  Evolutionary mechanism and biological functions of 8-mers containing CG dinucleotide in yeast.

Authors:  Yan Zheng; Hong Li; Yue Wang; Hu Meng; Qiang Zhang; Xiaoqing Zhao
Journal:  Chromosome Res       Date:  2017-02-09       Impact factor: 5.239

4.  HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.

Authors:  Sergey Nurk; Brian P Walenz; Arang Rhie; Mitchell R Vollger; Glennis A Logsdon; Robert Grothe; Karen H Miga; Evan E Eichler; Adam M Phillippy; Sergey Koren
Journal:  Genome Res       Date:  2020-08-14       Impact factor: 9.043

5.  Disentangling sRNA-Seq data to study RNA communication between species.

Authors:  José Roberto Bermúdez-Barrientos; Obed Ramírez-Sánchez; Franklin Wang-Ngai Chow; Amy H Buck; Cei Abreu-Goodger
Journal:  Nucleic Acids Res       Date:  2020-02-28       Impact factor: 16.971

6.  On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic inference.

Authors:  Alexis Criscuolo
Journal:  F1000Res       Date:  2020-11-10

7.  Abundant oligonucleotides common to most bacteria.

Authors:  Colin F Davenport; Burkhard Tümmler
Journal:  PLoS One       Date:  2010-03-23       Impact factor: 3.240

8.  Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies.

Authors:  Arang Rhie; Brian P Walenz; Sergey Koren; Adam M Phillippy
Journal:  Genome Biol       Date:  2020-09-14       Impact factor: 13.583

9.  Libra: scalable k-mer-based tool for massive all-vs-all metagenome comparisons.

Authors:  Illyoung Choi; Alise J Ponsero; Matthew Bomhoff; Ken Youens-Clark; John H Hartman; Bonnie L Hurwitz
Journal:  Gigascience       Date:  2019-02-01       Impact factor: 6.524

10.  Genomic DNA k-mer spectra: models and modalities.

Authors:  Benny Chor; David Horn; Nick Goldman; Yaron Levy; Tim Massingham
Journal:  Genome Biol       Date:  2009-10-08       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.