| Literature DB >> 20556860 |
Sergey I Mitrofanov1, Alexander Y Panchin, Sergei A Spirin, Andrei V Alexeevski, Yuri V Panchin.
Abstract
We studied the distribution of 1-7 bp words in a dataset that includes 139 complete eukaryotic genomes, 33 masked eukaryotic genomes and coding regions from 35 genomes. We tested different statistical models to determine over- and under-represented words. The method described by Karlin et al. has the strongest predictive power compared to other methods. Using this method we identified over- and under-represented words consistent within a large array of taxonomic groups. Some of those words have not yet been described as exclusive. For example, CGCG is over-represented in CG-deficient organisms. We also describe exceptions for widely known exclusive words, such as CG and TA.Entities:
Mesh:
Year: 2010 PMID: 20556860 DOI: 10.1142/S0219720010004719
Source DB: PubMed Journal: J Bioinform Comput Biol ISSN: 0219-7200 Impact factor: 1.122