Literature DB >> 35402963

Specificity Analysis of Genome Based on Statistically Identical K-Words With Same Base Combination.

Hyein Seo1, Yong-Joon Song1, Kiho Cho2, Dong-Ho Cho1.   

Abstract

Goal: Individual characteristics are determined through a genome consisting of a complex base combination. This base combination is reflected in the k-word profile, which represents the number of consecutive k bases. Therefore, it is important to analyze the genome-specific statistical specificity in the k-word profile to understand the characteristics of the genome. In this paper, we propose a new k-word-based method to analyze genome-specific properties.
Methods: We define k-words consisting of the same number of bases as statistically identical k-words. The statistically identical k-words are estimated to appear at a similar frequency by statistical prediction. However, this may not be true in the genome because it is not a random list of bases. The ratio between frequencies of two statistically identical k-words can then be used to investigate the statistical specificity of the genome reflected in the k-word profile. In order to find important ratios representing genomic characteristics, a reference value is calculated that results in a minimum error when classifying data by ratio alone. Finally, we propose a genetic algorithm-based search algorithm to select a minimum set of ratios useful for classification.
Results: The proposed method was applied to the full-length sequence of microorganisms for pathogenicity classification. The classification accuracy of the proposed algorithm was similar to that of conventional methods while using only a few features. Conclusions: We proposed a new method to investigate the genome-specific statistical specificity in the k-word profile which can be applied to find important properties of the genome and classify genome sequences.

Entities:  

Keywords:  Alignment-free; genetic algorithm; k-word; microbial pathogenicity; statistical specificity in k-word profile

Year:  2020        PMID: 35402963      PMCID: PMC8983152          DOI: 10.1109/OJEMB.2020.3009055

Source DB:  PubMed          Journal:  IEEE Open J Eng Med Biol        ISSN: 2644-1276


  29 in total

Review 1.  A survey of sequence alignment algorithms for next-generation sequencing.

Authors:  Heng Li; Nils Homer
Journal:  Brief Bioinform       Date:  2010-05-11       Impact factor: 11.622

2.  Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions.

Authors:  Gregory E Sims; Se-Ran Jun; Guohong A Wu; Sung-Hou Kim
Journal:  Proc Natl Acad Sci U S A       Date:  2009-02-02       Impact factor: 11.205

Review 3.  Sequencing technologies - the next generation.

Authors:  Michael L Metzker
Journal:  Nat Rev Genet       Date:  2009-12-08       Impact factor: 53.242

4.  A simple k-word interval method for phylogenetic analysis of DNA sequences.

Authors:  Shuyan Ding; Yang Li; Xiwu Yang; Tianming Wang
Journal:  J Theor Biol       Date:  2012-10-18       Impact factor: 2.691

5.  Alignment-Free Sequence Analysis and Applications.

Authors:  Jie Ren; Xin Bai; Yang Young Lu; Kujin Tang; Ying Wang; Gesine Reinert; Fengzhu Sun
Journal:  Annu Rev Biomed Data Sci       Date:  2018-04-25

6.  Inversion symmetry of DNA k-mer counts: validity and deviations.

Authors:  Sagi Shporer; Benny Chor; Saharon Rosset; David Horn
Journal:  BMC Genomics       Date:  2016-08-31       Impact factor: 3.969

7.  Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer.

Authors:  Qian Zhang; Se-Ran Jun; Michael Leuze; David Ussery; Intawat Nookaew
Journal:  Sci Rep       Date:  2017-01-19       Impact factor: 4.379

8.  K-mer Content, Correlation, and Position Analysis of Genome DNA Sequences for the Identification of Function and Evolutionary Features.

Authors:  Aaron Sievers; Katharina Bosiek; Marc Bisch; Chris Dreessen; Jascha Riedel; Patrick Froß; Michael Hausmann; Georg Hildenbrand
Journal:  Genes (Basel)       Date:  2017-04-19       Impact factor: 4.096

9.  Genomic DNA k-mer spectra: models and modalities.

Authors:  Benny Chor; David Horn; Nick Goldman; Yaron Levy; Tim Massingham
Journal:  Genome Biol       Date:  2009-10-08       Impact factor: 13.583

Review 10.  Next-generation sequencing to guide cancer therapy.

Authors:  Jeffrey Gagan; Eliezer M Van Allen
Journal:  Genome Med       Date:  2015-07-29       Impact factor: 11.117

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.