Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Specificity Analysis of Genome Based on Statistically Identical K-Words With Same Base Combination.

Literature DB >> 35402963

Specificity Analysis of Genome Based on Statistically Identical K-Words With Same Base Combination.

Hyein Seo¹, Yong-Joon Song¹, Kiho Cho², Dong-Ho Cho¹.

Abstract

Goal: Individual characteristics are determined through a genome consisting of a complex base combination. This base combination is reflected in the k-word profile, which represents the number of consecutive k bases. Therefore, it is important to analyze the genome-specific statistical specificity in the k-word profile to understand the characteristics of the genome. In this paper, we propose a new k-word-based method to analyze genome-specific properties.
Methods: We define k-words consisting of the same number of bases as statistically identical k-words. The statistically identical k-words are estimated to appear at a similar frequency by statistical prediction. However, this may not be true in the genome because it is not a random list of bases. The ratio between frequencies of two statistically identical k-words can then be used to investigate the statistical specificity of the genome reflected in the k-word profile. In order to find important ratios representing genomic characteristics, a reference value is calculated that results in a minimum error when classifying data by ratio alone. Finally, we propose a genetic algorithm-based search algorithm to select a minimum set of ratios useful for classification.
Results: The proposed method was applied to the full-length sequence of microorganisms for pathogenicity classification. The classification accuracy of the proposed algorithm was similar to that of conventional methods while using only a few features. Conclusions: We proposed a new method to investigate the genome-specific statistical specificity in the k-word profile which can be applied to find important properties of the genome and classify genome sequences.

Entities: Chemical

Keywords: Alignment-free; genetic algorithm; k-word; microbial pathogenicity; statistical specificity in k-word profile

Year: 2020 PMID： 35402963 PMCID： PMC8983152 DOI： 10.1109/OJEMB.2020.3009055

Source DB: PubMed Journal: IEEE Open J Eng Med Biol ISSN： 2644-1276

Keyword Cloud
References

29 in total

Review 1. A survey of sequence alignment algorithms for next-generation sequencing.

Authors: Heng Li; Nils Homer
Journal: Brief Bioinform Date: 2010-05-11 Impact factor: 11.622

2. Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions.

Authors: Gregory E Sims; Se-Ran Jun; Guohong A Wu; Sung-Hou Kim
Journal: Proc Natl Acad Sci U S A Date: 2009-02-02 Impact factor: 11.205

Review 3. Sequencing technologies - the next generation.

Authors: Michael L Metzker
Journal: Nat Rev Genet Date: 2009-12-08 Impact factor: 53.242

4. A simple k-word interval method for phylogenetic analysis of DNA sequences.

Authors: Shuyan Ding; Yang Li; Xiwu Yang; Tianming Wang
Journal: J Theor Biol Date: 2012-10-18 Impact factor: 2.691

5. Alignment-Free Sequence Analysis and Applications.

Authors: Jie Ren; Xin Bai; Yang Young Lu; Kujin Tang; Ying Wang; Gesine Reinert; Fengzhu Sun
Journal: Annu Rev Biomed Data Sci Date: 2018-04-25

6. Inversion symmetry of DNA k-mer counts: validity and deviations.

Authors: Sagi Shporer; Benny Chor; Saharon Rosset; David Horn
Journal: BMC Genomics Date: 2016-08-31 Impact factor: 3.969

7. Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer.

Authors: Qian Zhang; Se-Ran Jun; Michael Leuze; David Ussery; Intawat Nookaew
Journal: Sci Rep Date: 2017-01-19 Impact factor: 4.379

8. K-mer Content, Correlation, and Position Analysis of Genome DNA Sequences for the Identification of Function and Evolutionary Features.

Authors: Aaron Sievers; Katharina Bosiek; Marc Bisch; Chris Dreessen; Jascha Riedel; Patrick Froß; Michael Hausmann; Georg Hildenbrand
Journal: Genes (Basel) Date: 2017-04-19 Impact factor: 4.096

9. Genomic DNA k-mer spectra: models and modalities.

Authors: Benny Chor; David Horn; Nick Goldman; Yaron Levy; Tim Massingham
Journal: Genome Biol Date: 2009-10-08 Impact factor: 13.583

Review 10. Next-generation sequencing to guide cancer therapy.

Authors: Jeffrey Gagan; Eliezer M Van Allen
Journal: Genome Med Date: 2015-07-29 Impact factor: 11.117