| Literature DB >> 26376976 |
Dapeng Wang1,2, Jiayue Xu3,4, Jun Yu5.
Abstract
BACKGROUND: The K-mer approach, treating genomic sequences as simple characters and counting the relative abundance of each string upon a fixed K, has been extensively applied to phylogeny inference for genome assembly, annotation, and comparison.Entities:
Mesh:
Year: 2015 PMID: 26376976 PMCID: PMC4573299 DOI: 10.1186/s13062-015-0083-4
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Fig. 1An brief overview for KGCAK functionality. a A tree built from 5-mer protein sequences of nuclear genomes based on kitsch method. b A tree built from 5-mer protein sequences of mitochondrial genomes based on kitsch method. c An example for genomic parameter result from DNA sequences in five nuclear genomes. "A Content", "C Content", "G Content", "T Content", "GC Content", and "Purine Content" represent the percentages of nucleotide A, C, G, T, G + C and A + G in T + C + A + G of genomic sequences; N Content means percentage of nucleotide N in T + C + A + G + N of genomic sequences. d An example for K-mer statistics from cDNA sequences in five nuclear genomes in terms of 5-mer. In particular, "Information Entropy" is defined as the Shannon information entropy calculated from a K-mer array and the formula is H = −∑Pilog(Pi), where Pi is frequency of each K-mer. "Distance to Even" indicates the summary of square of difference between individual element and global average value in the K-mer array. e An example for uniqueness ratio from DNA sequences in five nuclear genomes. f An example for frequency distribution from DNA sequences in five nuclear genomes in terms of 8-mer. g An example for genome-complexity-3D compared between entropy-DNA-10mer, genome GC content, and genome size from five nuclear genomes