| Literature DB >> 28941638 |
Hai Ming Ni1, Da Wei Qi2, Hongbo Mu3.
Abstract
Converting DNA sequence to image by using chaos game representation (CGR) is an effective genome sequence pretreatment technology, which provides the basis for further analysis between the different genes. In this paper, we have constructed 10 mammal species, 48 hepatitis E virus (HEV), and 10 kinds of bacteria genetic CGR images, respectively, to calculate the mean structural similarity (MSSIM) coefficient between every two CGR images. From our analysis, the MSSIM coefficient of gene CGR images can accurately reflect the similarity degrees between different genomes. Hierarchical clustering analysis was used to calculate the class affiliation and construct a dendrogram. Large numbers of experiments showed that this method gives comparable results to the traditional Clustal X phylogenetic tree construction method, and is significantly faster in the clustering analysis process. Meanwhile MSSIM combined CGR method was also able to efficiently clustering of large genome sequences, which the traditional multiple sequence alignment methods (e.g. Clustal X, Clustal Omega, Clustal W, et al.) cannot classify.Entities:
Keywords: Chaos game representation; Genome sequences; Hierarchical clustering analysis; Structural similarity
Mesh:
Year: 2017 PMID: 28941638 DOI: 10.1016/j.ygeno.2017.09.010
Source DB: PubMed Journal: Genomics ISSN: 0888-7543 Impact factor: 5.736