| Literature DB >> 20480005 |
Zu-Guo Yu1, Xiao-Wen Zhan, Guo-Sheng Han, Roger W Wang, Vo Anh, Ka Hou Chu.
Abstract
A shortcoming of most correlation distance methods based on the composition vectors without alignment developed for phylogenetic analysis using complete genomes is that the "distances" are not proper distance metrics in the strict mathematical sense. In this paper we propose two new correlation-related distance metrics to replace the old one in our dynamical language approach. Four genome datasets are employed to evaluate the effects of this replacement from a biological point of view. We find that the two proper distance metrics yield trees with the same or similar topologies as/to those using the old "distance" and agree with the tree of life based on 16S rRNA in a majority of the basic branches. Hence the two proper correlation-related distance metrics proposed here improve our dynamical language approach for phylogenetic analysis.Entities:
Keywords: complete genome; composition vector; correlation-related distance metric; phylogenetic analysis
Mesh:
Year: 2010 PMID: 20480005 PMCID: PMC2869232 DOI: 10.3390/ijms11031141
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1.The plot of mean value of X over all K-strings as a function of K. The abbreviations “Mycge”, “PorpuC” and Dvir” are one of genomes in our first three datasets.
Figure 2.Phylogeny of 109 organisms (prokaryotes and eukaryotes) using the dynamical language approach with chord distance in the case K = 6 based on all protein sequences.
Figure 3.Phylogeny of chloroplast genomes using the dynamical language approach with piecewise distance in the case K = 6 based on all protein sequences.
Figure 4.The NJ tree of mitochondrial genomes based on the whole DNA sequences using the dynamical language approach with chord distance in the case K = 11. In this tree the birds and reptiles group together as Archosauria.
Figure 5.Phylogeny of 62 alpha-proteobacteria using the dynamical language approach with chord distance in the cases K = 5 and 6 based on all protein sequences. The topology of trees obtained by the dynamical language approach with pseudo-distance in [5] and piecewise distance in the cases K = 5 and 6 based on all protein sequences are the same as that in this figure.