| Literature DB >> 22065497 |
Xingqin Qi1, Qin Wu, Yusen Zhang, Eddie Fuller, Cun-Quan Zhang.
Abstract
Determination of sequence similarity is one of the major steps in computational phylogenetic studies. As we know, during evolutionary history, not only DNA mutations for individual nucleotide but also subsequent rearrangements occurred. It has been one of major tasks of computational biologists to develop novel mathematical descriptors for similarity analysis such that various mutation phenomena information would be involved simultaneously. In this paper, different from traditional methods (eg, nucleotide frequency, geometric representations) as bases for construction of mathematical descriptors, we construct novel mathematical descriptors based on graph theory. In particular, for each DNA sequence, we will set up a weighted directed graph. The adjacency matrix of the directed graph will be used to induce a representative vector for DNA sequence. This new approach measures similarity based on both ordering and frequency of nucleotides so that much more information is involved. As an application, the method is tested on a set of 0.9-kb mtDNA sequences of twelve different primate species. All output phylogenetic trees with various distance estimations have the same topology, and are generally consistent with the reported results from early studies, which proves the new method's efficiency; we also test the new method on a simulated data set, which shows our new method performs better than traditional global alignment method when subsequent rearrangements happen frequently during evolutionary history.Entities:
Keywords: DNA sequence; mathematical descriptor; similarity analysis; weighted graph
Year: 2011 PMID: 22065497 PMCID: PMC3204935 DOI: 10.4137/EBO.S7364
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Figure 1.Directed multi-graph G for S = ACGTATC with α = 1/2.
Figure 2.Simplified graph G for S = ACGTATC.
0.9-kb mtDNA fragments of 12 species.
| Macaca fascicular | M22653 | M.fas | 896 | NCBI |
| Macaca fuscata | M22651 | M.fus | 896 | NCBI |
| Macaca mulatta | M22650 | M.mul | 896 | NCBI |
| Macaca sylvanus | M22654 | M.syl | 896 | NCBI |
| Saimiri scirueus | M22655 | S.sci | 893 | NCBI |
| Chimpanzee | V00672 | Chi | 896 | NCBI |
| Lemur catta | M22657 | Lemur | 895 | NCBI |
| Gorilla | V00658 | Gorilla | 896 | NCBI |
| Hylobates | V00659 | Hyl. | 896 | NCBI |
| Orangutan | V00675 | Ora | 895 | NCBI |
| Tarsisus syrichta | M22656 | T.syr | 895 | NCBI |
| Human | L00016 | Human | 896 | NCBI |
Figure 3.Previous phylogenetic trees for these 12 species based on different methods. (A) Figure 3 in24 (B) Figure 1 in26 (C) Figure 1 in.25
The upper triangular part of similarity/dissimilairty matrix based on d1.
| Lemur | 0 | 0.0511 | 0.0169 | 0.0358 | 0.0539 | 0.0373 | 0.0327 | 0.0221 | 0.0510 | 0.0702 | 0.0171 | 0.0591 |
| Chi | 0 | 0.0528 | 0.0183 | 0.0072 | 0.0171 | 0.0211 | 0.0325 | 0.0179 | 0.0264 | 0.0654 | 0.0098 | |
| S.Sci | 0 | 0.0362 | 0.0545 | 0.0395 | 0.0347 | 0.0286 | 0.0496 | 0.0716 | 0.0201 | 0.0592 | ||
| M.fas | 0 | 0.0210 | 0.0085 | 0.0059 | 0.0172 | 0.0196 | 0.0391 | 0.0488 | 0.0255 | |||
| Gorilla | 0 | 0.0186 | 0.0233 | 0.0354 | 0.0133 | 0.0211 | 0.0679 | 0.0058 | ||||
| M.fus | 0 | 0.0063 | 0.0181 | 0.0174 | 0.0342 | 0.0514 | 0.0238 | |||||
| M.mul | 0 | 0.0131 | 0.0212 | 0.0397 | 0.0463 | 0.0284 | ||||||
| M.syl | 0 | 0.0326 | 0.0509 | 0.0355 | 0.0406 | |||||||
| Hyl | 0 | 0.0243 | 0.0631 | 0.0169 | ||||||||
| Ora | 0 | 0.0841 | 0.0198 | |||||||||
| T.syr | 0 | 0.0730 | ||||||||||
| Human | 0 |
The upper triangular part of similarity matrix based on d2.
| S.sci | 0 | 0.0163 | 0.0016 | 0.0080 | 0.0182 | 0.0087 | 0.0067 | 0.0030 | 0.0163 | 0.0305 | 0.0018 | 0.0219 |
| Chi | 0 | 0.0177 | 0.0021 | 0.0003 | 0.0018 | 0.0028 | 0.0066 | 0.0020 | 0.0043 | 0.0269 | 0.0006 | |
| Lemur | 0 | 0.0084 | 0.0190 | 0.0099 | 0.0076 | 0.0050 | 0.0160 | 0.0321 | 0.0024 | 0.0225 | ||
| M.fas | 0 | 0.0028 | 0.0004 | 0.0002 | 0.0018 | 0.0025 | 0.0094 | 0.0150 | 0.0042 | |||
| Gorilla | 0 | 0.0022 | 0.0034 | 0.0078 | 0.0011 | 0.0027 | 0.0290 | 0.0002 | ||||
| M.fus | 0 | 0.0002 | 0.0020 | 0.0019 | 0.0072 | 0.0166 | 0.0036 | |||||
| M.mul | 0 | 0.0011 | 0.0028 | 0.0098 | 0.0135 | 0.0051 | ||||||
| M.syl | 0 | 0.0066 | 0.0160 | 0.0079 | 0.0103 | |||||||
| Hyl | 0 | 0.0034 | 0.0252 | 0.0018 | ||||||||
| Ora | 0 | 0.0438 | 0.0023 | |||||||||
| T.syr | 0 | 0.0336 | ||||||||||
| Human | 0 |
The upper triangular part of similarity matrix based on d3.
| S.sci | 0 | 0.0749 | 0.0024 | 0.0360 | 0.0840 | 0.0396 | 0.0302 | 0.0136 | 0.0752 | 0.1348 | 0.0082 | 0.1015 |
| Chi | 0 | 0.0869 | 0.0098 | 0.0014 | 0.0087 | 0.0134 | 0.0302 | 0.0081 | 0.0183 | 0.1252 | 0.0027 | |
| Lemur | 0 | 0.0429 | 0.0959 | 0.0473 | 0.0366 | 0.0196 | 0.0851 | 0.1485 | 0.0078 | 0.1142 | ||
| M.fas | 0 | 0.0137 | 0.0016 | 0.0007 | 0.0068 | 0.0123 | 0.0409 | 0.0709 | 0.0205 | |||
| Gorilla | 0 | 0.0103 | 0.0166 | 0.0358 | 0.0046 | 0.0103 | 0.1367 | 0.0011 | ||||
| M.fus | 0 | 0.0012 | 0.0090 | 0.0076 | 0.0316 | 0.0773 | 0.0171 | |||||
| M.mul | 0 | 0.0044 | 0.0127 | 0.0431 | 0.0629 | 0.0247 | ||||||
| M.syl | 0 | 0.0284 | 0.0708 | 0.0357 | 0.0476 | |||||||
| Hyl | 0 | 0.0109 | 0.1210 | 0.0083 | ||||||||
| Ora | 0 | 0.1956 | 0.0086 | |||||||||
| T.syr | 0 | 0.1586 | ||||||||||
| Human | 0 |
Figure 4.Phylogetic tree for these 12 species based on Tables 2–4.
| 3 | 10 | 100 | 1000 | 10000 |
| DET method | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| Alignment | 0.2000 | 0.3000 | 0.4000 | 0.2000 | 0.3000 | 0.2667 | 0.3857 | 0.2250 | 0.2222 | 0.2500 |