| Literature DB >> 32327765 |
Ying Guo1, Yan-Fang Wang1, Sheng-Li Zhang2.
Abstract
We presented a novel way to numerically characterize DNA sequences based on the graphical representation for the sequences comparison and analysis. Instead of calculating the leading eigenvalues of the matrix for graphical representation, we computed curvature and torsion of curves as the descriptor to numerically characterize DNA sequences. The new method was tested on three data sets: the coding sequences of β-globin gene, all of their exons, and 24 coronavirus geneomes from NCBI. The similarities/dissimilarities and phylogenetic tree of these species verify the validity of our method. © 2010 Wiley Periodicals, Inc. Int J Quantum Chem, 2011.Entities:
Keywords: curvature; graphical representation; phylogenetic tree; torsion
Year: 2010 PMID: 32327765 PMCID: PMC7168550 DOI: 10.1002/qua.22872
Source DB: PubMed Journal: Int J Quantum Chem ISSN: 0020-7608 Impact factor: 2.444
Figure 1The 3‐D graphical representation of the sequence ATGGTGCACC.
Figure 2The 3‐D graphical representation of the sequence ATGGTGCACC.
The accession numbers, length, and location for each β‐globin genes and their exons
| Species | Database | ID | Location | Length (bp) | Location of each exon |
|---|---|---|---|---|---|
| 1 Human | NCBI | U01317 | 62187–63610 | 1424 | 62187···62278, 62409··· 62631, 63482··· 63610 |
| 2 Chimpanzee | NCBI | X02345 | 4189–5532 | 1344 | 4189···4293, 4412···4633, 5484···5532 |
| 3 Gorilla | NCBI | X61109 | 4538–5881 | 1344 | 4538···4630, 4761···4982, 5833···5881 |
| 4 Lemur | NCBI | M15734 | 154–1595 | 1442 | 154···245, 376···598, 1467···1595 |
| 5 Rat | NCBI | X06701 | 310–1505 | 1196 | 310···401, 517···739, 1377···1505 |
| 6 Mouse | NCBI | V00722 | 275–1462 | 1188 | 275···367, 484···705, 1334···1462 |
| 7 Goat | NCBI | M15387 | 279–1749 | 1471 | 279···364, 493···715, 1621···1749 |
| 8 Bovine | NCBI | X00376 | 278–1741 | 1464 | 278···363, 492···714, 1613···1741 |
| 9 Rabbit | NCBI | V00882 | 277–1419 | 1143 | 277···368, 495···717, 1291···1419 |
| 10 Opossum | NCBI | J03643 | 467–2488 | 2022 | 467···558, 672···894, 2360···2488 |
| 11 Gallus | NCBI | V00409 | 465–1810 | 1346 | 465···556, 649···871, 1682···1810 |
The six‐components vectors of coding sequence of the β‐globin gene of 11 species
| Pattern | AGTC (k) | AGTC (τ) | ATCG (k) | ATCG (τ) | ACGT (k) | ACGT (τ) |
|---|---|---|---|---|---|---|
| Human | 0.63200 | 0.05099 | 0.63072 | 0.03132 | 0.62382 | 0.02744 |
| Chimpanzee | 0.63112 | 0.05498 | 0.62941 | 0.03825 | 0.62328 | 0.02229 |
| Gorilla | 0.62998 | 0.05751 | 0.62843 | 0.04263 | 0.62133 | 0.02117 |
| Lemur | 0.61447 | 0.05474 | 0.61306 | 0.01929 | 0.61350 | 0.02304 |
| Rat | 0.64599 | 0.03798 | 0.63502 | 0.04446 | 0.63577 | 0.04387 |
| Mouse | 0.64188 | 0.05029 | 0.63145 | 0.02985 | 0.63260 | 0.04137 |
| Goat | 0.64591 | 0.07325 | 0.63768 | 0.02751 | 0.63712 | 0.02328 |
| Bovine | 0.64807 | 0.06577 | 0.64012 | 0.02722 | 0.63735 | 0.01937 |
| Rabbit | 0.63439 | 0.04962 | 0.62955 | 0.01564 | 0.63016 | 0.02001 |
| Opossum | 0.64708 | 0.04546 | 0.63761 | 0.00544 | 0.63592 | 0.03010 |
| Gallus | 0.64163 | 0.07442 | 0.62966 | −0.00025 | 0.63436 | 0.05336 |
The similarity matrix of the coding sequences of the β‐globin gene of 11 species
| Species | Human | Chimp‐ | Gorilla | Lemur | Rat | Mouse | Goat | Bovine | Rabbit | Opossum | Gallus |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Human | 0 | 0.00965 | 0.01501 | 0.03007 | 0.03112 | 0.01929 | 0.03076 | 0.02881 | 0.01871 | 0.03360 | 0.04922 |
| Chimp‐ | 0 | 0.00574 | 0.03163 | 0.03467 | 0.02576 | 0.03048 | 0.02909 | 0.02455 | 0.04135 | 0.05531 | |
| Gorilla | 0 | 0.03308 | 0.03752 | 0.03002 | 0.03271 | 0.03209 | 0.02984 | 0.04688 | 0.05889 | ||
| Lemur | 0 | 0.05762 | 0.04384 | 0.05063 | 0.05127 | 0.03154 | 0.04996 | 0.05601 | |||
| Rat | 0 | 0.02027 | 0.04431 | 0.04126 | 0.04161 | 0.04215 | 0.05888 | ||||
| Mouse | 0 | 0.03057 | 0.02943 | 0.02691 | 0.02867 | 0.04048 | |||||
| Goat | 0 | 0.00907 | 0.03094 | 0.03617 | 0.04203 | ||||||
| Bovine | 0 | 0.02731 | 0.03180 | 0.04631 | |||||||
| Rabbit | 0 | 0.02196 | 0.04528 | ||||||||
| Opossum | 0 | 0.03883 | |||||||||
| Gallus | 0 |
The similarity matrix of the coding sequences of the first exon of the β‐globin gene of 11 species
| Species | Human | Chimp‐ | Gorilla | Lemur | Rat | Mouse | Goat | Bovine | Rabbit | Opossum | Gallus |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Human | 0 | 0.04546 | 0.02183 | 0.15386 | 0.08196 | 0.10383 | 0.08294 | 0.06119 | 0.10542 | 0.10540 | 0.14157 |
| Chimp‐ | 0 | 0.04459 | 0.14779 | 0.11172 | 0.12163 | 0.11736 | 0.06849 | 0.12709 | 0.08631 | 0.13549 | |
| Gorilla | 0 | 0.15326 | 0.07260 | 0.09546 | 0.08338 | 0.05888 | 0.11757 | 0.10732 | 0.15115 | ||
| Lemur | 0 | 0.17836 | 0.13171 | 0.17118 | 0.10132 | 0.11817 | 0.11156 | 0.08159 | |||
| Rat | 0 | 0.06226 | 0.04167 | 0.08762 | 0.13894 | 0.13912 | 0.19241 | ||||
| Mouse | 0 | 0.06483 | 0.06798 | 0.12206 | 0.11249 | 0.16349 | |||||
| Goat | 0 | 0.08303 | 0.11411 | 0.13033 | 0.17459 | ||||||
| Bovine | 0 | 0.08848 | 0.07362 | 0.10987 | |||||||
| Rabbit | 0 | 0.12287 | 0.09962 | ||||||||
| Opossum | 0 | 0.10638 | |||||||||
| Gallus | 0 |
The similarity matrix of the coding sequences of the second exon of the β‐globin gene of 11 species
| Species | Human | Chimp‐ | Gorilla | Lemur | Rat | Mouse | Goat | Bovine | Rabbit | Opossum | Gallus |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Human | 0 | 0.00418 | 0.00992 | 0.05362 | 0.05805 | 0.05955 | 0.05134 | 0.04771 | 0.03977 | 0.05134 | 0.07491 |
| Chimp‐ | 0 | 0.00838 | 0.05061 | 0.05425 | 0.05626 | 0.05315 | 0.05023 | 0.03945 | 0.04832 | 0.07303 | |
| Gorilla | 0 | 0.05111 | 0.05425 | 0.05689 | 0.06028 | 0.05695 | 0.04530 | 0.04747 | 0.07298 | ||
| Lemur | 0 | 0.03670 | 0.02035 | 0.09014 | 0.08925 | 0.05604 | 0.01426 | 0.03782 | |||
| Rat | 0 | 0.02628 | 0.08780 | 0.09124 | 0.06804 | 0.03381 | 0.06576 | ||||
| Mouse | 0 | 0.09220 | 0.09382 | 0.06639 | 0.01814 | 0.04051 | |||||
| Goat | 0 | 0.01432 | 0.05231 | 0.09081 | 0.11111 | ||||||
| Bovine | 0 | 0.04682 | 0.08976 | 0.10931 | |||||||
| Rabbit | 0 | 0.05919 | 0.08091 | ||||||||
| Opossum | 0 | 0.03770 | |||||||||
| Gallus | 0 |
The similarity matrix of the coding sequences of the third exon of the β‐globin gene of 11 species
| Species | Human | Chimp‐ | Gorilla | Lemur | Rat | Mouse | Goat | Bovine | Rabbit | Opossum | Gallus |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Human | 0 | 0.04846 | 0.09099 | 0.12016 | 0.09176 | 0.12910 | 0.11005 | 0.08975 | 0.09191 | 0.04514 | 0.09097 |
| Chimp‐ | 0 | 0.05847 | 0.13822 | 0.13576 | 0.14872 | 0.10518 | 0.10569 | 0.09840 | 0.06711 | 0.11851 | |
| Gorilla | 0 | 0.13308 | 0.17006 | 0.15632 | 0.08933 | 0.11427 | 0.10655 | 0.08887 | 0.14676 | ||
| Lemur | 0 | 0.11037 | 0.06751 | 0.06350 | 0.05128 | 0.06878 | 0.07917 | 0.10755 | |||
| Rat | 0 | 0.10577 | 0.13933 | 0.09200 | 0.10699 | 0.08413 | 0.07091 | ||||
| Mouse | 0 | 0.09428 | 0.06444 | 0.07437 | 0.09237 | 0.11379 | |||||
| Goat | 0 | 0.05293 | 0.04704 | 0.07162 | 0.11004 | ||||||
| Bovine | 0 | 0.02260 | 0.05066 | 0.07055 | |||||||
| Rabbit | 0 | 0.05587 | 0.07534 | ||||||||
| Opossum | 0 | 0.07596 | |||||||||
| Gallus | 0 |
The comparison similarity between Human and the other 10 species based on our method and Yuan's method
| Species | Chimp‐ | Gorilla | Lemur | Rat | Mouse | Goat | Bovine | Rabbit | Opossum | Gallus |
|---|---|---|---|---|---|---|---|---|---|---|
| β gene | ||||||||||
| E(106) | 0.0770 | 0.0770 | 0.0179 | 0.2076 | 0.2142 | 0.0472 | 0.0401 | 0.2506 | 0.7159 | 0.0743 |
| L/L | 0.1143 | 0.1169 | 0.0140 | 0.3584 | 0.3761 | 0.0503 | 0.0409 | 0.4442 | 0.6943 | 0.1193 |
| M/M(103) | 0.0800 | 0.0800 | 0.0180 | 0.2279 | 0.2359 | 0.0471 | 0.0401 | 0.2810 | 0.5981 | 0.7914 |
|
|
|
|
|
|
|
|
|
|
|
|
| 1st exon | ||||||||||
| E(103) | 0.8900 | 0.0642 | 0.0000 | 0.0001 | 0.1293 | 0.3711 | 0.3712 | 0.1266 | 0.0002 | 0.0001 |
| L/L | 0.2647 | 0.0165 | 0.0672 | 0.0111 | 0.0107 | 0.1455 | 0.1275 | 0.0737 | 0.0497 | 0.0177 |
| M/M | 13.0146 | 1.0175 | 0.1072 | 0.0032 | 2.0398 | 5.9294 | 6.0303 | 1.9919 | 0.1728 | 0.0229 |
|
|
|
|
|
|
|
|
|
|
|
|
| 2nd exon | ||||||||||
| E(104) | 0.0155 | 0.0155 | 0.0000 | 0.0000 | 0.0155 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| L/L | 0.0087 | 0.0084 | 0.0045 | 0.0227 | 0.0165 | 0.0111 | 0.0029 | 0.0076 | 0.0025 | 0.0115 |
| M/M | 0.9990 | 0.9995 | 0.0394 | 0.0592 | 1.0380 | 0.0907 | 0.0524 | 0.0571 | 0.0263 | 0.1075 |
|
|
|
|
|
|
|
|
|
|
|
|
| 3rd exon | ||||||||||
| E(103) | 4.9488 | 4.9488 | 0.0000 | 0.0000 | 0.0001 | 0.0001 | 0.0001 | 0.0000 | 0.0001 | 0.0001 |
| L/L | 1.9337 | 1.9384 | 0.0150 | 0.0167 | 0.0116 | 0.0079 | 0.0204 | 0.0106 | 0.0113 | 0.0051 |
| M/M | 80.0582 | 80.0664 | 0.0425 | 0.0231 | 0.0701 | 0.0057 | 0.0256 | 0.0599 | 0.0985 | 0.1431 |
|
|
|
|
|
|
|
|
|
|
|
|
The accession number, abbreviation, name and length for the 24 coronavirus geneomes
| No. | Accession | Abbreviation | Genome | Length (bp) |
|---|---|---|---|---|
| 1 | NC_002645 | HCoV_ 229E | Human coronavirus 229E | 27317 |
| 2 | NC_002306 | TGEV | Transmissible gastroenteritis virus | 28586 |
| 3 | NC_003436 | PEDV | Porcine epidemic diarrhea virus | 28033 |
| 4 | U00735 | BCoVM | Bovine coronavirus strain Mebus | 31032 |
| 5 | AF391542 | BCoVL | Bovine coronavirus isolate BCoV‐LUN | 31028 |
| 6 | AF220295 | BCoVQ | Bovine coronavirus strain Quebec | 31100 |
| 7 | NC_003045 | BCoV | Bovine coronavirus | 31028 |
| 8 | AF208067 | MHVM | Murine hepatitis virus strain ML‐10 | 31233 |
| 9 | AF201929 | MHV2 | Murine hepatitis virus strain 2 | 31276 |
| 10 | AF208066 | MHVP | Murine hepatitis virus strain Penn 97–1 | 31112 |
| 11 | NC_001846 | MHV | Murine hepatitis virus strain A59 | 31357 |
| 12 | NC_001451 | IBV | Avian infectious bronchitis virus | 27608 |
| 13 | AY278488 | BJ01 | SARS coonavirus BJ01 | 29725 |
| 14 | AY278741 | Urbani | SARS coronavirus Urbani | 29727 |
| 15 | AY278491 | HKU‐39849 | SARS coronavirus HKU‐39849 | 29742 |
| 16 | AY278554 | CUHK‐W1 | SARS coronavirus CUHK‐W1 | 29736 |
| 17 | AY282752 | CUHK‐Su10 | SARS coronavirus CUHK‐SulO | 29736 |
| 18 | AY283794 | SIN2500 | SARS coronavirus Sin2500 | 29711 |
| 19 | AY283795 | SIN2677 | SARS coronavirus Sin2677 | 29705 |
| 20 | AY283796 | SIN2679 | SARS coronavirus Sin2679 | 29711 |
| 21 | AY283797 | SIN2748 | SARS coronavirus Sin2748 | 29706 |
| 22 | AY283798 | SIN2774 | SARS coronavirus Sin2774 | 29711 |
| 23 | AY291451 | TW1 | SARS coronavirus TW1 | 29729 |
| 24 | NC_004718 | TOR2 | SARS coronavirus | 29751 |
Figure 3The phylogenetic tree for the 24 coronavirus geneomes based on our numerical characterization. The tree is constructed by the UPGMA method.
Figure 4The phylogenetic tree for the 24 coronavirus geneomes by Clustal X.