| Literature DB >> 20688082 |
Jie Feng1, Yong Hu, Ping Wan, Aibing Zhang, Weizhong Zhao.
Abstract
We introduce a new approach to compare DNA primary sequences. The core of our method is a new measure of pairwise distances among sequences. Using the primitive discrimination substrings of sequence S and Q, a discrimination measure DM(S, Q) is defined for the similarity analysis of them. The proposed method does not require multiple alignments and is fully automatic. To illustrate its utility, we construct phylogenetic trees on two independent data sets. The results indicate that the method is efficient and powerful.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20688082 PMCID: PMC7094107 DOI: 10.1016/j.jtbi.2010.07.040
Source DB: PubMed Journal: J Theor Biol ISSN: 0022-5193 Impact factor: 2.691
Fig. 1The flow diagram of our method.
The full DNA sequences of gene of 10 species.
| Species | Database | Accession | Location | Length (bp) |
|---|---|---|---|---|
| Human | EMBL | 62187–63610 | 1424 | |
| Goat | EMBL | 279–1749 | 1471 | |
| Opossum | EMBL | 467–2488 | 2022 | |
| Gallus | EMBL | 465–1810 | 1346 | |
| Lemur | EMBL | 154–1595 | 1442 | |
| Mouse | EMBL | 275–1462 | 1188 | |
| Rat | EMBL | 310–1505 | 1196 | |
| Gorilla | EMBL | 4538–5881 | 1344 | |
| Bovine | EMBL | 278–1741 | 1464 | |
| Chimpanzee | EMBL | 4189–5532 | 1344 |
The similarity/dissimilarity matrix for 10 genes based on DM.
| Species | Human | Goat | Oposs. | Gallus | Lemur | Mouse | Rat | Gorilla | Bovine | Chimp. |
|---|---|---|---|---|---|---|---|---|---|---|
| Human | 0 | 0.2394 | 0.2683 | 0.2803 | 0.2233 | 0.2484 | 0.2489 | 0.0294 | 0.2438 | 0.0297 |
| Goat | 0 | 0.2761 | 0.2844 | 0.2502 | 0.2634 | 0.2570 | 0.2474 | 0.1130 | 0.2471 | |
| Oposs. | 0 | 0.2895 | 0.2828 | 0.2728 | 0.2698 | 0.2701 | 0.2785 | 0.2733 | ||
| Gallus | 0 | 0.2827 | 0.2778 | 0.2838 | 0.2895 | 0.2805 | 0.2896 | |||
| Lemur | 0 | 0.2591 | 0.2502 | 0.2223 | 0.2545 | 0.2201 | ||||
| Mouse | 0 | 0.1863 | 0.2548 | 0.2606 | 0.2530 | |||||
| Rat | 0 | 0.2562 | 0.2628 | 0.2558 | ||||||
| Gorilla | 0 | 0.2472 | 0.0167 | |||||||
| Bovine | 0 | 0.2483 | ||||||||
| Chimp. | 0 |
Fig. 2The phylogenetic tree for 10 species using the full DNA sequences of gene based on DM.
The accession number, abbreviation, name, and length for each of the 24 coronavirus genomes.
| No. | Accession | Abbreviation | Genome | Length (nt) |
|---|---|---|---|---|
| 1 | HCoV-229E | Human coronavirus 229E | 27,317 | |
| 2 | TGEV | Transmissible gastroenteritis virus | 28,586 | |
| 3 | PEDV | Porcine epidemic diarrhea virus | 28,033 | |
| 4 | BCoVM | Bovine coronavirus strain Mebus | 31,032 | |
| 5 | BCoVL | Bovine coronavirus isolate BCoV-LUN | 31,028 | |
| 6 | BCoVQ | Bovine coronavirus Quebec | 31,100 | |
| 7 | BCoV | Bovine coronavirus | 31,028 | |
| 8 | MHVM | Murine hepatitis virus strain ML-10 | 31,233 | |
| 9 | MHV2 | Murine hepatitis virus strain 2 | 31,276 | |
| 10 | MHVP | Murine hepatitis virus strain Penn 97-1 | 31,112 | |
| 11 | MHV | Murine hepatitis virus | 31,357 | |
| 12 | IBV | Avian infectious bronchitis virus | 27,608 | |
| 13 | BJ01 | SARS coronavirus BJ01 | 29,725 | |
| 14 | Urbani | SARS coronavirus Urbani | 29,727 | |
| 15 | HKU-39849 | SARS coronavirus HKU-39849 | 29,742 | |
| 16 | CUHK-W1 | SARS coronavirus CUHK-W1 | 29,736 | |
| 17 | CUHK-Su10 | SARS coronavirus CUHK-Su10 | 29,736 | |
| 18 | SIN2500 | SARS coronavirus Sin2500 | 29,711 | |
| 19 | SIN2677 | SARS coronavirus Sin2677 | 29,705 | |
| 20 | SIN2679 | SARS coronavirus Sin2679 | 29,711 | |
| 21 | SIN2748 | SARS coronavirus Sin2748 | 29,706 | |
| 22 | SIN2774 | SARS coronavirus Sin2774 | 29,711 | |
| 23 | TW1 | SARS coronavirus TW1 | 29,729 | |
| 24 | TOR2 | SARS coronavirus | 29,751 |
Fig. 3The phylogenetic tree for 24 coronavirus using whole genomes based on DM.