| Literature DB >> 22619571 |
Xingqin Qi1, Edgar Fuller, Qin Wu, Cun-Quan Zhang.
Abstract
Sequence comparison is a primary technique for the analysis of DNA sequences. In order to make quantitative comparisons, one devises mathematical descriptors that capture the essence of the base composition and distribution of the sequence. Alignment methods and graphical techniques (where each sequence is represented by a curve in high-dimension Euclidean space) have been used popularly for a long time. In this contribution we will introduce a new nongraphical and nonalignment approach based on the frequencies of the dinucleotide XY in DNA sequences. The most important feature of this method is that it not only identifies adjacent XY pairs but also nonadjacent XY ones where X and Y are separated by some number of nucleotides. This methodology preserves information in DNA sequence that is ignored by other methods. We test our method on the coding regions of exon-1 of β-globin for 11 species, and the utility of this new method is demonstrated.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22619571 PMCID: PMC3349307 DOI: 10.1100/2012/104269
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
ID Information for Exon-1 of β -globin gene of 11 species.
| Species | ID/Accession | Database | length |
|---|---|---|---|
| Human | U01317 | NCBI | 92 |
| Chimpanzee | X02345 | NCBI | 105 |
| Gorilla | X61109 | NCBI | 93 |
| Lemur | M15734 | NCBI | 92 |
| Rat | X06701 | NCBI | 92 |
| Mouse | V00722 | NCBI | 93 |
| Rabbit | V00882 | NCBI | 92 |
| Goat | M15387 | NCBI | 86 |
| Bovine | X00376 | NCBI | 86 |
| Opossum | J03643 | NCBI | 92 |
| Gallus | V00409 | NCBI | 92 |
The coding sequence of exon-1 of β -globin gene for 11 species.
| Species | DNA sequence |
|---|---|
| Human | ATGGTGCACCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAG |
| Chimpanzee | ATGGTGCACCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGTTGGTATCAAGG |
| Gorilla | ATGGTGCACCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGG |
| Lemur | ATGACTTTGCTGAGTGCTGAGGAGAATGCTCATGTCACCTCTCTGTGGGGCAAGGTGGATGTAGAGAAAGTTGGTGGCGAGGCCTTGGGCAG |
| Rat | ATGGTGCACCTAACTGATGCTGAGAAGGCTACTGTTAGTGGCCTGTGGGGAAAGGTGAACCCTGATAATGTTGGCGCTGAGGCCCTGGGCAG |
| Mouse | ATGGTTGCACCTGACTGATGCTGAGAAGTCTGCTGTCTCTTGCCTGTGGGCAAAGGTGAACCCCGATGAAGTTGGTGGTGAGGCCCTGGGCAGG |
| Rabbit | ATGGTGCATCTGTCCAGTGAGGAGAAGTCTGCGGTCACTGCCCTGTGGGGCAAGGTGAATGTGGAAGAAGTTGGTGGTGAGGCCCTGGGCAG |
| Goat | ATGCTGACTGCTGAGGAGAAGGCTGCCGTCACCGGCTTCTGGGGCAAGGTGAAAGTGGATGAAGTTGGTGCTGAGGCCCTGGGCAG |
| Bovine | ATGCTGACTGCTGAGGAGAAGGCTGCCGTCACCGCCTTTTGGGGCAAGGTGAAAGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAG |
| Opossum | ATGGTGCACTTGACTTCTGAGGAGAAGAACTGCATCACTACCATCTGGTCTAAGGTGCAGGTTGACCAGACTGGTGGTGAGGCCCTTGGCAG |
| Gallus | ATGGTGCACTGGACTGCTGAGGAGAAGCAGCTCATCACCGGCCTCTGGGGCAAGGTCAATGTGGCCGAATGTGGGGCCGAAGCCCTGGCCAG |
The upper triangular part of the dissimilarity/similarity matrix based on d 1.
| Species | Human | Chimpanzee | Gorilla | Lemur | Rat | Mouse | Rabbit | Goat | Bovine | Opossum | Gallus |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Human | 0 | 2.5567 | 2.4026 | 6.4922 | 5.6622 | 4.9144 | 4.2904 | 5.3220 | 4.8306 | 6.8358 | 7.4959 |
| Chimpanzee | 0 | 2.7338 | 6.5340 | 5.9455 | 5.1613 | 4.9587 | 5.6525 | 4.9670 | 7.4568 | 7.9791 | |
| Gorilla | 0 | 7.0466 | 6.2344 | 5.2819 | 5.0310 | 5.3353 | 4.9340 | 7.8956 | 8.0582 | ||
| Lemur | 0 | 6.9735 | 6.8419 | 5.6647 | 6.9332 | 6.0195 | 8.2293 | 9.0347 | |||
| Rat | 0 | 5.2540 | 6.8004 | 6.5847 | 6.2545 | 7.5359 | 8.2347 | ||||
| Mouse | 0 | 6.5730 | 6.7863 | 6.4133 | 7.2900 | 7.8317 | |||||
| Rabbit | 0 | 5.9265 | 5.2974 | 8.0743 | 8.3210 | ||||||
| Goat | 0 | 2.3438 | 8.0158 | 7.7129 | |||||||
| Bovine | 0 | 7.9847 | 8.2938 | ||||||||
| Opossum | 0 | 8.0268 | |||||||||
| Gallus | 0 |
The upper triangular part of the dissimilarity/similarity matrix based on d 2.
| Species | Human | Chimpanzee | Gorilla | Lemur | Rat | Mouse | Rabbit | Goat | Bovine | Opossum | Gallus |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Human | 0 | 0.0087 | 0.0074 | 0.0567 | 0.0464 | 0.0372 | 0.0253 | 0.0354 | 0.0287 | 0.0719 | 0.0819 |
| Chimpanzee | 0 | 0.0112 | 0.0564 | 0.0487 | 0.0383 | 0.0303 | 0.0403 | 0.0320 | 0.0793 | 0.0899 | |
| Gorilla | 0 | 0.0619 | 0.0538 | 0.0398 | 0.0312 | 0.0357 | 0.0302 | 0.0887 | 0.0877 | ||
| Lemur | 0 | 0.0691 | 0.0635 | 0.0454 | 0.0616 | 0.0463 | 0.0939 | 0.1139 | |||
| Rat | 0 | 0.0417 | 0.0631 | 0.0592 | 0.0552 | 0.0832 | 0.1048 | ||||
| Mouse | 0 | 0.0588 | 0.0573 | 0.0528 | 0.0765 | 0.0932 | |||||
| Rabbit | 0 | 0.0444 | 0.0349 | 0.0998 | 0.0933 | ||||||
| Goat | 0 | 0.0109 | 0.0948 | 0.0792 | |||||||
| Bovine | 0 | 0.0923 | 0.0937 | ||||||||
| Opossum | 0 | 0.0897 | |||||||||
| Gallus | 0 |
Figure 1The degree of dissimilarity/similarity of the other 10 species with human, where the degree of dissimilarity/similarity of the pair human-gorilla is defined relatively as 1.
Figure 2(a) The projection of the 336-dimensional vectors of 11 species on a 2D space composed of the top two principal components; (b) The contributions of the first 6 principal components.