| Literature DB >> 25707937 |
Xiaojing Xie, Jihong Guan, Shuigeng Zhou.
Abstract
BACKGROUND: DNA sequence analysis is an important research topic in bioinformatics. Evaluating the similarity between sequences, which is crucial for sequence analysis, has attracted much research effort in the last two decades, and a dozen of algorithms and tools have been developed. These methods are based on alignment, word frequency and geometric representation respectively, each of which has its advantage and disadvantage.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25707937 PMCID: PMC4331808 DOI: 10.1186/1471-2164-16-S3-S5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1The pipeline of our method. The parallelograms stand for input/output modules, and the rectangles indicate functional modules. The two modules enclosed in dashed rectangle run for each block.
Figure 2Two sequences.
Illustration of the mining process of the modified PrefixSpan algorithm.
| current pattern | extended patterns |
|---|---|
| 〈 | 〈 |
| 〈 | 〈 |
| 〈 | 〈 |
| 〈 | 〈 |
| 〈 | 〈 |
| 〈 | 〈 |
| 〈 | 〈 |
| 〈 | 〈 |
| 〈 | 〈 |
| 〈 | 〈 |
Each row represents one recursive step. The numbers after each pattern represent the starting locations of the suffixes, which are the so-called pseudo-projections. Patterns in bold are maximal.
Probabilities of patterns.
| pattern |
|
|
|
|---|---|---|---|
| 〈 | 2 | 2 | 2/(20 − 2 + 1) = 0.105263 |
| 〈 | 2 | 4 | 2/(20 − 4 + 1) = 0.117647 |
| 〈 | 2 | 2 | 2/(20 − 2 + 1) = 0.105263 |
Details of β-Globin genes of 11 species.
| species | accession number | location | length (nt) | |
|---|---|---|---|---|
| 1 | Bovine | [GenBank: | 278-1741 | 1464 |
| 2 | Chimpanzee | [GenBank: | 4189-5532 | 1344 |
| 3 | Gallus | [GenBank: | 465-1810 | 1346 |
| 4 | Goat | [GenBank: | 279-1749 | 1471 |
| 5 | Gorilla | [GenBank: | 4538-5881 | 1344 |
| 6 | Human | [GenBank: | 62187-63610 | 1424 |
| 7 | Lemur | [GenBank: | 154-1595 | 1442 |
| 8 | Mouse | [GenBank: | 275-1462 | 1188 |
| 9 | Opossum | [GenBank: | 467-2488 | 2022 |
| 10 | Rabbit | [GenBank: | 277-1419 | 1143 |
| 11 | Rat | [GenBank: | 310-1505 | 1196 |
The location column gives the start/end location of the sequence of each gene.
Figure 3The distance between human and gorilla vs. block size.
Figure 4The distances between original human sequence segments and the corresponding shuffled ones. The horizontal blue line indicates the distance between sequences of human and gorilla. The horizontal axis indicates the test label from 1 to 100.
Figure 5The averaged distance between the original human sequence and the contaminated sequences when noise ratio increases from 0.01 to 0.5. The horizontal blue line indicates the distance between the sequences of human and gorilla. The horizontal axis is the ratio of noise added to the sequence.
Pairwise distance matrix of β-Globin genes of 11 species.
| bovine | chimpanzee | gallus | goat | gorilla | human | lemur | mouse | opossum | rabbit | rat | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Bovine | 0.0000 | 0.0782 | 0.1112 | 0.0474 | 0.0782 | 0.0670 | 0.0824 | 0.0666 | 0.1202 | 0.0666 | 0.0663 |
| Chimpanzee | 0.0782 | 0.0000 | 0.0679 | 0.0957 | 0.0000 | 0.0000 | 0.0558 | 0.0568 | 0.1579 | 0.0569 | 0.0569 |
| Gallus | 0.1112 | 0.0679 | 0.0000 | 0.1239 | 0.0679 | 0.0792 | 0.0962 | 0.0806 | 0.1935 | 0.0805 | 0.0805 |
| Goat | 0.0474 | 0.0957 | 0.1239 | 0.0000 | 0.0957 | 0.0820 | 0.0675 | 0.0939 | 0.0994 | 0.0939 | 0.0938 |
| Gorilla | 0.0782 | 0.0000 | 0.0679 | 0.0957 | 0.0000 | 0.0000 | 0.0558 | 0.0568 | 0.1579 | 0.0569 | 0.0569 |
| Human | 0.0670 | 0.0000 | 0.0792 | 0.0820 | 0.0000 | 0.0000 | 0.0478 | 0.0663 | 0.1537 | 0.0664 | 0.0664 |
| Lemur | 0.0824 | 0.0558 | 0.0962 | 0.0675 | 0.0558 | 0.0478 | 0.0000 | 0.0942 | 0.1428 | 0.0945 | 0.0944 |
| Mouse | 0.0666 | 0.0568 | 0.0806 | 0.0939 | 0.0568 | 0.0663 | 0.0942 | 0.0000 | 0.1643 | 0.0672 | 0.0671 |
| Opossum | 0.1202 | 0.1579 | 0.1935 | 0.0994 | 0.1579 | 0.1537 | 0.1428 | 0.1643 | 0.0000 | 0.1643 | 0.1643 |
| Rabbit | 0.0666 | 0.0569 | 0.0805 | 0.0939 | 0.0569 | 0.0664 | 0.0945 | 0.0672 | 0.1643 | 0.0000 | 0.0003 |
| Rat | 0.0663 | 0.0569 | 0.0805 | 0.0938 | 0.0569 | 0.0664 | 0.0944 | 0.0671 | 0.1643 | 0.0003 | 0.0000 |
Figure 6The dendrogram of the 11 tested species based on the similarity matrix in Table 4. The dendrogram is generated by using the python SciPy library (http://www.scipy.org/).
Comparison of the distances between human and the other tested species.
| Bovine | Chimpanzee | Gallus | Goat | Gorilla | Lemur | Mouse | Opossum | Rabbit | Rat | Correlation coefficient | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| MEGA 5.2 | 0.4485 | 0.0095 | 0.8456 | 0.4696 | 0.0117 | 0.2423 | 0.4815 | 0.8337 | 0.4083 | 0.4935 | - |
| BLASTN 2.2.29+[ | 0.8600 | 0.0896 | 0.9880 | 0.8765 | 0.0896 | 0.6643 | 0.9026 | 1.0000 | 0.8423 | 0.9182 | 0.8912 |
| Method of [ | 22.4257 | 5.3704 | 23.5869 | 26.8209 | 5.3704 | 25.2515 | 25.8007 | 25.9952 | 20.5706 | 27.0102 | 0.7569 |
| Method of [ | 0.1000 | 0.0100 | 0.2150 | 0.1050 | 0.0110 | 0.0550 | 0.0830 | 0.0890 | 0.0700 | 0.0620 | 0.8318 |
| FPE | 0.0670 | 0.0000 | 0.0792 | 0.0820 | 0.0000 | 0.0478 | 0.0663 | 0.1537 | 0.0664 | 0.0664 | |
Note that the results from [14] are based on the sequence of the first exon in each β-globin gene.
Figure 7Comparing the distances evaluated by different methods. The vertical axis indicates the normalized distances between human and the other tested species, as shown in Table 5.