| Literature DB >> 28088189 |
Lorraine A K Ayad1, Solon P Pissis2.
Abstract
BACKGROUND: A fundamental assumption of all widely-used multiple sequence alignment techniques is that the left- and right-most positions of the input sequences are relevant to the alignment. However, the position where a sequence starts or ends can be totally arbitrary due to a number of reasons: arbitrariness in the linearisation (sequencing) of a circular molecular structure; or inconsistencies introduced into sequence databases due to different linearisation standards. These scenarios are relevant, for instance, in the process of multiple sequence alignment of mitochondrial DNA, viroid, viral or other genomes, which have a circular molecular structure. A solution for these inconsistencies would be to identify a suitable rotation (cyclic shift) for each sequence; these refined sequences may in turn lead to improved multiple sequence alignments using the preferred multiple sequence alignment program.Entities:
Keywords: Circular sequences; Multiple circular sequence alignment; Progressive alignment; q-grams
Mesh:
Substances:
Year: 2017 PMID: 28088189 PMCID: PMC5237495 DOI: 10.1186/s12864-016-3477-5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Standard genetic measures for Datasets 1-3
| Program |
Clustal
|
MARS+ Clustal
| MUSCLE | MARS+ MUSCLE |
|---|---|---|---|---|
| Dataset 1 | 12.2500.5 | 12.2500.5.rot | 12.2500.5 | 12.2500.5.rot |
| Length | 2,503 | 2,503 | 2,503 | 2,503 |
| PM Sites | 698 | 698 | 689 | 689 |
| Transitions | 3,845 | 3,849 | 3,804 | 3,804 |
| Transversions | 4,245 | 4,251 | 4,205 | 4,205 |
| Substitutions | 12,254 | 12,264 | 12,111 | 12,111 |
| Indels | 360 | 360 | 388 | 388 |
| AVPD | 191 | 191 | 189 | 189 |
| Dataset 2 | 12.2500.20 | 12.2500.20.rot | 12.2500.20 | 12.2500.20.rot |
| Length | 2,662 | 2,664 | 2,674 | 2,674 |
| PM Sites | 2,228 | 2,230 | 2,155 | 2,155 |
| Transitions | 16,819 | 16,502 | 16,171 | 16,184 |
| Transversions | 15,374 | 15,719 | 14,422 | 14,422 |
| Substitutions | 47,754 | 47,799 | 44,261 | 44,280 |
| Indels | 10,545 | 10,707 | 8,817 | 8,815 |
| AVPD | 883 | 886 | 804 | 804 |
| Dataset 3 | 12.2500.35 | 12.2500.35.rot | 12.2500.35 | 12.2500.35.rot |
| Length | 2,526 | 2,528 | 2,528 | 2,528 |
| PM Sites | 2,062 | 2,070 | 2,045 | 2,045 |
| Transitions | 18,385 | 18,167 | 18,362 | 18,362 |
| Transversions | 17,642 | 17,728 | 17,316 | 17,316 |
| Substitutions | 54,573 | 54,533 | 53,807 | 53,807 |
| Indels | 2,403 | 2,575 | 2,253 | 2,253 |
| AVPD | 863 | 865 | 849 | 849 |
Standard genetic measures for Datasets 4–6
| Program |
Clustal
|
MARS+ Clustal
| MUSCLE | MARS+ MUSCLE |
|---|---|---|---|---|
| Dataset 4 | 25.2500.5 | 25.2500.5.rot | 25.2500.5 | 25.2500.5.rot |
| Length | 2,515 | 2,515 | 2,515 | 2,515 |
| PM Sites | 1,243 | 1,238 | 1,230 | 1,230 |
| Transitions | 20,438 | 20,422 | 20,353 | 20,353 |
| Transversions | 20,672 | 20,587 | 20,498 | 20,498 |
| Substitutions | 61,780 | 61,523 | 61,289 | 61,289 |
| Indels | 2,582 | 1,932 | 1,842 | 1,842 |
| AVPD | 214 | 211 | 210 | 210 |
| Dataset 5 | 25.2500.20 | 25.2500.20.rot | 25.2500.20 | 25.2500.20.rot |
| Length | 2,600 | 2,595 | 2,590 | 2,591 |
| PM Sites | 2,585 | 2,577 | 2,572 | 2,572 |
| Transitions | 105,738 | 105,596 | 106,070 | 106,256 |
| Transversions | 104,778 | 104,451 | 103,335 | 103,238 |
| Substitutions | 313,329 | 312,311 | 309,953 | 310,056 |
| Indels | 20,524 | 20,658 | 13,678 | 13,784 |
| AVPD | 1,112 | 1,109 | 1,078 | 1,079 |
| Dataset 6 | 25.2500.35 | 25.2500.35.rot | 25.2500.35 | 25.2500.35.rot |
| Length | 2,726 | 2,751 | 2,722 | 2,716 |
| PM Sites | 2,700 | 2,727 | 2,684 | 2,679 |
| Transitions | 101,801 | 102,471 | 104,001 | 103,796 |
| Transversions | 104,993 | 104,632 | 100,595 | 101,078 |
| Substitutions | 310,597 | 311,468 | 304,100 | 304,481 |
| Indels | 47,080 | 58,288 | 35,956 | 35,110 |
| AVPD | 1,192 | 1,232 | 1,133 | 1,131 |
Standard genetic measures for Datasets 7–9
| Program |
Clustal
|
MARS+ Clustal
| MUSCLE | MARS+ MUSCLE |
|---|---|---|---|---|
| Dataset 7 | 50.2500.5 | 50.2500.5.rot | 50.2500.5 | 50.2500.5.rot |
| Length | 2,524 | 2,524 | 2,524 | 2,524 |
| PM Sites | 1,875 | 1,882 | 1,861 | 1,861 |
| Transitions | 86,781 | 87,190 | 86,628 | 86,628 |
| Transversions | 91,334 | 91,584 | 91,040 | 91,040 |
| Substitutions | 262,804 | 263,687 | 261,248 | 261,248 |
| Indels | 11,531 | 10,771 | 8,231 | 8,231 |
| AVPD | 223 | 224 | 219 | 219 |
| Dataset 8 | 50.2500.20 | 50.2500.20.rot | 50.2500.20 | 50.2500.20.rot |
| Length | 2,576 | 2,580 | 2,582 | 2,582 |
| PM Sites | 2,568 | 2,573 | 2,575 | 2,575 |
| Transitions | 284,302 | 284,667 | 282,638 | 282,670 |
| Transversions | 283,651 | 284,673 | 279,451 | 279,462 |
| Substitutions | 852,738 | 855,055 | 842,564 | 842,672 |
| Indels | 39,273 | 45,769 | 33,371 | 33,369 |
| AVPD | 728 | 735 | 715 | 715 |
| Dataset 9 | 50.2500.35 | 50.2500.35.rot | 50.2500.35 | 50.2500.35.rot |
| Length | 2,675 | 2,697 | 2,679 | 2,667 |
| PM Sites | 2,675 | 2,696 | 2,678 | 2,666 |
| Transitions | 424,910 | 423,592 | 426,230 | 426,063 |
| Transversions | 431,453 | 428,874 | 423,113 | 422,916 |
| Substitutions | 1,282,515 | 1,278,286 | 1,267,683 | 1,267,699 |
| Indels | 92,060 | 97,398 | 73,890 | 72,718 |
| AVPD | 1,122 | 1,123 | 1,095 | 1,094 |
Relative RF distance between trees obtained with original and refined datasets
| Dataset | BEAR | Cyclope | MARS |
|---|---|---|---|
| 12.2500.5 | 0.000 | 0.000 | 0.000 |
| 12.2500.20 | 0.000 | 0.000 | 0.000 |
| 12.2500.35 | 0.000 | 0.000 | 0.000 |
| 25.2500.5 | 0.000 | 0.000 | 0.000 |
| 25.2500.20 | 0.000 | 0.000 | 0.000 |
| 25.2500.35 | 0.000 |
| 0.000 |
| 50.2500.5 |
| 0.000 | 0.000 |
| 50.2500.20 | 0.000 | 0.000 | 0.000 |
| 50.2500.35 | 0.000 | 0.000 | 0.000 |
Non-zero values shown in bold
Standard genetic measures for real data
| Program |
Clustal
|
MARS+ Clustal
| MUSCLE | MARS+ MUSCLE |
|---|---|---|---|---|
| Mammals | ||||
| Length | 19,452 | 18,829 | 19,784 | 19,180 |
| PM Sites | 12,913 | 12,265 | 13,076 | 12,454 |
| Transitions | 135,380 | 137,589 | 135,794 | 137,835 |
| Transversions | 81,945 | 84,188 | 76,894 | 78,067 |
| Substitutions | 295,684 | 302,331 | 282,608 | 286,747 |
| Indels | 82,494 | 59,348 | 91,164 | 71,042 |
| AVPD | 5729 | 5479 | 5663 | 5421 |
| Primates | ||||
| Length | 18,176 | 17,568 | 18,189 | 17,669 |
| PM Sites | 11,086 | 10,450 | 11,023 | 10,454 |
| Transitions | 259,921 | 261,995 | 262,179 | 264,245 |
| Transversions | 100,708 | 102,336 | 95,403 | 96,010 |
| Substitutions | 439,929 | 445,252 | 429,532 | 432,993 |
| Indels | 80,851 | 52,727 | 82,117 | 55,525 |
| AVPD | 4339 | 4149 | 4263 | 4070 |
| Viroids | ||||
| Length | 566 | 498 | 486 | 476 |
| PM Sites | 555 | 484 | 475 | 459 |
| Transitions | 7567 | 7485 | 9338 | 9101 |
| Transversions | 5837 | 5998 | 5491 | 5393 |
| Substitutions | 19,436 | 19,291 | 20,828 | 20,374 |
| Indels | 19,003 | 18,383 | 14,323 | 13,491 |
| AVPD | 251 | 246 | 229 | 221 |
Elapsed-time comparison using real data
| Program | BEAR | Cyclope | MARS | |||
|---|---|---|---|---|---|---|
| Dataset | AVPD | Time (s) | AVPD | Time (s) | AVPD | Time (s) |
| Mammals | 5517 | 262.96 | 5422 | 1367.17 | 5421 | 333.50 |
| Primates | 4167 | 465.17 | 4080 | 2179.68 | 4070 | 463.25 |
| Viroids | 232 | 0.30 | 223 | 1.44 | 221 | 0.82 |