| Literature DB >> 27812276 |
Rohit Kumar Yadav1, Haider Banka2.
Abstract
In bioinformatics, multiple sequence alignment (MSA) is an NP-hard problem. Hence, nature-inspired techniques can better approximate the solution. In the current study, a novel biogeography-based optimization (NBBO) is proposed to solve an MSA problem. The biogeography-based optimization (BBO) is a new paradigm for optimization. But, there exists some deficiencies in solving complicated problems such as low population diversity and slow convergence rate. NBBO is an enhanced version of BBO, in which, a new migration operation is proposed to overcome the limitations of BBO. The new migration adopts more information from other habitats, maintains population diversity, and preserves exploitation ability. In the performance analysis, the proposed and existing techniques such as VDGA, MOMSA, and GAPAM are tested on publicly available benchmark datasets (ie, Bali base). It has been observed that the proposed method shows the superiority/competitiveness with the existing techniques.Entities:
Keywords: Multiple sequence alignment (MSA); biogeography-based optimization (BBO); diversity; migration operator
Year: 2016 PMID: 27812276 PMCID: PMC5084829 DOI: 10.4137/EBO.S40457
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Main procedure of BBO
Migration operator
Mutation operator
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 9 | |
| 10 |
Figure 1Initial solution.
Figure 2Encoding scheme.
Figure 3Graphical representation of migration process.
Figure 4Graphical representation of mutation process.
Figure 5Performance of improved BBO and some existing methods per generation with respect to reference set 1. (A) Performance of proposed method and other existing methods with respect to 1ped Data. (B) Performance of proposed method and other existing methods with respect to 1amk Data. (C) Performance of proposed method and other existing methods with respect to 1fieA Data. (D) Performance of proposed method and other existing methods with respect to 1ldg Data.
Figure 6Performance of improved BBO and some existing methods per generation with respect to reference set 2. (A) Performance of proposed method and other existing methods with respect to 1csy Data. (B) Performance of proposed method and other existing methods with respect to 1cpt Data. (C) Performance of proposed method and other existing methods with respect to 1havA Data. (D) Performance of proposed method and other existing methods with respect to 1sbp Data.
Result of IBBOMSA, MOMSA-W, VDGA, and GAPAM on Bali base reference set 1.
| NAME | SEQ NUMBER | SEQ LENGTH | GAPAM | VDGA | MOMSA | IBBOMSA |
|---|---|---|---|---|---|---|
| 1idy | 50 | 58 | 0.5650 | 0.5730 | 0.2154 | |
| 1tvxA | 4 | 69 | 0.3160 | 0.2670 | 0.0526 | |
| 1uky | 4 | 220 | 0.4020 | 0.4490 | 0.5148 | |
| kinase | 5 | 276 | 0.4870 | 0.5450 | 0.7834 | |
| 1ped | 3 | 374 | 0.4980 | 0.4820 | 0.7389 | |
| 2myr | 4 | 474 | 0.3170 | 0.3590 | 0.4372 | |
| 1ycc | 4 | 116 | 0.8450 | 0.7550 | 0.8269 | |
| 3cyr | 4 | 109 | 0.8210 | 0.8154 | 0.8934 | |
| 1ad2 | 4 | 213 | 0.9560 | 0.9410 | 0.9279 | |
| 1ldg | 4 | 675 | 0.9630 | 0.9060 | 0.8256 | |
| 1fieA | 4 | 442 | 0.9630 | 0.9300 | 0.9820 | |
| 1sesA | 5 | 63 | 0.9820 | 0.9620 | 0.9583 | |
| 1krn | 5 | 82 | 0.9600 | 0.9600 | 0.9286 | |
| 2fxb | 5 | 63 | 0.9700 | 0.9780 | 0.9357 | |
| 1amk | 5 | 258 | 0.9840 | 0.9947 | 0.9456 | |
| 1ar5A | 4 | 203 | 0.9380 | 0.9604 | 0.9238 | |
| 1 gpb | 5 | 828 | 0.9830 | 0.9840 | 0.9862 | |
| 1taq | 5 | 928 | 0.9450 | 0.9477 | 0.9125 | |
| Avg. score | – | – | 0.7797 | 0.7662 | 0.7926 | 0.8219 |
Result of IBBOMSA, MOMSA-W, VDGA, and GAPAM on Bali base reference set 2.
| NAME | SEQ NUMBER | SEQ LENGTH | GAPAM | VDGA | MOMSA | IBBOMSA |
|---|---|---|---|---|---|---|
| 1aboA | 15 | 80 | 0.7960 | 0.6910 | 0.8398 | |
| 1idy | 19 | 60 | 0.9890 | 0.9743 | 0.9270 | |
| 1csy | 19 | 99 | 0.7640 | 0.8536 | 0.8576 | |
| 1r69 | 20 | 76 | 0.9650 | 0.8340 | 0.9450 | |
| 1tvxA | 16 | 69 | 0.9200 | 0.9740 | 0.9365 | |
| 1tgxA | 19 | 71 | 0.8780 | 0.8780 | 0.9522 | |
| 1ubi | 15 | 60 | 0.7670 | 0.7780 | 0.8967 | |
| 1wit | 20 | 106 | 0.8510 | 0.8150 | 0.9119 | |
| 2trx | 18 | 94 | 0.9860 | 0.9860 | 0.9468 | |
| 1sbp | 16 | 262 | 0.7650 | 0.7720 | 0.8808 | |
| 1havA | 26 | 242 | 0.8790 | 0.8460 | 0.8969 | |
| 1uky | 23 | 225 | 0.8080 | 0.8910 | 0.9404 | |
| 2hsdA | 20 | 255 | 0.7960 | 0.8290 | 0.9192 | |
| 2pia | 16 | 294 | 0.8280 | 0.8500 | 0.9345 | |
| 3grs | 15 | 237 | 0.7460 | 0.7510 | 0.8492 | |
| kinase | 18 | 287 | 0.7990 | 0.8880 | 0.9397 | |
| 1ajsA | 18 | 389 | 0.8990 | 0.9050 | 0.9015 | |
| 1cpt | 15 | 434 | 0.8750 | 0.8120 | 0.8862 | |
| 1lvl | 23 | 473 | 0.7810 | 0.8190 | 0.9268 | |
| 1pamA | 18 | 511 | 0.8600 | 0.8630 | 0.9581 | |
| 1ped | 18 | 388 | 0.9120 | 0.9470 | 0.9717 | |
| 2myr | 17 | 482 | 0.8220 | 0.8300 | 0.9659 | |
| 4enl | 17 | 440 | 0.8960 | 0.8890 | 0.9151 | |
| Avg. score | – | – | 0.8513 | 0.8576 | 0.9249 | 0.9270 |
Result of IBBOMSA, MOMSA-W, VDGA, and GAPAM on Bali base reference set 3.
| NAME | SEQ NUMBER | SEQ LENGTH | GAPAM | VDGA | MOMSA | IBBOMSA |
|---|---|---|---|---|---|---|
| 1idy | 27 | 60 | 0.6010 | 0.5990 | 0.4600 | |
| 1r69 | 23 | 78 | 0.7090 | 0.7330 | 0.8784 | |
| 1ubi | 22 | 97 | 0.3860 | 0.4140 | 0.6606 | |
| 1wit | 19 | 102 | 0.7580 | 0.8730 | 0.7935 | |
| 1uky | 24 | 220 | 0.4680 | 0.4810 | 0.6393 | |
| kinase | 23 | 287 | 0.8280 | 0.8900 | 0.8345 | |
| 1ajsA | 28 | 396 | 0.3110 | 0.4530 | 0.5422 | |
| 1pamA | 19 | 511 | 0.8350 | 0.7880 | 0.8689 | |
| 1ped | 21 | 388 | 0.8130 | 0.8930 | 0.9131 | |
| 2myr | 21 | 482 | 0.5130 | 0.6510 | 0.7278 | |
| 4enl | 19 | 427 | 0.8000 | 0.8660 | 0.8158 | |
| Avg score. | – | – | 0.6383 | 0.6946 | 0.7583 | 0.7706 |
Result of IBBOMSA, MOMSA-W, VDGA, and GAPAM on Bali base reference set 4.
| NAME | SEQ NUMBER | SEQ LENGTH | GAPAM | VDGA | MOMSA | IBBOMSA |
|---|---|---|---|---|---|---|
| 1dynA | 6 | 848 | 0.0330 | 0.0330 | 0.8000 | |
| kinase2 | 18 | 468 | 0.3840 | 0.5420 | 0.8426 | |
| Avg. score | – | – | 0.2085 | 0.2875 | 0.9000 | 0.8702 |
Result of IBBOMSA, MOMSA-W, VDGA, and GAPAM on Bali base reference set 5.
| NAME | SEQ NUMBER | SEQ LENGTH | GAPAM | VDGA | MOMSA | IBBOMSA |
|---|---|---|---|---|---|---|
| 2cba | 8 | 328 | 0.8520 | 0.8350 | 0.8687 | |
| s51 | 15 | 301 | 0.8350 | 0.7430 | ||
| Avg. score | – | – | 0.8435 | 0.7890 | 0.9844 | 0.9258 |
Alignment score comparison between MOMSA and IBBOMSA on the BAliBASE version 2.0.
| ALGORITHMS | MOMSA-W (SP) | MOMSA-W (TC) | IBBOMSA (SP) | IBBOMSA (TC) |
|---|---|---|---|---|
| Ref1 (82) | 0.844 | 0.771 | ||
| Ref2 (23) | 0.925 | 0.557 | ||
| Ref3 (12) | 0.766 | 0.442 | ||
| Ref4 (12) | 0.871 | 0.617 | ||
| Ref5 (12) | 0.936 | 0.802 | ||
| Total (141) (mean & SD) | 0.861 ± 0.181 | 0.663 ± 0.290 |
Alignment score comparison between MOMSA and IBBOMSA on the BAliBASE version 3.0
| ALGORITHMS | MOMSA-W (SP) | MOMSA-W (TC) | IBBOMSA (SP) | IBBOMSA (TC) |
|---|---|---|---|---|
| BB11 (38) | 0.496 | 0.379 | ||
| BB12 (44) | 0.848 | 0.814 | ||
| BB2 (41) | 0.784 | 0.342 | ||
| BB3 (30) | 0.694 | 0.371 | ||
| BB4 (49) | 0.742 | 0.523 | ||
| BB5 (16) | 0.683 | |||
| Total (218) (mean & SD) | 0.722 ± 0.183 | 0.500 ± 0.309 |
Average TC score of several algorithms on BAliBASE version 3.0.
| ALIGNMENT ALGORITHMS | AVERAGE SCORE (218) | BB11 (38) | BB12 (44) | BB2 (41) | BB3 (30) | BB4 (49) | BB5 (16) | TOTAL TIME(S) |
|---|---|---|---|---|---|---|---|---|
| MSAProbs | 0.441 | 0.865 | 0.622 | 12382 | ||||
| Probalign | 0.589 | 0.862 | 0.439 | 0.566 | 0.603 | 0.549 | 10095.2 | |
| MAFFT (auto) | 0.588 | 0.439 | 0.831 | 0.45 | 0.581 | 0.605 | 0.591 | 1475.4 |
| IBBOMSA | 0.571 | 0.411 | 0.418 | 0.592 | 0.635 | 0.498 | 2472.6 | |
| Procons | 0.558 | 0.417 | 0.855 | 0.406 | 0.544 | 0.532 | 0.573 | 13086.3 |
| Clustal omeg | 0.554 | 0.358 | 0.789 | 0.45 | 0.575 | 0.579 | 0.533 | 539.91 |
| T-Coffee | 0.551 | 0.41 | 0.848 | 0.402 | 0.491 | 0.545 | 0.587 | 81041.5 |
| Kalign | 0.501 | 0.365 | 0.79 | 0.36 | 0.476 | 0.504 | 0.435 | |
| MOMSA-W | 0.500 | 0.379 | 0.814 | 0.362 | 0.371 | 0.534 | 0.418 | 110289 |
| MUSCLE | 0.475 | 0.318 | 0.804 | 0.35 | 0.409 | 0.45 | 0.46 | 789.57 |
| MAFFT (default) | 0.458 | 0.318 | 0.749 | 0.316 | 0.425 | 0.48 | 0.496 | 68.24 |
| FSA | 0.419 | 0.258 | 0.818 | 0.187 | 0.259 | 0.474 | 0.398 | 53648.1 |
| Dialign | 0.415 | 0.27 | 0.696 | 0.292 | 0.312 | 0.441 | 0.425 | 3977.44 |
| PRANK | 0.376 | 0.265 | 0.68 | 0.257 | 0.321 | 0.36 | 0.356 | 128355 |
| CLUSTALW | 0.374 | 0.223 | 0.712 | 0.22 | 0.272 | 0.396 | 0.308 | 766.47 |
Main procedure of IBBOMSA
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 8 | |
| 9 | |
| 11 | |
| 12 |
Improved migration operator
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
| 9 | |
| 10 | |
| 11 | |
| 12 | |
| 13 | |
| 14 |
Mutation operator
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 9 | |
| 10 |