| Literature DB >> 34690450 |
Mohamed Issa1, Ahmed M Helmi1,2, Ammar H Elsheikh3, Mohamed Abd Elaziz4,5,6.
Abstract
The longest common consecutive subsequences (LCCS) play a vital role in revealing the biological relationships between DNA/RNA sequences especially the newly discovered ones such as COVID-19. FLAT is a Fragmented local aligner technique which is an accelerated version of the local pairwise sequence alignment algorithm based on meta-heuristic algorithms. The performance of FLAT needs to be enhanced since the huge length of biological sequences leads to trapping in local optima. This paper introduces a modified version of FLAT based on improving the performance of the BA algorithm by integration with particle swarm optimization (PSO) algorithm based on a novel infection mechanism. The proposed algorithm, named BPINF, depends on finding the best-explored solution using BA operators which can infect the agents during the exploitation phase using PSO operators to move toward it instead of moving toward the best-exploited solution. Hence, moving the solutions toward the two best solutions increase the diversity of generated solutions and avoids trapping in local optima. The infection can be propagated through the agents where each infected agent can transfer the infection to other non-infected agents which enhances the diversification of generated solutions. FLAT using the proposed technique (BPINF) was validated to detect LCCS between a set of real biological sequences with huge lengths besides COVID-19 and other well-known viruses. The performance of BPINF was compared to the enhanced versions of BA in the literature and the relevant studies of FLAT. It has a preponderance to find the LCCS with the highest percentage (88%) which is better than other state-of-the-art methods.Entities:
Keywords: BA algorithm; COVID-19; Computaional Biology; Longest common consecutive subsequence (LCCS); Meta-heuristics
Year: 2021 PMID: 34690450 PMCID: PMC8527645 DOI: 10.1016/j.eswa.2021.116063
Source DB: PubMed Journal: Expert Syst Appl ISSN: 0957-4174 Impact factor: 6.954
Fig. 1The near-exact LCCS versus the exact LCCS (Issa et al., 2018).
Summary of some BA-based and PSO-based hybrid techniques.
| Ref. | Technique | Hybridization methodology | Application |
|---|---|---|---|
| ( | ASCA-PSO* | PSO exploits the regions around solutions found by SCA | LCCS between biological sequences |
| ( | IMO-PSO* | IMO starts exploring the search space then PSO refines the found solutions (exploitation phase) | LCCS between biological sequences |
| ( | BA-CSA | BA update procedure is applied to agents of CSA where new solutions survive if fitness improves | Global numerical optimization |
| ( | BA-DE | The population is updated randomly using improved BA or DE mechanism to improve both exploration and exploitation | Global numerical optimization |
| ( | BA-PSO | PSO operators are applied to BA solutions in the exploitation phase | ANN training for Enhancement of image registration process of the diagnosis of medical images |
| ( | BA-PSO | Swap and update mechanism is applied where best solutions of one algorithm replace worst solutions in the other one | Design of the labyrinth spillway geometry |
| ( | BA-PSO | Non satisfied solutions in the PSO population are updated using BA operators | Location of unified power flow controller in power systems |
| ( | PSO-BFA | PSO is applied as a mutation operator for BFA individuals | Design of power systems stabilizers in multimachine power systems |
| ( | PSO-TS | TS works as a local improvement procedure for PSO solutions | Tumor classification using gene expression data |
| ( | PSO-GSA | Each updates its position with the contribution of both algorithms (co-evolutionary technique) | Economic emission load dispatch problems |
| ( | BA-ALO | Updating operators of ALO were embedded into the updating equations of BA | Global numerical optimization |
| ( | BA-LBBA | One of many micro-bats is assigned as a leader instead of only one best solution to influence the other agents of LBBA | The mobile robot localization problem |
| ( | PSO-GA | Balancing exploration and exploitation is achieved via incorporating the crossover and mutation operators within PSO | Solving constrained optimization problems |
*Studies which implement the FLAT technique.
Fig. 2A simplified example of FLAT (Issa et al., 2018).
Fig. 3Main procedure of BA algorithm (Yang, 2010).
Fig. 4Main procedure of the PSO algorithm (Kennedy, 1995).
Fig. 5An explanation of updating agents based on the proposed infection mechanism in BPINF.
Fig. 6The flow charts of BPINF for FLAT.
The settings of parameters of various examined MAs.
| Algorithm | Parameter | Value | |
|---|---|---|---|
| SW alignment | +1.0 | ||
| ge | −0.5 | ||
| go | −1.0 | ||
| FLAT | SCA | A | 2.2 |
| ASCA-PSO | 0.25 | ||
| 0.5 | |||
| A | 2.0 | ||
| BA | A0 | 0.8 | |
| 5.0 | |||
| 20 | |||
| Α | 0.95 | ||
| Γ | 2 | ||
| BPINF, BA-PSO-1, BA-PSO-2, BA-CSA and BA-DE | A0 | 0.8 | |
| 5.0 | |||
| 20 | |||
| Α | 0.95 | ||
| Γ | 2 | ||
| 0.25 | |||
| 0.5 | |||
| 0.30 | |||
| 0.70 | |||
| 0.30 | |||
| 25 | |||
Average LCCS similarity percentage (%) measured by FLAT for compared optimizers.
| PSO | IMO | IMO-PSO | SCA | ASCA-PSO | BA | BA-PSO-1 | BA-PSO-2 | BA-DE | BA-CSA | BPINF | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 250,000 | 40 | 53 | 50 | 87 | 56 | 89 | 39 | 73 | 71 | 81 | 78 | 92 |
| 350,000 | 40 | 52 | 53 | 87 | 55 | 89 | 36 | 75 | 73 | 78 | 75 | 92 |
| 550,000 | 100 | 54 | 58 | 88 | 58 | 85 | 37 | 71 | 70 | 79 | 76 | 90 |
| 750,000 | 120 | 51 | 56 | 91 | 55 | 86 | 34 | 69 | 72 | 77 | 74 | 91 |
| 1,000,000 | 150 | 52 | 51 | 88 | 56 | 82 | 34 | 68 | 74 | 76 | 73 | 90 |
| 1,400,000 | 180 | 48 | 48 | 85 | 50 | 78 | 36 | 70 | 69 | 70 | 65 | 92 |
| 1,800,000 | 200 | 45 | 52 | 84 | 48 | 80 | 35 | 62 | 68 | 67 | 65 | 89 |
| 2,200,000 | 240 | 46 | 47 | 81 | 49 | 78 | 34 | 63 | 67 | 69 | 64 | 91 |
| 2,600,000 | 400 | 39 | 43 | 84 | 44 | 76 | 33 | 58 | 62 | 61 | 56 | 90 |
| 3,000,000 | 400 | 38 | 41 | 87 | 41 | 80 | 32 | 55 | 60 | 62 | 59 | 90 |
| 4,000,000 | 450 | 42 | 44 | 86 | 44 | 75 | 33 | 52 | 61 | 58 | 54 | 91 |
| 5,000,000 | 450 | 43 | 45 | 84 | 45 | 78 | 34 | 53 | 59 | 59 | 56 | 90 |
| 6,000,000 | 450 | 45 | 39 | 89 | 46 | 74 | 34 | 63 | 60 | 60 | 58 | 88 |
| 7,000,000 | 500 | 40 | 38 | 81 | 43 | 75 | 35 | 65 | 62 | 56 | 52 | 87 |
| 8,000,000 | 700 | 39 | 39 | 84 | 40 | 73 | 33 | 64 | 61 | 52 | 47 | 84 |
| 9,000,000 | 900 | 36 | 37 | 75 | 38 | 74 | 34 | 60 | 57 | 50 | 43 | 85 |
| 11,000,000 | 1000 | 36 | 33 | 80 | 39 | 71 | 36 | 50 | 56 | 51 | 46 | 81 |
| 13,000,000 | 1300 | 32 | 30 | 70 | 36 | 70 | 31 | 53 | 57 | 47 | 41 | 81 |
| 15,000,000 | 1600 | 27 | 31 | 71 | 32 | 71 | 28 | 45 | 50 | 42 | 38 | 78 |
| 18,000,000 | 1900 | 31 | 29 | 68 | 34 | 69 | 29 | 42 | 47 | 44 | 38 | 79 |
| 21,000,000 | 2200 | 29 | 28 | 65 | 30 | 70 | 26 | 38 | 48 | 39 | 33 | 80 |
Standard deviation of LCCS similarity percentage measured by FLAT for compared optimizers.
| PSO | IMO | IMO-PSO | SCA | ASCA-PSO | BA | BA-PSO-1 | BA-PSO-2 | BA-DE | BA-CSA | BPINF | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 250,000 | 2.24 | 2.28 | 1.28 | 3.52 | 0.90 | 3.64 | 0.64 | 0.37 | 0.89 | 0.89 | |
| 350,000 | 1.52 | 1.79 | 1.00 | 1.96 | 0.75 | 2.08 | 1.65 | 0.33 | 2.31 | 2.31 | |
| 550,000 | 2.57 | 2.64 | 1.28 | 1.76 | 1.01 | 1.88 | 0.85 | 0.66 | 1.19 | 1.19 | |
| 750,000 | 2.27 | 2.41 | 0.77 | 2.70 | 0.94 | 2.82 | 0.95 | 1.13 | 1.33 | 1.33 | |
| 1,000,000 | 2.93 | 3.12 | 1.11 | 4.48 | 1.15 | 4.60 | 2.49 | 0.92 | 3.73 | 3.73 | |
| 1,400,000 | 1.58 | 1.59 | 0.85 | 3.92 | 0.69 | 4.04 | 0.77 | 0.14 | 1.15 | 1.15 | |
| 1,800,000 | 2.09 | 2.41 | 1.12 | 2.05 | 0.94 | 2.17 | 1.29 | 1.46 | 1.93 | 1.93 | |
| 2,200,000 | 1.77 | 2.22 | 0.79 | 2.26 | 0.88 | 2.38 | 1.43 | 1.08 | 2.14 | 2.14 | |
| 2,600,000 | 1.89 | 2.32 | 1.47 | 1.58 | 0.91 | 1.70 | 1.11 | 0.95 | 1.80 | 1.80 | |
| 3,000,000 | 1.59 | 1.91 | 0.96 | 2.90 | 0.79 | 3.02 | 0.96 | 0.87 | 1.56 | 1.56 | |
| 4,000,000 | 4.31 | 4.45 | 1.31 | 9.22 | 1.55 | 9.34 | 2.73 | 2.49 | 4.44 | 4.44 | |
| 5,000,000 | 2.07 | 2.32 | 1.49 | 3.12 | 0.91 | 3.24 | 4.12 | 0.79 | 6.71 | 6.71 | |
| 6,000,000 | 2.53 | 2.67 | 1.10 | 1.18 | 1.01 | 1.30 | 1.01 | 1.56 | 1.64 | 1.64 | |
| 7,000,000 | 3.55 | 3.72 | 1.27 | 1.35 | 1.33 | 1.47 | 0.84 | 0.44 | 1.36 | 1.36 | |
| 8,000,000 | 0.75 | 1.02 | 0.92 | 2.35 | 0.52 | 2.47 | 1.65 | 0.09 | 2.68 | 2.68 | |
| 9,000,000 | 5.51 | 5.91 | 2.35 | 2.83 | 1.99 | 2.95 | 1.38 | 4.11 | 2.24 | 2.24 | |
| 11,000,000 | 1.4 | 1.79 | 1.00 | 1.79 | 0.75 | 1.91 | 0.48 | 0.49 | 2.75 | 2.75 | |
| 13,000,000 | 1.61 | 1.95 | 1.13 | 1.91 | 0.8 | 2.03 | 1.62 | 0.52 | 2.30 | 2.30 | |
| 15,000,000 | 1.2 | 1.35 | 1.50 | 1.35 | 0.62 | 1.47 | 1.02 | 0.98 | 2.33 | 2.33 | |
| 18,000,000 | 2.08 | 2.20 | 1.31 | 3.39 | 0.88 | 3.51 | 1.92 | 1.06 | 2.70 | 2.70 | |
| 21,000,000 | 2.14 | 2.47 | 1.51 | 2.59 | 0.43 | 2.71 | 0.64 | 0.92 | 1.70 | 1.70 |
Fig. 7Average standard deviation of BPINF-based FLAT versus other examined versions.
Wilcoxon test results for the numerical results of various FLAT versions against BPINF.
| PSO | IMO | IMO-PSO | SCA | ASCA-PSO | BA | BA-DE | BA-CSA | BA-PSO-1 | BA-PSO-2 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 250,000 | 4.0E-06 | 2.5E-05 | 2.8E-07 | 7.7E-06 | 3.9E-07 | 2.1E-06 | 2.0E-06 | 1.3E-05 | 2.5E-06 | 3.5E-05 |
| 350,000 | 1.4E-06 | 3.2E-06 | 1.8E-06 | 4.7E-06 | 2.3E-06 | 2.6E-06 | 1.6E-07 | 1.5E-06 | 2.9E-07 | 1.9E-06 |
| 550,000 | 8.5E-07 | 4.6E-05 | 1.0E-06 | 5.8E-06 | 8.0E-08 | 2.2E-06 | 3.2E-07 | 3.1E-05 | 2.5E-06 | 3.4E-05 |
| 750,000 | 5.5E-06 | 2.9E-06 | 5.5E-07 | 1.1E-05 | 3.8E-08 | 1.7E-06 | 3.5E-05 | 2.0E-05 | 7.7E-05 | 6.6E-05 |
| 1,000,000 | 1.2E-05 | 4.1E-07 | 2.6E-07 | 2.3E-05 | 2.4E-07 | 1.5E-06 | 5.1E-06 | 3.9E-07 | 7.0E-06 | 4.1E-07 |
| 1,400,000 | 6.2E-06 | 3.2E-06 | 5.9E-06 | 1.3E-05 | 2.4E-06 | 9.5E-06 | 1.6E-06 | 2.2E-05 | 3.5 E-06 | 2.3E-05 |
| 1,800,000 | 9.1E-07 | 4.5E-06 | 8.1E-07 | 1.5E-06 | 3.5E-07 | 1.2E-06 | 2.2E-07 | 4.5E-05 | 4.2E-07 | 5.6E-05 |
| 2,200,000 | 1.3E-06 | 1.0E-05 | 1.0E-08 | 1.9E-06 | 1.3E-07 | 1.2E-06 | 1.7E-04 | 2.8E-06 | 3.3E-04 | 4.6E-06 |
| 2,600,000 | 4.7E-06 | 3.8E-05 | 2.5E-06 | 2.8E-05 | 1.2E-06 | 2.9E-06 | 4.8E-05 | 2.6E-04 | 7.1E-05 | 6.2E-04 |
| 3,000,000 | 5.2E-07 | 1.4E-06 | 1.4E-07 | 1.4E-06 | 1.5E-07 | 1.2E-06 | 2.5E-05 | 2.0E-07 | 7.0E-05 | 5.0E-07 |
| 4,000,000 | 1.0E-05 | 1.5E-06 | 1.2E-06 | 1.4E-05 | 3.3E-07 | 1.3E-06 | 2.2E-07 | 6.9E-05 | 7.0E-06 | 7.1E-05 |
| 5,000,000 | 1.1E-06 | 9.5E-06 | 1.8E-06 | 2.4E-06 | 9.7E-07 | 1.9E-06 | 3.0E-04 | 2.3E-07 | 3.5E-04 | 6.3E-07 |
| 6,000,000 | 6.3E-05 | 7.7E-05 | 3.4E-07 | 8.0E-05 | 1.8E-08 | 5.1E-07 | 2.9E-05 | 3.9E-05 | 7.0E-05 | 4.5E-05 |
| 7,000,000 | 9.1E-07 | 1.8E-06 | 1.3E-07 | 1.2E-06 | 4.9E-07 | 1.2E-06 | 3.2E-06 | 5.0E-07 | 7.0E-06 | 5.3E-07 |
| 8,000,000 | 6.0E-07 | 5.6E-07 | 3.4E-07 | 2.3E-06 | 2.1E-08 | 4.5E-07 | 1.2E-05 | 4.4E-04 | 6.0E-05 | 5.3E-04 |
| 9,000,000 | 6.5E-06 | 5.9E-06 | 4.0E-08 | 8.4E-05 | 4.7E-07 | 1.4E-06 | 7.3E-07 | 1.4E-05 | 3.7E-06 | 4.3E-05 |
| 11,000,000 | 6.9E-07 | 3.5 E-06 | 8.5E-06 | 2.2E-06 | 5.1E-06 | 1.3E-05 | 9.5E-07 | 2.2E-06 | 2.1E-06 | 2.4E-06 |
| 13,000,000 | 2.3E-07 | 4.0E-07 | 2.8E-07 | 2.7E-07 | 4.0E-07 | 2.5E-06 | 1.8E-08 | 1.7E-05 | 3.5E-06 | 3.2E-05 |
| 15,000,000 | 6.2E-06 | 3.3E-04 | 3.8E-07 | 1.5E-05 | 8.9E-08 | 3.9E-07 | 3.5E-07 | 4.9E-04 | 4.3E-06 | 5.3E-04 |
| 18,000,000 | 6.0E-07 | 1.6E-06 | 9.8E-06 | 1.4E-06 | 5.3E-06 | 2.8E-05 | 2.8E-06 | 1.8E-05 | 5.3E-06 | 4.3E-05 |
| 21,000,000 | 3.2E-06 | 1.8E-06 | 1.0E-07 | 1.7E-05 | 4.5E-08 | 1.4E-07 | 9.3E-07 | 1.3E-04 | 1.6E-06 | 2.3E-04 |
Fig. 8Convergence curve of BPINF.
Fig. 9The execution time of FLAT using BPINF technique against SW for variable lengths.
Fig. 10The execution time of FLAT using BPINF technique versus other algorithms.
Results of conducted sensitivity analysis of the BPINF parameters.
| 65 | 85 | 82 | 78 | |||
| 73 | 82 | 79 | 76 | |||
| 68 | 84 | 78 | 78 | |||
| 63 | 80 | 81 | 72 | |||
| 60 | 83 | 76 | 74 | |||
| 61 | 81 | 78 | 73 | |||
| 75 | 60 | 79 | 60 | |||
| 73 | 65 | 77 | 59 | |||
| 69 | 63 | 74 | 57 | |||
| 63 | 61 | 71 | 61 | |||
| 68 | 59 | 65 | 62 | |||
| 72 | 60 | 62 | 58 | |||
| 89 | 72 | 82 | 68 | |||
| 90 | 81 | 75 | 72 | |||
| 87 | 73 | 77 | 66 | |||
| 83 | 79 | 79 | 69 | |||
| 86 | 74 | 76 | 71 | |||
| 83 | 72 | 73 | 67 | |||
| 100 | 400 | 800 | 60 | |||
| 43 | 90 | 91 | 62 | 90 | ||
| 39 | 86 | 90 | 69 | 91 | ||
| 41 | 84 | 89 | 65 | 90 | ||
| 37 | 85 | 91 | 69 | 88 | ||
| 34 | 74 | 88 | 59 | 87 | ||
| 32 | 69 | 86 | 63 | 84 | ||
Detecting the LCCS between COVID-19 virus with other viruses with FLAT using BPINF and other versions.
| Virus Protein Name | Technique | Score | LCCS | |
|---|---|---|---|---|
| 1 | MERS-CoV | SW | 5 | CVYSV |
| SCA | 3 | LAT | ||
| ASCA-PSO | 3 | QVL | ||
| IMO | 2 | NR | ||
| IMO-PSO | 3 | LSA | ||
| BA | 2 | HT | ||
| BA-DE | 3 | YSV | ||
| BA-CSA | 4 | LEGN | ||
| BA-PSO-1 | 3 | NRA | ||
| BA-PSO-2 | 4 | LPTG | ||
| BPINF | 5 | QVLSA | ||
| 2 | Hepatitis B | SW | 5 | SIFSR |
| SCA | 5 | SIFSR | ||
| ASCA-PSO | 5 | SILSP | ||
| IMO-PSO | 3 | LSP | ||
| IMO | 3 | ILS | ||
| BA | 3 | IGD | ||
| BA-DE | 4 | SIFS | ||
| BA-CSA | 4 | IGD | ||
| BA-PSO-1 | 3 | FSR | ||
| BA-PSO-2 | 4 | IGDA | ||
| BPINF | 5 | SIFSR | ||
| 3 | SARS-CoV | SW | 280 | SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIRKSNHNFLVQAGNVQLRVIGHSMQNCVLKLKVDTANPKTPKYKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNFTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGNFYGPFVDRQTAQAAGTDTTITVNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAVLDMCASLKELLQNGMNGRTILGSALLEDEFTPFDVVRQCSGVTFQ |
| SCA | 12 | TIKGSFLNGSCG | ||
| ASCA-PSO | 30 | YNYEPLTQDHVDILGPLSAQTGIAVLDMCA | ||
| IMO-PSO | 23 | SALLEDEFTPFDVVRQCSGVTFQ | ||
| IMO | 18 | EGCMVQVTCGTTTLNGLW | ||
| BA | 11 | TIKGSFLNGSC | ||
| BA-DE | 15 | EDMLNPNYEDLLIRK | ||
| BA-CSA | 16 | GTTTLNGLWLDDTVYC | ||
| BA-PSO-1 | 14 | SGFRKMAFPSGKVE | ||
| BA-PSO-2 | 16 | FTPFDVVRQCSGVTFQ | ||
| BPINF | 30 | SGFRKMAFPSGKVEGCMVQVTCGTTTLNGL | ||
| 4 | Dengue virus | SW | 5 | IVTCA |
| SCA | 4 | LTGY | ||
| ASCA-PSO | 5 | SGNLL | ||
| IMO | 3 | VLV | ||
| IMO-PSO | 4 | FLNG | ||
| BA | 4 | FDGS | ||
| BA-DE | 4 | FDGS | ||
| BA-CSA | 4 | TLVT | ||
| BA-PSO-1 | 3 | TLV | ||
| BA-PSO-2 | 4 | SGNL | ||
| BPINF | 5 | ETLVT | ||
| 5 | Cowbox virus | SW | 5 | QAIAS |
| SCA | 4 | IKRS | ||
| ASCA-PSO | 5 | SVRVV | ||
| IMO-PSO | 4 | IKRS | ||
| IMO | 3 | VDS | ||
| BA | 2 | VN | ||
| BA-DE | 3 | RVV | ||
| BA-CSA | 4 | VDSA | ||
| BA-PSO-1 | 3 | QVT | ||
| BA-PSO-2 | 4 | VNAS | ||
| BPINF | 5 | SVRVV | ||
Fig. 11Case of finding the exact LCCS.
Fig. 12Case of finding portions of the exact LCCS.
| 1: | |
| 2: | |
| 3: | Set the parameters: fragment length |
| 4: | Initialize a random population where each agent marks two positions, one in each sequence, in the range (1, length ( |
| 5: | |
| 6: | Apply the SW algorithm to every two fragments pointed out by each agent. |
| 7: | Evaluate solutions using Eq. |
| 8: | Move positions of search agents using the update procedure of applied optimizer toward the location of fragments where LCCS is found. |
| 9: | |
| 1: Input SeqA and SeqB; initialize the parameters: |
| 2: Initialize the population ( |
| 3: Cut two fragments starting from the positions of ( |
| SeqB) |
| 4: Apply SW algorithm on each pair of fragments of each search agent ( |
| 5: Compute the alignment score (length of near-exact LCCS found) for each search agent based on Eq. |
| 6: |
| 7: Update the first ( |
| 8: |
| 9: Update each solution ( |
| 10: |
| 11: Check the infection conditions for each solution ( |
| 12: |
| 13: Infection ( |
| 14: |
| 15: Infection ( |
| 16: |
| 17: Infection ( |
| 18: |
| 19: |
| 20: |
| 21: Update the solution ( |
| 22: |
| 23: Select randomly one solution ( |
| 24: |
| 25: Update the solution ( |
| 26: Infection ( |
| 27: |
| 28: |
| 30: |
| 31: |
| 32: |
| 33: Update the solution ( |
| 34: |
| 35: |
| 36: |
| 37: Cut two fragments starting from the positions of ( |
| Compute the alignment score for the search agents based on Eq. |
| 38: |
| 39: |
| 40: Output the near-exact LCCS pointed by the first best solution ( |