| Literature DB >> 33642960 |
Mohamed Issa1,2.
Abstract
COVID-19 is a global pandemic that aroused the interest of scientists to prevent it and design a drug for it. Nowadays, presenting intelligent biological data analysis tools at a low cost is important to analyze the biological structure of COVID-19. The global alignment algorithm is one of the important bioinformatics tools that measure the most accurate similarity between a pair of biological sequences. The huge time consumption of the standard global alignment algorithm is its main limitation especially for sequences with huge lengths. This work proposed a fast global alignment tool (G-Aligner) based on meta-heuristic algorithms that estimate similarity measurements near the exact ones at a reasonable time with low cost. The huge length of sequences leads G-Aligner based on standard Sine-Cosine optimization algorithm (SCA) to trap in local minima. Therefore, an improved version of SCA was presented in this work that is based on integration with PSO. Besides, mutation and opposition operators are applied to enhance the exploration capability and avoiding trapping in local minima. The performance of the improved SCA algorithm (SP-MO) was evaluated on a set of IEEE CEC functions. Besides, G-Aligner based on the SP-MO algorithm was tested to measure the similarity of real biological sequence. It was used also to measure the similarity of the COVID-19 virus with the other 13 viruses to validate its performance. The tests concluded that the SP-MO algorithm has superiority over the relevant studies in the literature and produce the highest average similarity measurements 75% of the exact one.Entities:
Keywords: Bioinformatics; COVID-19; Pairwise global alignment; Particle swarm optimization algorithm; Sine–Cosine optimization algorithm
Year: 2021 PMID: 33642960 PMCID: PMC7895693 DOI: 10.1016/j.asoc.2021.107197
Source DB: PubMed Journal: Appl Soft Comput ISSN: 1568-4946 Impact factor: 6.725
Fig. 1Aligning two biological sequences using global alignment.
Fig. 2The rule of using G-Aligner with NW alignment and other applications of analyzing COVID-19.
Fig. 3Representation of the solution of the global alignment based on stochastic algorithms.
Fig. 4The local minima of the proposed G-Aligner based on the SP-MO algorithm.
Benchmark of mathematical test functions (Dimension 50).
| Function | Bounds | ||
|---|---|---|---|
| 0 | |||
| 0 | |||
| 0 | |||
| 0 | |||
| 0 | |||
| 0 | |||
| 0 | |||
| 0 | |||
| 0 | |||
| [−100,100] | 0 | ||
| [−10,10] | 0 | ||
| [0, | 0 | ||
| [−100,100] | 0 | ||
| [-2 | 0 | ||
| [−10,10] | 0 | ||
The average results for all algorithms for 30 independent runs.
| F | SP-MO | m-SCA | SCA | PSO | ISCA | SCA-DE | ASCA-PSO | SCA-PSO | SCA-GWO | CSCF |
|---|---|---|---|---|---|---|---|---|---|---|
| 3.20E−07 | 0.37 | 2.3 | 1.21 | 1.20 | 1.50E−14 | 2.30E−15 | 0.62 | 2.60E−13 | 0.034 | |
| 2.30E−06 | 0.68 | 82.5 | 11.69 | 3.20 | 0.09 | 0.12 | 2.10 | 0.21 | 3.67 | |
| 8.50E−17 | 0.13 | 0.28 | 0.26 | 0.37 | 2.10E−4 | 0.002 | 0.008 | 4.50E−05 | 0.00243 | |
| 0 | 0.80 | 1.10 | 0.7 | 0.34 | 0.007 | 4.20E−3 | 0.06 | 3.01E−03 | 8.98E−03 | |
| 5.20E−68 | 7.00E−06 | 1.85E−14 | 0.08 | 0.07 | 1.90E−16 | 7.30E−71 | 9.10E−20 | 8.50E−12 | 10.4E−03 | |
| 3.40E−15 | 0.71 | 12.1 | 5.20 | 7.89 | 7.60E−08 | 6.20E−06 | 1.20 | 0.98 | 2.45 | |
| 4.5E−128 | 1.52 | 90 | 120.8 | 3.20 | 2.30E−16 | 4.10E−68 | 5.73 | 6.70E−10 | 3.95 | |
| 4.80E−14 | 4.20E−02 | 2.40 | 18.71 | 0.30 | 3.48E−04 | 8.52E−06 | 1.77 | 6.54E−09 | 1.24 | |
| 2.20E−65 | 0.08 | 5.2 | 3.50 | 0.09 | 2.50E−04 | 4.10E−10 | 0.06 | 1.03 | 2.34 | |
| 1.6E−137 | 0.13 | 2.76 | 2.63 | 0.40 | 4.50E−26 | 0.004 | 0.12 | 3.45E−03 | 0.245 | |
| 2.19E−04 | 3.40 | 5.13 | 4.31 | 1.03 | 1.20 | 3.25 | 1.4 | 0.97 | 1.89 | |
| 0 | 2.30 | 4.93 | 6.20 | 1.42 | 4.60E−03 | 3.60E−02 | 0.0008 | 0.004 | 4.9E−03 | |
| 0 | 1.30 | 4.20 | 3.65 | 0.43 | 3.65E−05 | 2.10E−02 | 0.05 | 8.64E−06 | 1.045 | |
| 2.07E−06 | 0.70 | 6.70 | 3.56 | 0.93 | 1.30E−03 | 0.47 | 1.03 | 0.067 | 2.53 | |
| 3.67E−13 | 3.20 | 7.23 | 4.23 | 2.10 | 2.30E−04 | 0.004 | 0.03 | 3.7E−03 | 0.00078 |
Standard Deviation of SP-MO versus comparative algorithms.
| F | SP-MO | m-SCA | SCA | PSO | ISCA | SCA-DE | ASCA-PSO | SCA-PSO | SCA-GWO | CSCF |
|---|---|---|---|---|---|---|---|---|---|---|
| 3.20E−07 | 0.757 | 3.41 | 1.21 | 0.97 | 0.62 | 0.65 | 0.37 | 0.068 | 0.236 | |
| 0 | 2.09 | 17.6 | 11.69 | 2.68 | 1.36 | 3.64 | 8.68 | 0.543 | 0.326 | |
| 8.84E−22 | 0.234 | 0.37 | 0.26 | 0.30 | 0.03 | 0 | 0.006 | 0.017 | 0.085 | |
| 1.49E−28 | 0.702 | 0.34 | 0.97 | 0.90 | 0.06 | 0 | 0.12 | 0.039 | 0.466 | |
| 1.85E−14 | 0.624 | 1.27 | 0.08 | 0.80 | 0 | 3.12E−12 | 7.00E−06 | 0.0234 | 0.443 | |
| 0 | 0.554 | 14.4 | 3.42 | 0.71 | 2.34 | 2.63 | 0.71 | 2.06 | 0.026 | |
| 0.003 | 0.016 | 0.48 | 2.14 | 0.02 | 6.10E−09 | 6.20E−06 | 0.02 | 6.8E−10 | 0.019 | |
| 0 | 0.001 | 2.03 | 4.20 | 1.80E−03 | 4.50E−06 | 2.80E−13 | 1.80E−16 | 4.3E−06 | 0 | |
| 0 | 2.668 | 18.49 | 120.8 | 3.42 | 0.08 | 11.07 | 0.08 | 0.018 | 3.056 | |
| 0 | 0 | 5.60 | 18.71 | 3.05E−04 | 1.77 | 0.29 | 3.05E−02 | 1.371 | 0 | |
| 6.38E−08 | 0.062 | 18.18 | 0.88 | 0.08 | 0.06 | 0.18 | 0.08 | 0.020 | 0.034 | |
| 0 | 0 | 1.20 | 4.49E−02 | 4.40E−04 | 7.10E−08 | 9.20E−09 | 4.40E−04 | 7.0E−08 | 0 | |
| 2.65E−07 | 0.881 | 7.78 | 2.63 | 1.13 | 0.12 | 0.48 | 0.73 | 0.021 | 0.075 | |
| 0.0052 | 2.652 | 5.13 | 4.31 | 3.40 | 1.4 | 0.02 | 2.31 | 0.699 | 2.136 | |
| 1.90E−16 | 1.638 | 3.54 | 1.23 | 2.10 | 8.30E−07 | 4.36E−8 | 1.00E−03 | 2.1E−07 | 1.284 |
The number of search agents used for G-Aligner according to each technique.
| m | Search agents |
|---|---|
| 100000 | 10 |
| 150000 | 20 |
| 400000 | 30 |
| 700000 | 50 |
| 900000 | 80 |
| 1200000 | 100 |
| 1700000 | 130 |
| 2000000 | 150 |
| 2500000 | 180 |
| 3000000 | 200 |
| 3500000 | 220 |
| 4500000 | 250 |
| 6000000 | 300 |
| 7000000 | 350 |
| 8000000 | 380 |
| 9000000 | 400 |
The setting values for the parameters of G-Aligner based on a different technique.
| Algorithm | Parameter | Value | |
|---|---|---|---|
| NW Alignment | Match | ||
| −0.5 | |||
| −1.0 | |||
| G-Aligner | SCA | a | 20 |
| PSO | Inertia Coefficient | 0.2 | |
| Local coefficient (C1) | 1.5 | ||
| Global coefficient (C2) | 1.5 | ||
| SP-MO | Inertia Coefficient | 0.2 | |
| Local coefficient (C1) | 1.5 | ||
| Global coefficient (C2) | 1.5 | ||
| A | 20 | ||
| SCA-DE | Beta | 0.3 | |
| 0.3 | |||
| A | 20 | ||
Fig. 5Execution time of NW-Alignment versus G-Aligner based on different metaheuristic techniques.
The average similarity scores using G-Aligner based on meta-heuristic techniques versus exact scores of NW global alignment.
| Protein ID (length) | NW | SCA | PSO | ISCA | m-SCA | SCA-DE | SCA-PSO | ASCA-PSO | SCA-GWO | CSCF | SP-MO | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Q08AH3 | 94 | 36 | 34 | 40 | 41 | 44 | 46 | 39 | 38 | 35 | 69 |
| 2 | P18089 | 53 | 22 | 25 | 24 | 23 | 25 | 25 | 22 | 21 | 18 | 39 |
| 3 | Q9Y2D8 | 96 | 36 | 34 | 39 | 41 | 43 | 46 | 37 | 39 | 36 | 70 |
| 4 | Q9UBJ2 | 107 | 41 | 38 | 41 | 46 | 50 | 50 | 46 | 47 | 44 | 77 |
| 5 | Q9H172 | 131 | 50 | 47 | 52 | 58 | 59 | 63 | 51 | 53 | 50 | 98 |
| 6 | Q12979 | 129 | 53 | 49 | 57 | 58 | 58 | 65 | 58 | 57 | 54 | 90 |
| 7 | Q12979 | 125 | 46 | 41 | 53 | 57 | 59 | 62 | 55 | 52 | 49 | 98 |
| 8 | Q9UG63 | 72 | 28 | 29 | 32 | 30 | 33 | 34 | 28 | 31 | 28 | 56 |
| 9 | Q8WWZ7 | 93 | 34 | 33 | 40 | 39 | 42 | 45 | 38 | 38 | 35 | 66 |
| 10 | O95870 | 83 | 33 | 35 | 38 | 34 | 37 | 40 | 34 | 31 | 28 | 60 |
| 11 | O95342 | 192 | 74 | 69 | 78 | 80 | 83 | 95 | 79 | 77 | 74 | 138 |
| 12 | Q8IUA7 | 209 | 75 | 76 | 81 | 91 | 95 | 104 | 92 | 91 | 87 | 165 |
| 13 | P55198 | 126 | 45 | 40 | 59 | 53 | 59 | 60 | 54 | 54 | 50 | 97 |
| 14 | Q8NFM4 | 156 | 59 | 57 | 68 | 68 | 68 | 78 | 64 | 64 | 60 | 113 |
| 15 | Q9UKV3 | 171 | 63 | 60 | 73 | 70 | 75 | 86 | 71 | 71 | 67 | 124 |
| 16 | A8K2U0 | 172 | 60 | 55 | 67 | 72 | 81 | 87 | 79 | 80 | 76 | 128 |
| 17 | O60706 | 158 | 57 | 55 | 65 | 69 | 74 | 80 | 70 | 74 | 70 | 115 |
| 18 | O43306 | 140 | 53 | 49 | 54 | 59 | 61 | 70 | 56 | 60 | 56 | 106 |
| 19 | Q7Z5R6 | 27 | 13 | 11 | 14 | 12 | 13 | 14 | 8 | 12 | 8 | 20 |
| 20 | A0PJZ0 Q96IX9 | 26 | 39 | 39 | 58 | 50 | 49 | 48 | 46 | 46 | 42 | 80 |
| 21 | Q96IX9 P86434 | 20 | 8 | 8 | 10 | 10 | 10 | 11 | 10 | 7 | 3 | 15 |
| 22 | Q96IU4 Q969K4 | 37 | 16 | 14 | 19 | 19 | 19 | 20 | 14 | 14 | 10 | 27 |
| 23 | P14060 Q7L8J4 | 50 | 20 | 20 | 25 | 26 | 26 | 25 | 22 | 23 | 18 | 38 |
| 24 | J3QRE5 H7C0G5 | 29 | 11 | 12 | 14 | 15 | 15 | 16 | 12 | 15 | 10 | 22 |
| 25 | P04229 | 245 | 94 | 90 | 115 | 127 | 130 | 121 | 122 | 126 | 121 | 183 |
| 26 | P14060 | 157 | 63 | 57 | 76 | 82 | 78 | 79 | 72 | 75 | 70 | 116 |
| 27 | Q8R4X Q8VD53 | 32 | 14 | 12 | 16 | 17 | 16 | 17 | 11 | 12 | 7 | 23 |
| 28 | P68510 | 148 | 61 | 58 | 68 | 73 | 72 | 79 | 69 | 68 | 63 | 105 |
| 29 | A0PJZ0 Q96IX9 | 26 | 11 | 11 | 12 | 13 | 14 | 14 | 11 | 11 | 9 | 20 |
The standard deviation of G-Aligner based on different meta-heuristic for a 20 independent run.
| Protein ID (length) | SCA | PSO | ISCA | m-SCA | SCA-DE | SCA-PSO | ASCA-PSO | SCA-GWO | CSCF | SP-MO | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Q08AH3 | 2.33 | 3.57 | 2.5 | 0.9 | 0.91 | 0.3 | 0.57 | 1.11 | 1.01 | 0.61 |
| 2 | P18089 | 1.84 | 2.01 | 1.41 | 0.75 | 0.63 | 0.26 | 1.58 | 0.83 | 0.92 | 0.80 |
| 3 | Q9Y2D8 | 2.69 | 1.81 | 1.27 | 1.01 | 0.91 | 0.59 | 0.78 | 1.11 | 1.16 | 0.37 |
| 4 | Q9UBJ2 | 2.46 | 2.75 | 1.93 | 0.94 | 0.4 | 1.06 | 0.88 | 0.6 | 0.57 | 0.24 |
| 5 | Q9H172 | 3.17 | 4.53 | 3.17 | 1.15 | 0.74 | 0.85 | 2.42 | 0.94 | 1 | 0.8 |
| 6 | Q12979 | 1.64 | 3.97 | 2.78 | 0.69 | 0.48 | 0.07 | 0.7 | 0.68 | 0.83 | 0.44 |
| 7 | Q12979 | 2.46 | 2.1 | 1.47 | 0.94 | 0.75 | 1.39 | 1.22 | 1.09 | 1.86 | 0.55 |
| 8 | Q9UG63 | 2.27 | 2.31 | 1.62 | 0.88 | 0.42 | 1.01 | 1.36 | 0.76 | 1.09 | 0.63 |
| 9 | Q8WWZ7 | 2.37 | 1.63 | 1.14 | 0.91 | 1.1 | 0.88 | 1.04 | 1.44 | 1.78 | 0.4 |
| 10 | O95870 | 1.96 | 2.95 | 2.07 | 0.79 | 0.47 | 0.8 | 0.89 | 0.81 | 1.12 | 0.74 |
| 11 | O95342 | 4.5 | 9.27 | 6.49 | 1.55 | 0.82 | 2.42 | 2.66 | 1.16 | 1.29 | 0.91 |
| 12 | Q8IUA7 | 2.37 | 3.17 | 2.22 | 0.91 | 1 | 0.72 | 4.05 | 1.34 | 1.22 | 0.23 |
| 13 | P55198 | 2.72 | 1.23 | 0.86 | 1.01 | 0.61 | 1.49 | 0.94 | 0.95 | 1.7 | 0.4 |
| 14 | Q8NFM4 | 3.77 | 1.4 | 0.98 | 1.33 | 0.78 | 0.37 | 0.77 | 1.12 | 0.99 | 0.48 |
| 15 | Q9UKV3 | 1.07 | 2.4 | 1.68 | 0.52 | 0.43 | 0.02 | 1.58 | 0.77 | 1.4 | 0.72 |
| 16 | A8K2U0 | 5.96 | 2.88 | 2.01 | 1.99 | 1.86 | 4.04 | 1.31 | 2.2 | 2.76 | 0.84 |
| 17 | O60706 | 1.84 | 1.84 | 1.29 | 0.75 | 0.4 | 0.42 | 0.41 | 0.74 | 0.67 | 0.95 |
| 18 | O43306 | 2 | 1.96 | 1.37 | 0.8 | 0.53 | 0.45 | 1.55 | 0.87 | 1.01 | 0.66 |
| 19 | Q7Z5R6 | 1.4 | 1.4 | 0.98 | 0.62 | 0.9 | 0.91 | 0.95 | 1.07 | 1.44 | 0.36 |
| 20 | A0PJZ0 Q96IX9 | 2.25 | 3.44 | 2.41 | 0.88 | 0.71 | 0.99 | 1.85 | 0.88 | 0.86 | 0.61 |
| 21 | Q96IX9 P86434 | 2.52 | 2.64 | 1.78 | 0.43 | 0.21 | 0.85 | 0.97 | 0.38 | 0.3 | 0.28 |
| 22 | Q96IU4 Q969K4 | 2.31 | 2.5 | 1.63 | 0.72 | 0.43 | 0.70 | 0.94 | 0.6 | 0.72 | 0.25 |
| 23 | P14060 Q7L8J4 | 2.58 | 2.41 | 1.51 | 0.73 | 0.38 | 0.88 | 0.84 | 0.55 | 1.16 | 0.16 |
| 24 | J3QRE5 H7C0G5 | 2.48 | 2.28 | 1.64 | 0.88 | 0.25 | 0.73 | 0.96 | 0.42 | 1.23 | 0.45 |
| 25 | P04229 | 2.33 | 2.25 | 1.81 | 0.75 | 0.33 | 0.90 | 1.2 | 0.5 | 1.43 | 0.28 |
| 26 | P14060 | 2.22 | 2.36 | 1.65 | 0.44 | 0.43 | 0.92 | 0.86 | 0.6 | 0.75 | 0.47 |
| 27 | Q8R4X Q8VD53 | 2.24 | 2.34 | 1.44 | 0.6 | 0.43 | 0.93 | 0.78 | 0.6 | 0.63 | 0.36 |
| 28 | P68510 | 2.55 | 2.38 | 1.71 | 0.67 | 0.41 | 0.84 | 1.30 | 0.58 | 0.73 | 0.19 |
| 29 | A0PJZ0 Q96IX9 | 2.48 | 2.33 | 1.64 | 0.87 | 0.47 | 0.94 | 1.03 | 0.64 | 0.97 | 0.55 |
Fig. 6Similarity scores of aligning COVID-19 against 13 viruses based on G-Aligner using SP-MO versus ASCA-PSO and NW algorithm [33].
Fig. 7Similarity scores of aligning COVID-19 against 13 viruses based on G-Aligner using SP-MO versus various stochastic techniques in the literature.