| Literature DB >> 35898232 |
Umair Ayub1,2, Hammad Naveed1,2.
Abstract
Motivation: The advancement of high-throughput PPI profiling techniques results in generating a large amount of PPI data. The alignment of the PPI networks uncovers the relationship between the species that can help understand the biological systems. The comparative study reveals the conserved biological interactions of the proteins across the species. It can also help study the biological pathways and signal networks of the cells. Although several network alignment algorithms are developed to study and compare the PPI data, the development of the aligner that aligns the PPI networks with high biological similarity and coverage is still challenging.Entities:
Keywords: Network meta-analysis; computational biology; semantics
Year: 2022 PMID: 35898232 PMCID: PMC9309777 DOI: 10.1177/11769343221110658
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 2.031
Figure 1.The density plot for the local sequence similarity is presented. The distribution is highly skewed. Less than 5% of the total protein pairs produce a similarity greater than 2.
Figure 2.Presents the 2 networks (Network 1 on the left side and Network 2 on the right side). Blue nodes are the aligned pairs. Green and red node pairs are the candidate pairs for alignment as they have aligned neighbors. The node pair with green color (u2, v2) has a score of 2 (due to 2 aligned neighbors) while the node pair with red color has a score of 3. According to the Stage-3 algorithm, the red nodes pair will be aligned first and then the green nodes pair will be aligned. The nodes u7, u8, and v7 will not be considered as candidates for alignment as they do not have any common aligned neighbor.
Data statistics that include the number of nodes, number of edges, and percentage of proteins with the 3D resolved structure are presented.
| Species Name | Mouse | Human | Yeast | Worm | Fly |
|---|---|---|---|---|---|
| Number of nodes | 744 | 10 791 | 5036 | 4486 | 7498 |
| Number of edges | 1229 | 47 427 | 19 085 | 11 496 | 25 679 |
| Nodes with 3D structure (%) | 17 | 43 | 29 | 2 | 3 |
Source: All these datasets are extracted from HINT database.
The results of the different stages and variants of BioAlign.
| Specie pair | Measures | Top-nodes | Stage1 | Stage1 + | Stage1 + | Stage2 + 3 (BD) Stage1 + BT |
|---|---|---|---|---|---|---|
| Mouse-Human | AFSMF | 0.78 | 0.78 |
|
|
|
| AFSBP | 0.68 | 0.67 |
|
|
| |
| NodesMF | 87 | 88 |
|
|
| |
| NodesBP | 90 | 92 |
|
|
| |
| Mouse-Yeast | AFSMF | 0.70 | 0.51 |
| 0.46 | 0.46 |
| AFSBP | 0.53 | 0.35 |
| 0.31 | 0.31 | |
| NodesMF | 20 | 56 | 71 | 69 |
| |
| NodesBP | 22 | 65 | 85 | 84 |
| |
| Mouse-Fly | AFSMF | 0.76 | 0.68 |
| 0.66 |
|
| AFSBP | 0.57 | 0.50 |
|
|
| |
| NodesMF | 58 | 76 | 78 |
| 78 | |
| NodesBP | 59 | 81 |
|
|
| |
| Mouse-Worm | AFSMF | 0.73 | 0.63 |
| 0.61 |
|
| AFSBP | 0.55 | 0.46 |
|
|
| |
| NodesMF | 43 | 69 |
| 72 |
| |
| NodesBP | 40 | 63 | 67 | 66 |
| |
| Yeast-Human | AFSMF | 0.74 | 0.60 |
|
|
|
| AFSBP | 0.57 | 0.45 |
| 0.41 | 0.41 | |
| NodesMF | 31 | 52 | 62 | 61 |
| |
| NodesBP | 31 | 60 | 73 | 72 |
| |
| Average | AFSMF | 0.74 | 0.64 |
| 0.61 |
|
| AFSBP | 0.58 | 0.48 |
|
|
| |
| NodesMF | 48 | 68 |
| 74 |
| |
| NodesBP | 49 | 72 | 80 | 79 |
|
Abbreviations: B, biology; T, Topology.
BV1 and BV2 use biology and topology (respectively), after stage1, to align the unaligned nodes. BD uses both biology and topology in the second stage. The bold cells present the best results of BioAlign variants.
Alignment of the mouse-human pair is completed in the first stage.
Comparison between the results of BioAlign and existing techniques on 5 network pairs on the basis of AFS and percentage of aligned nodes w.r.t MF and BP. BioAlign produced high-quality alignments in terms of AFS and coverage.
| Sp. pairs | Eval. criteria | Alignment algorithms | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BA | TW | BE | MO | SA | PR | HA | MA | IB | SAN | NE | MAG | ||
| MH | AFSMF | 0.78 | 0.73 |
| 0.56 | 0.58 | 0.76 | 0.48 | 0.42 | 0.35 | 0.34 | 0.33 | 0.36 |
| AFSBP | 0.67 | 0.63 |
| 0.43 | 0.43 | 0.66 | 0.34 | 0.30 | 0.26 | 0.25 | 0.24 | 0.26 | |
| NodesMF |
| 86 | 85 | 82 | 82 | 87 | 78 | 74 | 72 | 75 | 73 | 76 | |
| NodesBP |
| 88 | 87 | 87 | 84 | 91 | 82 | 81 | 83 | 82 | 82 | 82 | |
| MY | AFSMF | 0.46 | 0.44 |
| 0.43 | 0.40 | 0.38 | 0.36 | 0.31 | 0.29 | 0.30 | 0.31 | 0.29 |
| AFSBP |
| 0.31 |
| 0.31 | 0.27 | 0.25 | 0.25 | 0.23 | 0.21 | 0.22 | 0.22 | 0.21 | |
| NodesMF |
| 69 | 64 | 67 | 72 | 56 | 71 | 71 | 63 | 67 | 64 | 67 | |
| NodesBP | 88 | 79 | 73 | 77 |
| 72 | 90 | 88 | 76 | 84 | 83 | 83 | |
| YH | AFSMF |
| 0.50 | 0.54 | 0.48 | 0.48 | 0.42 | 0.46 | 0.26 | 0.30 | 0.27 | 0.26 | 0.26 |
| AFSBP |
| 0.36 | 0.40 | 0.36 | 0.35 | 0.32 | 0.34 | 0.22 | 0.24 | 0.23 | 0.22 | 0.22 | |
| NodesMF | 63 | 60 | 55 | 59 |
| 57 | 63 | 60 | 58 | 61 | 60 | 59 | |
| NodesBP | 74 | 70 | 63 | 70 |
| 67 | 74 | 72 | 70 | 73 | 72 | 70 | |
| MF | AFSMF |
| 0.62 |
| 0.55 | 0.50 | 0.55 | 0.42 | 0.36 | 0.33 | 0.32 | 0.32 | 0.37 |
| AFSBP |
| 0.46 |
| 0.40 | 0.37 | 0.41 | 0.31 | 0.28 | 0.24 | 0.23 | 0.23 | 0.28 | |
| NodesMF |
| 77 | 74 | 73 | 73 | 69 | 67 | 66 | 58 | 67 | 57 | 63 | |
| NodesBP |
| 82 | 79 | 80 | 80 | 77 | 76 | 74 | 58 | 76 | 60 | 62 | |
| MW | AFSMF |
| 0.56 | 0.58 | 0.49 | 0.56 | 0.52 | 0.49 | 0.41 | 0.30 | 0.32 | 0.29 | 0.31 |
| AFSBP |
| 0.41 | 0.41 | 0.34 | 0.41 | 0.39 | 0.37 | 0.30 | 0.25 | 0.25 | 0.24 | 0.25 | |
| NodesMF | 73 | 68 | 67 | 66 |
| 61 | 73 | 71 | 62 | 59 | 62 | 64 | |
| NodesBP | 68 | 64 | 63 | 62 | 69 | 56 | 67 | 66 | 70 | 57 | 72 |
| |
| Avg | AFSMF |
| 0.57 | 0.61 | 0.50 | 0.50 | 0.52 | 0.44 | 0.35 | 0.31 | 0.31 | 0.30 | 0.32 |
| AFSBP |
| 0.42 | 0.46 | 0.37 | 0.37 | 0.40 | 0.32 | 0.27 | 0.24 | 0.24 | 0.23 | 0.24 | |
| NodesMF |
| 72 | 68 | 69 | 73 | 66 | 70 | 68 | 63 | 66 | 63 | 66 | |
| NodesBP |
| 77 | 72 | 75 | 80 | 73 | 77 | 76 | 71 | 74 | 74 | 75 | |
Abbreviations: BA, BioAlign; BE, BEAMS; HA, HubAlign; IB, IBNAL; MA, ModuleAlign; MAG, MAGNA++; NE, NETAL; MO, MONACO; PR, PROPER; SA, SAlign; SAN, SANA; TW, Twadn.
The percentage of aligned nodes is calculated with respect to the smaller network. AFS referred to as the average functional similarity of the complete alignment while Nodes are referred to as the percentage of nodes aligned. Bold cells represent the best results.
Comparison between the results of BioAlign and existing techniques on 5 network pairs on the basis of NGOC w.r.t MF and BP. BioAlign produced better or comparable results in terms of NGOC.
| Sp. pairs | Eval. criteria | Alignment algorithms | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BA | TW | BE | MO | SA | PR | HA | MA | IB | SAN | NE | MAG | ||
| MH | NGOCMF | 0.54 | 0.49 |
| 0.28 | 0.28 | 0.29 | 0.17 | 0.12 | 0.06 | 0.08 | 0.07 | 0.08 |
| NGOCBP | 0.48 | 0.43 |
| 0.19 | 0.17 | 0.18 | 0.08 | 0.04 | 0.01 | 0.01 | 0.01 | 0.01 | |
| MY | NGOCMF |
| 0.14 | 0.14 | 0.12 | 0.12 | 0.07 | 0.10 | 0.07 | 0.05 | 0.04 | 0.05 | 0.05 |
| NGOCBP |
| 0.06 | 0.06 | 0.05 | 0.04 | 0.02 | 0.03 | 0.02 | 0.01 | 0.01 | 0.01 | 0.01 | |
| YH | NGOCMF |
| 0.18 | 0.19 | 0.17 | 0.19 | 0.15 | 0.18 | 0.07 | 0.06 | 0.07 | 0.07 | 0.07 |
| NGOCBP |
| 0.08 | 0.09 | 0.08 | 0.08 | 0.05 | 0.08 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | |
| MF | NGOCMF |
| 0.24 | 0.26 | 0.17 | 0.13 | 0.15 | 0.08 | 0.05 | 0.02 | 0.02 | 0.02 | 0.03 |
| NGOCBP |
| 0.12 | 0.13 | 0.08 | 0.07 | 0.08 | 0.04 | 0.02 | 0.01 | 0.01 | 0.01 | 0.01 | |
| MW | NGOCMF |
| 0.18 | 0.18 | 0.12 | 0.16 | 0.12 | 0.13 | 0.08 | 0.03 | 0.03 | 0.03 | 0.05 |
| NGOCBP |
| 0.08 | 0.09 | 0.05 | 0.08 | 0.06 | 0.06 | 0.03 | 0.02 | 0.01 | 0.01 | 0.02 | |
| Avg | NGOCMF |
| 0.25 | 0.27 | 0.17 | 0.18 | 0.16 | 0.13 | 0.08 | 0.04 | 0.05 | 0.05 | 0.06 |
| NGOCBP |
| 0.15 |
| 0.09 | 0.09 | 0.08 | 0.06 | 0.02 | 0.01 | 0.01 | 0.01 | 0.01 | |
Bold cells represent the best results.
Figure 3.The 2D-positions of the aligners on the basis of average AFS and the percentage of aligned nodes are represented. The solutions of different aligners are represented by the lines with different colors. The best-line (light-blue) represents the Pareto-Front. The dotted lines show the difference of BioAlign with Twadn, BEAMS, and SAlign that is notably high. BioAlign outperforms all the existing aligners in terms of positions that is validated by the Pareto-Front technique.
Figure 4.The number of aligned nodes that have: (A) AFS > 0.50 and (B) AFS > 0.75 are presented. BioAlign aligns a much larger number of nodes in both cases (AFS > 0.50 and AFS > 0.75). The margin between the results of BioAlign and existing aligners is notably high.
Figure 5.The results in terms of AFS for different percentages of aligned nodes (10, 20 . . . 100) are presented. BioAlign outperforms all existing techniques in all cases (blue color). PROPER and BEAMS lines (green and pink colors) show incompleteness in 4 out of 5 cases. Twadn and MONACO show incompleteness in the Mouse-Yeast case. The remaining algorithms are failed to produce high AFS.
The average execution time of the aligners on 5 datasets.
| Algorithm | Execution time |
|---|---|
| PROPER | 3 s |
| Twadn | 5 s |
| MONACO | 30 s |
| BioAlign | 48 s |
| BEAMS | 54 s |
| HubAlign | 74 s |
| SAlign | 88 s |
| SANA | 06 min |
| ModuleAlign | 26 min |
| MAGNA++ | 58 min |
| Algorithm-1: Seeds-Generation on the basis of biological scoring matrices |
|---|
| 1: Procedure: Seed Generation |
| Algorithm-2: Alignment using Remote Homology |
|---|
| 1: Procedure: Align_Remote_Homologs |
| Algorithm-3: Alignment using Secondary Structure Motifs |
|---|
| 1: Procedure: Alignment_using_SS-Motifs |