| Literature DB >> 22536895 |
Yu-Keng Shih1, Srinivasan Parthasarathy.
Abstract
BACKGROUND: Advances in high-throughput technology has led to an increased amount of available data on protein-protein interaction (PPI) data. Detecting and extracting functional modules that are common across multiple networks is an important step towards understanding the role of functional modules and how they have evolved across species. A global protein-protein interaction network alignment algorithm attempts to find such functional orthologs across multiple networks.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22536895 PMCID: PMC3311098 DOI: 10.1186/1471-2105-13-S3-S11
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The procedure of our method.
Notations in the algorithms
| Whether the protein | |
| { | |
| The network where protein | |
| The match-set containing protein | |
| The networks where at least one protein in |
Figure 2An example with two networks, A and B. The two tables are the similarity scores with and without preprocessing. The solid lines connecting two proteins in the same network are edges, and bold lines are conserved edges. The arrows across two networks are match-sets. The threshold τ is 50 here. Stage 1 shows the clustering result, which is {{A1, B1}, {A2, B2, B3}, {A3, B4}}. Stage 2 generates seed match-sets. Since here are only two networks, we do not merge any match-set and therefore the seed {A2, B3} is ignored. Stage 3 expands the alignment based on the seed {A2, B2}. There are no alignable pairs after stage 3, so stage 4 is not executed in this example.
Experimental datasets
| Datasets | Species | # proteins | # PPIs | Percent of proteins with GO terms |
|---|---|---|---|---|
| DOMAIN | D. mela | 5014 | 10884 | 95.8% |
| S. cere | 3481 | 11186 | 87.2% | |
| C. eleg | 1864 | 2159 | 90.0% | |
| DIP | D. mela | 7486 | 22340 | 82.89% |
| S. cere | 5139 | 24821 | 93.87% | |
| H. sapi | 5025 | 12705 | 95.22% | |
| C. eleg | 3095 | 4891 | 68.27% | |
| E. coli | 2953 | 11759 | 65.09% | |
| M. musc | 1149 | 1171 | 97.39% | |
| H. pylo | 708 | 1354 | 68.05% | |
| BioGRID | D. mela | 7210 | 24710 | 86.1% |
| C. eleg | 3420 | 6339 | 87.0% | |
| S. pomb | 1995 | 12573 | 99.7% | |
| H. sapi | 8282 | 45031 | 93.20% | |
| A. thal | 1609 | 2861 | 94.59% | |
Figure 3The trade-offs in DIP dataset. The average similarity score and the conserved edge rate of different clustering methods (agglo and rbr) and different criterion functions (i1 and i2).
Comparison of quality between our method and IsoRankN
| Datasets | DOMAIN | DIP | BioGRID | |||
|---|---|---|---|---|---|---|
| Coverage | 8588 | 3372 | 24119 | 19555 | 21385 | 13928 |
| Average similarity score | 237.06 | 174.32 | 0.00509 | 0.00426 | 0.1735 | 0.0834 |
| Conserved edges rate | .260 | .160 | .2209 | .086 | 0.1781 | 0.0685 |
| # total enriched GO terms | 1026 | 90 | 3871 | 1893 | 1123 | 523 |
The average similarity scores for DOMAIN dataset are computed by equation 2 and the average similarity scores for DIP and BioGRID datasets are normalized BLAST bit scores, computed by equation 1.
Figure 4The lowest 500 p-values on each dataset. (a) DOMAIN (b) DIP (c) BioGRID.
Figure 5The best match-set (with the lowest p-value) discovered by our algorithm. Proteins with the same circle line style are in the same match-set formed by IsoRankN. Proteins without any circle are not covered by IsoRankN. The same color presents the same species. (a) DIP (DIP ID) (b) BioGRID (UniprotKB AC).
Figure 6Execution time for generating several alignments.