| Literature DB >> 21034440 |
Ethan Dh Kim1, Ashish Sabharwal, Adrian R Vetta, Mathieu Blanchette.
Abstract
BACKGROUND: Affinity purification followed by mass spectrometry identification (AP-MS) is an increasingly popular approach to observe protein-protein interactions (PPI) in vivo. One drawback of AP-MS, however, is that it is prone to detecting indirect interactions mixed with direct physical interactions. Therefore, the ability to distinguish direct interactions from indirect ones is of much interest.Entities:
Year: 2010 PMID: 21034440 PMCID: PMC2991326 DOI: 10.1186/1748-7188-5-34
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Figure 1Indirect interactions in AP-MS PPI data. In a cell, multiple copies of a bait protein b are expressed, and interact (directly or indirectly) with other proteins p1, ..., p8 (left); After the pull-down on the bait b, MS detects all prey proteins, including indirect interaction partners (top right); The direct interaction network should, however, contain only the edges between direct interaction partners (bottom right).
Figure 2Example of a direct interaction network and its connectivity matrix. An example of a direct interaction network G(left) with its connectivity matrix (right) calculated with . Assuming each edge of Gsurvives with probability , the probability of connectivity between each pair of protein can be estimated via sampling of the probabilistic network.
Performance of detecting weakly connected vertices
| (i) 1-cut vertices | (ii) Degree 2 vertices | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Network | Total | real | FDR(%) | FNR(%) | real | FDR(%) | FNR(%) | Remaining | ||
| Yu | 1090 | 552 | 552 | 0 | 0 | 195 | 207 | 7.7 | 2.05 | 331 |
| DIP | 1406 | 656 | 656 | 0 | 0 | 309 | 326 | 5.82 | 0.64 | 424 |
| PAM | 1000 | 457 | 457 | 0 | 0 | 351 | 363 | 3.58 | 0.28 | 180 |
| DM | 1000 | 323 | 323 | 0 | 0 | 117 | 126 | 11.9 | 5.12 | 551 |
Number of vertices detected as (i) 1-cut vertices, and (ii) degree 2 vertices in the real network and the predicted network. False discovery (FDR) and false negative (FNR) ratios are given in percentage. Remaining: the number of vertices remaining after identifying 1-cut vertices and degree 2 vertices.
Performance of quasi-clique predictions
| Network | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| real | FD(%) | FN(%) | real | FD(%) | FN(%) | real | FD(%) | FN(%) | ||||
| Yu | 42 | 54 | 22.22 | 0 | 66 | 104 | 36.53 | 0 | 146 | 308 | 55.19 | 5.48 |
| DIP | 0 | 42 | 100 | 0 | 86 | 112 | 26.79 | 4.65 | 184 | 266 | 35.34 | 6.52 |
| PAM | 0 | 31 | 100 | 0 | 0 | 45 | 100 | 0 | 0 | 96 | 100 | 0 |
| DM | 254 | 346 | 28.32 | 2.36 | 194 | 267 | 29.21 | 2.57 | 488 | 718 | 34.96 | 4.30 |
Number of edges that belong to maximal cliques of size k. Real: actual number of edges that belong to maximal cliques of size k; pred: predicted number of maximal k-clique edges; FD(false discovery ratio): percentage of false-positives in the predicted set; FN(false negative ratio): percentage of false-negatives in the real set.
Performance of genetic algorithm and overall algorithm
| Network | (i) reduced network | (ii) overall network | ||||||
|---|---|---|---|---|---|---|---|---|
| real | FDR(%) | FNR(%) | real | FDR(%) | FNR(%) | |||
| Yu | 563 | 552 | 43.65 | 44.76 | 1318 | 1390 | 14.96 | 10.31 |
| DIP | 931 | 890 | 35.50 | 38.34 | 1967 | 2041 | 17.34 | 14.23 |
| PAM | 473 | 421 | 49.88 | 55.39 | 1538 | 1462 | 16.14 | 20.28 |
| DM | 1138 | 1295 | 43.39 | 35.58 | 1869 | 1804 | 32.81 | 35.15 |
(i) Performance of the genetic algorithm on the reduced network obtained from removing 1-cut vertices and degree-2 vertices. (ii) Overall performance of the combined prediction pipeline on the complete connectivity matrix. Real and predicted describe the number of edges in the real and predicted networks, respectively.
Comparison of our method to simple hill-climbing approach
| Network | Real | Predicted | FDR(%) | FNR(%) |
|---|---|---|---|---|
| (i) Hill-climbing | 1318 | 2108 | 86.67 | 78.68 |
| (ii) Hill-climbing + weakly conn. nodes | 1318 | 1517 | 43.57 | 35.05 |
| (iii) Our approach (GA) | 1318 | 1390 | 14.96 | 10.31 |
Comparison of our method to simple hill-climbing approach. (i) accuracy of the hill-climbing approach used over the complete network; (ii) accuracy of the hill-climbing approach after fixing the weakly connected nodes using our algorithm; (iii) accuracy of our combined pipeline using the genetic algorithm.
Running times of the algorithm
| Network | (i) 1-cut & degree 2 vertices (secs) | (ii) Quasi-clique predictions | (iii) Genetic algorithm |
|---|---|---|---|
| Yu | 0:00:29 | 0:31:02 | 15:00:00 |
| DIP | 0:00:41 | 0:48:14 | 15:00:00 |
| PAM | 0:00:19 | 0:29:49 | 15:00:00 |
| DM | 0:00:24 | 1:18:03 | 15:00:00 |
Running times of our method on the model networks for each phase of the algorithm: (i) detecting 1-cut vertices and degree two vertices; (ii) predicting quasi-clique clusters; and (iii) running the genetic algorithm. We are reporting the average run time over three runs on each network. The implementation was tested on a Powermac G5 2 Ghz with 4 GB of RAM. Note that the genetic algorithm was run for a fixed amount of time, and the top scoring candidates achieved the quality as shown in Table 3. The times are shown in hh.mm.ss format.
Figure 3Inferring direct interactions from actual AP-MS dataset. Overlap between the Y2 H interaction network of Yu et al. and various AP-MS-based networks: (a) High-confidence set of interactions from Krogan et al. (b) Set of 164 highest scoring interactions from Krogan et al. (c) Set of 164 interactions predicted as direct interactions by our algorithms, based on the AP-MS data from Krogan et al.