| Literature DB >> 21685095 |
Edoardo M Airoldi1, Katherine A Heller, Ricardo Silva.
Abstract
MOTIVATION: Proteins and protein complexes coordinate their activity to execute cellular functions. In a number of experimental settings, including synthetic genetic arrays, genetic perturbations and RNAi screens, scientists identify a small set of protein interactions of interest. A working hypothesis is often that these interactions are the observable phenotypes of some functional process, which is not directly observable. Confirmatory analysis requires finding other pairs of proteins whose interaction may be additional phenotypical evidence about the same functional process. Extant methods for finding additional protein interactions rely heavily on the information in the newly identified set of interactions. For instance, these methods leverage the attributes of the individual proteins directly, in a supervised setting, in order to find relevant protein pairs. A small set of protein interactions provides a small sample to train parameters of prediction methods, thus leading to low confidence.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21685095 PMCID: PMC3117334 DOI: 10.1093/bioinformatics/btr236
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Example with words.
Fig. 2.Example with proteins.
Fig. 3.General framework of the procedure: first, a ‘prior’ over parameters Θ for a link classifier is defined empirically using linked and unlinked pairs of points (the dashed edges indicate that creating a prior empirically is optional, but in practice we rely on this method). Given a query set S of linked pairs of interest, the system computes the predictive likelihood of each linked pair 𝒟(∈𝒟+ and compares it to the conditional predictive likelihood, given the query. This defines a measure of similarity with respect to S by which all pairs in 𝒟+ are sorted.
Collection of datasets used to generate protein-specific features
| Type of data | Data sources for our study |
|---|---|
| Gene expression | |
| Synthetic genetic int. | |
| Cellular localization | |
| TF binding sites | |
| Sequence data |
Number of times each method wins when querying pairs of MIPS classes using the MIPS protein–protein interaction network
| Method | #AUC | #TOP10 | #AUC.S | #TOP10.S |
|---|---|---|---|---|
| COS | 240 | 294 | 219 | 277 |
| NNS | 42 | 122 | 28 | 75 |
| MLS | 105 | 270 | 52 | 198 |
| RBS | 542 | 556 | 578 | 587 |
| Method | #AUC | #TOP10 | #AUC.S | #TOP10.S |
| COS | 314 | 356 | 306 | 340 |
| NNS | 75 | 146 | 62 | 111 |
| MLS | 273 | 329 | 246 | 272 |
| RBS | 267 | 402 | 245 | 387 |
The first two columns, #AUC and #TOP10, count the number of times the respective method obtains the best score according to the AUC and TOP10 measures, respectively, among the 4 approaches. This is divided by the number of replications of each query type (5). The last two columns, #AUC.S and #TOP10.S are ‘smoothed’ versions of this statistic: a method is declared the winner of a round of 5 replications if it obtains the best score in at least 3 out of the 5 replications. The top table shows the results when only the continuous variables are used by RBSets, and in the bottom table when the discrete variables are also given to RBSets.
Pairwise comparison of methods according to the AUC and TOP10 criterion
| COS | NNS | MLS | RBS | |
|---|---|---|---|---|
| AUC | ||||
| COS | – | 0.67 | 0.43 | 0.30 |
| NNS | 0.32 | – | 0.18 | 0.06 |
| MLS | 0.56 | 0.81 | – | 0.25 |
| RBS | 0.69 | 0.93 | 0.74 | – |
| TOP10 | ||||
| COS | – | 0.70 | 0.46 | 0.30 |
| NNS | 0.29 | – | 0.25 | 0.11 |
| MLS | 0.53 | 0.74 | – | 0.28 |
| RBS | 0.69 | 0.88 | 0.71 | – |
Each cell shows the proportion of the trials where the method in the respective row wins over the method in the column, according to both criteria. In each cell, the proportion is calculated with respect to the 4655 rankings where no tie happened.
Distribution across all queries of the number hits in the top 10 pairs, as ranked by each algorithm
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Proportion of top hits using MIPS categories and links specified by the MIPS database | |||||||||||
| COS | 0.12 | 0.15 | 0.12 | 0.10 | 0.08 | 0.07 | 0.06 | 0.05 | 0.04 | 0.07 | 0.08 |
| NNS | 0.29 | 0.16 | 0.14 | 0.10 | 0.06 | 0.05 | 0.03 | 0.03 | 0.03 | 0.03 | 0.02 |
| MLS | 0.12 | 0.12 | 0.12 | 0.10 | 0.09 | 0.08 | 0.07 | 0.06 | 0.07 | 0.06 | 0.07 |
| RBS | 0.04 | 0.08 | 0.09 | 0.09 | 0.09 | 0.08 | 0.09 | 0.07 | 0.09 | 0.08 | 0.14 |
| Proportion of top hits using GO categories and links specified by the MIPS database | |||||||||||
| COS | 0.12 | 0.13 | 0.11 | 0.10 | 0.11 | 0.09 | 0.06 | 0.06 | 0.04 | 0.06 | 0.06 |
| NNS | 0.53 | 0.23 | 0.07 | 0.02 | 0.02 | 0.02 | 0.04 | 0.01 | 0.00 | 0.00 | 0.01 |
| MLS | 0.16 | 0.11 | 0.12 | 0.10 | 0.08 | 0.08 | 0.08 | 0.06 | 0.05 | 0.06 | 0.05 |
| RBS | 0.09 | 0.09 | 0.10 | 0.10 | 0.08 | 0.08 | 0.06 | 0.08 | 0.08 | 0.07 | 0.12 |
The more skewed to the right, the better. Notice that using GO categories doubles the number of zero hits for RBSets.
Number of times each method wins when querying pairs of GO classes using the MIPS protein–protein interaction network
| Method | #AUC | #TOP10 | #AUC.S | #TOP10.S |
|---|---|---|---|---|
| COS | 58 | 73 | 58 | 72 |
| NNS | 1 | 10 | 0 | 4 |
| MLS | 26 | 55 | 13 | 38 |
| RBS | 93 | 105 | 101 | 110 |
Columns #AUC, #TOP10, #AUC.S and #TOP10.S are defined as in Table 2.
Number of times each method wins when querying pairs of KEGG classes using the KEGG protein–protein interaction network
| Method | #AUC | #TOP10 | #AUC.S | #TOP10.S |
|---|---|---|---|---|
| COS | 159 | 575 | 134 | 507 |
| NNS | 30 | 305 | 17 | 227 |
| MLS | 290 | 506 | 199 | 431 |
| RBS | 1042 | 1091 | 1107 | 1212 |
Columns #AUC, #TOP10, #AUC.S and #TOP10.S are defined as in Table 2.
Distribution across all queries of the number hits in the top 10 pairs, as ranked by each algorithm
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Proportion of top hits using KEGG categories and links specified by the KEGG database | |||||||||||
| COS | 0.56 | 0.21 | 0.08 | 0.03 | 0.02 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |
| NNS | 0.89 | 0.03 | 0.04 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| MLS | 0.57 | 0.21 | 0.08 | 0.04 | 0.02 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 |
| RBS | 0.29 | 0.24 | 0.16 | 0.09 | 0.06 | 0.03 | 0.02 | 0.01 | 0.03 | 0.02 | 0.01 |
The more skewed to the right, the better.