| Literature DB >> 24584908 |
Abstract
High-throughput AP-MS methods have allowed the identification of many protein complexes. However, most post-processing methods of this type of data have been focused on detection of protein complexes and not its subcomplexes. Here, we review the results of some existing methods that may allow subcomplex detection and propose alternative methods in order to detect subcomplexes from AP-MS data. We assessed and drew comparisons between the use of overlapping clustering methods, methods based in the core-attachment model and our own prediction strategy (TRIBAL). The hypothesis behind TRIBAL is that subcomplex-building information may be concealed in the multiple edges generated by an interaction repeated in different contexts in raw data. The CACHET method offered the best results when the evaluation of the predicted subcomplexes was carried out using both the hypergeometric and geometric scores. TRIBAL offered the best performance when using a strict meet-min score.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24584908 PMCID: PMC3939454 DOI: 10.1038/srep04262
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1TRIBAL algorithm.
Precision and recall analysis for different complex prediction strategies, using the hypergeometric index as match criterion (p-value < 0.05)
| Methods | #Predicted matches | #All_Predicted_comp | Precision | #Reference_matches | #All_Reference_comp | Recall | F-measure |
|---|---|---|---|---|---|---|---|
| Raw-non-repeated | 1280 | 1849 | 0.69 | 196 | 214 | 0.92 | 0.79 |
| Raw-repeated | 260 | 317 | 0.82 | 96 | 214 | 0.45 | 0.58 |
| Dice-H | 798 | 2293 | 0.35 | 201 | 214 | 0.94 | 0.51 |
| Hart-H | 200 | 544 | 0.37 | 198 | 214 | 0.92 | 0.53 |
| PE-H | 349 | 1353 | 0.26 | 202 | 214 | 0.94 | 0.40 |
| SA-H | 181 | 608 | 0.30 | 187 | 214 | 0.87 | 0.44 |
| Dice-lcomm | 559 | 770 | 0.73 | 165 | 214 | 0.77 | 0.75 |
| PE-lcomm | 489 | 553 | 0.88 | 126 | 214 | 0.59 | 0.71 |
| SA-lcomm | 468 | 694 | 0.67 | 154 | 214 | 0.72 | 0.70 |
| Dice-OCG | 323 | 474 | 0.68 | 187 | 214 | 0.87 | 0.76 |
| Hart-OCG | 127 | 194 | 0.65 | 65 | 214 | 0.30 | 0.41 |
| PE-OCG | 404 | 467 | 0.86 | 182 | 214 | 0.85 | 0.86 |
| SA-OCG | 201 | 249 | 0.81 | 173 | 214 | 0.81 | 0.81 |
Purification Enrichment seems to offer the best precision, as the best results are PE-lcomm (88%) followed by PE-ocomm (86%). Regarding recall, hierarchical clustering methods seem to offer the best results, as the best values are Dice-H and PE-H (94%). OCG outperforms linkcomm in terms of recall. The PE-OCG combination offers the best F-measure results.
Precision and recall analysis for different complex prediction strategies, using the geometric index as match criterion (index > 0.2)
| Methods | #Predicted matches | #All_Predicted_comp | Precision | #Reference_matches | #All_Reference_comp | Recall | F-measure |
|---|---|---|---|---|---|---|---|
| Raw-non-repeated | 323 | 1849 | 0.17 | 118 | 214 | 0.55 | 0.26 |
| Raw-repeated | 47 | 317 | 0.15 | 23 | 214 | 0.11 | 0.12 |
| Dice-H | 264 | 2293 | 0.11 | 149 | 214 | 0.70 | 0.20 |
| Hart-H | 80 | 544 | 0.15 | 99 | 214 | 0.46 | 0.22 |
| PE-H | 153 | 1353 | 0.11 | 148 | 214 | 0.69 | 0.19 |
| SA-H | 87 | 608 | 0.14 | 102 | 214 | 0.48 | 0.22 |
| Dice-lcomm | 227 | 770 | 0.29 | 89 | 214 | 0.42 | 0.34 |
| PE-lcomm | 164 | 553 | 0.30 | 73 | 214 | 0.34 | 0.32 |
| SA-lcomm | 185 | 694 | 0.27 | 84 | 214 | 0.39 | 0.32 |
| Dice-OCG | 101 | 474 | 0.21 | 67 | 214 | 0.31 | 0.25 |
| Hart-OCG | 22 | 194 | 0.11 | 17 | 214 | 0.08 | 0.09 |
| PE-OCG | 73 | 467 | 0.16 | 46 | 214 | 0.21 | 0.18 |
| SA-OCG | 63 | 249 | 0.25 | 41 | 214 | 0.19 | 0.22 |
Results with the more strict geometric criterion show that Link communities has a better performance than the alternatives. Thus, the best F-measure belongs to Dice + lcomm, while the second and third best belong to PE + lcomm and SA + lcomm.
Precision and recall analysis for different subcomplex prediction strategies, using the hypergeometric index as match criterion (p-value < 0.05)
| Methods | #Predicted matches | #All_Predicted_comp | Precision | #Reference_matches | #All_Reference_comp | Recall | F-measure |
|---|---|---|---|---|---|---|---|
| Raw data | 139 | 263 | 0.53 | 108 | 214 | 0.50 | 0.52 |
| Dice-lcomm | 55 | 102 | 0.54 | 64 | 214 | 0.30 | 0.38 |
| PE-lcomm | 24 | 35 | 0.69 | 30 | 214 | 0.14 | 0.23 |
| SA-lcomm | 37 | 67 | 0.55 | 43 | 214 | 0.20 | 0.29 |
| Dice-OCG | 20 | 29 | 0.69 | 11 | 214 | 0.05 | 0.10 |
| Hart-OCG | 4 | 7 | 0.57 | 6 | 214 | 0.03 | 0.05 |
| PE-OCG | 34 | 34 | 1.00 | 19 | 214 | 0.09 | 0.16 |
| CACHET | 231 | 309 | 0.75 | 130 | 214 | 0.61 | 0.67 |
| TRIBAL | 18 | 18 | 1.00 | 14 | 214 | 0.06 | 0.12 |
For subcomplexes and the hypergeometric criterion, CACHET is visibly the best performing method (higher F-measure). Both TRIBAL and PE-OCG display perfect results in terms of precision but a very poor recall. The good performance of CACHET is mainly due to its comparatively higher recall.
Precision and recall analysis for different subcomplex prediction strategies, using the geometric index as match criterion (score > 0.2)
| Methods | #Predicted matches | #All_Predicted_comp | Precision | #Reference_matches | #All_Reference_comp | Recall | F-measure |
|---|---|---|---|---|---|---|---|
| Raw data | 78 | 263 | 0.30 | 65 | 214 | 0.30 | 0.52 |
| Dice-lcomm | 29 | 102 | 0.28 | 32 | 214 | 0.20 | 0.38 |
| PE-lcomm | 14 | 35 | 0.40 | 17 | 214 | 0.13 | 0.23 |
| SA-lcomm | 21 | 67 | 0.31 | 21 | 214 | 0.15 | 0.29 |
| Dice-OCG | 1 | 29 | 0.03 | 1 | 214 | 0.01 | 0.10 |
| Hart-OCG | 1 | 7 | 0.14 | 1 | 214 | 0.01 | 0.05 |
| PE-OCG | 3 | 34 | 0.09 | 4 | 214 | 0.03 | 0.16 |
| CACHET | 106 | 309 | 0.34 | 74 | 214 | 0.34 | 0.67 |
| TRIBAL | 14 | 18 | 0.78 | 8 | 214 | 0.07 | 0.12 |
For subcomplexes and the geometric criterion, CACHET is visibly the best performing method (higher F-measure). TRIBAL displays the best result in terms of precision but a very poor recall. The good performance of CACHET is mainly due to its comparatively higher recall.
Figure 2Number and percentage of validated predicted subcomplexes using TRIBAL and six other methods.
TRIBAL outperforms CACHET and all combinations of scoring strategies and overlapping clustering methods, for a meet-min equal to 1.0, that is, in terms of perfect containment of a subcomplex by a reference complex. This aplies to both (a) the number of validated subcomplexes and (b) the precision or percentage of validated subcomplexes.
Figure 3An example of similarity and containment metrics.
The Jaccard and Geometric indexes are able to measure the similarity between two sets. The higher Jaccard and Geometric indexes indicate that the two sets in (a) are more “similar” to each other than the two sets in (b). In opposition, the Meet-min is a better measure of containment. The higher meet-min index shows that the two sets in (b) are a perfect set and subset, while the two sets in (a) are only overlapping. The scores show that case (a) is an example of a good similarity with a not so good containment, while (b) is an example of a good containment with a poor similarity.