| Literature DB >> 24564758 |
Mohammed El-Kebir, Tobias Marschall, Inken Wohlers, Murray Patterson, Jaap Heringa, Alexander Schönhuth, Gunnar W Klau.
Abstract
BACKGROUND: We study the problem of mapping proteins between two protein families in the presence of paralogs. This problem occurs as a difficult subproblem in coevolution-based computational approaches for protein-protein interaction prediction.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24564758 PMCID: PMC3852051 DOI: 10.1186/1471-2105-14-S15-S18
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Two alignments of protein families . Black and red nodes and edges compose the matching graph G. A matching θ is shown in red. A unit of coevolution ((a, b), (a', b')) within θ is highlighted in bold. For this toy example, we have ℓ(a, a') = 12 (matches + mismatches), ∆(a, a') = 11 (mismatches), ℓ(b, b') = 19 and ∆(b, b') = 15 and a resulting probability . Note the lower score of the unit ((a', b'), (a", b")), which is .
The average recall and precision values in percent as well as the runtime in hours of TAG-TSEMA [8], TreeTop [10], MMM [9] and our method CUPID are shown.
| Recall | Precision | Runtime | #Instances | |
|---|---|---|---|---|
| TAG-TSEMA [ | 45 % | 730 h | 488 | |
| TreeTop [ | 38 % | 48 % | 488 | |
| CUPID | 30 h | 488 | ||
| MMM, | 6 % | 35 % | 55 h | 488 |
| MMM, | 15 % [61 %] | 46 % [55 %] | 121 h | 394 |
| MMM, | 26 % [70 %] | 57 % [64 %] | 250 h | 270 |
| MMM, | 35 % [71 %] | 53 % [65 %] | 323 h | 214 |
| MMM, | 37 % [70 %] | 44 % [65 %] | 363 h | 149 |
CUPID was terminated when either optimality was reached or a time limit of 5 minutes was hit; in the latter case, the best solution found until that time was used. TAG-TSEMA and TreeTop values are taken from [10]. MMM runs were subject to a time limit of 1 hour; the number of instances solved within this time limit are given in the last column. Precision and recall values are only determined for the set of solved instances. For the same set of solved instances the CUPID quality measure is given in square brackets.
Effect of time limit on solution quality of CUPID.
| Time limit | 10 sec | 30 sec | 1 min | 5 min | 10 min | 20 min |
|---|---|---|---|---|---|---|
| Total runtime | 1.3 h | 3.8 h | 7.3 h | 30.2 h | 51.6 h | 81.0 h |
| Precision | 46.8 % | 47.8 % | 48.2 % | 49.6 % | 49.8 % | 50.3 % |
| Recall | 52.6 % | 53.7 % | 54.4 % | 55.9 % | 56.2 % | 56.7 % |
| Median relative gap size | 10.4 % | 5.4 % | 3.1 % | 2.1 % | 1.7 % | 1.3 % |
| Instances solved to optimality | 6.1 % | 9.4 % | 11.9 % | 16.0 % | 16.8 % | 17.0 % |
Figure 2Distribution of the relative gap in percent for the 488 instances.
Figure 3The plots show the quality of the scoring function as measured by the average log likelihood of a unit of coevolution in our solutions versus the average log likelihood of a unit of coevolution in the reference matchings. Points are colored according to (a) recall and (b) relative gap size.