| Literature DB >> 22987736 |
Abstract
It has been known even since relatively few structures had been solved that longer protein chains often contain multiple domains, which may fold separately and play the role of reusable functional modules found in many contexts. In many structural biology tasks, in particular structure prediction, it is of great use to be able to identify domains within the structure and analyze these regions separately. However, when using sequence data alone this task has proven exceptionally difficult, with relatively little improvement over the naive method of choosing boundaries based on size distributions of observed domains. The recent significant improvement in contact prediction provides a new source of information for domain prediction. We test several methods for using this information including a kernel smoothing-based approach and methods based on building alpha-carbon models and compare performance with a length-based predictor, a homology search method and four published sequence-based predictors: DOMCUT, DomPRO, DLP-SVM, and SCOOBY-DOmain. We show that the kernel-smoothing method is significantly better than the other ab initio predictors when both single-domain and multidomain targets are considered and is not significantly different to the homology-based method. Considering only multidomain targets the kernel-smoothing method outperforms all of the published methods except DLP-SVM. The kernel smoothing method therefore represents a potentially useful improvement to ab initio domain prediction.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22987736 PMCID: PMC3563215 DOI: 10.1002/prot.24181
Source DB: PubMed Journal: Proteins ISSN: 0887-3585
Figure 1Smoothing contact profiles with kernel density estimation. The predicted contact profile for protein 3TF4 is shown before (black line) and after (blue line) kernel density estimation smoothing with a bandwidth of 40. Red lines show the positions of the real and predicted domain boundaries. A PyMol53 ribbon diagram of the structure colored blue–red N–C is shown above the line.
Figure 2Domain prediction accuracy using real contacts. The three methods using 3D data (Taylor, domain1.2, PDP) and the kernel-smoothing method (KDE) are plotted.
Figure 3Domain prediction accuracy. The 10 methods are compared based on the mean NDO scores for the single-domain and multidomain targets from the combined CASP and Bourne datasets. Error bars indicate standard errors.
Statistical Comparisons Between Methods
| KDE | WT | ISLM | PDP | CUT | PRO | DLP | SCO | HOM | DGS | |
|---|---|---|---|---|---|---|---|---|---|---|
| KDE | 8.49 | 6.00 | 15.0 | 8.49 | 10.5 | 19.8 | 17.0 | 0.00 | 0.00 | |
| WT | 3e–03 | 0.00 | 5.22 | 0.00 | 0.00 | 14.8 | 9.20 | 0.00 | 0.00 | |
| ISLM | 3e–06 | 1.00 | 9.05 | 0.00 | 0.00 | 13.8 | 11.1 | 0.00 | 0.00 | |
| PDP | 1e–14 | 2e–08 | 3e–05 | −6.55 | −4.51 | 0.00 | 0.00 | −9.08 | −5.12 | |
| CUT | 0.006 | 1.00 | 1.00 | 9e–05 | 0.00 | 11.3 | 8.50 | 0.00 | 0.00 | |
| PRO | 2e–04 | 1.00 | 1.00 | 2e–03 | 1.00 | 9.27 | 0.00 | 0.00 | 0.00 | |
| DLP | 3e–12 | 9e–06 | 3e–04 | 1.00 | 4e–04 | 0.04 | 0.00 | −13.8 | −9.94 | |
| SCO | 2e–11 | 7e–05 | 0.03 | 1.00 | 4e–03 | 0.20 | 1.00 | −11.2 | −7.17 | |
| HOM | 0.72 | 1.00 | 1.00 | 4e–07 | 1.00 | 1.00 | 1e–07 | 2e–07 | 0.00 | |
| DGS | 1.00 | 1.00 | 1.00 | 1e–07 | 1.00 | 1.00 | 0.01 | 2e–04 | 1.00 |
NDO scores for all 368 targets were compared using paired Wilcoxon signed-rank tests. Entries below the diagonal show Bonferroni-corrected P-values for the test (N = 45 tests). Entries above the diagonal show the mean differences between the two groups, row – column. Cells representing significantly different methods (5% threshold) are colored red if the mean difference is positive, blue if negative. Key to methods: KDE: kernel smoothing, WT: DOM-parsing of preliminary structures, ISLM: Domain1.2 parsing of preliminary structures, PDP: PDP parsing of preliminary structures, CUT: DOMCUT, PRO: DomPRO, SCO: SCOOBY-DOmain, HOM: homology method, DGS: naive length-based predictor.
Statistical Comparisons Between Methods
| KDE | WT | ISLM | PDP | CUT | PRO | DLP | SCO | HOM | DGS | |
|---|---|---|---|---|---|---|---|---|---|---|
| KDE | 0.00 | 0.00 | 22.1 | 13.7 | 16.0 | 0.00 | 12.4 | 0.00 | 19.7 | |
| WT | 1.00 | 0.00 | 18.5 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 14.8 | |
| ISLM | 1.00 | 1.00 | 16.8 | 0.00 | 10.7 | 0.00 | 0.00 | 0.00 | 14.7 | |
| PDP | 5e–12 | 7e–07 | 3e–06 | 0.00 | 0.00 | −16.9 | 0.00 | −25.1 | 0.00 | |
| CUT | 3e–03 | 1.00 | 0.540 | 0.07 | 0.00 | 0.00 | 0.00 | −16.7 | 0.00 | |
| PRO | 3e–06 | 0.09 | 8e–03 | 0.87 | 1.00 | −10.9 | 0.00 | −19.0 | 0.00 | |
| DLP | 1.00 | 1.00 | 1.00 | 3e–06 | 0.14 | 0.03 | 0.00 | 0.00 | 14.8 | |
| SCO | 0.03 | 1.00 | 1.00 | 0.06 | 1.00 | 1.00 | 0.12 | −15.9 | 0.00 | |
| HOM | 1.00 | 1.00 | 0.73 | 2e–11 | 2e–06 | 1e–04 | 0.43 | 2e–05 | 23.2 | |
| DGS | 3e–07 | 4e–03 | 3e–04 | 1.00 | 1.00 | 1.00 | 2e–04 | 1.00 | 2e–09 |
NDO scores for the 165 multidomain targets were compared using paired Wilcoxon signed-rank tests. Entries below the diagonal show Bonferroni-corrected P-values for the test (N = 45 tests). Entries above the diagonal show the mean differences between the two groups, row – column. Cells representing significantly different methods (5% threshold) are colored red if the mean difference is positive, blue if negative. Key to methods: KDE: kernel smoothing, WT: DOM-parsing of preliminary structures, ISLM: Domain1.2 parsing of preliminary structures, PDP: PDP parsing of preliminary structures, CUT: DOMCUT, PRO: DomPRO, SCO: SCOOBY-DOmain, HOM: homology method, DGS: naive length-based predictor.