| Literature DB >> 27978812 |
Reda Rawi1, Raghvendra Mall2, Khalid Kunji2, Mohammed El Anbari3, Michael Aupetit2, Ehsan Ullah2, Halima Bensmail2.
Abstract
BACKGROUND: The post-genomic era with its wealth of sequences gave rise to a broad range of protein residue-residue contact detecting methods. Although various coevolution methods such as PSICOV, DCA and plmDCA provide correct contact predictions, they do not completely overlap. Hence, new approaches and improvements of existing methods are needed to motivate further development and progress in the field. We present a new contact detecting method, COUSCOus, by combining the best shrinkage approach, the empirical Bayes covariance estimator and GLasso.Entities:
Keywords: GLasso; Residue-residue contact prediction; Shrinkage
Mesh:
Substances:
Year: 2016 PMID: 27978812 PMCID: PMC5159955 DOI: 10.1186/s12859-016-1400-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
30]. The GLasso approach operates very well in this context, but the computational time required to reach convergence can be large in some cases such as for protein families with low number of sequences. As an alternative to the natural estimator S, several shrinkage estimators have been proposed in the literature [31, 32]. They take a weighted average of the sample covariance matrix S, with a suitable chosen target diagonal matrix. Jones et al. applied a smoothed covariance estimator that shrinks the matrix towards the shrinkage target [16]. In this work, we applied the empirical Bayes estimator proposed by Haff [24]:
Fig. 1The top L/2 (in this case 62) long, medium and short contact predictions for the Immunoglobulin V-set domain family (PFAM ID: PF07686) obtained using PSICOV (a and b) and COUSCOus (c and d) and mapped to the myelin oligodendrocyte glycoprotein 3D crystal structure (PDB ID: 1PKO) (right panel). Correctly predicted contacts are shown in green and incorrect ones in red. Upper triangles of the contact maps display all the native C −C contacts (left panel). The lower triangles show contacts predicted by PSICOV (a and b) and COUSCOus (c and d)
Mean accuracies of COUSCOus vs. PSICOV on PSICOV benchmark dataset
| Accuracy top- | Accuracy top- | Accuracy top- | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Long | Medium | Short | Long | Medium | Short | Long | Medium | Short | |
| PSICOV | 0.6724 | 0.5709 | 0.4876 | 0.5816 | 0.4401 | 0.3716 | 0.3016 | 0.1787 | 0.1589 |
| COUSCOus |
|
|
|
|
|
|
|
|
|
Higher mean accuracies in bold
Mean X values of COUSCOus and PSICOV on PSICOV benchmark dataset
|
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|---|
| Long | Medium | Short | Long | Medium | Short | Long | Medium | Short | |
| PSICOV | 0.2694 | 0.2751 | 0.2239 | 0.2518 | 0.2564 | 0.2068 | 0.1930 | 0.1864 | 0.1422 |
| COUSCOus |
|
|
|
|
|
|
|
|
|
Higher mean values in bold
Mean AUC values
| Long | Medium | Short | |
|---|---|---|---|
| PSICOV | 0.2150 | 0.2630 | 0.2715 |
| COUSCOus |
|
|
|
Higher mean values in bold
Fig. 2MCC distributions for PSICOV benchmark proteins in case of long, medium and short range contacts predicted by PSICOV and COUSCOus. The stars represent statistical significance where ⋆ is used to represent P-value < 0.05 and ⋆⋆⋆ is used to represent P-values < 0.001
Fig. 3Dependence of the performance of PSICOV and COUSCOus on the effective number of sequences (N ) in the MSAs. The performance is evaluated using accuracies for the top L/10long, medium and short contacts. The solid line represents the averaged accuracies of the test set binned into five different categories of Neff (ln(Neff): [4, 5), [5, 6), [6, 7), [7, 8), [8, 10)). COUSCOus outperforms PSICOV independent of the l n(N ) in the test set
Mean accuracies of COUSCOus vs. PSICOV on the CASP11 benchmark dataset
| Accuracy top- | Accuracy top- | Accuracy top- | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Long | Medium | Short | Long | Medium | Short | Long | Medium | Short | |
| PSICOV | 0.6687 | 0.5809 | 0.5278 | 0.5872 | 0.4809 | 0.4229 | 0.3383 | 0.2373 | 0.1820 |
| COUSCOus |
|
|
|
|
|
|
|
|
|
Higher mean accuracies in bold
Mean accuracies of RF meta-classifier including COUSCOus or PSICOV as a feature on the PSICOV benchmark dataset
| Accuracy top- | Accuracy top- | Accuracy top- | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Long | Medium | Short | Long | Medium | Short | Long | Medium | Short | |
| RF-PSICOV | 0.7846 | 0.6946 | 0.6547 | 0.7047 | 0.5500 | 0.5140 |
| 0.2439 | 0.2212 |
| RF-COUSCOus |
|
|
|
|
|
| 0.3984 |
|
|
Higher mean accuracies in bold