| Literature DB >> 29244007 |
Nurit Haspel1, Dong Luo2, Eduardo González3.
Abstract
BACKGROUND: Understanding protein structure and dynamics is essential for understanding their function. This is a challenging task due to the high complexity of the conformational landscapes of proteins and their rugged energy levels. In particular, it is important to detect highly populated regions which could correspond to intermediate structures or local minima.Entities:
Keywords: Algebraic topology; Clustering; Dimensionality reduction; Protein conformational sampling; Protein structure
Mesh:
Substances:
Year: 2017 PMID: 29244007 PMCID: PMC5731496 DOI: 10.1186/s12859-017-1918-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Isomap cluster analysis for Calmodulin, AdK and GroEL. The data is visualized in Fig. 3
| Cluster No. | Size | RMSD (1CTR) | RMSD (1CLL) |
| 1 | 10 | 14.71±0.2 | 1.95±0.2 |
| 2 | 5 | 13.89±0.1 | 2.69±0.1 |
| 3 | 22 | 13.43±0.5 | 3.82±0.7 |
| 4 | 19 | 10.23±1.0 | 6.88±0.9 |
| 5 | 5 | 7.97±0.3 | 8.43±0.3 |
| 6 | 47 | 3.86±1.9 | 11.99±1.7 |
| Cluster No. | Size | RMSD (1AKE) | RMSD (4AKE) |
| 1 | 15 | 6.49±0.2 | 1.99±0.2 |
| 2 | 19 | 6.04±0.2 | 2.85±0.3 |
| 3 | 6 | 4.96±0.3 | 3.73±0.1 |
| 4 | 8 | 4.65±0.5 | 3.84±0.2 |
| 5 | 10 | 3.56±0.2 | 4.42±0.2 |
| 6 | 33 | 2.91±0.6 | 5.91±0.6 |
| Cluster No. | Size | RMSD (1SX4) | RMSD (1SS8) |
| 1 | 11 | 11.56±0.2 | 2.50±0.6 |
| 2 | 13 | 11.22±0.2 | 3.95±0.5 |
| 3 | 20 | 10.40±0.5 | 5.25±0.5 |
| 4 | 8 | 8.34±0.5 | 6.58±0.4 |
| 5 | 29 | 5.46±1.2 | 8.85±0.7 |
| 6 | 17 | 2.39±0.9 | 11.35±0.6 |
The RMSD is measured by the cluster geometric center with respect to each one of the end points. The clusters numbers are sorted according to their RMSD (in Å) with respect to their original endpoints
Spherical-PCA cluster analysis for Calmodulin, AdK and GroEL
| Cluster No. | Size | RMSD (1CLL) | RMSD (1CTR) |
| 1 | 14 | 14.63±0.3 | 2.04±0.4 |
| 2 | 8 | 13.28±0.4 | 4.40±0.4 |
| 3 | 6 | 11.89±0.5 | 5.60±0.3 |
| 4 | 13 | 9.55±1.0 | 7.45±0.6 |
| 5 | 41 | 2.73±1.4 | 12.96±1.2 |
| Cluster No. | Size | RMSD (1AKE) | RMSD (4AKE) |
| 1 | 26 | 6.28±0.3 | 2.41±0.5 |
| 2 | 47 | 2.56±1.4 | 5.55±1.2 |
| 3 | 5 | 2.36±0.2 | 6.00±0.1 |
| Cluster No. | Size | RMSD (1SX4) | RMSD (1SS8) |
| 1 | 25 | 11.42±0.3 | 3.17±0.8 |
| 2 | 14 | 10.20±0.4 | 5.31±0.5 |
| 3 | 39 | 3.74±1.4 | 10.24±1.2 |
The RMSD is measured similar to Table 1
Fig. 3The hierarchical clustering structure for (a) Calmodulin (1CLL →1CTR) (b) AdK (1AKE →4AKE) (c) GroEL (1SS8 →1SX4). The plot shows the Isomap generated hierarchy (left) and the Spherical PCA hierarchy (right). The clusters are numbered by their RMSD from the end point
Proteins used in this study. The PDB ids of two known structures of each protein are listed
| Protein | Calmodulin I | AdK | GroEL |
|---|---|---|---|
| No. Amino acids | 144 | 214 | 524 |
| Structure 1 | 1CLL | 1AKE | 1SS8 |
| Structure 2 | 1CTR | 4AKE | 1SX4 |
| RMSD | 14.84 | 7.13 | 12.21 |
| No. Clusters (Isomap) | 6 | 6 | 6 |
| No. Clusters (PCA) | 5 | 3 | 3 |
Fig. 1An example of a barcode diagram. The point where three clusters merge into two, and two merge into one are marked by vertical bars
Fig. 2Representatives of the six cluster centers generated by Isomap for the first case of (a-f) Calmodulin (1CLL →1CTR). The centers are sorted according to their RMSD from 1CTR. (g-l) AdK. The clusters are sorted according to their RMSD from 1AKE. (m-r) GroEL. The clusters are sorted according to their RMSD from 1SX4
Fig. 4The projection of the clusters for (a-b) Calmodulin (1CLL →1CTR) (c-d) AdK (1AKE →4AKE) (e-f) GroEL (1SS8 →1SX4) The left plot shows the hierarchical clustering and the right plot shows the clusters generated by k-means. In each case the Isomap projection along the first three reaction coordinates is used. Every cluster is depicted in a different color
Comparison of clusters of AdK to known intermediates
| PDB | Isomap closest clust. (RMSD) | PCA closest clust. (RMSD) |
|---|---|---|
| 1E4Y | Clust 1 (1.7Å) | Clust 1 (2.1Å) |
| 1AK2 | Clust 2 (3.5Å) | Clust 1 (4.1Å) |
| 1DVR | Clust 2 (2.6Å) | Clust 1 (2.8Å) |
| 2RH5A | Clust 6 (2.2Å) | Clust 3 (2.5Å) |
| 2RH5B | Clust 6 (2.3Å) | Clust 3 (2.4Å) |
| 2RH5C | Clust 6 (3.0Å) | Clust 3 (3.1Å) |
For every known intermediate, the RMSD to the closest cluster is shown. The cluster numbers are as in Fig. 2