| Literature DB >> 30823390 |
Kazi Lutful Kabir1, Liban Hassan2, Zahra Rajabi3, Nasrin Akhter4, Amarda Shehu5,6,7.
Abstract
Significant efforts in wet and dry laboratories are devoted to resolving molecular structures. In particular, computational methods can now compute thousands of tertiary structures that populate the structure space of a protein molecule of interest. These advances are now allowing us to turn our attention to analysis methodologies that are able to organize the computed structures in order to highlight functionally relevant structural states. In this paper, we propose a methodology that leverages community detection methods, designed originally to detect communities in social networks, to organize computationally probed protein structure spaces. We report a principled comparison of such methods along several metrics on proteins of diverse folds and lengths. We present a rigorous evaluation in the context of decoy selection in template-free protein structure prediction. The results make the case that network-based community detection methods warrant further investigation to advance analysis of protein structure spaces for automated selection of functionally relevant structures.Entities:
Keywords: community detection; decoy selection; nearest-neighbor graph; protein structure space; template-free protein structure prediction
Mesh:
Substances:
Year: 2019 PMID: 30823390 PMCID: PMC6429114 DOI: 10.3390/molecules24050854
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Column 2 shows the PDB ID of a known native structure for each test case. Columns 3 and 4 show the fold (* indicates native structures with a predominant fold and a short helix) and the length (number of amino acids), respectively. Column 5 shows the size of the decoy set generated via Rosetta, and column 6 shows the lowest lRMSD from the known native structure over the decoy ensemble.
| PDB ID | Fold | Length (# aas) |
| min_dist (Å) | ||
|---|---|---|---|---|---|---|
| Easy |
| 1dtdb |
| 61 |
|
|
|
| 1tig |
| 88 |
|
| |
|
| 1dtja |
| 74 |
|
| |
| Medium |
| 1hz6a |
| 64 |
|
|
|
| 1c8ca |
| 64 |
|
| |
|
| 1bq9 |
| 53 |
|
| |
|
| 1sap |
| 66 |
|
| |
| Hard |
| 2ezk |
| 93 |
|
|
|
| 1aoy |
| 78 |
|
| |
|
| 1isua |
| 62 |
|
|
Figure 1Comparison of community detection methods (encoded by different colors) on directed nngraphs embedding each of the 10 decoy datasets along (a) modularity, (b) flake odf, (c) conductance, and (d) separability.
The s, n, p of the sets of communities selected by different selection strategies over communities identified by the Louvain algorithm on decoy datasets embedded as directed ngraphs. We refer to this setting as Louvain. Recall that s stands for size (number of decoys), and n and p are the two performance metrics described above (and in more detail in Section 4).
| Louvain | ||||
|---|---|---|---|---|
| Sel-S | Sel-S+E | Sel-PR | Sel-PR+PC | |
|
|
|
|
| |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
The s, n, p of the communities selected by different selection strategies over communities identified by the Louvain algorithm on decoy datasets embedded as undirected ngraphs.
| Louvain | ||||
|---|---|---|---|---|
| Sel-S | Sel-S+E | Sel-PR | Sel-PR+PC | |
| s, n, p (%) | s, n, p (%) | s, n, p (%) | s, n, p (%) | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
The s, n, p of the communities selected by different strategies over communities identified by the Greedy Modularity Maximization (GMM) algorithm on decoy datasets embedded as undirected graphs.
| GMM | ||||
|---|---|---|---|---|
| Sel-S | Sel-S+E | Sel-PR | Sel-PR+PC | |
| s, n, p (%) | s, n, p (%) | s, n, p (%) | s, n, p (%) | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
|
| C | C | C | C |
| C | C | C | C | |
| C | C | C | C | |
Figure 2Comparison of the various selection strategies on the purity of the top community selected over communities detected with the Louvain method on directed nngraph embeddings of decoy data in (a), the Louvain method on undirected nngraph embeddings of decoy data in (b), and the GMM method on undirected nngraph embeddings of decoy data in (c).
Figure 3Comparison of community detection methods based on the quality of the top community selected by Sel-S and Sel-S+E. In the legend, Lo-D refers to the Louvain method applied to directed nngraphs that embed the decoy datasets. The -S and -S+E refer to the Sel-S and Sel-S+E selection strategies.
Figure 4Comparison of community detection methods based on the quality of the top three communities selected by Sel-S and Sel-S+E. In the legend, Lo-D refers to the Louvain method applied to directed nngraphs that embed the decoy datasets. The -S and -S+E refer to the Sel-S and Sel-S+E selection strategies.
Rank (by Size(S), Size and Energy(S+E), Pareto rank(PR), Pareto rank and Pareto count (PR+PC)) of the community with the highest purity among those identified by Louvain (Lo), Louvain (Lo) and GMM.
| Rank by (Lo) | Rank by (Lo | Rank by (GMM) | |
|---|---|---|---|
| S, S+E, PR, PR+PC | S, S+E, PR, PR+PC | S, S+E, PR, PR+PC | |
|
| 3, 4, | ||
|
| 691, | 229, | 283, |
|
| 71, | ||
|
| 647, | 280, | 337, |
|
| 818, | 42, | 15, |
|
| 1230, | 1223, | 1271, |
|
| 3301, | 3298, | 3369, |
|
| 6, | 3, | |
|
| 3, | ||
|
| 135, | 136, | 194, |
Comparison of Size + Energy (S+E) to other selection strategies on best rank via 1-sided Fisher’s and Barnard’s tests. Top panel evaluates the null hypothesis that Sel-S+E does not provide the best rank (based on reported p-values), considering each of the other three selection strategies in turn. Similarly, the lower panel evaluates the null hypothesis that Sel-S+E does not provide a better rank with respect to another particular selection strategy, considering each in turn.
|
| |||
|
|
|
|
|
| Fisher’s | 6.621 × 10 | 1.626 × 10 | 9.388 × 10 |
| Barnard’s | 2.314 × 10 | 6.33 × 10 | 2.128 × 10 |
|
| |||
|
|
|
|
|
| Fisher’s | 0.0001154 | 7.744 × 10 | 4.194 × 10 |
| Barnard’s | 6.738 × 10 | 3.811 × 10 | 8.075 × 10 |
Comparison of Size + Energy to other selection strategies on best rank via 2-sided Fisher’s and Barnard’s tests. The tests evaluate the null hypothesis (based on reported p-values) that Sel-S+E (or, Size+Energy) provides similar ranking in comparison to other selection strategies.
|
| |||
|
|
|
|
|
| Fisher’s | 1.324 × 10 | 3.252 × 10 | 1.878 × 10 |
| Barnard’s | 4.629 × 10 | 1.266 × 10 | 4.255 × 10 |
|
| |||
|
|
|
|
|
| Fisher’s | 0.000231 | 1.549 × 10 | 8.388 × 10 |
| Barnard’s | 0.0001348 | 7.621 × 10 | 1.615 × 10 |
Figure 5Purity for the range of k from 10 to 50.
Entropy values for the Louvain and GMM methods.
| Entropy | Entropy | Entropy | |
|---|---|---|---|
|
| 2.054332 |
| 2.007146 |
|
| 5.571811 |
| 5.660448 |
|
| 3.679291 |
| 3.670991 |
|
| 4.2847 |
| 4.400298 |
|
| 3.323009 |
| 3.480973 |
|
| 4.920814 |
| 4.92005 |
|
| 0.961949 |
| 0.914548 |
|
| 1.222933 |
| 1.065169 |
|
| 0.905185 |
| 0.873628 |
|
| 1.725124 |
| 1.680263 |