| Literature DB >> 21387345 |
Michele Seeber1, Angelo Felline, Francesco Raimondi, Stefanie Muff, Ran Friedman, Francesco Rao, Amedeo Caflisch, Francesca Fanelli.
Abstract
Wordom is a versatile, user-friendly, and efficient program for manipulation and analysis of molecular structures and dynamics. The following new analysis modules have been added since the publication of the original Wordom paper in 2007: assignment of secondary structure, calculation of solvent accessible surfaces, elastic network model, motion cross correlations, protein structure network, shortest intra-molecular and inter-molecular communication paths, kinetic grouping analysis, and calculation of mincut-based free energy profiles. In addition, an interface with the Python scripting language has been built and the overall performance and user accessibility enhanced. The source code of Wordom (in the C programming language) as well as documentation for usage and further development are available as an open source package under the GNU General Purpose License from http://wordom.sf.net.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21387345 PMCID: PMC3151548 DOI: 10.1002/jcc.21688
Source DB: PubMed Journal: J Comput Chem ISSN: 0192-8651 Impact factor: 3.376
New Features in Wordom Since the Original Publication1
| Module | Label | Function | Reference |
|---|---|---|---|
| Secondary Structure Assignment | SSA | Assignment of secondary structure based on geometric criteria | |
| Molecular Surface | SURF | Calculation of solvent accessible, solvent excluding and van der Waals surfaces; surface correlation along a trajectory | |
| Elastic Network Model | ENM | Calculation of elastic network models on a protein structure | |
| Cross Correlation | CORR | Correlations of atomic displacements along a trajectory | |
| Protein Structure Network | PSN | Calculation of network of amino acid interactions | |
| PSN Path | PSN-path | Path calculation within protein structure network | |
| Clustering | CLUS | Clustering according to conformation similarity | |
| cut-based Free Energy Profile | cFEP | Computation of a one-dimensional free energy profile that preserves barriers between free energy basins | |
| Kinetic Grouping Analysis | KGA | Determination of free energy basins based on kinetic behavior |
Abbreviation/acronym used in the text.
Comparison Between the Secondary Structure Assignments Made by Wordom (SSA Module, DLIKE Option) and Those Made by the DSSP Programa
| DSSP | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| E | B | T | S | L | H | G | I | Total | |
| Wordom/SSA | |||||||||
| E | 2103 | 31 | 18 | 21 | 85 | 2 | 0 | 0 | 2260 |
| B | 11 | 58 | 6 | 6 | 38 | 1 | 7 | 0 | 127 |
| T | 16 | 1 | 638 | 51 | 32 | 6 | 3 | 0 | 747 |
| S | 12 | 0 | 5 | 656 | 13 | 0 | 2 | 0 | 688 |
| L | 44 | 3 | 9 | 6 | 1351 | 3 | 0 | 0 | 1416 |
| H | 0 | 1 | 11 | 2 | 7 | 951 | 17 | 0 | 989 |
| G | 1 | 0 | 39 | 0 | 3 | 1 | 163 | 0 | 207 |
| I | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 2 |
| Total | 2187 | 94 | 728 | 742 | 1529 | 964 | 192 | 0 | 6436 |
The test set consists of 29 proteins (2CCY, 1ECA, 2IFO, 1TPM, 1HRE, 1PHT, 2POR, 3BCL, 2HLA, 1CDQ, 1AFC, 1MSA, 1VMO, 1HXN, 1NSC, 2BBK, 3AAH, 1TSP, 2PEC, 1PPK, 1STD, 4TIM, 1BRS, 1NTR, 1PYA, 2DNJ, 1PLQ, 1BNH, and 1PYP) selected as representatives of common folds.50 Results have been pooled together for each program and compared. Each element ij of the matrix reports the number of residues assigned by Wordom and by DSSP to be in conformation i and j, respectively.
Speed (in Seconds) Comparison of Secondary Structure Computations
| #Residues | #Frames | DSSPDCD | DSSPPDBs | WordomSSA |
|---|---|---|---|---|
| 316 | 10,000 | 1460 | 920 | 640 |
| 16 | 10,000 | 238 | 155 | 0.35 |
A script extracted each single frame by mean of Wordom and called DSSP on the extracted frame.
A script called DSSP on the already-extracted frames.
Calculation through the Wordom SSA module.
Computing Time for Different Modules
| Module | # Selected atoms | Approximate CPU time |
|---|---|---|
| Surface (WordomARVO) | 115 | 2980 |
| Surface (ARVO) | 115 | 3690 |
| Surface (WordomGEPOL–ASURF) | 115 | 2130 |
| Surface (GEPOLASURF) | 115 | 2660 |
| Surface (WordomGEPOL–ESURF) | 115 | 5900 |
| Surface (GEPOLESURF) | 115 | 7290 |
| Surface (WordomGEPOL–WSURF) | 115 | 1890 |
| Surface (GEPOLWSURF) | 115 | 1970 |
| Correlation (DCC) | 360 | 4 |
| Correlation (LMI) | 360 | 63 |
| PSN | 2593 | 391 |
| PSN-path | – | 15 per pair |
| Clustering (distances only) | 316 | 1461 |
| Clustering (QT-like) | 316 | 100 |
| Clustering (hiero) | 316 | >50,000 |
| Clustering (leader) | 316 | 10 |
| Clustering (leader) | 316 | 10 |
| Clustering (leader) | 316 | 45 |
The considered system is a 10,000 frame trajectory of the GTP-bound Gα subunit (PDB: 1CIP; 2593 atoms; 316 residues and 1 GTP molecule (44 atoms)).
CPU time (seconds) on an AMD Athlon 64 3000+, 2 GHz, 2 GB RAM.
Solvent accessible surface area computed by the Wordom implementation of the ARVO algorithm.
selection consisted in GTP and first 9 residues (selection /*/@(1–10)/*)
Solvent accessible surface area computed by the ARVO program.
Solvent accessible surface area computed by the Wordom implementation of the GEPOL algorithm (highest accuracy).
Solvent accessible surface area computed by the GEPOL program (highest accuracy).
Solvent excluded surface area computed by the Wordom implementation of the GEPOL algorithm; accuracy settings: rmin 0.5, ofac 0.8, ndiv 5.
Solvent excluded surface area computed by the GEPOL program; accuracy setting: rmin 0.5, ofac 0.8, ndiv 5.
van der Waals surface area computed by the Wordom implementation of the GEPOL algorithm; highest accuracy.
van der Waals surface area computed by the GEPOL program; highest accuracy.
Residue-residue correlation by means of the dynamic cross correlation method; masses were not taken into account.
Selection consisted in all Cα atoms and GTP
Residue-residue correlation by means of the linear mutual information method; masses were not taken into account.
PSN analysis probing 11 different Imin values (from 0.0 to 5.0 with a 0.5 step).
Only the RMSD-based distance matrix was computed at this stage and written to file.
All Cα atoms were selected.
Clustering by the QT-like algorithm, using a precalculated distance matrix (RMSD cutoff 1.0 Å).
Clustering by the hierarchical algorithm, using a pre-calculated distance matrix (RMSD cutoff 1.0 Å).
Clustering by the leader-like algorithm (RMSD cutoff 1.0 Å); distance matrix is not necessary.
Clustering by the leader-like algorithm (RMSD cutoff 1.0 Å) and turning on the non-markovian option. In this case, the bottleneck is disk speed (CPU usage 18%).
Clustering by the leader-like algorithm (DRMS cutoff 1.0 Å).
Figure 1Application of the SPM (within the ENM module) to the GTP-bound Gα-subunit (PDB: 1CIP). Each Cα-atom is colored according to the response to the perturbation of the 1st normal mode. Coloring from red to blue indicates maximum (100%) and minimum (0%) perturbations, respectively. Arrows point in the direction of the 1st normal mode. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]
Figure 2Cross-correlation matrix of the atomic fluctuations of the Gα-subunit Cα-atoms and the geometrical center of GTP. The regions below and above the matrix main diagonal concern the DCC and LMI correlation methods, respectively. DCC correlation values go from −1.0 (fully anti-correlated motions) to 1.0 (fully correlated motions), whereas LMI correlation values go from 0.0 (fully uncorrelated motions) to 1.0 (full correlated motions).
Figure 3Hub correlation analysis on a 10 ns MD trajectory of GTP-bound Gα-subunit. Each dot corresponds to two amino acids that show a correlated tendency to behave as hubs (i.e., that are syncronized in their hub behavior in more than 50% of the trajectory frames). An Imin = 3.0% was employed for the PSN analysis.
Figure 4Results of PSN and PATH analyses on a 10 ns MD trajectory of GTP-bound Gα-subunit. (A) Cα-atoms of the 27 stable hub residues, at Imin = 3.0%, are represented as cyan spheres. The GTP molecule, which is a stable hub as well, is shown as a red sphere centered on the C4′ ribose atom. Nodes are considered as stable hubs if they are involved in at least four connectivities at a given Imin (3.0% in this case) in more than 50% of the trajectory frames. (B) The 90 nodes that contribute to the largest cluster at Imin = 3.0% are shown as green spheres centered on the Cα-atoms. The GTP molecule, which participates as well in such cluster, is shown as a red sphere centered on the C4′ ribose atom. (C) Representation of the most frequent shortest communication path (i.e., frequency = 46%). The amino acids that participate in the path are shown as magenta spheres centered on the Cα-atoms, whereas GTP, which participates in the path as well, is shown as a red sphere centered on the C4′ ribose atom. The two apical residues in this path are A152 and I222, located, respectively, in the α-helical and Ras-like domains.
Figure 5Complex network analysis of free energy landscapes. (A) Conformation space network. Nodes and links are protein conformations (i.e., microstates, see main text) and direct transitions sampled during the MD simulation, respectively. Node size is proportional to the population of the microstate, whereas link width is proportional to the transitions frequencies, i.e., larger link widths indicate more frequent transitions. Densely connected regions of the network represent rapidly interconverting microstates that belong to the same free energy basin (highlighted by a shaded circle). (B) Simplified example of a two state system. The free energy barrier between the two macro-states is represented by a region of minimum flow in the network (identified by a minimum-cut). (C) Cut-based free energy profile (cFEP). The free energy is projected onto the partition function-based reaction coordinate Z, a projection that preserves the barriers as it takes into account all possible pathways to a reference microstate.46 The solid vertical line indicates the correspondence between the minimum-cut and the highest free energy barrier.
Figure 6Network description of MD and evaluation of kinetic distance. The high-dimensional free-energy surface is coarse-grained into nodes of a network. The figure shows a schematic illustration of the transition network of a β-sheet peptide where nodes represent microstates and links represent direct transitions sampled along the MD simulation(s). The size of the nodes and links is proportional to the statistical weight of the microstates and number of transitions, respectively. The cFEP method implemented in Wordom requires a reference microstate. In this simplified illustration, the reference microstate is the large red sphere in the center of the folded state (which is the β-sheet structure, i.e., the basin on the left). The kinetic distance of each node from the reference microstate can be evaluated in Wordom by the folding probability (pfold) or the mean first passage time (mfpt). The kinetic distance is rendered by the continuous coloring from red (folded, i.e., pfold = 1 or mfpt = 0) to blue (unfolded, i.e., pfold = 0 or mfpt = infinity).