Literature DB >> 20198187

CARON--average RMSD of NMR structure ensembles.

Kresimir Sikic1, Oliviero Carugo.   

Abstract

The NMR protein structures are often deposited in the Protein Data Bank as ensembles of models that agree with the experimental restraints. Information about stereochemical variability and the molecular flexibility can be obtained by systematic comparison of all models. Here we describe CARON, a software that allows the computation of the root-mean-square-distances between equivalent atoms and residues in all models and introduces these values into the occupancy and the B-factor fields of PDB-formatted files. This tool allows the user to both get a quantitative estimation of the conformational homogeneity of the models and to exploit this information in common computer graphics programs.

Entities:  

Keywords:  Bioinformatics software; CARON; NMR spectroscopy; Protein Data Bank; Root-Mean-Square Deviation; conformational homogeneity; superposition

Year:  2009        PMID: 20198187      PMCID: PMC2828896          DOI: 10.6026/97320630004132

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

Despite a number of protein three-dimensional structures is determined with NMR spectroscopic methods, the format of the Protein Data Bank (PDB) files [1, 2] was designed to account for crystallographic analyses. The crystallographic occupancy and the atomic displacement parameters (adp; often referred to as B-factors) are not computed in NMR structure determinations despite they are declared in PDB formatted files. Here we present a computer program that replaces the occupancy and adp fields with quantities that describe the conformational homogeneity of the models deposited in the PDB file. NMR protein structures are deposited in the PDB as ensembles of models. If there are N models, it is thus possible to superpose the N(N- 1)/2 unique pairs of models and to compute distances Di between the same atoms in all the superposed model pairs. These distances are then used for calculating . These two variables provide a quantitative measure of the spatial dispersion of each atom and residue. By inserting the atom RMSD and the average residue RMSD values in the occupancy and B-factor fields it is then possible to provide the quantitative information in a format compatible with molecular graphics computer programs and, for example, color the molecule. These computations can be performed with CARON, a program that uses the Ying-Hunk et al. [3] implementation of Kabsch algorithm [4] for superposing pairs of molecules. the RMSD of each atom and the average RMSD between the equivalent atoms of the same residue in all the superposed models

Implementation

CARON is a stand alone program written in C language and can be compiled using any standard ANSI C compiler under Linux or Windows. The input files are PDB files containing coordinates of atoms. The output data are stored in PDB formatted files where the occupancy and the B-factor fields of the ATOM lines report the average RMSD of the atom and of the residue, respectively, computed on the basis of the superposition of all the unique pairs of molecular models. Although the PDB files of NMR structures contains hydrogen atoms and often also non-protein atoms, these are disregarded by CARON and are absent from the PDB formatted output files. The user may decide to superpose all the non-hydrogen protein atoms or only the Cα atoms. At the beginning of the PDB-formatted output file, pertinent information is provided about the overall RMSD values and their distribution. An additional parse feature enables user to dissect a PDB file containing N models into N PDB files, each containing one of the models.

Results

An example of the results obtained with CARON is shown in Figure 1. The three-dimensional structures of the human epidermal growth factor-like domain of heregulin-alpha was determined with NMR methods and deposited in the PDB file 1HRF as an ensemble of 10 models [5]. It is apparent that the termini are conformationally illdefined as well as the loop on the top-right corner (Figure 1a). This is well shown by coloring the trace of only one model as a function of the RMSD values of the residues (Figure 1c). An alternative method, implemented in a script distributed with PyMol, allows one to depict the trace of a single model according to its conformational dispersion - segments with a very variable stereochemistry are large and red - produces on the contrary different results, with a large conformational dispersion in the middle of the molecule, which does not seem to be a genuine structural feature (Figure 1b). Eventually, it is necessary to remember that the conformational dispersion observed on the basis of the PDB files does not depend necessarily on the intra-molecular flexibility. The absence of experimental information might also be responsible for the structural divergence of the termini and of some loops. Moreover, work is in progress to use also other superposition tools that might be appropriate to handle multi-domain protein structures and to allow alternative selections of the atoms/residues that must be superposed.
Figure 1

All figures were created with PyMol program (http://www.pymol.org).

Trace of all 10 models of the 1HRF entry of the Protein Data Bank;

First model colored according to the B-factor putty script distributed with PyMol;

First model colored according to the average residues RMSD computed with CARON.

  3 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  The Protein Data Bank: a computer-based archival file for macromolecular structures.

Authors:  F C Bernstein; T F Koetzle; G J Williams; E F Meyer; M D Brice; J R Rodgers; O Kennard; T Shimanouchi; M Tasumi
Journal:  J Mol Biol       Date:  1977-05-25       Impact factor: 5.469

3.  Solution structure of the epidermal growth factor-like domain of heregulin-alpha, a ligand for p180erbB-4.

Authors:  K Nagata; D Kohda; H Hatanaka; S Ichikawa; S Matsuda; T Yamamoto; A Suzuki; F Inagaki
Journal:  EMBO J       Date:  1994-08-01       Impact factor: 11.598

  3 in total
  2 in total

Review 1.  Protein stability: a crystallographer's perspective.

Authors:  Marc C Deller; Leopold Kong; Bernhard Rupp
Journal:  Acta Crystallogr F Struct Biol Commun       Date:  2016-01-26       Impact factor: 1.056

2.  Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB).

Authors:  Nicolas K Shinada; Peter Schmidtke; Alexandre G de Brevern
Journal:  Int J Mol Sci       Date:  2020-03-24       Impact factor: 5.923

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.