Literature DB >> 17488841

PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations.

Todd J Dolinsky¹, Paul Czodrowski, Hui Li, Jens E Nielsen, Jan H Jensen, Gerhard Klebe, Nathan A Baker.

Abstract

Real-world observable physical and chemical characteristics are increasingly being calculated from the 3D structures of biomolecules. Methods for calculating pK(a) values, binding constants of ligands, and changes in protein stability are readily available, but often the limiting step in computational biology is the conversion of PDB structures into formats ready for use with biomolecular simulation software. The continued sophistication and integration of biomolecular simulation methods for systems- and genome-wide studies requires a fast, robust, physically realistic and standardized protocol for preparing macromolecular structures for biophysical algorithms. As described previously, the PDB2PQR web server addresses this need for electrostatic field calculations (Dolinsky et al., Nucleic Acids Research, 32, W665-W667, 2004). Here we report the significantly expanded PDB2PQR that includes the following features: robust standalone command line support, improved pK(a) estimation via the PROPKA framework, ligand parameterization via PEOE_PB charge methodology, expanded set of force fields and easily incorporated user-defined parameters via XML input files, and improvement of atom addition and optimization code. These features are available through a new web interface (http://pdb2pqr.sourceforge.net/), which offers users a wide range of options for PDB file conversion, modification and parameterization.

Entities: CellLine Chemical Gene Species

Mesh：

Substances：

Year: 2007 PMID： 17488841 PMCID： PMC1933214 DOI： 10.1093/nar/gkm276

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Due to the importance of electrostatic interactions in biomolecular systems, a variety of computational methods have been developed for evaluating electrostatic forces and energies [see (1–6) and references therein]. Typical computational electrostatics methods for biomolecular systems can be loosely grouped into two categories: ‘explicit solvent’ methods, which treat solvent molecules in full molecular detail, and ‘implicit solvent’ methods, which include solvent–solute interactions in averaged or continuum fashion. Implicit solvent methods are, by definition, limited in detail and therefore lack the atomic-scale accuracy of their explicit solvent counterparts. However, implicit solvent methods have gained increasing popularity, in part due to their elimination of the extensive sampling of solvent configurations required with explicit models (1,3–7). The basic ingredients of an implicit solvent electrostatics calculation are environmental parameters such as temperature, solvent dielectric and ionic strength; biomolecular atomic coordinates; and parameters for atomic charges and radii. While the environmental parameters are relatively straightforward to specify, the remaining two ingredients can often be difficult to supply. In particular, most biomolecular structures in the Protein Data Bank (PDB) (8) do not contain hydrogen atoms, and many are also missing a fraction of the heavy atom coordinates. The addition of hydrogens and the reconstruction of these missing coordinates is not a trivial process; electrostatic properties obtained from the ‘repaired’ structures can often be very sensitive to the manner in which missing atoms are added and protonation states are assigned (9,10). Furthermore, inconsistent atomic nomenclature and other force field idiosyncrasies can often make the assignment of atomic charges and radii a cumbersome task. An additional obstacle to the use of PDB structures in electrostatics calculations and other biomolecular computational tasks is the accurate assignment of parameters to ‘non-standard’ residues and ligands. Previously (9), we introduced the freely available PDB2PQR service (http://pdb2pqr.sf.net/), which was designed to facilitate the setup and execution of continuum electrostatics calculations from PDB data, particularly by non-experts. The original PDB2PQR server automated many of the common tasks of preparing structures for continuum electrostatics calculations, including adding a limited number of missing heavy atoms to biomolecular structures, estimating titration states and protonating biomolecules in a manner consistent with favorable hydrogen bonding, assigning charge and radius parameters from a variety of force fields, and finally generating ‘PQR’ output (a PDB-like format with the occupancy and temperature factor columns replaced with charge ‘Q’ and radius ‘R’, respectively) compatible with several popular computational biology electrostatics [APBS (10) and MEAD (11)], docking [AutoDock (12)], simulation [AMBER (13)] and visualization [VMD (14), PyMOL (15) and PMV (16)] packages. Since its inception, we have continued to expand the capabilities of the PDB2PQR server to address the challenges associated with ligand parameterization in PDB files and to include several new features.

METHODS

The PDB2PQR web service is driven by a modular, Python-based collection of routines, which provides considerable flexibility to the software and permits non-interactive, high-throughput usage. The service is available via a number of web mirrors listed at http://pdb2pqr.sf.net/. The source code is also available for download from this link, and due to the portability of Python, PDB2PQR can be executed on a wide range of platforms. Figure 1 outlines the typical workflow of a PDB2PQR job and summarizes the features described in more detail below. The procedures for reconstruction of missing atoms, hydrogen optimization and APBS input generation were described previously (9) and are essentially unchanged in the current version of the software. Since their initial development, these atom reconstruction options have been greatly improved through a number of bug fixes and code optimization, robust support for separate biomolecular chains, and improved chain termini optimization. The following sections describe modified and new elements of the PDB2PQR pipeline.

Figure 1.

Flowchart demonstrating the sequence of operations performed by the pipeline. The process begins with an input PDB file and ends with a parameterized PQR file and, optionally, an APBS input file.

Titration state assignment by PROPKA

Protonation states for titratable protein groups are assigned by PROPKA 1.0 (http://propka.ki.ku.dk) (17). PROPKA utilizes a very fast empirical method to predict pKa values and is successful at predicting unusual pKa values. Recently, a comparative study of several protein pKa prediction methods showed that PROPKA was the most accurate method overall (18). PROPKA uses a heuristic method to compute the pKa perturbations due to desolvation, hydrogen bonding and charge–charge interactions. In the current version of PROPKA, contributions from nucleic acids as well as heteroatoms such as bound ions or ligands to the pKa values are not included. Note that, during the course of titration state assignment, PROPKA generates statistics on residue hydrogen bonding, location and solvent accessibility and Coulombic interactions. This information is available to users as a downloadable text file provided at the end of the PDB2PQR/PROPKA calculation.

Standard residue parameter assignment

PDB2PQR currently allows users to assign protein and (where available) nucleic acid parameters based on explicit solvent AMBER99 (19) and CHARMM27 (20) force fields, the PARSE continuum electrostatics force field (21), a Poisson–Boltzmann-optimized force field by Tan et al. (22), or user-defined force fields. User-defined parameters can be uploaded to the PDB2PQR server in a simple flat-file format described in the PDB2PQR user guide. Additionally, PDB2PQR output can be customized to include a variety of atom naming schemes, including AMBER99 (19), CHARMM22 (20), PARSE (21) and an internal naming scheme based on the IUPAC naming recommendations (23). This flexibility in nomenclature was included to facilitate import of PDB2PQR output into other modeling packages. Additionally, the web server provides a ‘map’ which is output at the end of every PDB2PQR calculation and presents a table of atoms’ name/number, residue name, chain name, AMBER atom type and CHARMM atom type to aid in the interpretation of parameter assignment and the development of user-defined charges and radii.

Ligand parameter assignment

The calculation of ligand charges necessitates detailed information on molecular structure and protonation states due to the large variation in the covalent structures of small-molecule protein ligands. The current version of PDB2PQR therefore requires the ligand structure, protonation state and formal charge to be specified by the user in the popular MOL2 (24) format. Ligand structures in MOL2 format are readily available from popular molecular modeling software and free web services such as PRODRG (25). Future versions of PDB2PQR will include a pdb2mol2 parser and automatic assignment of default ligand protonation states from a small-molecule pKa database. The calculation of ligand charges in PDB2PQR is based on the partial equalization of orbital electronegativities (PEOE) procedure developed by Gasteiger and Marsili (26). In the PEOE procedure, orbital electronegativities χ are linked to partial atomic charges q by a polynomial expansion (χ = a +b·q + c·q2 + d·q3). The coefficients a, b, c and d were optimized by Gasteiger and Marsili using gas phase data on ionization potentials and electron affinities. We utilize a PEOE algorithm, which has been optimized by Czodrowski et al. to obtain better agreement between theoretical and experimental solvation energies for a set of small molecules including the polar amino acids (27). The resulting PEOE_PB charges have been tested for small-molecule complexes with trypsin, thrombin (28) and HIV protease (29), and have been found to give results that are in agreement with experimental values.

Post-processing

The current version of PDB2PQR supports an ‘extension’ directory for user-defined processing of PDB2PQR output. Such extensions might include alternative naming schemes, identification and parameterization of other molecule types, additional hydrogen bond processing, etc. The web servers listed at (http://pdb2pqr.sf.net/) provide only the default PDB2PQR functionality. However, it is straightforward for users to download the PDB2PQR software and setup their own web servers with additional functionality based on custom extensions.

CONCLUSIONS

We have described a number of new features for the free PDB2PQR web server, a service which helps users prepare molecular structures for further computational work by modeling missing atoms, assigning charges and titration states, and providing a mechanism for assignment of ligand parameters. Readers interested in these tasks might also be interested in other servers, which provide complementary services for biomolecular structure processing (30–32). Planned future developments for PDB2PQR include the construction of a pdb2mol2 parser to allow for the automatic parameterization of non-protein atoms, the correct treatment of protein post-translational modification, and the integration of a Poisson–Boltzmann continuum electrostatics-based pKa calculation algorithm into PDB2PQR. We anticipate that the PDB2PQR service will continue to be a helpful addition to the portfolio of tools available to the structural and computational biology communities.

21 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Python: a programming language for software integration and development.

Authors: M F Sanner
Journal: J Mol Graph Model Date: 1999-02 Impact factor: 2.518

3. Electrostatics of nanosystems: application to microtubules and the ribosome.

Authors: N A Baker; D Sept; S Joseph; M J Holst; J A McCammon
Journal: Proc Natl Acad Sci U S A Date: 2001-08-21 Impact factor: 11.205

4. PDB2PQR: an automated pipeline for the setup of Poisson-Boltzmann electrostatics calculations.

Authors: Todd J Dolinsky; Jens E Nielsen; J Andrew McCammon; Nathan A Baker
Journal: Nucleic Acids Res Date: 2004-07-01 Impact factor: 16.971

5. WHAT IF: a molecular modeling and drug design program.

Authors: G Vriend
Journal: J Mol Graph Date: 1990-03

6. Assignment of polar states for protein amino acid residues using an interaction cluster decomposition algorithm and its application to high resolution protein structure modeling.

Authors: Xin Li; Matthew P Jacobson; Kai Zhu; Suwen Zhao; Richard A Friesner
Journal: Proteins Date: 2007-03-01

7. Development, validation, and application of adapted PEOE charges to estimate pKa values of functional groups in protein-ligand complexes.

Authors: Paul Czodrowski; Ingo Dramburg; Christoph A Sotriffer; Gerhard Klebe
Journal: Proteins Date: 2006-11-01

Review 8. Modeling electrostatic effects in proteins.

Authors: Arieh Warshel; Pankaz K Sharma; Mitsunori Kato; William W Parson
Journal: Biochim Biophys Acta Date: 2006-08-25

9. Atypical protonation states in the active site of HIV-1 protease: a computational study.

Authors: Paul Czodrowski; Christoph A Sotriffer; Gerhard Klebe
Journal: J Chem Inf Model Date: 2007-05-16 Impact factor: 4.956

10. Recommendations for the presentation of NMR structures of proteins and nucleic acids.

Authors: J L Markley; A Bax; Y Arata; C W Hilbers; R Kaptein; B D Sykes; P E Wright; K Wüthrich
Journal: J Mol Biol Date: 1998-07-31 Impact factor: 5.469

708 in total

1. Alternaria alternata allergen Alt a 1: a unique β-barrel protein dimer found exclusively in fungi.

Authors: Maksymilian Chruszcz; Martin D Chapman; Tomasz Osinski; Robert Solberg; Matthew Demas; Przemyslaw J Porebski; Karolina A Majorek; Anna Pomés; Wladek Minor
Journal: J Allergy Clin Immunol Date: 2012-06-02 Impact factor: 10.793

2. Engineering of an alternative electron transfer path in photosystem II.

Authors: Shirley Larom; Faris Salama; Gadi Schuster; Noam Adir
Journal: Proc Natl Acad Sci U S A Date: 2010-05-10 Impact factor: 11.205

3. Histidine in continuum electrostatics protonation state calculations.

Authors: Vernon Couch; Alexei Stuchebrukhov
Journal: Proteins Date: 2011-08-30

4. Nonspecific DNA binding and bending by HUαβ: interfaces of the three binding modes characterized by salt-dependent thermodynamics.

Authors: Junseock Koh; Irina Shkel; Ruth M Saecker; M Thomas Record
Journal: J Mol Biol Date: 2011-04-12 Impact factor: 5.469