Literature DB >> 20733063

OpenStructure: a flexible software framework for computational structural biology.

Marco Biasini¹, Valerio Mariani, Jürgen Haas, Stefan Scheuber, Andreas D Schenk, Torsten Schwede, Ansgar Philippsen.

Abstract

MOTIVATION: Developers of new methods in computational structural biology are often hampered in their research by incompatible software tools and non-standardized data formats. To address this problem, we have developed OpenStructure as a modular open source platform to provide a powerful, yet flexible general working environment for structural bioinformatics. OpenStructure consists primarily of a set of libraries written in C++ with a cleanly designed application programmer interface. All functionality can be accessed directly in C++ or in a Python layer, meeting both the requirements for high efficiency and ease of use. Powerful selection queries and the notion of entity views to represent these selections greatly facilitate the development and implementation of algorithms on structural data. The modular integration of computational core methods with powerful visualization tools makes OpenStructure an ideal working and development environment. Several applications, such as the latest versions of IPLT and QMean, have been implemented based on OpenStructure-demonstrating its value for the development of next-generation structural biology algorithms. AVAILABILITY: Source code licensed under the GNU lesser general public license and binaries for MacOS X, Linux and Windows are available for download at http://www.openstructure.org. CONTACT: torsten.schwede@unibas.ch SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Gene

Mesh：

Year: 2010 PMID： 20733063 PMCID： PMC2951092 DOI： 10.1093/bioinformatics/btq481

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

We introduce OpenStructure, a flexible software framework for computational structural biology, a solid, yet flexible and versatile toolkit for rapid prototyping of new methods as well as their productive implementation. Typically, method development in structural bioinformatics involves combining different independent software tools, and significant effort is devoted to writing code for input/output operations and format conversions between different packages. This culminates when data and algorithms from different domains are to be combined, e.g. protein structures, protein sequence annotation and chemical ligands. Several software tools and frameworks are available today for molecular modeling, e.g. MMTK (Hinsen, 2000), Coot (Emsley et al., 2010) MolIDE (Canutescu and Dunbrack, 2005), Modeller (Eswar et al., 2008), bioinformatics algorithms libraries, e.g. BALL (Kohlbacher and Lenhof, 2000), workflow automation tools, e.g. Biskit (Grunberg et al., 2007) or KNIME (www.knime.org) and visualization e.g. VMD (Humphrey et al., 1996), PyMol (www.pymol.org), DINO (www.dino3d.org), or SwissPdbViewer (Guex et al., 2009). OpenStructure is a flexible software framework tailored for computational structural biology, which combines a C++ based library of commonly used functionality with a Python layer and powerful visualization tools. While PyMol and VMD also combine a scripting environment with sophisticated visualization tools, they are primarily geared toward visualization and less on providing a clean application programmer interface (API) that is easy to use and allows for rapid development of new algorithms. OpenStructure is also designed to easily accommodate interfaces to already existing software. This allows for rapid visually enhanced prototyping of new functionality, making OpenStructure an ideal environment for the development of next-generation structural biology algorithms. For example, new versions of the QMean tools for model quality assessment (Benkert et al., 2009a, b) are based on OpenStructure, as well as the structural analysis tools in ProteinModelPortal (Arnold et al., 2009). Further, work is on the way to implement the next generation of the SWISS-MODEL pipeline using the OpenStructure framework (Arnold et al., 2006; Bordoli et al., 2009).

2 IMPLEMENTATION

In OpenStructure, molecular or chemical entities, such as macromolecules, sequences, alignments or electron density maps, are represented as objects, offering a comprehensive set of functions for data manipulation and information querying. Typically, users interact with a high-level Python interface, while ‘power users’ with high computational requirements access the API at the level of C++. Functionality in OpenStructure is grouped into modules. Each of these modules consists of a computational core as a shared library of C++ code and a set of Python modules built on top of the exported API. Parts of the computational core and the graphical user interface of the Image Processing Library and Toolkit IPLT (Philippsen et al., 2007) have been incorporated into OpenStructure to offer versatile handling of image data with support for various algorithms in one, two and three dimensions. A graphics module for real-time rendering of molecules, density maps and molecular surfaces provides functionalities for data visualization. Processing and visualization of molecular entities often requires filtering by certain selection criteria. These selections are implemented as so-called EntityViews, containing subsets of atoms, residues, chains and bonds of the respective EntityHandle chosen using selection statements (queries). The EntityView class shares a common interface with the EntityHandle class it points to, and hence they can be used interchangeably. This handle/view concept pertains to the full structural hierarchy, i.e. residue views will only contain the atoms that were part of the selection, etc. The query language supports sophisticated selection criteria (for example, distance-based selection, Boolean operators, selections based on user-defined properties, and so on). In order to infer connectivity and topology when reading molecular coordinate files, we make use of the chemical components dictionary which is part of the official PDB distribution (Berman et al., 2003). Thus, detailed information is available on any of the chemical components, allowing the framework to ensure correct connectivity and topology during the load process and issue appropriate warnings. The connectivity step is extensible and its behavior can be adapted by overloading functions. Additionally, a heuristic method is available as a fallback for loading unknown residues or to handle non-standard residue and atom names.

3 APPLICATION EXAMPLE

Most users will interact with OpenStructure using Python. The code fragment in Supplementary Table S1 illustrates the expressiveness of the OpenStructure API in combining data from different domains. In this example, we compare the sequence conservation of residues in contact with a ligand with the rest of the protein, quantifying the visually derived hypothesis that the binding-site residues of the SH2 domain are more conserved than the rest. This is achieved by mapping of a conservation score derived from a multiple sequence alignment of various SH2 domains (‘sh2.aln’) onto a representative structure (PDB: 3IMJ) (DeLorbe et al., 2009) and identifying residues in direct contact with the ligand. Figure 1 shows the results displayed in the DNG (‘DINO/DeepView Next Generation’) graphical user interface, using the conservation score to color a molecular surface representation.

Fig. 1.

Molecular surface representation of a SH2 domain (PDB:3IMJ) colored by conservation of the positions in a multiple sequence alignment. The color scale ranges from red for conserved residues to blue for residues with high variability. The ligand peptide is shown as yellow stick representation. The image was rendered in OpenStructure, the molecular surface was calculated using MSMS (Sanner et al., 1996). See Supplementary Table S1 for details on calculation of sequence conservation scores. The OpenStructure distribution contains several scripting examples to introduce new users to the functionalities and usage style of the tool kit, such as scripts to animate molecular dynamics trajectories, calculate electron density maps from atomistic structures or rank short peptide fragments according to their correlation with electron density. Exhaustive documentation and tutorials are provided on the web site. Mailing lists for OpenStructure users and developers provide a forum to ask questions, report problems or suggest new developments.

16 in total

1. BALL--rapid software prototyping in computational molecular biology. Biochemicals Algorithms Library.

Authors: O Kohlbacher; H P Lenhof
Journal: Bioinformatics Date: 2000-09 Impact factor: 6.937

2. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling.

Authors: Konstantin Arnold; Lorenza Bordoli; Jürgen Kopp; Torsten Schwede
Journal: Bioinformatics Date: 2005-11-13 Impact factor: 6.937

3. MollDE: a homology modeling framework you can click with.

Authors: Adrian A Canutescu; Roland L Dunbrack
Journal: Bioinformatics Date: 2005-04-21 Impact factor: 6.937

4. Biskit--a software platform for structural bioinformatics.

Authors: Raik Grünberg; Michael Nilges; Johan Leckner
Journal: Bioinformatics Date: 2007-01-18 Impact factor: 6.937

5. Collaborative EM image processing with the IPLT image processing library and toolbox.

Authors: Ansgar Philippsen; Andreas D Schenk; Gian A Signorell; Valerio Mariani; Simon Berneche; Andreas Engel
Journal: J Struct Biol Date: 2006-07-14 Impact factor: 2.867

6. Protein structure homology modeling using SWISS-MODEL workspace.

Authors: Lorenza Bordoli; Florian Kiefer; Konstantin Arnold; Pascal Benkert; James Battey; Torsten Schwede
Journal: Nat Protoc Date: 2009 Impact factor: 13.491

7. Protein structure modeling with MODELLER.

Authors: Narayanan Eswar; David Eramian; Ben Webb; Min-Yi Shen; Andrej Sali
Journal: Methods Mol Biol Date: 2008

8. VMD: visual molecular dynamics.

Authors: W Humphrey; A Dalke; K Schulten
Journal: J Mol Graph Date: 1996-02

9. Reduced surface: an efficient way to compute molecular surfaces.

Authors: M F Sanner; A J Olson; J C Spehner
Journal: Biopolymers Date: 1996-03 Impact factor: 2.505

10. Features and development of Coot.

Authors: P Emsley; B Lohkamp; W G Scott; K Cowtan
Journal: Acta Crystallogr D Biol Crystallogr Date: 2010-03-24

18 in total

1. Assessment of ligand binding site predictions in CASP10.

Authors: Tiziano Gallo Cassarino; Lorenza Bordoli; Torsten Schwede
Journal: Proteins Date: 2014-02

2. Assessment of ligand-binding residue predictions in CASP9.

Authors: Tobias Schmidt; Jürgen Haas; Tiziano Gallo Cassarino; Torsten Schwede
Journal: Proteins Date: 2011-10-11

3. Computational modeling of the N-terminus of the human dopamine transporter and its interaction with PIP2 -containing membranes.

Authors: George Khelashvili; Milka Doktorova; Michelle A Sahai; Niklaus Johner; Lei Shi; Harel Weinstein
Journal: Proteins Date: 2015-03-25

Review 4. Modelling three-dimensional protein structures for applications in drug design.

Authors: Tobias Schmidt; Andreas Bergner; Torsten Schwede
Journal: Drug Discov Today Date: 2013-11-08 Impact factor: 7.851

5. Assessment of ligand binding residue predictions in CASP8.

Authors: Gonzalo López; Iakes Ezkurdia; Michael L Tress
Journal: Proteins Date: 2009

6. A pipeline for comprehensive and automated processing of electron diffraction data in IPLT.

Authors: Andreas D Schenk; Ansgar Philippsen; Andreas Engel; Thomas Walz
Journal: J Struct Biol Date: 2013-03-14 Impact factor: 2.867

7. Preconfiguration of the antigen-binding site during affinity maturation of a broadly neutralizing influenza virus antibody.

Authors: Aaron G Schmidt; Huafeng Xu; Amir R Khan; Timothy O'Donnell; Surender Khurana; Lisa R King; Jody Manischewitz; Hana Golding; Pirada Suphaphiphat; Andrea Carfi; Ethan C Settembre; Philip R Dormitzer; Thomas B Kepler; Ruijun Zhang; M Anthony Moody; Barton F Haynes; Hua-Xin Liao; David E Shaw; Stephen C Harrison
Journal: Proc Natl Acad Sci U S A Date: 2012-11-21 Impact factor: 11.205

8. Toward the estimation of the absolute quality of individual protein structure models.

Authors: Pascal Benkert; Marco Biasini; Torsten Schwede
Journal: Bioinformatics Date: 2010-12-05 Impact factor: 6.937

9. Structures of DNA duplexes containing O6-carboxymethylguanine, a lesion associated with gastrointestinal cancer, reveal a mechanism for inducing pyrimidine transition mutations.

Authors: Fang Zhang; Masaru Tsunoda; Kaoru Suzuki; Yuji Kikuchi; Oliver Wilkinson; Christopher L Millington; Geoffrey P Margison; David M Williams; Ella Czarina Morishita; Akio Takénaka
Journal: Nucleic Acids Res Date: 2013-04-10 Impact factor: 16.971

10. OpenStructure: an integrated software framework for computational structural biology.

Authors: M Biasini; T Schmidt; S Bienert; V Mariani; G Studer; J Haas; N Johner; A D Schenk; A Philippsen; T Schwede
Journal: Acta Crystallogr D Biol Crystallogr Date: 2013-04-19