Literature DB >> 34954790

HOMELETTE: a unified interface to homology modelling software.

Abstract

SUMMARY: Homology modelling, the technique of generating models of 3D protein structures based on experimental structures from related proteins, has become increasingly popular over the years. An abundance of different tools for model generation and model evaluation is available from various research groups. We present HOMELETTE, an interface which implements a unified programmatic access to these tools. This allows for the assemble of custom pipelines from pre- or self-implemented building blocks.
AVAILABILITY AND IMPLEMENTATION: HOMELETTE is implemented in Python, compatible with version 3.6 and newer. It is distributed under the MIT license. Documentation and tutorials are available at Read the Docs (https://homelette.readthedocs.io/). The latest version of HOMELETTE is available on PyPI (https://pypi.org/project/homelette/) and GitHub (https://github.com/PhilippJunk/homelette). A full installation of the latest version of HOMELETTE with all dependencies is also available as a Docker container (https://hub.docker.com/r/philippjunk/homelette_template). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Year: 2021 PMID： 34954790 PMCID： PMC8896651 DOI： 10.1093/bioinformatics/btab866

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Access to homology modelling tools has become increasingly simpler over the last years. There is a multitude of web services such as SWISS-MODEL offering total automation of the whole process. These are great tools for small homology modelling projects (Waterhouse ). However, medium to large scale projects, aiming to model the structures of tens or hundreds of proteins with different homology modelling software in a full- or semi-automated manner are faced with a very tedious exercise. Most of the popular homology modelling services offer command line tools. However, these tools come with different interfaces and work with different file types. The same is true for software aiming to evaluate homology models. The general flow of a homology modelling pipeline is depicted in Figure 1a (Webb and Sali, 2016). Usual requirements for most homology modelling software are a multiple sequence alignment (MSA) of the target sequence against one or multiple template sequences, as well as template structures. Using the information from the alignment and the template structure(s), a homology modelling algorithm assembles one or multiple models. Afterwards, these are evaluated by some evaluation metrics in order to select the best model(s).

Fig. 1.

Homology modelling pipeline. (a) General pipeline of homology modelling from left to right. (b) Building blocks implemented in HOMELETTE and how they correspond to the steps in homology modelling

Homology modelling pipeline. (a) General pipeline of homology modelling from left to right. (b) Building blocks implemented in HOMELETTE and how they correspond to the steps in homology modelling Exchanging components of the pipeline such as the modelling algorithm or the evaluation metrics is not trivial due to the problems outlined above. Therefore, the motivation behind HOMELETTE is to provide a modular homology modelling interface that can be used to construct pipelines with diverse modelling and evaluation tools within the same interface. The focus is also on making it easy for the user to implement new building blocks that fit into the framework. This interface can be used to easily assemble custom pipelines and streamline medium to large scale homology modelling projects (Fig. 1b).

2 Implementation

The HOMELETTE interface is fully implemented in Python. Python is a popular and accessible programming language extensively used in the scientific community (Van Rossum and Drake, 2009). HOMELETTE is built with modular design principles in mind. Template identification/alignment generation, model generation and model evaluation are designed as interchangeable building blocks that interact with the other components of the pipeline in an identical manner. This allows for the easy assembly of custom pipelines by freely combining these building blocks. Alignment generation and template processing building blocks are available for identifying templates with the RCSB Search Web API using MMseq2 (Rose ; Steinegger and Söding, 2017) and align them with Clustal Omega (Sievers ; Sievers and Higgins, 2018), or using HHSuite3 (Steinegger ). Model generation building blocks are currently available for MODELLER (Sali and Blundell, 1993; Webb and Sali, 2016), altMOD (Janson ) and ProMod3 (Biasini ; Studer ). Model evaluation building blocks are available for DOPE scores (Shen and Sali, 2006), SOAP scores (Dong ), QMEAN (Benkert ; Benkert ), QMEAN DisCo (Studer ) and MolProbity (Chen , Williams ). A good model is expected to have a low DOPE score, a low SOAP score, a high QMEAN score and a MolProbity score as close to 0 as possible. A list of the implemented building blocks is available in Supplementary Table S1. In addition, new building blocks can be implemented and seamlessly fit into existing pipelines allowing for even further customization. This is particularly useful for integrating software for which no building block is available yet into the framework. Users are strongly encouraged to share their custom building blocks with the community, and an extension framework has been set up to make this possible. Extensive documentation and tutorials teach the user how to use these building blocks, how to implement new building blocks and how to assemble them into more complex pipelines. The documentation is available online at https://homelette.readthedocs.io/. The tutorials are hosted together with the documentation, or as interactive Jupyter notebooks on the GitHub page and in the Docker container. HOMELETTE does not have any model building or model evaluating capacities on its own, but its strength comes from the integration of different software. Due to these design choices, it is reliant on third-party software (Supplementary Table S1). All currently integrated software is freely available for academic research. The documentation gives instruction on how to acquire and install third-party software. Alternatively, HOMELETTE is also available as a Docker container with all third-party software already installed.

3 Application

As an example for the custom assembly of alignment generating, homology modelling and model evaluation building blocks into custom pipelines, the ARAF protein was modelled (Supplementary Fig. S1). Starting from the sequence, the templates 3NY5 (BRAF) and 4G0N (RAF1) were identified, aligned and processed. In order to show how different modelling building blocks can be used interchangeably, two MODELLER building blocks with different parameters for model refinement were used. Evaluation was performed by using SOAP scores and MolProbity scores, which were summarized to a combined score using Borda count (Supplementary Fig. S1b). As expected, the modelling routine that spends more time on model refinement generates better models. There are also differences between the templates to be observed. The code to execute this example as well as to generate the visualization is made fully available in Tutorial 7.

4 Conclusion

There are three major determinants for the quality of a homology model. These are the alignment used, the quality of the template structures and the algorithm chosen for generating the models (Webb and Sali, 2016). HOMELETTE leaves the selection of all three determinants in the hand of the user. The user has agency which modelling software to use and compare, as well as full control over generating and refining the alignment and selecting templates. We explain and demonstrate the use of HOMELETTE in the series of eight tutorials. The tutorials culminate in a tutorial about pipeline assembly, which has been shown as an example pipeline for a proof of concept in this publication (Supplementary Fig. S1). In conclusion, HOMELETTE offers a unified, simple and well-documented interface to a multitude of popular homology model and model evaluation software. Its modular design principles allow users to assemble their own pipelines in an easy and consistent manner. Simple implementation and extensive documentation make it possible to extend HOMELETTE with other software, while retaining the same programmatic interface. This gives users even more freedom to assemble the best custom pipeline for their particular project. This could prove useful for large scale projects such as the structural modelling of whole biological systems. Click here for additional data file.

18 in total

1. Optimized atomic statistical potentials: assessment of protein interfaces and loops.

Authors: Guang Qiang Dong; Hao Fan; Dina Schneidman-Duhovny; Ben Webb; Andrej Sali
Journal: Bioinformatics Date: 2013-09-27 Impact factor: 6.937

2. Comparative Protein Structure Modeling Using MODELLER.

Authors: Benjamin Webb; Andrej Sali
Journal: Curr Protoc Bioinformatics Date: 2016-06-20

3. MolProbity: More and better reference data for improved all-atom structure validation.

Authors: Christopher J Williams; Jeffrey J Headd; Nigel W Moriarty; Michael G Prisant; Lizbeth L Videau; Lindsay N Deis; Vishal Verma; Daniel A Keedy; Bradley J Hintze; Vincent B Chen; Swati Jain; Steven M Lewis; W Bryan Arendall; Jack Snoeyink; Paul D Adams; Simon C Lovell; Jane S Richardson; David C Richardson
Journal: Protein Sci Date: 2017-11-27 Impact factor: 6.725

4. Clustal Omega for making accurate alignments of many protein sequences.

Authors: Fabian Sievers; Desmond G Higgins
Journal: Protein Sci Date: 2017-10-30 Impact factor: 6.725

5. Toward the estimation of the absolute quality of individual protein structure models.

Authors: Pascal Benkert; Marco Biasini; Torsten Schwede
Journal: Bioinformatics Date: 2010-12-05 Impact factor: 6.937

6. QMEANDisCo-distance constraints applied on model quality estimation.

Authors: Gabriel Studer; Christine Rempfer; Andrew M Waterhouse; Rafal Gumienny; Juergen Haas; Torsten Schwede
Journal: Bioinformatics Date: 2020-03-01 Impact factor: 6.937

7. Revisiting the "satisfaction of spatial restraints" approach of MODELLER for protein homology modeling.

Authors: Giacomo Janson; Alessandro Grottesi; Marco Pietrosanto; Gabriele Ausiello; Giulia Guarguaglini; Alessandro Paiardini
Journal: PLoS Comput Biol Date: 2019-12-17 Impact factor: 4.475

8. ProMod3-A versatile homology modelling toolbox.

Authors: Gabriel Studer; Gerardo Tauriello; Stefan Bienert; Marco Biasini; Niklaus Johner; Torsten Schwede
Journal: PLoS Comput Biol Date: 2021-01-28 Impact factor: 4.475

9. OpenStructure: an integrated software framework for computational structural biology.

Authors: M Biasini; T Schmidt; S Bienert; V Mariani; G Studer; J Haas; N Johner; A D Schenk; A Philippsen; T Schwede
Journal: Acta Crystallogr D Biol Crystallogr Date: 2013-04-19

10. RCSB Protein Data Bank: Architectural Advances Towards Integrated Searching and Efficient Access to Macromolecular Structure Data from the PDB Archive.

Authors: Yana Rose; Jose M Duarte; Robert Lowe; Joan Segura; Chunxiao Bi; Charmi Bhikadiya; Li Chen; Alexander S Rose; Sebastian Bittrich; Stephen K Burley; John D Westbrook
Journal: J Mol Biol Date: 2020-11-10 Impact factor: 6.151