Literature DB >> 17584798

Pcons.net: protein structure prediction meta server.

Björn Wallner¹, Per Larsson, Arne Elofsson.

Abstract

The Pcons.net Meta Server (http://pcons.net) provides improved automated tools for protein structure prediction and analysis using consensus. It essentially implements all the steps necessary to produce a high quality model of a protein. The whole process is fully automated and a potential user only submits the protein sequence. For PSI-BLAST detectable targets, an accurate model is generated within minutes of submission. For more difficult targets the sequence is automatically submitted to publicly available fold-recognition servers that use more advanced approaches to find distant structural homologs. The results from these servers are analyzed and assessed for structural correctness using Pcons and ProQ; and the user is presented with a ranked list of possible models. In addition, if the protein sequence contains more than one domain, these are automatically parsed out and resubmitted to the server as individual queries.

Entities: Disease Gene

Mesh：

Substances：
Proteins

Year: 2007 PMID： 17584798 PMCID： PMC1933226 DOI： 10.1093/nar/gkm319

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Reliable and accurate predictions of protein structure are important for many biologists. For many years it was believed that manual experts significantly outperformed all automatic methods. However since consensus-based approaches (1) were introduced it has been found that at the most a handful experts in the world can outperform the ‘community’ of web-servers. It has also been shown consistently in CASP that consensus methods are superior compared to individual methods in predicting the structure of a protein sequence (2–4). Pcons has been among the top performing automated predictors since CASP5 and was the best method for assessing model quality in CASP7 (5). Here, we introduce the Pcons.net meta server (http://pcons.net) which provides improved automated tools for protein structure prediction and analysis using consensus. The whole process is fully automated and a potential user only submits the protein sequence. This makes it easy to acquire structural information without any prior knowledge of remote homology detection, model building and model quality assessment. Pcons has previously been available as a downloadable program as well as through several other meta servers (genesilico.pl and bioinfo.pl). Pcons.net meta server provides significant improvements over these servers. It has an improved web interface and prediction accuracy, the local accuracy for each residue is also provided and for easy targets an accurate 3D model is build within minutes of submission.

SERVER DESCRIPTION

The Pcons.net Meta Server (http://pcons.net) essentially implements all the steps necessary to produce a high quality model of a protein sequence: Finding the best possible template. Aligning the template to the query sequence. Building a 3D structure based on the alignment. Assessing the quality of the model. An overview of the method is shown in Figure 1. In the first step domains are assigned using Pfam (6) and a quick database search against known protein structures (PDB90) is performed using BLAST (7) and RPS-BLAST (8). This also establishes the difficulty of the submitted sequence. If a significant hit is found using RPS-BLAST, an all-atom model is produced using, Pfrag, a novel rapid homology modeling program based on segment matching and assembly. If the sequence identity is above 50% this model will be quite close to the native structure, comparable to low-resolution X-ray and NMR structures (9,10). The whole process from sequence to all-atom model takes ∼30 s, making it one of the fastest comparative modeling servers available.

Figure 1.

Flow chart describing the different components of Pcons.net.

Flow chart describing the different components of Pcons.net. RPS-BLAST is also used to parse the sequence into structural domains by analyzing the significance and span of the best RPS-BLAST alignment. If the hit is (i) significant (10−5) and (ii) the alignment contains more than 30 unaligned residues, the unaligned residues are parsed out and resubmitted to the servers as a separate submission. In many cases, these domains agree well with the domains obtained using Pfam. It is only if no significant hits are found using RPS-BLAST, that the sequence is submitted to publicly available more advanced fold-recognition servers (Table 1). The user has the possibility to force the submission of sequences that has clear RPS-BLAST hits. However, we strongly discourage overuse of this possibility in order to not overload the external servers with trivial queries.

Table 1.

Internal and external servers utilized by the Pcons.net Meta Server. For similar servers, e.g. bas_b and bas_c only one of them is used in the consensus analysis

Servers	URL
BLAST (7)	run internally
RPS-BLAST (8)	run internally
FFAS03 (23)	http://bioinfo.pl/meta/
Meta-Basic (24)	http://bioinfo.pl/meta/
bas_c (24)	http://bioinfo.pl/meta/
bas_b (24)	http://bioinfo.pl/meta/
orfeus2 (25)	http://bioinfo.pl/meta/
SAM-T02 (26)	http://www.cse.ucsc.edu/compbio/HMM-apps/T02-query.html
mGenTHREADER (27)	http://bioinf.cs.ucl.ac.uk/psipred/psiform.html
FUGUE (28)	http://tardis.nibio.go.jp/fugue/
SP³ (29)	http://sparks.informatics.iupui.edu/hzhou/anonymous-fold-sp3.html
inub (30)	http://inub.cse.buffalo.edu/
FORTE (31)	http://www.cbrc.jp/htbin/forte-cgi/forte_form.pl
HHpred (32)	http://toolkit.tuebingen.mpg.de/hhpred
PSIPRED (18)	http://bioinf.cs.ucl.ac.uk/psipred/psiform.html
Pfam (6)	http://www.sanger.ac.uk/Software/Pfam/

Internal and external servers utilized by the Pcons.net Meta Server. For similar servers, e.g. bas_b and bas_c only one of them is used in the consensus analysis The alignments from the initial BLAST, RPS-BLAST as well as the alignments from the fold-recognition servers are collected as they finish and all-atom models are built using Pfrag. When the model building is finished, the quality of the models is assessed using Pcons (1,2,11). Pcons benefits from the use of as many individual servers as possible. Thus, it is important to not put too much weight on a consensus analysis that is only based on the results from a few servers. In parallel to the consensus analysis, the model quality is also assessed purely based on structural features using ProQ (12). Both Pcons and ProQ give an overall quality to each model as well as a local quality score to each individual residue (13). In CASP7, Pcons was one of the best method for assessing the overall quality of protein models and the best method for assessing the local quality of residues (5). In summary, the major advances over other web servers are: For PSI-BLAST detectable targets a quite accurate homology model is generated within minutes. A query sequence with PSI-BLAST detectable domains is automatically parsed into domains. A novel approach to display alignment similarity makes it easy to quickly select the best model. The overall as well as local quality of the model is assessed, using state-of-the-art methods.

SERVER INPUTS AND OUTPUTS

The server takes a protein sequence in one-letter amino acid format as input. The user has the possibility to name the sequence and to give their e-mail address. Both the name and e-mail address can be used to filter the results in the job queue (http://pcons.net/index.php?queue). Results for a specific job are provided through the web interface by clicking on the job id listed in the job queue table (Figure 2). This page is updated continuously as more predictions are finished. If an e-mail is provided the top 10 ranked model coordinates are e-mailed after 46 h. The 46 h time limit is set to allow for as many fold-recognition servers as possible to finish and provide the basis for the consensus analysis. However, if a significant hit indeed is found using the locally run RPS-BLAST, an accurate model should be ready within minutes of submission.

Figure 2.

An example of structure prediction results.

An example of structure prediction results. In addition to the web interface, the Pcons.net meta server will also be made available as a web service using the Web Service Description Language (WSDL) (14). The idea behind web services is to allow applications to communicate with each other in a standardized way. WSDL is used to conceptually describe the operations available at the service, and expresses the data formats using XML Schema definitions. Communication between web services and clients is done using the SOAP language (Simple Object Application Protocol) (15). For Pcons.net this will mean that a user who has access to a web service client, such as Taverna (16), will be able to make submissions to the meta server and also build in these submissions into more complex analysis workflows.

ALIGNMENT REPRESENTATION

An additional novel feature is the representation of the different alignments (Figure 3), which enables a quick overview of the alignment quality and facilitates comparisons of many alternative alignments.

Figure 3.

Alignment representation that facilitates comparisons of many different alternative alignments.

Alignment representation that facilitates comparisons of many different alternative alignments. The alignment is represented as a line that is color-coded according to the secondary structure. For the template structure STRIDE (17) is used to assign secondary structure based on the coordinates, for the target sequence PSIPRED (18) is used to predict secondary structure and assign it to each residue. Both the target and the template sequence are represented as full-length sequences, making it possible to see which parts of the target and template that are covered; and if the alignment spans only a part of the whole template structure. Here, the user also has the possibility to submit unaligned regions that did not fulfill the criteria for automatic domain resubmission (see above).

MODEL BUILDING

The model building based on the target–template alignment is performed using Pfrag, a reimplementation of the SegMod (19) homology modeling program. It builds models based on segment matching. By searching a database of highly refined protein structures, structural fragments are found that matches the template structure as closely as possible. Criteria for evaluating individual fragments are the degree of amino acid sequence homology between the target and the template, the RMSD deviation between a fragment and the template structure and the Lennard–Jones interaction energy between fragments and the structure. Initial screening of fragments is done using the methodology of distance matching by Jones and Thirup (20). The all-atom models are then energy minimized using the ENCAD force field (21) to enforce proper stereochemistry.

QUALITY SCORES

A key component for any successful protein structure protocol is the ability to assign quality scores to the created models. Pcons.net scores models using the best methods currently available. For each model three global quality scores are provided, one based on consensus (Pcons), one based solely on structure (ProQ) and one using a combination of the two (Pmodeller). All are presented in the job summary page. The reason for providing more than one score is that they contain complementary information. The Pcons score, for instance, is only meaningful if a sufficient number of models are available. If this is not the case, a structural evaluation using ProQ might be more suitable and for other cases the ProQ score might be a useful aid in the process of choosing the best model. From a user perspective it is important to know when to trust a certain score. Based on results from the quality assessment category in CASP7 (5) the Pcons score correlates well with the correct quality of the models as measured by LGscore (22) (R = 0.96). Moreover a Pcons score above 1.1 separates correct from incorrect models almost perfectly (only 2.5% false predictions). The ProQ and Pmodeller scores are the predicted LGscore and score values above 1.5 correspond to P-values better than 10−3. In addition to the global quality scores, each amino acid in the models is given an estimate of the CA–CA error as measured by the local S-score (S = 1/(1 + error2/5)). The S-score varies between 0 and 1 corresponding to high and low error, respectively, e.g. if the S-score is larger than 0.5 the error is predicted to be <2.24 Å (51/). The advantage with this type of score is that it focusses on the regions that have low error and gives the same score value for regions that are wrong. As for the global scores the local quality is predicted using either consensus (Pcons) or structural features (ProQres). In terms of performance, Pcons is superior to ProQres (13). In fact, no non-consensus-based approach is nearly as good as consensus-based approaches (5). However, ProQres still provide some additional value as a complement when there is no clear consensus or as additional augmentation when the consensus is weak. The local quality predictions are accessible by clicking either on the Pcons score or on the ProQ score in the job summary page (Figure 2). The local quality scores predicted by Pcons are also added to the B-factor column of all models for easy visualization in any coordinate viewing program (Figure 4).

Figure 4.

Local quality prediction using Pcons. (A) Predicted quality plotted for each residue in the sequence. (B) The structure color-coded from red to blue using the predicted quality, corresponding to poor and good, respectively (picture made using PyMOL (33). In this particular example, Pcons has identified a region around residue number 100 and the C-terminal to be incorrect. Despite that these two regions are far apart in sequence they end up on the same side of the protein, since the rest of the protein is correct; this suggests that the C-terminal residues makes some interactions with residues in other region that is not capture by this model. With this information it might be possible to improve the model.

THROUGHPUT

The throughput of Pcons.net depends to a large degree on the difficulty of the target. For the easy targets, the meta server could easily handle more than 1000 requests per day. But for the harder targets it can only handle about 50 requests per day, due to the throughput of the external server it uses. To avoid overloading the external servers there is also a limit in the number of pending external server jobs the meta server can have. If this limit is reached, the meta server will queue the jobs locally until the number of pending jobs decreases.

30 in total

1. Protein secondary structure prediction based on position-specific scoring matrices.

Authors: D T Jones
Journal: J Mol Biol Date: 1999-09-17 Impact factor: 5.469

2. FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties.

Authors: J Shi; T L Blundell; K Mizuguchi
Journal: J Mol Biol Date: 2001-06-29 Impact factor: 5.469

3. CDD: a database of conserved domain alignments with links to domain three-dimensional structure.

Authors: Aron Marchler-Bauer; Anna R Panchenko; Benjamin A Shoemaker; Paul A Thiessen; Lewis Y Geer; Stephen H Bryant
Journal: Nucleic Acids Res Date: 2002-01-01 Impact factor: 16.971

4. Protein structure prediction and structural genomics.

Authors: D Baker; A Sali
Journal: Science Date: 2001-10-05 Impact factor: 47.728

5. Pcons: a neural-network-based consensus predictor that improves fold recognition.

Authors: J Lundström; L Rychlewski; J Bujnicki; A Elofsson
Journal: Protein Sci Date: 2001-11 Impact factor: 6.725

6. Can correct protein models be identified?

Authors: Björn Wallner; Arne Elofsson
Journal: Protein Sci Date: 2003-05 Impact factor: 6.725

7. 3D-SHOTGUN: a novel, cooperative, fold-recognition meta-predictor.

Authors: Daniel Fischer
Journal: Proteins Date: 2003-05-15

8. Taverna: a tool for the composition and enactment of bioinformatics workflows.

Authors: Tom Oinn; Matthew Addis; Justin Ferris; Darren Marvin; Martin Senger; Mark Greenwood; Tim Carver; Kevin Glover; Matthew R Pocock; Anil Wipat; Peter Li
Journal: Bioinformatics Date: 2004-06-16 Impact factor: 6.937

Review 9. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971

10. Improvement of the GenTHREADER method for genomic fold recognition.

Authors: Liam J McGuffin; David T Jones
Journal: Bioinformatics Date: 2003-05-01 Impact factor: 6.937

27 in total

1. CONTSOR--a new knowledge-based fold recognition potential, based on side chain orientation and contacts between residue terminal groups.

Authors: Boris Vishnepolsky; Malak Pirtskhalava
Journal: Protein Sci Date: 2011-11-23 Impact factor: 6.725

2. Spiralin diversity within Iranian strains of Spiroplasma citri.

Authors: Amin Khanchezar; Laure Béven; Keramat Izadpanah; Mohammad Salehi; Colette Saillard
Journal: Curr Microbiol Date: 2013-09-01 Impact factor: 2.188

3. Structural analysis and molecular dynamics simulations of novel δ-endotoxin Cry1Id from Bacillus thuringiensis to pave the way for development of novel fusion proteins against insect pests of crops.

Authors: Budheswar Dehury; Mousumi Sahu; Jagajjit Sahu; Kishore Sarma; Priyabrata Sen; Mahendra K Modi; Madhumita Barooah; Manabendra Dutta Choudhury
Journal: J Mol Model Date: 2013-10-24 Impact factor: 1.810

4. Using multiple templates to improve quality of homology models in automated homology modeling.

Authors: Per Larsson; Björn Wallner; Erik Lindahl; Arne Elofsson
Journal: Protein Sci Date: 2008-04-25 Impact factor: 6.725

5. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates.

Authors: Yuedong Yang; Eshel Faraggi; Huiying Zhao; Yaoqi Zhou
Journal: Bioinformatics Date: 2011-06-11 Impact factor: 6.937

6. The C Terminus of Rotavirus VP4 Protein Contains an Actin Binding Domain Which Requires Cooperation with the Coiled-Coil Domain for Actin Remodeling.

Authors: Germain Trugnan; Serge Chwetzoff; Wilfried Condemine; Thibaut Eguether; Nathalie Couroussé; Catherine Etchebest; Agnes Gardet
Journal: J Virol Date: 2018-12-10 Impact factor: 5.103

Review 7. From local structure to a global framework: recognition of protein folds.

Authors: Agnel Praveen Joseph; Alexandre G de Brevern
Journal: J R Soc Interface Date: 2014-04-16 Impact factor: 4.118

8. Deciphering the three-domain architecture in schlafens and the structures and roles of human schlafen12 and serpinB12 in transcriptional regulation.

Authors: Jiaxing Chen; Leslie A Kuhn
Journal: J Mol Graph Model Date: 2019-04-09 Impact factor: 2.518

9. Nse1 RING-like domain supports functions of the Smc5-Smc6 holocomplex in genome stability.

Authors: Stephanie Pebernard; J Jefferson P Perry; John A Tainer; Michael N Boddy
Journal: Mol Biol Cell Date: 2008-07-30 Impact factor: 4.138

Review 10. Template-based protein modeling: recent methodological advances.

Authors: Pankaj R Daga; Ronak Y Patel; Robert J Doerksen
Journal: Curr Top Med Chem Date: 2010 Impact factor: 3.295