Literature DB >> 25883141

JPred4: a protein secondary structure prediction server.

Alexey Drozdetskiy¹, Christian Cole¹, James Procter¹, Geoffrey J Barton².

Abstract

JPred4 (http://www.compbio.dundee.ac.uk/jpred4) is the latest version of the popular JPred protein secondary structure prediction server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction. In addition to protein secondary structure, JPred also makes predictions of solvent accessibility and coiled-coil regions. The JPred service runs up to 94 000 jobs per month and has carried out over 1.5 million predictions in total for users in 179 countries. The JPred4 web server has been re-implemented in the Bootstrap framework and JavaScript to improve its design, usability and accessibility from mobile devices. JPred4 features higher accuracy, with a blind three-state (α-helix, β-strand and coil) secondary structure prediction accuracy of 82.0% while solvent accessibility prediction accuracy has been raised to 90% for residues <5% accessible. Reporting of results is enhanced both on the website and through the optional email summaries and batch submission results. Predictions are now presented in SVG format with options to view full multiple sequence alignments with and without gaps and insertions. Finally, the help-pages have been updated and tool-tips added as well as step-by-step tutorials.

Entities: Chemical

Mesh：

Substances：
Solvents

Year: 2015 PMID： 25883141 PMCID： PMC4489285 DOI： 10.1093/nar/gkv332

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Knowledge of a protein's three-dimensional structure is central to understanding the protein's detailed function. Although recent developments in structural biology (1–4) have led to an acceleration in the rate of three-dimensional structure determination by X-ray crystallography, nuclear magnetic resonance and 3D-EM techniques, in January 2015 there were still just 105 732 protein structures known (http://www.ebi.ac.uk/pdbe) (5) compared to almost 90 million sequences (http://www.ebi.ac.uk/uniprot/TrEMBLstats) (6). The routine use of massively parallel DNA sequencing technologies today means knowledge of protein sequences will continue to outpace structural biology for the foreseeable future. As a consequence, there is a need for accurate methods to predict structural and functional features from the amino acid sequence. Over the last 30 years, techniques to predict the three-state secondary structure of the protein (α-helix, β-strand and coil: i.e. all other states) have increased in accuracy from around 50% in 1983 (7) to over 80% today (8–11) which is close to the estimated maximum for prediction from multiple alignment (12). Although knowledge of the secondary structure alone is not as useful as a full three-dimensional model, secondary structure predictions provide important constraints for fold-recognition techniques (13–17) as well as in homology modelling (18,19), ab initio (20–24) and constraint-based tertiary structure prediction methods (25–27). Secondary structure predictions can also help in the identification of functional domains and may be used to guide the rational design of site-specific or deletion mutation experiments. Although hundreds of papers have been published describing methods for protein secondary structure prediction, three of the most widely used are JPred, PSIPRED and PredictProtein. JPred (v. 3.0) (11) gave 81.5% three-state accuracy (Q3), PSIPRED v.3.0 (28) reported accuracy of 81.4%, while the current PSIPRED V 3.2 server, which includes a broad suite of prediction algorithms, quotes 81.6%. (http://bioinf.cs.ucl.ac.uk/psipred). There is no recent blind prediction test for the PROFphd secondary structure prediction algorithm in the PredictProtein (29) secondary structure prediction method, though the earlier PROFsec reported 76% (30). In this paper we summarize the current performance and features of the upgraded JPred server (JPred4) which incorporates the secondary structure and solvent accessibility prediction program JNet v.2.3.1.

MATERIALS AND METHODS

The basic usage pattern for JPred4 is the same as for JPred3 (11). The user can submit a single protein sequence, a multiple sequence alignment (MSA) or a batch of single protein sequences for prediction. Results are returned either interactively through a web page or as a summary email that directs the user to results on the JPred4 website. The look and feel of the JPred4 web server has been changed significantly compared to JPred3 by embracing contemporary web technologies, the Bootstrap framework (www.getbootstrap.com/) and custom JavaScript. These changes allow smoother user interaction through the use of ‘tooltips’ that pop up to present help on each option in an easy-to-read form without the need to leave the page. The Bootstrap framework provides a modern look and feel to the website as well as improving usability on devices such as tablets and phones with different screen sizes and resolutions. Figure 1 illustrates the appearance of the advanced submission page showing the use of tooltips to get help about each option. As well as updates to the help pages, step-by-step tutorials with screenshots are a new addition that helps users to obtain maximum benefit from the JPred4 server.

Figure 1.

(1) Screenshot of the JPred4 job submission page with single sequence submission field (2) and an example of a tool-tip message (3). Advanced options are opened on request (4) and include input file upload, format selection (5) as well as optional email and query name fields (6). (7) Job progress page with access to the detailed job run log file (8).

Prediction algorithm

As with JPred3, JPred4 makes secondary structure and residue solvent accessibility predictions by the JNet algorithm (11,31). However, in JPred4, the JNet 2.0 neural network-based predictor has been retrained to make JNet 2.3.1 by 7-fold cross-validation using one representative for each of the 1358 SCOPe/ASTRAL v.2.04 superfamily domain sequences (32). Multiple alignments for each sequence were built by PSI-BLAST (33) through searching UniRef90 v.2014_07 (34). In addition to retraining, the HMM building step in JNet was updated to HMMer 3 (35) and some improvements were made to the code to simplify management and future algorithmic developments. The final accuracy of JNet 2.3.1 was assessed in a blind test on 150 sequences from 150 superfamilies not used in training. The 150 superfamily sequences were selected to reproduce a similar distribution of secondary structure compositions as the training structures in order to avoid biasing the reported accuracy of the blind test results. On the blind test, the average secondary structure prediction Q3 score increased to 82.0% from 81.5% for JNet v.2.0, and solvent accessibility prediction accuracy rose to 90.0, 83.6 and 78.1% from 88.9, 82.4 and 77.8% for JNet v.2.0 for each of >0, >5 and >25% relative solvent accessibility thresholds.

JPred4 results reporting

JPred3 has been widely used in teaching and integrated into many bioinformatics pipelines across the world. Accordingly, in order to maintain support for legacy courses and scripts, the results options in JPred4 include all the original formats and styles (PDF, HTML, etc.) as well as the intermediary processing files. In addition to these outputs, JPred4 reports have been enhanced to include more visualization options and to present a complete picture of the alignment generated for prediction including all insertions. Figure 2 summarizes the main results page while Figure 3 shows examples of summary emails returned to a user for single or batch sequence submissions. Unlike previous versions of JPred, the primary visualization of a JPred4 prediction result is a scrollable SVG image. The SVG is generated by Jalview 2.9 (www.jalview.org) (36) run in command-line mode as part of the JPred4 web server processing pipeline so users do not need to run Jalview on their own computers. However, the JalviewLite Java applet result page is still provided for users working with Java-enabled browsers who prefer direct access to Jalview's sophisticated functions.

Figure 2.

Figure 3.

(1) Illustration of a single sequence job submission secondary structure prediction results summary email with link to full result details (2). (3) Illustration of a batch submission email summary with overall and per job (4) details that give links to individual predictions and an archive with all results for all sequences submitted in the batch.

JPred4 results summary page (1) with the results of predictions presented in SVG (2). Links to detailed and simple reports in coloured HTML/PS/PDF formats (3). Example summary in HTML format is shown in (4) as well as the new addition of full multiple sequence alignments with and without gaps/insertions (5). On a separate linked page the user is able to run the Jalview applet (6) which allows a more sophisticated and interactive method of viewing the prediction results. Links to all the details for the prediction and an archive of the results are also available (7). (1) Illustration of a single sequence job submission secondary structure prediction results summary email with link to full result details (2). (3) Illustration of a batch submission email summary with overall and per job (4) details that give links to individual predictions and an archive with all results for all sequences submitted in the batch. In all previous versions of JPred, the alignment returned showed the full-length query sequence without gaps necessary to accommodate insertions in sequences returned from the PSI-BLAST search. JPred4 introduces options to view the full multiple alignment including all residues in all sequences or download it for further analysis. For users who have local installations of Jalview (36), Jalview feature files are provided to allow easy annotation and analysis of the alignment and predictions. In JPred3, a batch job with multiple query sequences would return separate emails for each query. JPred4 condenses these messages into a single email with a summary of success/failure for each sequence (Figure 3) in the batch and a compressed archive of all the predictions. All JPred4 jobs are currently stored on the server for 5 days.

Time required to complete predictions

The median time for a JPred4 prediction to return results is 5 min calculated over a recent 50 000 consecutive predictions performed by end-users in the autumn of 2014. However, the server can accommodate jobs of up to 3-h duration. Most of the time is spent in the PSI-BLAST search phase which is avoided if the user submits a pre-existing MSA. MSA predictions typically return results within a few seconds. In summary, the JPred server has been upgraded to provide a richer user experience and to include more accurate secondary structure and solvent accessibility predictions from the JNet 2.3.1 algorithm.

36 in total

1. Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training.

Authors: Ofer Dor; Yaoqi Zhou
Journal: Proteins Date: 2007-03-01

2. Jalview Version 2--a multiple sequence alignment editor and analysis workbench.

Authors: Andrew M Waterhouse; James B Procter; David M A Martin; Michèle Clamp; Geoffrey J Barton
Journal: Bioinformatics Date: 2009-01-16 Impact factor: 6.937

3. Protein annotation and modelling servers at University College London.

Authors: D W A Buchan; S M Ward; A E Lobley; T C O Nugent; K Bryson; D T Jones
Journal: Nucleic Acids Res Date: 2010-05-27 Impact factor: 16.971

4. High-resolution structure prediction and the crystallographic phase problem.

Authors: Bin Qian; Srivatsan Raman; Rhiju Das; Philip Bradley; Airlie J McCoy; Randy J Read; David Baker
Journal: Nature Date: 2007-10-14 Impact factor: 49.962

5. Three-dimensional structures of membrane proteins from genomic sequencing.

Authors: Thomas A Hopf; Lucy J Colwell; Robert Sheridan; Burkhard Rost; Chris Sander; Debora S Marks
Journal: Cell Date: 2012-05-10 Impact factor: 41.582

6. HMMER web server: interactive sequence similarity searching.

Authors: Robert D Finn; Jody Clements; Sean R Eddy
Journal: Nucleic Acids Res Date: 2011-05-18 Impact factor: 16.971

7. Protein 3D structure computed from evolutionary sequence variation.

Authors: Debora S Marks; Lucy J Colwell; Robert Sheridan; Thomas A Hopf; Andrea Pagnani; Riccardo Zecchina; Chris Sander
Journal: PLoS One Date: 2011-12-07 Impact factor: 3.240

8. The Jpred 3 secondary structure prediction server.

Authors: Christian Cole; Jonathan D Barber; Geoffrey J Barton
Journal: Nucleic Acids Res Date: 2008-05-07 Impact factor: 16.971

9. Activities at the Universal Protein Resource (UniProt).

Authors:
Journal: Nucleic Acids Res Date: 2013-11-18 Impact factor: 16.971

10. PDBe: Protein Data Bank in Europe.

Authors: Aleksandras Gutmanas; Younes Alhroub; Gary M Battle; John M Berrisford; Estelle Bochet; Matthew J Conroy; Jose M Dana; Manuel A Fernandez Montecelo; Glen van Ginkel; Swanand P Gore; Pauline Haslam; Rowan Hatherley; Pieter M S Hendrickx; Miriam Hirshberg; Ingvar Lagerstedt; Saqib Mir; Abhik Mukhopadhyay; Thomas J Oldfield; Ardan Patwardhan; Luana Rinaldi; Gaurav Sahni; Eduardo Sanz-García; Sanchayita Sen; Robert A Slowley; Sameer Velankar; Michael E Wainwright; Gerard J Kleywegt
Journal: Nucleic Acids Res Date: 2013-11-27 Impact factor: 16.971

649 in total

1. Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system.

Authors: Bernd Zetsche; Jonathan S Gootenberg; Omar O Abudayyeh; Ian M Slaymaker; Kira S Makarova; Patrick Essletzbichler; Sara E Volz; Julia Joung; John van der Oost; Aviv Regev; Eugene V Koonin; Feng Zhang
Journal: Cell Date: 2015-09-25 Impact factor: 41.582

2. Study of Legionella Effector Domains Revealed Novel and Prevalent Phosphatidylinositol 3-Phosphate Binding Domains.

Authors: Nimrod Nachmias; Tal Zusman; Gil Segal
Journal: Infect Immun Date: 2019-05-21 Impact factor: 3.441

3. EspH Suppresses Erk by Spatial Segregation from CD81 Tetraspanin Microdomains.

Authors: Rachana Pattani Ramachandran; Felipe Vences-Catalán; Dan Wiseman; Efrat Zlotkin-Rivkin; Eyal Shteyer; Naomi Melamed-Book; Ilan Rosenshine; Shoshana Levy; Benjamin Aroeti
Journal: Infect Immun Date: 2018-09-21 Impact factor: 3.441

4. Characterization of a secretory hydrolase from Mycobacterium tuberculosis sheds critical insight into host lipid utilization by M. tuberculosis.

Authors: Khundrakpam Herojit Singh; Bhavya Jha; Abhisek Dwivedy; Eira Choudhary; Arpitha G N; Anam Ashraf; Divya Arora; Nisheeth Agarwal; Bichitra Kumar Biswal
Journal: J Biol Chem Date: 2017-05-17 Impact factor: 5.157

5. A Structurally Dynamic Region of the HslU Intermediate Domain Controls Protein Degradation and ATP Hydrolysis.

Authors: Vladimir Baytshtok; Xue Fei; Robert A Grant; Tania A Baker; Robert T Sauer
Journal: Structure Date: 2016-09-22 Impact factor: 5.006

6. PAGE4 and Conformational Switching: Insights from Molecular Dynamics Simulations and Implications for Prostate Cancer.

Authors: Xingcheng Lin; Susmita Roy; Mohit Kumar Jolly; Federico Bocci; Nicholas P Schafer; Min-Yeh Tsai; Yihong Chen; Yanan He; Alexander Grishaev; Keith Weninger; John Orban; Prakash Kulkarni; Govindan Rangarajan; Herbert Levine; José N Onuchic
Journal: J Mol Biol Date: 2018-06-05 Impact factor: 5.469

7. The Histone Deacetylase Complex 1 Protein of Arabidopsis Has the Capacity to Interact with Multiple Proteins Including Histone 3-Binding Proteins and Histone 1 Variants.

Authors: Giorgio Perrella; Craig Carr; Maria A Asensi-Fabado; Naomi A Donald; Katalin Páldi; Matthew A Hannah; Anna Amtmann
Journal: Plant Physiol Date: 2016-03-07 Impact factor: 8.340