Literature DB >> 15980453

Web-based toolkits for topology prediction of transmembrane helical proteins, fold recognition, structure and binding scoring, folding-kinetics analysis and comparative analysis of domain combinations.

Hongyi Zhou¹, Chi Zhang, Song Liu, Yaoqi Zhou.

Abstract

We have developed the following web servers for protein structural modeling and analysis at http://theory.med.buffalo.edu: THUMBUP, UMDHMM(TMHP) and TUPS, predictors of transmembrane helical protein topology based on a mean-burial-propensity scale of amino acid residues (THUMBUP), hidden Markov model (UMDHMM(TMHP)) and their combinations (TUPS); SPARKS 2.0 and SP3, two profile-profile alignment methods, that match input query sequence(s) to structural templates by integrating sequence profile with knowledge-based structural score (SPARKS 2.0) and structure-derived profile (SP3); DFIRE, a knowledge-based potential for scoring free energy of monomers (DMONOMER), loop conformations (DLOOP), mutant stability (DMUTANT) and binding affinity of protein-protein/peptide/DNA complexes (DCOMPLEX & DDNA); TCD, a program for protein-folding rate and transition-state analysis of small globular proteins; and DOGMA, a web-server that allows comparative analysis of domain combinations between plant and other 55 organisms. These servers provide tools for prediction and/or analysis of proteins on the secondary structure, tertiary structure and interaction levels, respectively.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2005 PMID： 15980453 PMCID： PMC1160121 DOI： 10.1093/nar/gki360

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

BACKGROUND

In the post-genomics era, attention is now squarely focused on the interconnections between sequences, structures and function of proteins. As more sequences from genome-sequencing projects and more structures from structure-genomics projects become available, tools are urgently needed to extract the maximum amount of information from them in order to analyze and predict unknown structures and function. We present a number of web-based servers available at as shown in Table 1. They are THUMBUP, UMDHMMTMHP and TUPS for topology prediction of transmembrane helical proteins (1); SPARKS 2.0 (2) and SP3 (3) for sequence-to-structure fold recognition and alignment; DFIRE energy function (4) for scoring structural monomer (DMONOMER) and loop conformations (DLOOP) (5), predicting mutant stability (DMUTANT) (4), binding affinity of protein–protein/peptide complexes (DCOMPLEX) (6) and protein–DNA complexes (DDNA) (7); TCD for analysis of folding kinetics (8,9) and DOGMA for comparative analysis of plant domain graph (10). These servers can be classified as the tools for prediction and analysis of the secondary structures, tertiary structures and interactions of proteins as shown in Figure 1. Details are described below.

Table 1

List of web-based toolkits on the services section of the website:

Name (reference)	Inputa	Output
TM helical topology (secondary structure level)
THUMBUP (1)	Sequence	TMH residue ranges
		N-terminal orientation (in or out)
UMDHMM^TMHP (1)	As above	As above
TUPS	As above	As above
Fold recognition, alignment and structure prediction (tertiary structure level)
SPARKS 2.0 (2)	Sequence	Sequence-to-structure alignment
	No. of models to be built	Models built (in PDB format)
SP³(3)	As above	As above
Application of DFIRE energy function (interaction level)
DMONOMER (4)	Structure file	Conformation energy score
DLOOP (5)	Structure file	Conformation energy score
	Loop location
DMUTANT (4)	Structure file	Stability change
	Residue mutated
DCOMPLEX (6)	Complex structure file	Binding affinity
	Two chain IDs
DDNA (7)	Complex structure file	Binding affinity
	Two chain IDs
Protein folding kinetics (interaction level)
TCD (8,9)	Structure file	TCD, folding rate transition-state size
	Chain ID
	Residue range
Domain graph analysis (interaction level)
DOGMA (10)	Organism name	Comparative domain graph
	List of domain names	Shortest path between domains
		Phylogenetic profiling of domain/combination
		Topology analysis of domain graph

aThe formats for sequence and structural inputs are those of FASTA and PDB, respectively.

Figure 1

The classification of the web servers available on .

THUMBUP, UMDHMMTMHP AND TUPS

Overview

Communications and regulation of the communications between the inside and the outside of cell membranes are controlled mostly by transmembrane (TM) proteins. Most TM proteins are helical (TMH) proteins. Many different methods have been developed to predict the topology of TMH proteins (11–13). The determination of the topology of a TMH protein is useful for the annotation of its function.

Description

THUMBUP uses a simple scale of burial propensity and a sliding window-based algorithm to predict TM helical segments, and a positive-inside rule (14) to predict N-terminal orientation. The use of burial propensity was based on the fact that helical membrane proteins are packed more tightly than helical soluble proteins (15). It was found that THUMBUP gives an excellent prediction for TM proteins with known structures (3D_helix database), but relatively poorer prediction for a 1D_helix database (topology information was obtained by gene fusion and other experimental techniques) (1). The latter was attributed in part to the high inaccuracy of 1D_helix database employed (16–18). UMDHMMTMHP uses a modified version of hidden Markov model software developed at University of Maryland (version 1.02, ) for transmembrane-helical-topology prediction. The program differs from typical HMM-based methods for TMH proteins in that the parameters in UMDHMMTMHP were trained by the 3D_helix database only. TUPS combines the prediction of THUMBUP and UMDHMMTMHP for TM segments and PHOBIUS (19) for the identification of signal peptides. More specifically, TUPS first takes the results from UMDHMMTMHP. Then, if a TM segment predicted by THUMBUP does not overlap with any TM segments predicted by UMDHMMTMHP, the segment is included in the TUPS prediction. Finally, signal peptides identified by PHOBIUS are removed from the TUPS prediction. There is no additional parameter introduced in TUPS other than the parameters determined in THUMBUP and UMDHMMTMHP.

Performance

In addition to the 3D and 1D helix datasets tested in the original paper (1), we tested THUMBUP and UMDHMMTMHP in the static benchmark established by Kernytsky and Rost (20). UMDHMMTMHP and THUMBUP without any modification provides 86 and 80% per-segment accuracy for high-resolution dataset, respectively. The performances were ranked #1 and #3, respectively, among the methods compared in the static benchmark. Their performances on low-resolution dataset were only about average, as expected. The new TUPS server provides 88% per-segment accuracy for high-resolution dataset in this benchmark with significant lower rate for misidentifying signal peptides as TM helices (3 versus 70 in UMDHMMTMHP and 28 in THUMBUP). TUPS also provides a substantially better performance per topology accuracy on our 3D_helix test set (1) (86% versus 75% by THUMBUP and 78% by UMDHMMTMHP).

Input and output

The input is protein sequence in the FASTA format. Multiple sequences can also be submitted. The output provides information on the residue ranges of TM helices (if any) and the N-terminal orientation (Inside or Outside of membrane if the protein is a TMH protein) for every protein submitted. The output is now reported in a table format for easy understanding. A graphical interface will be built in near future for visualizing the TM region. Sample input and output with detailed line-to-line explanations are available online.

SPARKS 2.0 AND SP3

Fold recognition refers to recognition of structural similarity of two proteins with or without significant sequence identity. One way to detect structural similarity is to identify remote sequence homology via sequence comparison. Advances have been made from the pairwise to multiple sequence comparison, from sequence-to-sequence, sequence-to-profile to profile-to-profile comparison. Another way to detect structural similarity is via sequence-to-structure threading. More recent works attempt to optimally combine the sequence and structure information for a more accurate/sensitive fold recognition. For a recent review, see Ref. (21). Both fold recognition servers SPARKS 2.0 (2) and SP3 (3) belong to the profile-based methods that provide sequence to structure alignment based on the sequence as well as the structure information of templates. SPARKS 2.0 and SP3 differ in how structural information is integrated with the sequence profile of templates. The former uses a sophisticated knowledge-based, single-body score that includes torsion, contact energy and surface-accessible potentials. The structure score is calculated by threading the query sequence into template structure. The latter builds two separate sequence profiles from the sequence and structure of a template. The structure-derived sequence profile was derived from depth-dependent structural alignment of the fragments in the template structure with the fragments in a fragment library. SPARKS 2.0 an upgraded version of SPARKS (2), takes the methods for parameter optimization, dynamic programming and template ranking from SP3 (3). Both SPARKS 2.0 and SP3 automatically make a weekly update for template and sequence libraries, i.e. based on new releases from the NCBI (sequences) and PDB (structures), respectively. Testing on various benchmarks including LiveBench (22) indicates that SP3 is slightly more accurate than SPARKS 2.0. SPARKS 2.0 and SP3 are the two best servers for comparative modeling targets and are among the top single-method servers for all targets in the CASP 6 meeting that assessed 49 automatic webservers (). The input for both SPARKS 2.0 and SP3 is the query sequence in the FASTA format and the number of structure models to be built is based on top ranked templates. The structure models are built by MODELLER (23). It usually takes 30 min to a few hours to complete the fold recognition of a sequence (depending on the size of the query protein and the load of the server computer). The output (in html format) contains the links to PSI-BLAST output for sequence profile, PSIPRED output for the secondary structure prediction, the top 10 sequence-to-structure alignments and the structure models (in PDB format) built based on the alignments. The significance of the sequence-to-structure alignment is indicated by the Z-score for each alignment. An alignment is significant if Z-score is >5.6 for SPARKS 2.0 and >6.3 for SP3. The thresholds were based on LiveBench 8 (22) for predicted models with MaxSub score (24) >0.01 when compared to their respective native structures. The output is now reported in a table format for easy understanding. Sample input and output with detailed line-to-line explanations are available online.

DFIRE ENERGY-BASED SERVERS

One bottleneck to the solution of the problems of how proteins fold, bind and function is the lack of an accurate energy function. The energy functions that are currently used by the computational biology community are obtained through either a physical-based (25) or a ‘bioinformatics-based’ statistical approach (26). Statistical energy functions are easy to produce and have been proven effective in many applications. Our group developed an all-atom statistical potential based on a new reference state named Distance-scaled, Finite, Ideal-gas REference (DFIRE). The DFIRE-based energy function has been successfully applied to structure (4) and docking selections (6), loop scoring (5), prediction of mutation-induced change in stability (4), and binding affinity of protein–protein (peptide) (6), protein–ligand (7) and protein–DNA complexes (7). These applications resulted in several servers: DMONOMER and DLOOP for scoring protein monomer and loop conformations, respectively; DMUTANT for predicting mutant stability; DCOMPLEX and DDNA for predicting binding affinities of protein–protein/peptide complexes and those of protein–DNA complexes, respectively. Comparisons between the DFIRE energy function and other knowledge-based or physical-based energy functions were made. For example, the DFIRE energy function was found to be comparable in accuracy to some physical-based energy functions equipped with various state-of-the-art solvation models [illustrated in loop selection (5)] or empirical energy functions with many adjustable terms [illustrated in docking (6) and prediction of protein–ligand binding affinities (7)]. The usefulness of the DFIRE energy-based servers was also independently verified in predicting protein stability of arc repressor mutants by using our webserver (27). The input for DMONOMER, DCOMPLEX and DDNA is the atomic coordinates file in PDB format and the chain ID, while DLOOP needs additional input for loop location. The outputs for these four servers are corresponding DFIRE energy scores and/or binding affinities. DCOMPLEX also gives an indication whether input complex is a genuine homodimer or crystal artifact. Inputs for DMUTANT is structure file, Chain ID and residue position. The output is the stability change due to the mutation of a specified residue into 19 other residues. Note that the binding affinities predicted by DCOMPLEX and DDNA were shifted and/or scaled based on test sets used in publication. Sample input and output with detailed explanations are available online for each server.

TCD

Our group developed a parameter called total contact distance (TCD) to predict folding rates of small two-state proteins (8). This parameter was built on the observation that either contact order (CO) or long-range order (LRO) parameter has a significant correlation with the logarithms of folding rates (28,29). The TCD web-server takes the inputs of the structure file, chain ID and residue range of interest for a specific protein. Its output is the calculated value of TCD as well as the predicted folding rate. The auxiliary TCD transition-state server presents the predicted TCD, the approximate size of the folding transition state of a given protein (9).

DOGMA

Proteins are made of functional domains. One effective method to uncover the function of proteins on a genomic scale is by analyzing the network graph of domain–domain interactions (30). A domain graph consists of all domains found in a given proteome. Each vertex (node) represents a distinct domain and two vertices are linked by an edge if they occur together in at least one protein. DOGMA is an online server implementing CADO (Comparative Analysis of Protein Domain Organization) algorithms (31) and applying it in the comparative analysis of domain graph between plant and other 55 organisms (9 eukaryote, 30 bacteria and 16 archae) (10). The input includes name(s) of Pfam domain(s) (32) and organism(s) to be compared with plant (taken Arabidopsis as representative). Depending on the option chosen, output can be domain graph, shortest path between two given domains, phylogentic profile, and others in both comparative and graphical format. Although the original paper is about comparison between plant and other proteomes, DOGMA could be used to analyze any one against other 55 proteomes.

29 in total

1. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors: A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal: J Mol Biol Date: 2001-01-19 Impact factor: 5.469

2. MPtopo: A database of membrane protein topology.

Authors: S Jayasinghe; K Hristova; S H White
Journal: Protein Sci Date: 2001-02 Impact factor: 6.725

Review 3. Comparative protein structure modeling of genes and genomes.

Authors: M A Martí-Renom; A C Stuart; A Fiser; R Sánchez; F Melo; A Sali
Journal: Annu Rev Biophys Biomol Struct Date: 2000

4. Comparison between long-range interactions and contact order in determining the folding rate of two-state proteins: application of long-range order to folding rate prediction.

Authors: M M Gromiha; S Selvaraj
Journal: J Mol Biol Date: 2001-06-29 Impact factor: 5.469

5. Folding rate prediction using total contact distance.

Authors: Hongyi Zhou; Yaoqi Zhou
Journal: Biophys J Date: 2002-01 Impact factor: 4.033

6. Comparison of helix interactions in membrane and soluble alpha-bundle proteins.

Authors: Markus Eilers; Ashish B Patel; Wei Liu; Steven O Smith
Journal: Biophys J Date: 2002-05 Impact factor: 4.033

7. Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction.

Authors: Hongyi Zhou; Yaoqi Zhou
Journal: Protein Sci Date: 2002-11 Impact factor: 6.725

8. Transmembrane helix predictions revisited.

Authors: Chien Peter Chen; Andrew Kernytsky; Burkhard Rost
Journal: Protein Sci Date: 2002-12 Impact factor: 6.725

9. Static benchmarking of membrane helix predictions.

Authors: Andrew Kernytsky; Burkhard Rost
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

Review 10. Fold recognition methods.

Authors: Adam Godzik
Journal: Methods Biochem Anal Date: 2003

5 in total

1. Vaccinia virus virulence factor N1L is a novel promising target for antiviral therapeutic intervention.

Authors: Anton V Cheltsov; Mika Aoyagi; Alexander Aleshin; Eric Chi-Wang Yu; Taylor Gilliland; Dayong Zhai; Andrey A Bobkov; John C Reed; Robert C Liddington; Ruben Abagyan
Journal: J Med Chem Date: 2010-05-27 Impact factor: 7.446

2. Stabilizing proteins from sequence statistics: the interplay of conservation and correlation in triosephosphate isomerase stability.

Authors: Brandon J Sullivan; Tran Nguyen; Venuka Durani; Deepti Mathur; Samantha Rojas; Miriam Thomas; Trixy Syu; Thomas J Magliery
Journal: J Mol Biol Date: 2012-05-01 Impact factor: 5.469