Literature DB >> 17478507

LOMETS: a local meta-threading-server for protein structure prediction.

Abstract

We developed LOMETS, a local threading meta-server, for quick and automated predictions of protein tertiary structures and spatial constraints. Nine state-of-the-art threading programs are installed and run in a local computer cluster, which ensure the quick generation of initial threading alignments compared with traditional remote-server-based meta-servers. Consensus models are generated from the top predictions of the component-threading servers, which are at least 7% more accurate than the best individual servers based on TM-score at a t-test significance level of 0.1%. Moreover, side-chain and C-alpha (C(alpha)) contacts of 42 and 61% accuracy respectively, as well as long- and short-range distant maps, are automatically constructed from the threading alignments. These data can be easily used as constraints to guide the ab initio procedures such as TASSER for further protein tertiary structure modeling. The LOMETS server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/LOMETS.

Entities: Chemical Disease Species

Mesh：

Year: 2007 PMID： 17478507 PMCID： PMC1904280 DOI： 10.1093/nar/gkm251

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The meta-server technique represents one of the major progresses in the field of protein tertiary structure prediction during recent years (1–4). It generates 3D structure predictions by taking the consensus models from a variety of individual (mainly threading/fold-recognition) servers. Various benchmarking and blind test experiments demonstrate that the consensus meta-server predictions outperform the best individual threading server (5,6). There are, however, several drawbacks in the current meta-servers. First, all the meta-servers, including 3D-Jury (2) and GeneSilico (4), take the initial threading inputs from remote computer servers installed in other laboratories. Because of the differences in the available computer resources among different laboratories, it is difficult to quickly collect all the threading results from the individual servers, which influences its usefulness in the large-scale protein structure prediction (7,8). Especially, some remote individual servers can be occasionally shut down or become not available. In the 3D-Jury meta-server, for example, there was only one server from FFAS03 (9) that was available during the CASP7 season. The absence of sufficient initial threading inputs will influence the performance of the final meta-server results. The second drawback of the current meta-servers is the instability of the algorithms of the remote servers. To achieve the best performance, the meta-servers need to balance various cutoff parameters for the selection and combination of the final models. This requires careful tuning and training of the meta-server algorithms based on all the individual servers. However, the inconsistent updating and modifications of the remote individual servers make the development of a steady and robust meta-server algorithm difficult. In this work, we developed a new meta-threading-server, LOMETS, where all nine individual threading servers are installed locally. This will allow us to control and tune our meta-server algorithms in a consistent manner, and make the users able to obtain the comprehensive predictions of all servers quickly. In addition to the construction of the best possible 3D models, the LOMETS server also provides the Cα and side-chain contact and distance map predictions, combined from all threading alignments. These constraints can be used to guide the structure construction procedures such as MODELLER (10), ROSETTA (11) and TASSER (12) for generating protein tertiary models.

METHODS

Component threading programs in LOMETS

LOMETS server takes predictions from nine different servers that represent a diverse set of state-of-the-art threading algorithms, i.e. FUGUE (13), HHSEARCH (14), PROSPECT2 (15), SAM-T02 (16), SPARKS2 (17), SP3 (18), PAINT, PPA-I and PPA-II. The first six programs were copied from other laboratories and the last three developed in our own lab. All the nine servers are installed and run in our local computer cluster with template libraries updated every week. The algorithms were selected to cover different threading methods. Here, we give a brief introduction of the methods. where Pquery(i, k) is the frequency of the kth amino acid at the ith position of the query sequence when a PSI-BLAST search of the query sequence runs against a non-redundant sequence database (ftp://ftp.ncbi.nih.gov/blast/db/nr.Z) with an E-value cutoff of 0.001; Ltemplate(j, k) is the log-odds profile of template sequence in the PSI-BLAST search; Squery(i) is the secondary structure prediction from PSIPRED (23) for the ith residue of the query sequence and Stemplate(j) the secondary structure assignment by DSSP (24) for the jth residue of the template; δ(Squery(i),Stemplate(j)) equals to 1 if Squery(i) = Stemplate(j) and 0 otherwise. The weight factor c1 is an adjustable parameter for balancing the profile term and the secondary structure matches; the shift constant c2 is introduced to avoid the alignment of unrelated regions in the local alignment (18). The Needleman–Wunsch (20) dynamic programming algorithm is used to find the best match between query and template sequences. A position-dependent gap penalty in the dynamic programming is employed: no gap is allowed inside the secondary structure regions; gap opening (go) and gap extension (ge) penalties apply to other regions; ending gap-penalty is neglected. The four parameters [i.e. c1, c2, in Equation (1), and go, ge of gap penalties in dynamic programming] are decided by trial and error on the ProSup benchmark (25). FUGUE. FUGUE is developed at the Blindell Lab (13). It aligns target sequence profile against template structural profile collected from HOMSTRAD (19). Dynamic programming algorithm (20) is used to find the best sequence–structure match. PROSPECT2. PROSPECT2 (15) is developed at the Xu Lab, which uses a score function including residue mutations, secondary structure propensity, solvent accessibility and pairwise contact potential. A divide-and-conquer searching approach (15) is exploited to generate the global optimization of alignments. SPARKS2 and SP3. Both methods have been developed at the Zhou lab (17,18). In SPARKS2 (17), the authors exploit a sequence profile–profile alignment combined with a single-body knowledge-based statistical potential; in SP3 (18), they use a residue depth-dependent structure profile to replace the single-body potential in the SPARKS2. Both methods use dynamic programming for the sequence–structure alignment search. SAM-T02. SAM-T02 (16) is developed at the Karplus lab, which starts from the PSI-Blast sequence database search (21). Based on the PSI-Blast multiple sequence alignment, a hidden Markov model (HMM) will be constructed in an iterative way, which is then exploited to search through the whole template library by the Viterbi algorithm (22). HHSEARCH. HHSEARCH (14) is developed at the Soding lab, which aligns the profile HMM of target with the profile HMM of templates by maximizing the log-sum-of-odds score. PPA-I. PPA-I is a simple sequence Profile–Profile Alignment approach combined with secondary structure matches. The alignment score between the ith residue of the query sequence and the jth residue of the template structure is defined as PPA-II. PPA-II is also a profile–profile alignment algorithm. The only difference from PPA-I is that the sequence profiles in PPA-II are collected from SAM-T99 sequence alignments (26). Here, we do not use SAM-T02 because we found that PPA-II with SAM-T99 sequence profile generates slightly better alignments as judged by average TM-score. During the construction of sequence profiles, Henikoff weights (27) are used for re-weighting the redundant sequences. PAINT. PAINT is a PAirwise-Interaction-based Threading algorithm similar to RAPTOR (28). There are five terms in PAINT's energy function which account for environment fitness, residue mutation, secondary structure match, pair-wise interactions and gap penalty. A detailed description of the energy terms and the PAINT algorithm can be found in the Supplementary Data. Since the sequence–structure alignment is defined by the integer coefficients (x's) of the energy function, the goal of the PAINT threading is to identify the set of integer coefficients which maximize the total alignment score of Equation (S1). Under the constraint of Equations (S2–S5), x's can be solved by the established integer programming programs of GLPK (http://www.gnu.org/software/glpk). Since the computation of integer programming is time-consuming for big proteins, we take only a subset of template proteins which consist of top 10 templates from each of other eight threading servers. The average CPU time for the alignment of the 80 template proteins is around 5 min. There are two main differences of PAINT and RAPTOR algorithm (28). For the identification of possible alignment positions, only the alignment positions with top 40% energy score are considered for the purpose of reducing the chance of missing possible alignment positions. Second, rather than using SVM in RAPTOR, we have used a simple scaled score of E/Lali for the ranking of alignments, where E is the energy score and Lali is the number of aligned residues after alignment.

Threading model selection

Models in LOMETS are selected from individual servers purely based on consensus, i.e. the structure similarity of the considered model with other threading alignments. For the best performance, 30 models are taken from the top predictions of the nine servers sequentially from PPA-I, SP3, PPA-II, SPARKS, PROSPECT, FUGUE, HHSEARCH, PAINT and SAM-T02, where the order of the servers are based on their performance on independent test runs. The 30 models are taken as following: First, select the first model of PPA-I and then the first model from SP3. This procedure proceeds until all the first models from nine servers are collected. Then, all the second models from nine servers are collected in the same order. The collection process proceeds and stops until 30 models have been reached. During the collection, the templates of very short alignments, i.e. the number of aligned residues is less than a quarter of the query sequence length, are neglected. The consensus score of each (ith) of the 30 models is calculated by the average TM-score (29): We note that, when running the TM-score program with model and model, the TM-score is by default normalized by the length () of the second model (i.e. model). But in Equation (2) TM-score should be uniformly normalized by the query sequence length (L). To do this, one can first run the TM-score program with an option of ‘−d d0’ with to obtain TM-score (). The normalized TM-score can be then obtained by TM - score() / L. Here, purpose of the option ‘−d d0’ in the TM-score program is to assign the new-defined length scale of d0 to the Levitt–Gerstein score (29). Finally, the models are ranked based on 〈 TM - score 〉 , i.e. the models with higher average TM-score to other models are ranked higher.

Spatial constraints

For each protein, threading models are categorized as ‘good’ or ‘bad’ depending on whether the inherent Z-score (the energy in standard deviation units relative to mean) of the alignment is above or below a threshold Z-scorecut. The threshold cutoff is determined by the minimization of the false positive (high Z-score but with low TM-score) and false negative rate (low Z-score but with high TM-score) of each threading program based on an independent benchmark set of 1489 non-redundant proteins (12). For PPA-I, SP3, PPA-II, SPARKS2, PROSPECT2, FUGUE, HHSEARCH, PAINT and SAM-T02, the Z-scorecut are 8.2, 8.0, 7.0, 8.8, 4.0, 6.0, 11.0, 0.5 and 9.5, respectively. If the total number of ‘good’ models is more than nine (i.e. on average at least one ‘good’ model from each server), the target is defined as an ‘Easy’ target; if there is no ‘good’ model at all in all the servers, the target is a ‘Hard’ target; otherwise, it is a ‘Medium’ target. For Easy/Medium/Hard targets, N (=20/30/50) highest confident models are selected from the servers for the next constraint construction. The ‘good’ models and then the ‘bad’ models are taken in a sequential server order as mentioned above until N models are selected. The logic for the decision of N is the following: for ‘Easy’ targets where we have good templates, about top two (good) templates on average are taken from each program while including more templates with bad quality will bring more noises for the good templates. For the ‘Medium’ and ‘Hard’ targets where we do not have good templates and constraints overall, we will take more templates to enhance the consensus information because there are usually some partially correct substructures even in the low rank templates which may be identified by the consensus selections. There are four types of spatial constraints that are collected from the N selected threading alignments: Here d(A, B) was obtained by calculating the average distance of side-chain centers of mass of the contacted residues A and B with at least one pair of heavy atoms in A and B < 4.5 Å in 6379 non-homologous PDB structures. Δ(A, B) is the SD of d(A, B). The data of d(A, B) and Δ(A, B) can be seen at our website http://zhang.bioinformatics.ku.edu/LOMETS/sidechain_contact.txt. In the side-chain contact file of LOMETS server, we list the identities of all the contacts with contact order ⩾5, as well as the confidence score that is defined as the number of occurrences of the contacts divided by the total number of templates that have both residues aligned. Side-chain contacts. A pair of side-chains is considered as contact if the distance between the centers of mass in the aligned templates is below an amino acid specific cutoff: C The Cα-contact file lists the identity of all predicted Cα pairs in contact with contact order ⩾5 and the confidence score. A pair of Cαs is considered as contact if the distance of Cα atoms is below 6 Å. Long-range C This file contains the Cα-distances between i and i + j *10 residues (i = 1, … ,L; j = 1,2, …), which are collected from the top four templates. Short-range C This file contains the average Cα-distances of i and i + j residues (i = 1, … ,L; j = 2, … ,6), taken from all N templates. It includes only local structure information and can be used for guiding the protein-like secondary structure construction.

RESULTS

For the testing of the LOMETS server, we select 620 non-homologous proteins (<25% sequence identity with lengths from 50 to 600) from PDBSELECT (2006 March) (30). A list of the 620 benchmark proteins and the threading results of all nine programs are available at http://zhang.bioinformatics.ku.edu/LOMETS/benchmark.html.

Threading alignment and consensus selections

In Figure 1, we present the threading results of the nine individual servers on the 620 benchmark proteins, where all homologous templates with sequence identity to targets >30% have been removed from the template library. Since all servers run locally, we could obtain the threading results quickly and the average CPU time for one target is less than 20 min in our computer cluster when we run them at nine nodes in parallel. There is an obvious correlation between the TM-score and the Z-score of each server. We also show the Z-score cutoff in each server in the plot. If we use TM-score ⩾0.5 (or <0.5) to define a correct (or wrong) threading model, the false negative and false positive rates of the Z-score cutoffs are: 0.0444 and 0.0622 (for PPA-I), 0.0515 and 0.0282 (for SP3), 0.0359 and 0.0597 (for PPA-II), 0.0829 and 0.0045 (for SPARKS2), 0.0602 and 0.0376 (for PROSPECT2), 0.0183 and 0.0447 (for FUGUE), 0.0339 and 0.0733 (for HHSEARCH), 0.0219 and 0.0193 (for PAINT) and 0.0154 and 0.0831 (for SAM-T02).

Figure 1.

TM-score of threading alignments of nine component servers on 620 non-homologous proteins versus the Z-score, where Z-score is defined as the deviation of the inherent raw score from mean divided by the SD. The vertical line in each box indicates a Z-score cutoff to distinguish ‘bad’ and ‘good’ predictions. In Table 1, we list the average TM-score, RMSD, and alignment coverage of all threading programs on the 620 proteins. Below the values, we also list the average TM-score and RMSD of the full-length models built by MODELLER v8.2 (10), where external constraints from LOMETS are incorporated. We found that MODELLER generates slightly better results when using the LOMETS spatial constraints than running MODELLER by default. Based on the average TM-score, the improvement is ∼0.8%. Except for the external constraint file, MODELLER has an option to include multiple templates where MODELLER extracts constraints from multiple templates by itself. By trial and error, we found that for ‘Easy’ targets the MODELLER program using up to five consensus templates (0.75

Table 1.

Summary of component-threading programs and the meta-server selections

Threading servers or meta-servers	TM-score of threading alignments (MODELLER models)		RMSD (Å) of aligned residues (MODELLER models)		Coverage^a of threading alignments

	First model	Best in top five models	First model	Best in top five models	First model	Best in top five models
PPA-I	0.4001 (0.4117)	0.4389 (0.4531)	10.11 (16.66)	9.13 (14.02)	0.831	0.846
SP3	0.3991 (0.4138)	0.4391 (0.4551)	10.50 (13.86)	9.62 (12.83)	0.858	0.867
PPA-II	0.3900 (0.4076)	0.4306 (0.4512)	10.72 (14.89)	9.40 (13.02)	0.837	0.847
SPARKS2	0.3855 (0.3973)	0.4283 (0.4441)	11.62 (13.60)	10.03 (12.23)	0.895	0.893
PROSPECT2	0.3793 (0.3914)	0.4245 (0.4384)	12.19 (13.01)	10.68 (12.02)	0.903	0.903
FUGUE	0.3580 (0.3721)	0.4038 (0.4173)	10.78 (19.26)	10.30 (15.82)	0.827	0.872
HHSEARCH	0.3635 (0.3827)	0.4016 (0.4224)	6.92 (22.38)	6.44 (19.04)	0.607	0.643
PAINT	0.3558 (0.3758)	0.4045 (0.4210)	10.35 (15.74)	9.86 (14.21)	0.735	0.786
SAM-T0 2	0.3402 (0.3575)	0.3798 (0.3971)	10.19 (21.75)	9.83 (17.53)	0.721	0.777
LOMETS	0.4287 (0.4434)	0.4481 (0.4669)	10.18 (10.99)	9.49 (10.61)	0.890	0.882
PCONS5	0.4117 (0.4272)	0.4434 (0.4628)	10.03 (15.39)	9.14 (13.67)	0.840	0.852

aCoverage = length of aligned residues/length of target sequence.

Summary of component-threading programs and the meta-server selections aCoverage = length of aligned residues/length of target sequence. Since the lengths of MODELLER models are longer than those of threading alignments, the average TM-scores of the full-length MODELLER models are relatively larger than the threading alignments although the topology of the core regions are unchanged. The increment of TM-score ranges from 2.9% (PPA-I) to 5.6% (PAINT) depending on the threading alignment coverage. In general, the smaller the threading alignment coverage is, the bigger increment the TM-score of MODELLER models has, because more residues have been added in the full-length models. Although both MODELLER (10) and I-TASSER (31) make use of consensus restraints from templates in their structure modeling, the structure improvement of I-TASSER models on the templates is much higher. Based on the recent CASP7 experiment, the average TM-score of the models generated by I-TASSER (‘Zhang-Server’) is 16.9% higher than that of the best template (32). There may be two factors contributing to the difference. First, the I-TASSER force field includes a variety of knowledge-based, protein-sequence specific/nonspecific potentials obtained from variant resources (31), which has been optimized using structure decoys. Second, the conformational space of MODELLER is searched using a conjugated gradient algorithm, which is a local minimization method. The advantage of the conjugated gradient method is the quick convergence to the local minimum of an object function. But if the external restraint is different from the initial templates, the method does not guarantee the optimal satisfaction of all the constraints. In contrast, the conformational space in I-TASSER is searched by the parallel Monte Carlo sampling method (33), the goal of which is to identify the lowest free-energy state by global search. But the Monte Carlo simulation of I-TASSER takes much longer CPU time than MODELLER does. Since the major purpose of the LOMETS server is to provide a quick collection of the alignments and restraints from multiple local threading servers, we do not include the I-TASSER simulation here. A publicly available server of the I-TASSER algorithm is provided separately at our website: http://zhang.bioinformatics.ku.edu/I-TASSER. At the bottom of Table 1, we show the result of LOMETS consensus selections. The average TM-score of the first model in LOMETS is 0.4287, ∼7% better than the best individual server (PPA-I). This difference is statistically significant, which is at a 0.1% significance level based on the t-test. The TM-score of the best in top five models of the LOMETS selection is shown at Column 3, which also outperforms the best individual server. The higher TM-score of the LOMETS models demonstrates a better balance of RMSD (Columns 4 and 5) and alignment coverage (Columns 6 and 7) in comparison with that of the individual servers. As a control, we also downloaded the PCONS5, the newest version of PCONS meta-server selection program by Wallner and Elofsson (34), which combines consensus analysis (by LGscore), structural evaluation and inherent score of threading servers. The PCONS5 selection result is listed in the last row of Table 1. The selection of PCONS5 is also better (∼3%) than the best individual server but not better than LOMETS. This result seems to indicate that the consensus analysis, which is the only fact adopted in LOMETS by TM-score analysis, is the most robust factor of meta-server selections.

Spatial constraint predictions

The effect of spatial constraints on the protein structure modeling is a tradeoff of the prediction accuracy (Acc) and the prediction coverage (Cov) (35). For the quantitative evaluation of the Cα and side-chain contact predictions, we define where Ncorr is the number of correctly predicted contacts that are true contacts in native structures based on the same distance cutoff of Equation (3), Npred is the number of total predicted contacts and L is the length of target sequence. In Figure 2a, we show the accuracy of predicted contacts versus relative occurrence frequencies with which the contacts occurred in the models for the 620 testing proteins. Here relative occurrence frequency for a contact is defined as N0/N, where N0 is the number of templates having the contact and N (=20/30/50) the total number of the selected threading templates. It is worth noting that accuracy in Figure 2a is non-cumulative, i.e. the accuracy at frequency f is an average accuracy calculated in [f − 0.05, f + 0.05]. As expected, the more often the contacts occur, the more accurate the contacts are, which indicate that the occurrence frequency can be considered as a confidence score for the contact prediction. In Figure 2b, we show how the prediction coverage is reduced with increasing the relative occurrence frequency.

Figure 2.

(a) Average accuracy of predicted Cα and side-chain contacts versus the relative occurrence frequency of the contacts in the LOMETS threading templates. (b) Coverage of the predicted contacts versus the relative occurrence frequency. For each frequency value (f), the data is calculated as an average within the bin of [f − 0.05, f + 0.05]. As demonstrated in our previous study (35), an accuracy of side-chain contact constraints of >22% has a positive effect on ab initio protein structure modeling. This accuracy value corresponds to the occurrence frequency of ∼0.18 in Figure 2a. In Table 2, we list a summary of contact predictions of Cα and side-chains by LOMETS and its component threading programs with a confidence score ⩾0.18 (Columns 2–5). Here the constraints of a single threading program are collected from the top ten templates. Obviously, the spatial constraints from consensus meta-servers have much higher accuracy than those from individual threading programs.

Table 2.

Summary of constraint predictions by LOMETS and the threading programs (with a relative occurrence frequency ⩾0.18 for contact predictions)

Threading servers or meta-servers		2	3
PPA-I	0.249	1.655	0.431	0.696	1.178	600.5	3.732	1159.5
SP3	0.239	1.713	0.405	0.712	1.220	612.3	3.817	1196.6
PPA-II	0.253	1.527	0.410	0.661	1.216	591.3	3.894	1173.0
SPARKS2	0.223	1.654	0.375	0.659	1.356	629.4	3.804	1203.8
PROSPECT2	0.236	1.599	0.411	0.653	1.219	631.7	3.591	1198.0
FUGUE	0.221	1.185	0.379	0.438	1.586	625.3	3.649	1175.7
HHSEARCH	0.359	0.842	0.528	0.404	1.024	357.1	4.111	743.7
PAINT	0.248	1.174	0.372	0.529	1.267	527.3	4.138	1103.2
SAM-T02	0.227	1.164	0.350	0.534	1.597	520.0	3.923	1019.4
LOMETS	0.421	0.910	0.607	0.405	1.186	632.7	3.455	1193.0

aACCsc: Average accuracy for side-chain center of mass contact predictions.

bCovsc: Average coverage for side-chain center of mass contact predictions.

cAccCα: Average accuracy for Cα atom contact predictions.

dCovCα: Average coverage for Cα atom contact predictions.

eDifshort: Average difference (Å) between native and predicted short-range Cα-distances.

fNoshort: Average number of predicted short-range Cα-distances.

gDiflong: Average difference (Å) between native and the best predicted long-range Cα-distances.

hNolong: Average number of the best predicted long-range Cα-distances.

Summary of constraint predictions by LOMETS and the threading programs (with a relative occurrence frequency ⩾0.18 for contact predictions) aACCsc: Average accuracy for side-chain center of mass contact predictions. bCovsc: Average coverage for side-chain center of mass contact predictions. cAccCα: Average accuracy for Cα atom contact predictions. dCovCα: Average coverage for Cα atom contact predictions. eDifshort: Average difference (Å) between native and predicted short-range Cα-distances. fNoshort: Average number of predicted short-range Cα-distances. gDiflong: Average difference (Å) between native and the best predicted long-range Cα-distances. hNolong: Average number of the best predicted long-range Cα-distances. In Column 6, we present the average differences between the native and the predicted distances for short-range Cα distance maps of |i − j| < 7. Column 7 is the average numbers of the predicted short-range distance constraints. For long-range Cα distance map of |i − j| ⩾ 10, we generate up to four predictions for each pair of residues. The eighth column shows the average error of the best predicted long-range Cα distance pairs and the ninth column gives the average numbers of the long-range distance constraints. Because of the differences in accuracy and coverage of threading alignments, the accuracy and number of distance constraints are different among the threading programs. For example, HHSEARCH has the highest accuracy of short-range distance constraints but the number of short-range distance constraints is the lowest because it has no alignment in many uncertain regions. For a balance of the accuracy and number of distance constraints, the consensus LOMETS has obviously the highest accuracy with a reasonable number of distance constraints on the distance maps. The accuracy of the spatial constraints relies on the quality of the threading templates. In Figure 3, we plot the histogram of the prediction accuracy of constraints for 200 ‘Easy’, 120 ‘Medium’ and 300 ‘Hard’ proteins separately. Obviously, for ‘Easy’ targets, the templates have a better quality of alignments and the accuracy of constraints is higher than that of ‘Medium’ and ‘Hard’ targets (Figure 3a and c). Moreover, the number of aligned residues in ‘Easy’ targets is higher and the alignments by different servers are more consistent, which make the prediction coverage in ‘Easy’ targets also higher than that of ‘Medium’ and ‘Hard’ targets (Figure 3b). Here, because of the fixed small Cα distance cutoff [<6 Å as used in TASSER modeling (12)], the coverage of Cα contacts is lower than that of side-chain center contacts. The average accuracy of Cα is also higher than that of side-chain centers which may be due to the fact that side-chain rotamers have more structure variations.

Figure 3.

The average result of spatial constraint predictions for ‘Easy’, ‘Medium’ and ‘Hard’ targets on 620 non-homologous proteins. (a) Accuracy of Cα and side-chain center contact predictions. (b) Coverage of Cα and side-chain center contact predictions. (c) Prediction error of short-range and the best long-range distance map.

SUMMARY

We have developed a quick and automated meta-server, LOMETS, for protein structure predictions. Different from other on-line meta-servers, all nine component-threading servers are installed and run in our local computer cluster. The local installation of the servers greatly speeds up the coherent generation of initial threading alignments, as well as facilitates the development of a robust and well-tuned meta-server algorithm. The consensus prediction taken from LOMETS servers is at least 7% more accurate than all the individual servers. The difference is also statistically meaningful with a t-test at 0.1% of significance level. The average CPU time for a medium size protein (∼200 residues) is less than 20 min when the programs are run in parallel on nine nodes of our cluster. In addition to the threading alignments, LOMETS also provides highly accurate contact and distance predictions for the query sequences. In our benchmark testing of 620 proteins, the average accuracy of side-chain center contacts is 0.42 with coverage of 91%; the average accuracy of Cα contacts is 0.61 with coverage of 41%. The average errors of the best long- and short-range distance map prediction are 3.5 and 1.2 Å, respectively. These data can be easily used as constraints to guide the tertiary structure modeling procedures such as MODELLER (10), ROBETTA (11), TASSER (12,36). Last but not the least, the template libraries of all nine servers are kept updated every week. We have managed to generate template files in our local computers for SAM-T02, PROSPECT2, SPARKS2, SP3, PPA-I, PPA-II and PAINT. The template library for FUGUE and HHSEARCH are automatically downloaded from the authors’ websites (i.e. ftp://merlin.bioc.cam.ac.uk/pub/software/fugue/data and ftp://ftp.tuebingen.mpg.de/pub/protevo/HHsearch/databases/pdb70_*.hhm.tar.gz), which are also kept updated each week. LOMETS will be open to add new and efficient threading programs when they become available.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

35 in total

1. Protein secondary structure prediction based on position-specific scoring matrices.

Authors: D T Jones
Journal: J Mol Biol Date: 1999-09-17 Impact factor: 5.469

2. Scoring function for automated assessment of protein structure template quality.

Authors: Yang Zhang; Jeffrey Skolnick
Journal: Proteins Date: 2004-12-01

3. Protein homology detection by HMM-HMM comparison.

Authors: Johannes Söding
Journal: Bioinformatics Date: 2004-11-05 Impact factor: 6.937

4. Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments.

Authors: Hongyi Zhou; Yaoqi Zhou
Journal: Proteins Date: 2005-02-01

5. HOMSTRAD: a database of protein structure alignments for homologous families.

Authors: K Mizuguchi; C M Deane; T L Blundell; J P Overington
Journal: Protein Sci Date: 1998-11 Impact factor: 6.725

6. Hidden Markov models for detecting remote protein homologies.

Authors: K Karplus; C Barrett; R Hughey
Journal: Bioinformatics Date: 1998 Impact factor: 6.937

Review 7. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971

8. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions.

Authors: K T Simons; C Kooperberg; E Huang; D Baker
Journal: J Mol Biol Date: 1997-04-25 Impact factor: 5.469

9. Comparative protein modelling by satisfaction of spatial restraints.

Authors: A Sali; T L Blundell
Journal: J Mol Biol Date: 1993-12-05 Impact factor: 5.469

10. Enlarged representative set of protein structures.

Authors: U Hobohm; C Sander
Journal: Protein Sci Date: 1994-03 Impact factor: 6.725

337 in total

1. Improving the physical realism and structural accuracy of protein models by a two-step atomic-level energy minimization.

Authors: Dong Xu; Yang Zhang
Journal: Biophys J Date: 2011-11-15 Impact factor: 4.033

2. Template-based protein structure modeling using TASSER(VMT.).

Authors: Hongyi Zhou; Jeffrey Skolnick
Journal: Proteins Date: 2011-11-22

3. Acyl carrier protein structural classification and normal mode analysis.

Authors: David C Cantu; Michael J Forrester; Katherine Charov; Peter J Reilly
Journal: Protein Sci Date: 2012-03-29 Impact factor: 6.725

4. Retrieving backbone string neighbors provides insights into structural modeling of membrane proteins.

Authors: Jiang-Ming Sun; Tong-Hua Li; Pei-Sheng Cong; Sheng-Nan Tang; Wen-Wei Xiong
Journal: Mol Cell Proteomics Date: 2012-03-13 Impact factor: 5.911

5. Evolution and function of the plant cell wall synthesis-related glycosyltransferase family 8.

Authors: Yanbin Yin; Huiling Chen; Michael G Hahn; Debra Mohnen; Ying Xu
Journal: Plant Physiol Date: 2010-06-03 Impact factor: 8.340

6. Integration of QUARK and I-TASSER for Ab Initio Protein Structure Prediction in CASP11.

Authors: Wenxuan Zhang; Jianyi Yang; Baoji He; Sara Elizabeth Walker; Hongjiu Zhang; Brandon Govindarajoo; Jouko Virtanen; Zhidong Xue; Hong-Bin Shen; Yang Zhang
Journal: Proteins Date: 2015-09-23

7. Structure of the human angiotensin II type 1 (AT1) receptor bound to angiotensin II from multiple chemoselective photoprobe contacts reveals a unique peptide binding mode.

Authors: Dany Fillion; Jérôme Cabana; Gaétan Guillemette; Richard Leduc; Pierre Lavigne; Emanuel Escher
Journal: J Biol Chem Date: 2013-02-05 Impact factor: 5.157

8. An extracellular bacterial pathogen modulates host metabolism to regulate its own sensing and proliferation.

Authors: Moshe Baruch; Ilia Belotserkovsky; Baruch B Hertzog; Miriam Ravins; Eran Dov; Kevin S McIver; Yoann S Le Breton; Yiting Zhou; Catherine Youting Cheng; Catherine Youting Chen; Emanuel Hanski
Journal: Cell Date: 2014-01-16 Impact factor: 41.582

9. In silico and in vivo studies of molecular structures and mechanisms of AtPCS1 protein involved in binding arsenite and/or cadmium in plant cells.

Authors: Noor Nahar; Aminur Rahman; Maria Moś; Tomasz Warzecha; Sibdas Ghosh; Khaled Hossain; Neelu N Nawani; Abul Mandal
Journal: J Mol Model Date: 2014-02-20 Impact factor: 1.810

10. Tying up the Ends: Plasticity in the Recognition of Single-Stranded DNA at Telomeres.

Authors: Neil R Lloyd; Thayne H Dickey; Robert A Hom; Deborah S Wuttke
Journal: Biochemistry Date: 2016-09-15 Impact factor: 3.162