| Literature DB >> 21906678 |
Ian M Overton1, Geoffrey J Barton.
Abstract
Selection of protein targets for study is central to structural biology and may be influenced by numerous factors. A key aim is to maximise returns for effort invested by identifying proteins with the balance of biophysical properties that are conducive to success at all stages (e.g. solubility, crystallisation) in the route towards a high resolution structural model. Selected targets can be optimised through construct design (e.g. to minimise protein disorder), switching to a homologous protein, and selection of experimental methodology (e.g. choice of expression system) to prime for efficient progress through the structural proteomics pipeline. Here we discuss computational techniques in target selection and optimisation, with more detailed focus on tools developed within the Scottish Structural Proteomics Facility (SSPF); namely XANNpred, ParCrys, OB-Score (target selection) and TarO (target optimisation). TarO runs a large number of algorithms, searching for homologues and annotating the pool of possible alternative targets. This pool of putative homologues is presented in a ranked, tabulated format and results are also visualised as an automatically generated and annotated multiple sequence alignment. The target selection algorithms each predict the propensity of a selected protein target to progress through the experimental stages leading to diffracting crystals. This single predictor approach has advantages for target selection, when compared with an approach using two or more predictors that each predict for success at a single experimental stage. The tools described here helped SSPF achieve a high (21%) success rate in progressing cloned targets to diffraction-quality crystals.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21906678 PMCID: PMC3202631 DOI: 10.1016/j.ymeth.2011.08.014
Source DB: PubMed Journal: Methods ISSN: 1046-2023 Impact factor: 3.608
Estimation of protein characteristics useful for target selection and optimisation.
| Protein characteristics | Exemplar algorithms and/or databases |
|---|---|
| Homology relationships | Algorithms: BLAST |
| Databases: eggNOG | |
| Matches to known structures/declared targets | PDB |
| Domains | Algorithms: HMMER |
| Protein interactions | PIPS |
| Disorder/low-complexity sequence | Disembl |
| Signal peptide and transmembrane regions | SignalP |
| Glycosylation sites | NetOGlyc |
| Phosphorylation sites | NetPhos |
| Secondary structure | JPred |
| Surface entropy | SERp |
| Chemical properties: isoelectric point (pI), molecular weight, charge, sequence length, extinction coefficient, #Methionines, #Cysteines, #Histidines, hydrophobicity, protease sites | Bioperl |
| Annotated function | Gene Ontology |
| Overall tractability (selected to diffraction-quality crystals) | XANNPred |
Fig. 1An example target selection pipeline. This figure summarises a target selection project conducted in the SSPF, starting with the Comprehensive Microbial Resource (CMR) database [94] in order to identify tractable targets that were in novel structure space and structurally similar to human proteins. Circles on the left-hand side represent proteins, circles in the middle electronic analysis and rectangles give selection thresholds. The SOFA (specificity of functional annotation) scoring [3] provided an estimate of available functional annotation on the candidate targets. The analyses in this pipeline were run within customised scripts developed at the SSPF. Following manual inspection, targets selected from the ranked lists were analysed using TarO.
Fig. 2Comparison of methods for predicting overall success in the structure determination pipeline. This figure shows receiver operator characteristic (ROC) curves for the methods XANNpred-PDB, PPCpred, XtalPred and OB-Score on a non-redundant set of 150 proteins that were developed as an independent blind test for XANNpred-PDB [16]. Areas under the ROC curve are given in the bottom right-hand corner. XANNpred performs significantly better than the next best algorithm, PPCpred.
Fig. 3Outline of TarO workflow. This figure outlines the major steps involved in the TarO workflow. Protein input sequences provide the starting point for homologue searching. The input and all matched homologues are then annotated in the sequence characterisation step. An initial ranking is automatically provided within the user interface, but human analysis of the presented results is an important step.
Fig. 4Key features of TarO user interface. This figure shows snapshots of several TarO user interface pages. Dashed arrows (red in online figure) indicate navigation by clicking on the relevant links. The TarO guest user ’Home’ page is shown at the top, clicking on the ’New Query’ link circled (red in online figure), navigates to the new query submission form; clicking on the ’Query Results’ link, circled (red in online figure), navigates to the ’Input Sequences’ page for the relevant query. Links on the ’Input Sequences’ page enable navigation to the homologues page (’H’), circled (red in online figure), and display of the multiple sequence alignment. Please note that the tables shown in this figure are truncated, and have many additional results columns.
Fig. 5Annotated multiple sequence alignment. This figure shows a portion of an annotated multiple sequence alignment, visualised with Jalview [85]. The different shades (colours in online figure) on the aligned sequences represent different annotation types. The lightest grey (lilac) corresponds to a Pfam domain. Predicted GlobPlot [60] and Disembl [59] disorder are show in medium greys (slate blue, light/dark orange, green). Predicted post-translational modifications (PTMs), phosphorylation [67] and N-linked glycosylation [90] are respectively shown in dark grey (red) and medium grey (blue). Jpred [61] predicted secondary structure for the input sequence is shown on the line entitled ‘jnetpred’ that runs towards the bottom of the figure. Related annotations are grouped and may be selectively displayed in order to enable visualisation and interpretation of the information. The TarO annotation groupings are viewed inside the Jalview ‘Features Settings’ box. For example, Disembl and GlobPlot disorder are grouped together, whilst Pfam domains and RONN disorder are in a separate group. There is also a group for protein disorder predicted by Disembl and RONN. From the ‘Feature Settings’ box, the user can change the display of the various groups in order to customise the presence or absence of annotations on the MSA. The order of annotations displayed is also specified within the ‘Feature Settings’ box. For example the annotation layer for PTMs is best displayed on top of the other annotations in this figure. Therefore the medium grey (slate blue) GlobPlot disorder annotation on the sequence region ‘TGGTTG’ is displayed underneath the dark grey (red) predicted phosphorylation site annotation on the second threonine residue of the ‘TGGTTG’ sequence. The row at the bottom of the figure shows the alignment conservation and is automatically calculated by Jalview.