Literature DB >> 18385152

TarO: a target optimisation system for structural biology.

Ian M Overton1, C A Johannes van Niekerk, Lester G Carter, Alice Dawson, David M A Martin, Scott Cameron, Stephen A McMahon, Malcolm F White, William N Hunter, James H Naismith, Geoffrey J Barton.   

Abstract

TarO (http://www.compbio.dundee.ac.uk/taro) offers a single point of reference for key bioinformatics analyses relevant to selecting proteins or domains for study by structural biology techniques. The protein sequence is analysed by 17 algorithms and compared to 8 databases. TarO gathers putative homologues, including orthologues, and then obtains predictions of properties for these sequences including crystallisation propensity, protein disorder and post-translational modifications. Analyses are run on a high-performance computing cluster, the results integrated, stored in a database and accessed through a web-based user interface. Output is in tabulated format and in the form of an annotated multiple sequence alignment (MSA) that may be edited interactively in the program Jalview. TarO also simplifies the gathering of additional annotations via the Distributed Annotation System, both from the MSA in Jalview and through links to Dasty2. Routes to other information gateways are included, for example to relevant pages from UniProt, COG and the Conserved Domains Database. Open access to TarO is available from a guest account with private accounts for academic use available on request. Future development of TarO will include further analysis steps and integration with the Protein Information Management System (PIMS), a sister project in the BBSRC 'Structural Proteomics of Rational Targets' initiative.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18385152      PMCID: PMC2447720          DOI: 10.1093/nar/gkn141

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Target selection for structural biology encompasses a variety of analyses, and may include optimisation of the protein target for successful progress in the structure determination pipeline. The evaluation of putative homologues and/or alternative constructs is a key aspect of the optimisation process (1,2). One useful metric that may be applied to this end is estimated crystallisation propensity (3,4). This approach aims to increase the odds of success in the face of attrition rates that typically exceed 90% in structural genomics consortia (5–7). However, target optimisation is also commonplace as a salvage strategy following difficulties with the originally selected protein. Numerous bioinformatics analyses can be applied during target optimisation, including searching various databases (8–11) and sequence-based prediction of protein properties, such as protein disorder (1). However, the generation, integration and management of results from these analyses are not trivial (1,12). There are many publicly available servers that run individual bioinformatics analysis steps. Websites are also available to provide a single point of access to individual analysis tools, for example Expasy (13), Entrez (14) and OPAL (12). However, target optimisation using these sites is laborious and there is little facility to integrate the results of numerous analyses across many sequences. A greater level of integration over a user-supplied multiple sequence alignment (MSA) is provided by MACSIMS (15), which also propagates annotations by homology inference. However, MACSIMS is not focused on target optimisation and does not generate any ranking of sequences. Also, MACSIMS returns a limited set of annotation types and only annotation that is amenable to display on a MSA is given in a user-friendly format. Servers that focus on target selection are available, such as SGTarget (16), and the more recent XtalPred (17). These provide some integration of data for the user, but are limited in terms of the number of annotation types and the server features. Neither SGTarget nor XtalPred provide an annotated MSA. We have developed a system (TarO) that offers a single point of reference for key target optimisation analyses. TarO features include gathering and annotation of putative orthologues and homologues, searching the protein input against the Protein DataBank (18) with PSIBLAST (19), generation of annotated MSA, and presentation of integrated results to the user. TarO was originally developed for the Scottish Structural Proteomics Facility (SSPF) (www.sspf.ac.uk), and plays a key role in the SSPF bioinformatics platform. To date, TarO has processed more than 720 queries and is used by several different research groups outside the SSPF.

METHODS

Overview of TarO

TarO takes a protein sequence as input, which is used to search for putative orthologues and homologues. The input and associated sequences are analysed in a number of annotation steps, and the results stored in a database. The TarO website (www.compbio.dundee.ac.uk/taro) provides access to results, and integrates the Jalview (20,21) program to visualise complex annotation over a MSA. All analyses are run on a local computer cluster. Figure 1 gives a summary of the processes involved in TarO.
Figure 1.

Overview of TarO Processes. Given a protein sequence, TarO searches the COG database (11) to identify putative orthologues. Any matched COG sequences and the input sequence are then searched against UniRef100 (8) to identify putative homologues. The input, COG and UniRef100 sequences are then subject to a number of annotation steps (detailed in Table 1). The annotated sequences are electronically ranked according to crystallisation propensity score (4) and BLAST (19) expectation value. The TarO website, which incorporates the Jalview (20,21) program, facilitates human interpretation of the data. The final ranking is therefore semi-automated, combining electronic and human interpretation of the data.

Overview of TarO Processes. Given a protein sequence, TarO searches the COG database (11) to identify putative orthologues. Any matched COG sequences and the input sequence are then searched against UniRef100 (8) to identify putative homologues. The input, COG and UniRef100 sequences are then subject to a number of annotation steps (detailed in Table 1). The annotated sequences are electronically ranked according to crystallisation propensity score (4) and BLAST (19) expectation value. The TarO website, which incorporates the Jalview (20,21) program, facilitates human interpretation of the data. The final ranking is therefore semi-automated, combining electronic and human interpretation of the data.
Table 1.

Summary of algorithms and databases included in TarO

Brief descriptionAlgorithm(s)Database(s) searched (as applicable)
Search for orthologuesBLASTP (19)COG, KOG (11)
Search for homologuesPSIBLAST (19)UniRef100 (8)
Search structural genomics targetsBLASTPTargetDB (25)
Search known structuresPSIBLAST, BLASTPPDB (18)
Search domain profilesRPSBLAST (19)Pfam, CDD, COG, KOG, SMART (9–11,26–29)
Multiple sequence alignmentMUSCLE (41)
Protein disorder/order predictionDisembl, RONN, GlobPlot (36–38)
Signal peptide predictionSignalP (33)
Transmembrane region predictionTMHMM2 (45)
Glycosylation site predictionNetOGlyc, NetNGlyc (34, http://www.cbs.dtu.dk/services/NetNGlyc/)
Phosphorylation site predictionNetPhos (35)
Secondary structure predictionJPred (39,40)
Isoelectric point (pI), Molecular weightBioperl-based code (31)
Sequence length, #Met/Cys/His, Hydrophobicity, pI/Hydrophobicity clusterCustom perl code
Extinction coefficientPEPSTATS (32)
Crystallisation propensity predictionParCrys, OB-Score (3,4)
Summary of algorithms and databases included in TarO

Detection and annotation of functionally related sequences

Detection of functionally and structurally similar proteins helps in the selection of sequences that are more amenable to structural studies. Orthologues frequently share substantial functional similarity, and this assumption may be cautiously extended to all homologues (22,23). Part of the assessment of functional relationships involves examination of the patterns of annotation and conserved residues, or ‘functional signatures’, on the sequences. This process is assisted by an annotated MSA constructed from the input sequence and the putative orthologues/homologues. The annotated MSA is displayed in Jalview (20,21). Scores from BLAST (19) sequence alignments also provide a rough metric for estimating functional similarity in TarO. TarO detects putative orthologues by searching the input sequence against COG/KOG (11) with BLASTP (19). Matches for both the orthologue and homologue searches are defined from thresholds selected to infer protein structural similarity (24). In addition, all matches must have BLAST expectation values of 10−3 or better. The top-scoring COG/KOG match forms the basis to infer a COG/KOG orthologue cluster; all sequences in the relevant orthologue cluster are thus assigned as putative orthologues of the input protein. Subsequently, the input sequence as well as any putative orthologues are searched against the UniRef100 (8) database with PSIBLAST (three iterations, default values) (19). The input sequence and any putative orthologues/homologues found are searched against the Protein DataBank (PDB) (18) with PSIBLAST and BLASTP, respectively. The input and associated sequences are also searched against TargetDB (25) with BLASTP, thereby highlighting any similar targets that have been registered by Structural Genomics consortia. The searches of TargetDB and the PDB both use the thresholds for structural similarity (24) and expectation value as described above. RPSBLAST (19) is also used to search all query-associated sequences against the Conserved Domains Database (CDD) (26,27), which includes profiles from Pfam (9,10), SMART (28,29) and COG/KOG (11). RPSBLAST matches to domain profiles are defined by an expectation value threshold of 10−3. Elementary chemical properties [e.g. average GES hydrophobicity (30)] are calculated with custom perl code, Bioperl (31) and PEPSTATS (32). Sequences are assigned to phylogenetic classifications in order to allow for SignalP (33) prediction of signal peptide (default parameters). This classification is based on the data provided by COG/KOG and UniRef100. Where phylogenetic classification is not available, SignalP is run using all of the possible classifications. Only the first 70 amino acids of each sequence are taken as input to SignalP in order to reduce false positives. Additionally, predictions for the input and all associated sequences are obtained for NetOglyc (34), NetPhos (35), RONN (36), Disembl (37), Globplot (38), Jpred (39,40) and NetNglyc (http://www.cbs.dtu.dk/services/NetNGlyc/), with the default settings for each algorithm. It is important to note that NetNglyc and NetOglyc glycosylation predictions should be treated with caution when a signal peptide is not also predicted (34) http://www.cbs.dtu.dk/services/NetNGlyc/. TarO gives a warning when displaying the list of predicted glycosylation sites for a sequence without a predicted signal peptide. The MSA is generated from the input and associated sequences by running MUSCLE (41). Reliably generating a MSA from automatically obtained search results can be difficult, so sequences are only included in the MSA if their BLAST alignment to the input sequence has an expectation value ≤10−20, and if their sequence length is no more than 125% of the input sequence length. Also, sequences are chosen for inclusion into the MSA according to the order of priority: input > putative orthologues > putative homologues. This order is followed until the user-specified maximum number of sequences is reached (default 100), or until all of the query-associated sequences have been examined. We plan further development of the strategy for generating the MSA which will be incorporated into later releases of TarO. TarO also annotates the input and associated sequences with information that is useful through the course of ‘wet-lab’ stages in the structure determination pipeline. The predicted extinction coefficient at 280 nm is calculated by PEPSTATS (32), to assist with protein purification. Counts of the amino acids histidine, cysteine and methionine are given, which may be relevant for protein purification and deriving phases by anomalous scattering approaches. Other information in this category includes molecular weight, sequence length, hydrophobicity and isoelectric point. Table 1 summarises the various algorithms and databases currently employed in TarO.

The TarO database and external database update management

The results of the various analyses run by TarO, including searches of external databases, are parsed with custom perl code and stored in a relational database. The TarO web server queries this database when presenting results to the user. External databases (Table 1) are stored as flat files and searched locally on a high-performance compute cluster as part of the process of running a TarO query. These external databases are updated on a weekly basis with custom scripts based around the ‘wget’ Unix command. As a consequence, the information gathered by TarO is no more than one week old at the time of running a given query. Results associated with a TarO query reflect the information available at the time that the search was performed. The TargetDB database ‘target status’ information is a special case in this regard, because it is regularly updated into the TarO database. Therefore, the TargetDB ‘target status’ displayed in TarO is updated every week for any matched TargetDB sequence, regardless of the date and time at which the TarO query was run. However, all matches between TarO and TargetDB sequences are identified from a search of the TargetDB database available at the time that the TarO query is run. Regular searches of completed TarO queries are not run against any database, partly because a TarO query is not necessarily an active target. However, the option of periodically searching certain databases (e.g. TargetDB, PDB) may be incorporated in a future release.

Usage

Submitting a TarO query

Open access to TarO is available for any user, via a ‘Guest’ area that can be easily accessed from a link on the TarO home page. The ‘New Query’ link in the ‘Guest’ area navigates to a form that will accept TarO queries in ‘FASTA’ format. Queries can be uploaded to the server as a file or pasted into a textbox. There is an input option to specify the maximum number of sequences to include in the MSA (default value is 100). There is also a ‘functional description’ textbox which allows users to more easily identify their submitted queries. Some algorithms do not accept non-standard amino acid characters, and so these are removed from the query sequence input when appropriate. Queries submitted by the ‘Guest’ user are visible to anyone and deleted from the server after a minimum of 8 days. However, free private accounts are available for academic use; see the TarO website (www.compbio.dundee.ac.uk/taro) for further details. We ask that users wait for the results of a submitted query before making a further submission to the server. We estimate that an ‘average’ query will require approximately 100 cpu hours, though these are spread over a compute cluster. Given a typical load on the cluster, throughput is in excess of 70 queries per week and a typical query is completed within 4–12 hr.

Tracking query progress and access to results

Figure 2 shows an example of the query sequence information page, which serves as a hub for each TarO query. Tabulated annotation details for the input sequence are available from this page. Several links are also provided, to allow display of the annotated MSA, access to pages describing putative orthologues/homologues, access to more details for matches to external databases [e.g. TargetDB (25)], and access to gateways such as UniProt (8), Dasty2 (42), COG (11) and CDD (26,27). The query status table on this page summarises the various steps in the annotation process and provides progress information for each annotation step. Each row in the query status table changes colour according to a ‘traffic lights’ system, to reflect progress of the corresponding annotation step. The pages for putative orthologues and homologues provide tabulated annotation details and related links, ranked according to ParCrys (4) crystallisation propensity scores. The ranking scheme also incorporates the estimated similarity of the orthologue/homologue to the input protein sequence, currently based on BLAST expectation values. All TarO pages provide user guidance as context-sensitive help upon mouse over, and further information is provided via links to a help page. The help page also provides an introduction to the TarO system and is accessed from http://www.compbio.dundee.ac.uk/taro/TarO_help.html.
Figure 2.

Query sequence information page. This page serves as a hub for each TarO query. The table at the top has 47 columns and so extends to about three times the width of the figure. This table includes basic sequence statistics, as well as details of the top-scoring match from COG/KOG (11), the PDB (18), TargetDB (25) and UniRef100 (8). Several links are available within in this table, notably to display ranked annotations for putative orthologues and homologues of the input sequence, respectively displayed as the characters ‘O’ and ‘H’. There are also links to relevant pages of the COG/KOG, Dasty2 (42), CDD (26,27) and UniProt (8) websites, as well as links to results of RPSBLAST (19) searches of domain profiles. Clicking on the grey rectangle below this table displays the annotated MSA in the Jalview (20,21) applet (Figure 3). The ‘Query Status’ table allows tracking of the query progress through the various annotation stages, according to a ‘traffic lights’ system. Stages that have started are shown in Amber, Red is used to indicate a failed step, and completed analyses are shown in Green. Inset shows an example ‘Query Status’ table for a query that is in progress. There is extensive context-sensitive help throughout TarO, and the table headings also provide links to the relevant section of the help document. An example query sequence information page is given at http://www.compbio.dundee.ac.uk/taro/cgi-taro/targpipe_display_query_seqs.pl?query=657&funcdesc=Test_Guest1.

Query sequence information page. This page serves as a hub for each TarO query. The table at the top has 47 columns and so extends to about three times the width of the figure. This table includes basic sequence statistics, as well as details of the top-scoring match from COG/KOG (11), the PDB (18), TargetDB (25) and UniRef100 (8). Several links are available within in this table, notably to display ranked annotations for putative orthologues and homologues of the input sequence, respectively displayed as the characters ‘O’ and ‘H’. There are also links to relevant pages of the COG/KOG, Dasty2 (42), CDD (26,27) and UniProt (8) websites, as well as links to results of RPSBLAST (19) searches of domain profiles. Clicking on the grey rectangle below this table displays the annotated MSA in the Jalview (20,21) applet (Figure 3). The ‘Query Status’ table allows tracking of the query progress through the various annotation stages, according to a ‘traffic lights’ system. Stages that have started are shown in Amber, Red is used to indicate a failed step, and completed analyses are shown in Green. Inset shows an example ‘Query Status’ table for a query that is in progress. There is extensive context-sensitive help throughout TarO, and the table headings also provide links to the relevant section of the help document. An example query sequence information page is given at http://www.compbio.dundee.ac.uk/taro/cgi-taro/targpipe_display_query_seqs.pl?query=657&funcdesc=Test_Guest1.
Figure 3.

Visualisation of complex annotation. An annotated MSA is shown, viewed in Jalview (20,21). Sequence identifiers are listed along the left-hand side of the alignment. The different colours on the aligned sequences correspond to different annotation types; for example, lilac corresponds to the overlap of matched Pfam (9,10) and CDD (26,27) domains. Predicted GlobPlot (38) disorder is shown in slate blue; light and dark orange show DISEMBL (37) ‘Hotloops’ and the overlap of DISEMBL ‘Hotloops’/‘REM465’ disorder, respectively. Green shows the overlap of Gloplot and Disembl ‘Hotloops’ disorder. The predicted post-translational modifications (PTMs), phosphorylation (NetPhos (35)) and N-linked glycosylation (NetNglyc http://www.cbs.dtu.dk/services/NetNGlyc/) are respectively shown in red and blue. Jpred (39,40) predicted secondary structure for the input sequence is shown on the line entitled ‘jnetpred’ that runs towards the bottom of the figure. Related annotations are grouped and may be selectively displayed in order to enable visualisation and interpretation of the information. The TarO annotation groupings are viewed inside the Jalview ‘Sequence Features’ box. For example, DISEMBL and GlobPlot disorder are grouped together, whilst the Pfam/CDD domains and RONN (36) disorder are in a separate group. There is also a group for protein disorder predicted by DISEMBL and RONN. From the ‘Sequence Features’ box, the user can change the display of the various groups in order to customise the presence or absence of annotations on the MSA. The order of annotations displayed is also specified within the ‘Sequence Features’ box. For example the annotation layer for PTMs is displayed over the other annotations in this figure. Therefore the slate blue GlobPlot disorder annotation on the sequence region ‘TGGTTG’ is displayed underneath the red predicted phosphorylation site annotation on the second threonine residue of the ‘TGGTTG’ sequence. The row at the bottom of the figure shows the alignment conservation and is automatically calculated by Jalview.

Visualisation of complex annotation. An annotated MSA is shown, viewed in Jalview (20,21). Sequence identifiers are listed along the left-hand side of the alignment. The different colours on the aligned sequences correspond to different annotation types; for example, lilac corresponds to the overlap of matched Pfam (9,10) and CDD (26,27) domains. Predicted GlobPlot (38) disorder is shown in slate blue; light and dark orange show DISEMBL (37) ‘Hotloops’ and the overlap of DISEMBL ‘Hotloops’/‘REM465’ disorder, respectively. Green shows the overlap of Gloplot and Disembl ‘Hotloops’ disorder. The predicted post-translational modifications (PTMs), phosphorylation (NetPhos (35)) and N-linked glycosylation (NetNglyc http://www.cbs.dtu.dk/services/NetNGlyc/) are respectively shown in red and blue. Jpred (39,40) predicted secondary structure for the input sequence is shown on the line entitled ‘jnetpred’ that runs towards the bottom of the figure. Related annotations are grouped and may be selectively displayed in order to enable visualisation and interpretation of the information. The TarO annotation groupings are viewed inside the Jalview ‘Sequence Features’ box. For example, DISEMBL and GlobPlot disorder are grouped together, whilst the Pfam/CDD domains and RONN (36) disorder are in a separate group. There is also a group for protein disorder predicted by DISEMBL and RONN. From the ‘Sequence Features’ box, the user can change the display of the various groups in order to customise the presence or absence of annotations on the MSA. The order of annotations displayed is also specified within the ‘Sequence Features’ box. For example the annotation layer for PTMs is displayed over the other annotations in this figure. Therefore the slate blue GlobPlot disorder annotation on the sequence region ‘TGGTTG’ is displayed underneath the red predicted phosphorylation site annotation on the second threonine residue of the ‘TGGTTG’ sequence. The row at the bottom of the figure shows the alignment conservation and is automatically calculated by Jalview.

DISCUSSION

Structural biology projects are highly variable and so there is not a universally applicable target optimisation strategy. However, certain criteria are generally useful. Target optimisation frequently draws upon overlapping information for the evaluation of both alternative constructs and putative homologues. Although NMR is an important technique for structure determination, as of January 2008 85% of all structures in the PDB (18) had been solved by X-ray crystallography. As a consequence, obtaining crystals is a key stage in most structural biology pipelines. Modifying the construct sequence may influence crystallisation propensity, and alternative homologues may be examined since protein families commonly have members with a wide range of estimated crystallisation propensity (3). The OB-Score (3), ParCrys (4) and Hydrophobicity/pI clustering (43) are all harnessed by TarO to estimate crystallisation propensity, and so guide the evaluation of homologues. Proteins with transmembrane regions or significant disordered sequence are frequently problematic (1,17). Also, posttranslational modifications (PTMs) are commonly associated with protein disorder (44). TarO assists with identification of sequences that are likely to contain these potentially troublesome, but biologically interesting, features. Transmembrane regions are predicted by TMHMM2 (45), whilst protein disorder predictions are obtained from Disembl, GlobPlot and RONN (36–38). Phosphorylation sites, as well as O-linked and N-linked glycosylation are, respectively, predicted by the programs NetPhos (35) NetOglyc (34) and NetNglyc (http://www.cbs.dtu.dk/services/NetNGlyc/). TarO also assists with the identification of protein domain boundaries, facilitated by an annotated MSA that is viewed in Jalview (20,21). The MSA annotations include matched domains from Pfam (9,10) and the conserved domains database (CDD) (26,27), combined with predicted protein disorder. Predicted transmembrane regions, signal peptide [SignalP (33)], PTMs and secondary structure [JPred (39,40)] are also annotated on the MSA. Other useful information associated with the MSA is provided by the Jalview program. For example, Jalview automatically provides a display of residue conservation at each position of the alignment. In addition, Jalview provides the facility to query numerous Distributed Annotation System (46) servers, and to display any returned annotation on the MSA. The various annotations associated with the MSA are useful to assist with the design of optimised constructs and identification of functionally important residues. Building upon this, a likely future development in TarO is the automated design and ranking of optimised construct sequences. Of course, the design of optimised construct sequences may also benefit from information provided by experimental methods such as limited proteolysis (47). Retaining the functional features that originally stimulated interest in the target is an important consideration during target optimisation. For example, removing part of an enzyme's active site might make crystals easier to obtain; although the resultant protein structure would be relatively ineffective for studies of the molecular mechanism of catalysis! The range of functional information provided by TarO aims to assist with identification and comparison of functional regions in protein sequences. A possible future direction is the automated evaluation of sequence features to provide more sophisticated prediction and analysis of the functional conservation for a given protein pair. These predictions could be useful in the context of target optimisation, for example by enabling more advanced protein ranking systems. Different projects have different sets of functional properties that are required to be retained in the optimised target sequence. However, all putative orthologues and homologues currently identified in TarO pass thresholds that aim to preserve a reasonable level of structural similarity (24). As a screening mechanism to avoid duplication of effort, the protein input and associated sequences are searched against the PDB (18) and TargetDB (25). The discovery of a similar structure in the PDB or TargetDB may be sufficient grounds to eliminate a potential target. On the other hand, identification of a known and related structure could be important; this may provide a model for molecular replacement calculations, or inform on components of multi-domain or multi-subunit systems. In summary, TarO enables selection of sequences that are likely to be more amenable to structural studies and share functional similarity with the input sequence. Additionally, TarO provides information relevant for many of the structure determination pipeline stages, including design of optimised constructs. The use of TarO accelerates progress in structural proteomics by efficiently providing bioinformatics data to inform decision-making on the prioritisation and optimisation of potential targets. TarO simplifies the gathering, storage and retrieval of data and so frees up research time to make use of the information and to think creatively. Please cite TarO as well as the underlying algorithms and databases, as appropriate. Active development of TarO is continuing to include further analysis steps, improvements to the user interface, and integration with the Protein Information Management System (PIMS) a sister project in the BBSRC Structural Proteomics of Rational Targets (SPoRT) initiative. We also plan to make available a distribution of the TarO source code. We feel that community interactions with the TarO project can lead to further advancement and dissemination of best practices for target optimisation. Access to TarO is from www.compbio.dundee.ac.uk/taro and we are grateful to receive feedback from users.
  46 in total

1.  Evolution of protein sequences and structures.

Authors:  T C Wood; W R Pearson
Journal:  J Mol Biol       Date:  1999-08-27       Impact factor: 5.469

2.  Twilight zone of protein sequence alignments.

Authors:  B Rost
Journal:  Protein Eng       Date:  1999-02

3.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction.

Authors:  J A Cuff; G J Barton
Journal:  Proteins       Date:  1999-03-01

4.  TargetDB: a target registration database for structural genomics projects.

Authors:  Li Chen; Rose Oughtred; Helen M Berman; John Westbrook
Journal:  Bioinformatics       Date:  2004-05-06       Impact factor: 6.937

5.  MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Authors:  Robert C Edgar
Journal:  Nucleic Acids Res       Date:  2004-03-19       Impact factor: 16.971

6.  ParCrys: a Parzen window density estimation approach to protein crystallization propensity prediction.

Authors:  Ian M Overton; Gianandrea Padovani; Mark A Girolami; Geoffrey J Barton
Journal:  Bioinformatics       Date:  2008-02-19       Impact factor: 6.937

7.  Pfam: multiple sequence alignments and HMM-profiles of protein domains.

Authors:  E L Sonnhammer; S R Eddy; E Birney; A Bateman; R Durbin
Journal:  Nucleic Acids Res       Date:  1998-01-01       Impact factor: 16.971

8.  SMART, a simple modular architecture research tool: identification of signaling domains.

Authors:  J Schultz; F Milpetz; P Bork; C P Ponting
Journal:  Proc Natl Acad Sci U S A       Date:  1998-05-26       Impact factor: 11.205

Review 9.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

10.  Protein biophysical properties that correlate with crystallization success in Thermotoga maritima: maximum clustering strategy for structural genomics.

Authors:  Jaume M Canaves; Rebecca Page; Ian A Wilson; Raymond C Stevens
Journal:  J Mol Biol       Date:  2004-12-03       Impact factor: 5.469

View more
  8 in total

1.  Branching network of proteinaceous filaments within the parasitophorous vacuole of Encephalitozoon cuniculi and Encephalitozoon hellem.

Authors:  Kaya Ghosh; Eddie Nieves; Patrick Keeling; Jean-Francois Pombert; Philipp P Henrich; Ann Cali; Louis M Weiss
Journal:  Infect Immun       Date:  2011-01-10       Impact factor: 3.441

2.  Jalview Version 2--a multiple sequence alignment editor and analysis workbench.

Authors:  Andrew M Waterhouse; James B Procter; David M A Martin; Michèle Clamp; Geoffrey J Barton
Journal:  Bioinformatics       Date:  2009-01-16       Impact factor: 6.937

3.  A critical evaluation of in silico methods for detection of membrane protein intrinsic disorder.

Authors:  Edward E Pryor; Michael C Wiener
Journal:  Biophys J       Date:  2014-04-15       Impact factor: 4.033

4.  The Protein Information Management System (PiMS): a generic tool for any structural biology research laboratory.

Authors:  Chris Morris; Anne Pajon; Susanne L Griffiths; Ed Daniel; Marc Savitsky; Bill Lin; Jonathan M Diprose; Alan Wilter da Silva; Katya Pilicheva; Peter Troshin; Johannes van Niekerk; Neil Isaacs; James Naismith; Colin Nave; Richard Blake; Keith S Wilson; David I Stuart; Kim Henrick; Robert M Esnouf
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2011-03-18

5.  Computational approaches to selecting and optimising targets for structural biology.

Authors:  Ian M Overton; Geoffrey J Barton
Journal:  Methods       Date:  2011-08-27       Impact factor: 3.608

6.  Mechanisms of KCNQ1 channel dysfunction in long QT syndrome involving voltage sensor domain mutations.

Authors:  Hui Huang; Georg Kuenze; Jarrod A Smith; Keenan C Taylor; Amanda M Duran; Arina Hadziselimovic; Jens Meiler; Carlos G Vanoye; Alfred L George; Charles R Sanders
Journal:  Sci Adv       Date:  2018-03-07       Impact factor: 14.136

7.  ANNIE: integrated de novo protein sequence annotation.

Authors:  Hong Sain Ooi; Chia Yee Kwo; Michael Wildpaner; Fernanda L Sirota; Birgit Eisenhaber; Sebastian Maurer-Stroh; Wing Cheong Wong; Alexander Schleiffer; Frank Eisenhaber; Georg Schneider
Journal:  Nucleic Acids Res       Date:  2009-04-23       Impact factor: 16.971

8.  DASMiner: discovering and integrating data from DAS sources.

Authors:  Diogo F T Veiga; Helena F Deus; Caner Akdemir; Ana Tereza R Vasconcelos; Jonas S Almeida
Journal:  BMC Syst Biol       Date:  2009-11-17
  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.