| Literature DB >> 19420060 |
Wei-Cheng Lo1, Che-Yu Lee, Chi-Ching Lee, Ping-Chiang Lyu.
Abstract
iSARST is a web server for efficient protein structural similarity searches. It is a multi-processor, batch-processing and integrated implementation of several structural comparison tools and two database searching methods: SARST for common structural homologs and CPSARST for homologs with circular permutations. iSARST allows users submitting multiple PDB/SCOP entry IDs or an archive file containing many structures. After scanning the target database using SARST/CPSARST, the ordering of hits are refined with conventional structure alignment tools such as FAST, TM-align and SAMO, which are run in a PC cluster. In this way, iSARST achieves a high running speed while preserving the high precision of refinement engines. The final outputs include tables listing co-linear or circularly permuted homologs of the query proteins and a functional summary of the best hits. Superimposed structures can be examined through an interactive and informative visualization tool. iSARST provides the first batch mode structural comparison web service for both co-linear homologs and circular permutants. It can serve as a rapid annotation system for functionally unknown or hypothetical proteins, which are increasing rapidly in this post-genomics era. The server can be accessed at http://sarst.life.nthu.edu.tw/iSARST/.Entities:
Mesh:
Year: 2009 PMID: 19420060 PMCID: PMC2703971 DOI: 10.1093/nar/gkp291
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Flowchart of iSARST. The query structure is first transformed into a structurally meaningful Ramachandran string and then used to screen target database by SARST or CPSARST. In refinement stage, the raw hit list is re-ordered according to the structural similarity scores calculated by accurate structure comparison method like FAST (28), TM-align (29) or SAMO (27). Final outputs of iSARST are tables listing co-linear homologs or circular permutants of the query protein. Structure superimpositions and related inspection tools are provided, too.
Average recall and running time of iSARST over various sizes of hit list
| Hit list size | Avg. recall (%) | Avg. running time with different refinement engines (s) | ||
|---|---|---|---|---|
| FAST | TM-align | SAMO | ||
| 100 | 75.4 | 3.11 | 4.03 | 19.94 |
| 250 | 82.9 | 4.88 | 6.07 | 30.45 |
| 500 | 85.1 | 7.78 | 9.41 | 47.43 |
| 1000 | 87.3 | 13.38 | 15.46 | 77.89 |
| 2500 | 91.0 | 29.69 | 32.47 | 167.15 |
| 5000 | 93.9 | 61.31 | 66.47 | 295.33 |
| 10 000 | 96.8 | 102.46 | 130.45 | 506.21 |
| 25 000 | 99.6 | 242.38 | 273.15 | 1184.95 |
| 34 055 | 100.0 | 320.89 | 364.91 | 1574.95 |
Query and target databases used in these information retrieval experiments are the same as those in (31) and (9). The target database contains 34 055 protein domains collected from SCOP. Eighty processors were recruited to share the calculations. Without this multi-processor system, the running time on a single machine can be approximately 60 times longer. For instance, at 100% recall level, when FAST was applied to align one query to all target proteins, it took 19 003 s in average.
Figure 2.Final output of iSARST. (a) Hit list. This list can be re-ordered according to various indexes and protein functions by clicking column titles. Functions of the top 5 hits are summarized and highlighted in red. Any protein listed here can be re-submitted to perform a new round of search simply by clicking the searching icon. Several filtering and operational parameters are adjustable in this page. (b) Structure inspection tools and a circularly permuted structural alignment. PDB entries 1dglA (the fifth letter is the chain ID) and 1gv9A are lectins from Dioclea grandiflora (40) and protein ERGIC-53 from Rattus norvegicus (41), respectively; they are carbohydrate binding proteins, a large family in which many CP cases have been identified. The natural CP relation between these two proteins can be detected by iSARST, even if their sequence identity is merely ∼10%. Aligned residue pairs are listed in the right frame. The original structure-based sequence alignment made by the refinement engine, e.g. TM align (29) in this case, and the alignment improved by SE (30) are shown in the lower region. The circularized sequence alignment graph in the center is useful to identify CP. In this example, these proteins can be well aligned only when the 127 amino terminal residues of 1DGL are permuted to its carboxyl terminus. The dot matrix plot is drawn in a way that the darkness of a residue pair is in proportion to its score defined in BLOSUM62 (36). In addition, residues aligned by the refinement engine are colored green. When there is a CP relationship, two parallel green lines can be observed. (c) Results of a co-linear structural alignment. To confirm the existence of a CP, one can compare the results made by co-linear and circularly permuted alignments. As shown in this case, these two circular permutants can only be partially aligned in the co-linear mode. The alignment size is much smaller than that in (b). Besides, there are more unaligned buds in the circularized graph and only one green line can be seen in the dot matrix plot.