| Literature DB >> 22553364 |
Jiangming Sun1, Shengnan Tang, Wenwei Xiong, Peisheng Cong, Tonghua Li.
Abstract
Many studies have demonstrated that shape string is an extremely important structure representation, since it is more complete than the classical secondary structure. The shape string provides detailed information also in the regions denoted random coil. But few services are provided for systematic analysis of protein shape string. To fill this gap, we have developed an accurate shape string predictor based on two innovative technologies: a knowledge-driven sequence alignment and a sequence shape string profile method. The performance on blind test data demonstrates that the proposed method can be used for accurate prediction of protein shape string. The DSP server provides both predicted shape string and sequence shape string profile for each query sequence. Using this information, the users can compare protein structure or display protein evolution in shape string space. The DSP server is available at both http://cheminfo.tongji.edu.cn/dsp/ and its main mirror http://chemcenter.tongji.edu.cn/dsp/.Entities:
Mesh:
Year: 2012 PMID: 22553364 PMCID: PMC3394270 DOI: 10.1093/nar/gks361
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A) The flowchart of the prediction of shape string and (B) sequence alignment with hallmark patterns as seeds. An example of (C) the predicted shape string and (D) the output sequence shape string profile. AA, amino acid; MT, match times; PredSS, predicted shape string; Prob, output probability.
Figure 2.An illustration of consecutive sequence pattern mining.
The shape string composition for 20 amino acids
| A | 0.156 | 0.157 | 0.012 | 0.021 | 0.048 | 0.590 | 0.013 | 0.003 |
| R | 0.220 | 0.134 | 0.017 | 0.013 | 0.060 | 0.526 | 0.026 | 0.003 |
| N | 0.223 | 0.115 | 0.039 | 0.023 | 0.159 | 0.324 | 0.112 | 0.004 |
| D | 0.211 | 0.156 | 0.026 | 0.016 | 0.122 | 0.413 | 0.052 | 0.004 |
| C | 0.308 | 0.192 | 0.039 | 0.010 | 0.064 | 0.361 | 0.023 | 0.003 |
| Q | 0.172 | 0.146 | 0.017 | 0.013 | 0.074 | 0.547 | 0.029 | 0.002 |
| E | 0.164 | 0.132 | 0.011 | 0.013 | 0.057 | 0.601 | 0.020 | 0.002 |
| G | 0.158 | 0.122 | 0.005 | 0.006 | 0.042 | 0.199 | 0.334 | 0.134 |
| H | 0.275 | 0.144 | 0.030 | 0.020 | 0.110 | 0.375 | 0.043 | 0.003 |
| I | 0.416 | 0.113 | 0.003 | 0.005 | 0.039 | 0.422 | 0.002 | 0.001 |
| L | 0.250 | 0.125 | 0.006 | 0.011 | 0.042 | 0.557 | 0.008 | 0.001 |
| K | 0.206 | 0.142 | 0.010 | 0.013 | 0.052 | 0.544 | 0.031 | 0.003 |
| M | 0.242 | 0.146 | 0.012 | 0.013 | 0.049 | 0.521 | 0.016 | 0.002 |
| F | 0.332 | 0.139 | 0.023 | 0.011 | 0.073 | 0.404 | 0.017 | 0.001 |
| P | 0.005 | 0.559 | 0.006 | 0.018 | 0.026 | 0.386 | 0.001 | 0.000 |
| S | 0.261 | 0.203 | 0.013 | 0.008 | 0.052 | 0.439 | 0.020 | 0.004 |
| T | 0.338 | 0.177 | 0.009 | 0.004 | 0.073 | 0.391 | 0.006 | 0.002 |
| W | 0.266 | 0.183 | 0.016 | 0.010 | 0.049 | 0.462 | 0.012 | 0.002 |
| Y | 0.332 | 0.137 | 0.025 | 0.010 | 0.095 | 0.383 | 0.017 | 0.002 |
| V | 0.455 | 0.130 | 0.005 | 0.003 | 0.041 | 0.362 | 0.003 | 0.001 |
Figure 3.Performance comparison on EVA benchmark set.
Figure 4.A sequence logo created by sequence-shape string profile. (It should be noted that letter C is denoted as shape U).