| Literature DB >> 17407606 |
Daron M Standley1, Hiroyuki Toh, Haruki Nakamura.
Abstract
BACKGROUND: Structure alignment methods offer the possibility of measuring distant evolutionary relationships between proteins that are not visible by sequence-based analysis. However, the question of how structural differences and similarities ought to be quantified in this regard remains open. In this study we construct a training set of sequence-unique CATH and SCOP domains, from which we develop a scoring function that can reliably identify domains with the same CATH topology and SCOP fold classification. The score is implemented in the ASH structure alignment package, for which the source code and a web service are freely available from the PDBj website http://www.pdbj.org/ASH/.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17407606 PMCID: PMC1955748 DOI: 10.1186/1471-2105-8-116
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Structural descriptors
| Term | Definition | Description | Weight | Full TS | Red TS |
| Δ | Relative difference in the sequence lengths Nq and Nt of query and template, respectively | 51.94 | 5.92 | ||
| Δ | Difference in the radii of gyration of the query and template | -0.33 | -0.54 | ||
| Δ | Difference in the relative contact orders of query and template, with the modification that only Cα atoms are used, and the cutoff distance was set to 10 Å | 0.96 | 0.42 | ||
| Δ | Differences in the relative number of helical residues | 3.21 | 0.68 | ||
| Δ | Differences in the relative number of strand residues | 1.71 | 0.76 |
The structural descriptor terms used in the ASH score are listed. Each descriptor is independent of alignment. The optimized weights of each term for the full training set (Full TS) and reduced training set (Red TS) are listed in the last two columns.
Figure 1ROC Curves. The ROC curve for the NER score (black), the new ASH score, and the sequence similarity term (blue) are shown. The new ASH score was evaluated both on the full training set (red), on the reduced training set (orange), and the test set using parameters derived from the reduced training set (green). The area under each curve is indicated in the legend.
CPU usage for geometric test set.
| Program | CPU (s) | CPU/Align | Rel. to FAST | Align Missing |
| Dali | 12895 | 6.2 | 20 | 121 |
| SRUCTAL | 3703 | 1.8 | 5 | 0 |
| GASH | 3197 | 1.5 | 5 | 4 |
| RASH | 1424 | .69 | 2 | 0 |
| FAST | 645 | .31 | 1 | 3 |
The total CPU usage for 5 programs was computed over the geometric test set of 2,071 query-template pairs. The last column lists the number of query-template pairs for which no alignment was obtained by the method.
Figure 2Geometric Analysis. The relationship between average NER score and average SAS score. The average was computed over 2,071 query-template pairs using the geometric test set.