| Literature DB >> 15608181 |
James Casbon1, Mansoor A S Saqi.
Abstract
S4 is an automatically generated database of multiple structure-based sequence alignments of protein superfamilies in the SCOP database. All structural domains that do not share more than 40% sequence identity as defined by the ASTRAL compendium of protein structures are included. The alignments are constructed using pairwise structural alignments to generate residue equivalences that are then integrated into multiple alignments using sequence alignment tools. We describe the database and give examples showing how the automatically generated S4 alignments compare favourably to hand-crafted alignments. Available at: http://compbio.mds.qmw.ac.uk/S4.html.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15608181 PMCID: PMC539997 DOI: 10.1093/nar/gki043
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Part of the multiple alignment of the four-helical cytokines showing those domains also aligned (16). The grey shading shows the cores of the families: light grey shows the long-chain core, medium grey the short-chain core and dark grey the common core. This dark grey shading should be aligned in four clear blocks. The boxes show parts of the core that are marked as gaps in the reference alignment, this makes visualization easier as all corresponding blocks are of the same length across each sequence. The figure was produced using ALSCRIPT (19).
Comparison of accuracies for PASS2 and S4 on the reference alignments
| PASS2 | S4 | |||
|---|---|---|---|---|
| Alignment | AC | AC | AC | AC |
| Long-chain cytokines | 0.23 | 0 | 0.95 | 0.86 |
| Short-chain cytokines | 0.02 | 0 | 0.38 | 0 |
| All cytokines | 0.15 | 0 | 0.79 | 0 |
| Four-helical cytochromes | 0.95 | 0.9 | 1 | 1 |
Accuracy measures are as described previously (18): AC is the accuracy of the whole alignment, i.e. number of correct positions divided by the length of the alignment; AC is the average alignment accuracy over all possible pairs of sequences in the alignment. The score is only calculated over regions marked as core in the reference alignments. Note, AC is quite ‘brittle’ decaying quickly since, for a position to be correct, all sequences must be aligned correctly in that position.