| Literature DB >> 24304899 |
Naomi K Fox1, Steven E Brenner, John-Marc Chandonia.
Abstract
Structural Classification of Proteins-extended (SCOPe, http://scop.berkeley.edu) is a database of protein structural relationships that extends the SCOP database. SCOP is a manually curated ordering of domains from the majority of proteins of known structure in a hierarchy according to structural and evolutionary relationships. Development of the SCOP 1.x series concluded with SCOP 1.75. The ASTRAL compendium provides several databases and tools to aid in the analysis of the protein structures classified in SCOP, particularly through the use of their sequences. SCOPe extends version 1.75 of the SCOP database, using automated curation methods to classify many structures released since SCOP 1.75. We have rigorously benchmarked our automated methods to ensure that they are as accurate as manual curation, though there are many proteins to which our methods cannot be applied. SCOPe is also partially manually curated to correct some errors in SCOP. SCOPe aims to be backward compatible with SCOP, providing the same parseable files and a history of changes between all stable SCOP and SCOPe releases. SCOPe also incorporates and updates the ASTRAL database. The latest release of SCOPe, 2.03, contains 59 514 Protein Data Bank (PDB) entries, increasing the number of structures classified in SCOP by 55% and including more than 65% of the protein structures in the PDB.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24304899 PMCID: PMC3965108 DOI: 10.1093/nar/gkt1240
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
SCOP and SCOPe growth and benchmarking
| Release | Freeze date | Release date | Months to release | Total PDB entries | Total PDB entries classified | New PDB entries used in benchmark | PDB deposition rate per month | Percent of new entries classifiable by current automated method |
|---|---|---|---|---|---|---|---|---|
| SCOP 1.55 | 2001–03 | 2001–07 | 4 | 13 300 | 13 228 | n/a | 258 | n/a |
| SCOP 1.57 | 2001–10 | 2002–01 | 3 | 14 825 | 14 736 | 1508 | 275 | 49 |
| SCOP 1.59 | 2002–03 | 2002–05 | 2 | 16 057 | 15 985 | 1249 | 270 | 47 |
| SCOP 1.61 | 2002–09 | 2002–11 | 2 | 17 498 | 17 411 | 1426 | 304 | 51 |
| SCOP 1.63 | 2003–03 | 2003–06 | 3 | 19 036 | 18 951 | 1540 | 351 | 50 |
| SCOP 1.65 | 2003–08 | 2003–12 | 4 | 20 699 | 20 619 | 1668 | 374 | 51 |
| SCOP 1.67 | 2004–05 | 2005–02 | 9 | 24 131 | 24 036 | 3417 | 436 | 52 |
| SCOP 1.69 | 2004–10 | 2005–07 | 9 | 26 101 | 25 972 | 1936 | 454 | 46 |
| SCOP 1.71 | 2005–01 | 2006–10 | 21 | 27 821 | 27 599 | 1627 | 474 | 45 |
| SCOP 1.73 | 2007–09 | 2007–11 | 2 | 44 169 | 34 494 | 6895 | 593 | 58 |
| SCOP 1.75 | 2009–02 | 2009–06 | 4 | 53 830 | 38 221 | 3727 | 632 | 48 |
| SCOPe 2.01 (formerly 1.75A) | 2012–02 | 2012–03 | 1 | 76 528 | 49 219 | n/a | 775 | n/a |
| SCOPe 2.02 (formerly 1.75B) | 2012–11 | 2013–01 | 2 | 83 643 | 49 674 | n/a | 816 | n/a |
| SCOPe 2.03 (formerly 1.75C) | 2013–08 | 2013–10 | 2 | 90 812 | 59 514 | n/a | n/a | n/a |
The number of new entries added in each release of SCOP that used stable identifiers. For each release, the ‘freeze date’, or date at which no new PDB entries were to be classified in the release, is given. In practice, some entries released just after the freeze date were sometimes included. The total number of PDB entries that contained protein structures, were not obsolete as of the freeze date, or which were included in each release, is given, as well as the number of PDB entries that were included in each release. Release 1.71 was the most recent comprehensive SCOP release (i.e. one in which nearly all PDB entries available prior to the freeze date were classified). The average rate at which PDB entries were deposited each month is also given, measured over the 6 months before and after (if applicable) the freeze date.
Figure 1.Errors identified during benchmarking. We detected errors in 70 manually curated domains by running benchmarking and manually inspecting predicted domains that did not sufficiently match the manually annotated domains. These errors in domain boundaries in multi-domain chains were manually fixed in SCOPe 2.03. We also detected and fixed inconsistencies in 5054 domains that had been predicted and classified with the SCOP 1.73 automated method. We review some of the types of errors detected. (a) The SCOP 1.73 automated method used to predict domain d2p8qa1 had included approximately half the residues in the chain. This was inconsistent with all other manually curated entries in its species-level clade that included the entire chain. (b) A strand of beta sheet was included in the d1tqya2 domain by manual curation. (c) All of chain I from 1oyv had been placed into a single domain. (d) The manually curated domain d1seja2 excluded the first helix in the chain.
Figure 2.Automated curation example. This figure depicts an example of applying the automated method for domain prediction and classification to 1vj5, chain A, released on 2004-04-27. We attempted to automatically classify it into SCOP 1.67, based only on domains defined in SCOP 1.65. 1vj5A has 554 residues, of which residues 2-547 are observed (found in the ATOM records in PDB data). Two significant BLAST hits were found to the classified chain 1ek1A, which has a distinct sequence from 1vj5A but also has 554 residues, of which residues 4-19, 48-66 and 90-544 are observed. The two BLAST hits include residues 2-224 and 226-544 in 1vj5A. The final predicted domains in 1vj5A are 2-225 and 226-547. The manually annotated domains for 1vj5A are 2-223 and 224-547. Since the end of each predicted domain differs from the manually annotated domain by at most 10 residues, this domain prediction is deemed to fall within the error tolerance for validation.