| Literature DB >> 26253692 |
Ian Sillitoe1, Natalie Dawson2, Janet Thornton2, Christine Orengo2.
Abstract
This article presents a historical review of the protein structure classification database CATH. Together with the SCOP database, CATH remains comprehensive and reasonably up-to-date with the now more than 100,000 protein structures in the PDB. We review the expansion of the CATH and SCOP resources to capture predicted domain structures in the genome sequence data and to provide information on the likely functions of proteins mediated by their constituent domains. The establishment of comprehensive function annotation resources has also meant that domain families can be functionally annotated allowing insights into functional divergence and evolution within protein families.Entities:
Keywords: Protein structure; Structure classification
Mesh:
Substances:
Year: 2015 PMID: 26253692 PMCID: PMC4678953 DOI: 10.1016/j.biochi.2015.08.004
Source DB: PubMed Journal: Biochimie ISSN: 0300-9084 Impact factor: 4.079
Fig. 1This figure shows the smallest (left figure) and largest (middle figure) domain structures from the “Nitrogenase molybdenum iron protein domain” CATH superfamily (ID: 3.40.50.1980) and a superposition of all non-redundant structural relatives from that superfamily (non-redundant at 35% sequence identity) (right figure). The superposition shows that structural ‘core’ of related protein structures can remain highly conserved even after the amino acid sequence has changed beyond recognition.
Fig. 2Selected relatives from the HUP superfamily (CATH ID: 3.40.50.620) illustrating the diverse structural embellishments (shown in blue) that have evolved and are embellishing the conserved structural core (shown in pink).
A summary of the terms used in common between the CATH and SCOP structure classification databases.
| CATH | SCOP | Description |
|---|---|---|
| Class | Class | Hierarchy separated by gross structural differences (e.g. secondary structure content) |
| Architecture | – | Similar general organization of secondary structures within 3D space |
| Topology (fold) | Fold | Structural similarity without clear evidence of evolutionary similarity |
| Homologous superfamily | Superfamily | Structural and functional features suggest a common evolutionary origin (often despite low sequence similarity) |
| – | Family | Clusters domains with clear evolutionary relationship (usually including significant sequence similarity) |
| FunFam | – | Clusters domains with functional similarity |
Fig. 3The first three levels of the CATH structure classification hierarchy: Class (based on secondary structure content), Architecture (based on gross spatial arrangement of secondary structures), Topology or Fold (similar folding arrangement of secondary structures).
Fig. 4Plot showing the population of sequences in CATH domain superfamilies. More than half of all known protein domains in the genome sequences come from a small number (<5%) of highly populated superfamilies.
Fig. 5Functional Families (FunFams) in CATH aim to cluster protein domains that all share a specific function. Relationships between FunFams within a superfamily can be visualised in a number of ways including a) clustering according to structural similarity and b) networks according to global sequence similarity.