| Literature DB >> 24265221 |
Jaume Bonet1, Joan Planas-Iglesias, Javier Garcia-Garcia, Manuel A Marín-López, Narcis Fernandez-Fuentes, Baldo Oliva.
Abstract
The function of a protein is determined by its three-dimensional structure, which is formed by regular (i.e. β-strands and α-helices) and non-periodic structural units such as loops. Compared to regular structural elements, non-periodic, non-repetitive conformational units enclose a much higher degree of variability--raising difficulties in the identification of regularities, and yet represent an important part of the structure of a protein. Indeed, loops often play a pivotal role in the function of a protein and different aspects of protein folding and dynamics. Therefore, the structural classification of protein loops is an important subject with clear applications in homology modelling, protein structure prediction, protein design (e.g. enzyme design and catalytic loops) and function prediction. ArchDB, the database presented here (freely available at http://sbi.imim.es/archdb), represents such a resource and has been an important asset for the scientific community throughout the years. In this article, we present a completely reworked and updated version of ArchDB. The new version of ArchDB features a novel, fast and user-friendly web-based interface, and a novel graph-based, computationally efficient, clustering algorithm. The current version of ArchDB classifies 149,134 loops in 5739 classes and 9608 subclasses.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24265221 PMCID: PMC3964960 DOI: 10.1093/nar/gkt1189
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Classification pipeline. Two different methods are applied to build the loop clusters (DS and MCL, see Clustering section and Supplementary Material). Shown within brackets in each subclass is the consensus geometry of the clustered loops, i.e. distance, hoist angle, packing angle and meridian angle [see definitions for loop geometry in the supplementary material, FAQs and in (23)].
The different loop types according to their flanking secondary structure
| Type | Type description | All | DS (%) | MCL (%) |
|---|---|---|---|---|
| BK | β-link | 28 418 | 11 777 (41.4) | 6054 (21.3) |
| BN | β-hairpin | 35 616 | 27 995 (78.6) | 22 536 (63.3) |
| EG | β-helix310 | 18 349 | 6950 (37.8) | 8531 (46.5) |
| EH | beta–alpha helix | 42 442 | 23 364 (55.0) | 19 661 (46.3) |
| GE | helix310–beta | 16 478 | 6829 (41.4) | 7731 (46.9) |
| GG | helix310–helix310 | 3498 | 704 (20.1) | 23 (0.6) |
| GH | helix310–α-helix | 16 249 | 7537 (46.9) | 10 141 (62.4) |
| HE | α-helix–β | 42 079 | 24 870 (59.1) | 23 327 (55.4) |
| HG | α-helix–helix310 | 14 472 | 5689 (39.3) | 9133 (63.1) |
| HH | α-helix–α-helix | 35 294 | 18 200 (51.5) | 19 503 (55.2) |
The total number for each type as well as the number of each type that has been classified is also shown.
Figure 2.Distribution of classified loops for each of the clustering method as a function of loop length.
Figure 3.RMSD distribution of the five most populated loop lengths (from 0 to 4) for all loop types. Distribution using DS clustering (top). Distribution using MCL clustering (bottom; this includes two types of subclasses 4S and 4M at length 4). See Supplementary Figures S1 and S2 for a detailed analysis of the RMSD distribution by type-length.