| Literature DB >> 32164553 |
Renzo Angles1,2, Mauricio Arenas-Salinas3,4, Roberto García5,4, Jose Antonio Reyes-Suarez3,4, Ehmke Pohl6,7.
Abstract
BACKGROUND: In the field of protein engineering and biotechnology, the discovery and characterization of structural patterns is highly relevant as these patterns can give fundamental insights into protein-ligand interaction and protein function. This paper presents GSP4PDB, a bioinformatics web tool that enables the user to visualize, search and explore protein-ligand structural patterns within the entire Protein Data Bank.Entities:
Keywords: Big data; PDB; Protein-ligand interaction; Structural patterns
Mesh:
Substances:
Year: 2020 PMID: 32164553 PMCID: PMC7068854 DOI: 10.1186/s12859-020-3352-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Three-dimensional representation of the Zinc finger pattern characteristic of the Cys2His2 type. Four residues (Cys107, Cys112, His125 and His129) coordinate to the zinc ion (cyan ball)
Fig. 2A schematic representation of a Zinc Finger found in PROSITE
Fig. 3Graph-based structural pattern for a GATA-type zinc finger
Fig. 4Structure of the relational database used by GSP4PDB. For each table we show table’s name, rows number, attribute and a sample data row. Primary keys and foreign keys are marked with [↓] and [↑] respectively. Indexed attributes are marked with [Δ]
Fig. 5Components of the GSP4PDB web interface: (Top) Navigation bar and Design area; (Middle) Output area in Tabular view mode; (Bottom) Output area in Gallery view mode
Fig. 6Test patterns related to the Cys2His2 zinc finger. Search results: (a) 55,740 hits in 2,407 proteins; (b) 2,354 hits in 1,006 proteins; (c) and (d) 630 hits in 343 proteins; (e) 4 hits in 4 proteins
Example of summarized data calculated over the results of the Cys2His2 pattern shown in Fig. 6d
| Keywords | Hits |
|---|---|
| TRANSCRIPTION (188) | 409 |
| TRANSCRIPTION/DNA (185) | |
| TRANSCRIPTION REGULATOR/DNA (14) | |
| TRANSCRIPTION, METAL BINDING PROTEIN (2) | |
| TRANSCRIPTION FACTOR/DNA (9) | |
| TRANSCRIPTION REGULATION (7) | |
| TRANSCRIPTION REGULATOR (2) | |
| TRANSCRIPTION/RNA (2) | |
| DNA BINDING PROTEIN (23) | 109 |
| DNA-BINDING PROTEIN (4) | |
| DNA BINDING PROTEIN/DNA (79) | |
| DNA BINDING PROTEIN/RNA/DNA (3) | |
| METAL BINDING PROTEIN (21) | 39 |
| METAL BINDING PROTEIN/DNA (10) | |
| DNA/METAL BINDING PROTEIN (3) | |
| TRANSCRIPTION, METAL BINDING PROTEIN (2) | |
| NUCLEAR PROTEIN/METAL BINDING PROTEIN (3) | |
| TRANSFERASE (2) | 21 |
| TRANSFERASE/DNA (19) | |
| GENE REGULATION (10) | 15 |
| GENE REGULATION/DNA (5) | |
| ZINC FINGER (3) | 9 |
| ZINC FINGER DNA BINDING DOMAIN (6) | |
| RNA BINDING PROTEIN (6) | 9 |
| RNA-BINDING PROTEIN/RNA (2) | |
| RNA BINDING PROTEIN/RNA (1) | |
| HYDROLASE/DNA (1) | 6 |
| HYDROLASE/DNA/RNA (5) | |
| UNKNOWN FUNCTION | 5 |
| PROTEIN BINDING (2) | 2 |
| CELL CYCLE (2) | 2 |
| TRANSLATION REGULATOR (1) | 2 |
| TRANSLATION (1) | |
| SPLICING (2) | 2 |
| LIGASE (1) | 1 |
| VIRUS (1) | 1 |
CATH information about the solutions of the Cys2His2 pattern shown in Fig. 6d
| Class | Architecture | Topology/fold | Homologous superfamily | Hits | CATH code description |
|---|---|---|---|---|---|
| 3 | - | - | - | 300 | Alpha Beta |
| 3 | 30 | - | - | 300 | 2-Layer Sandwich |
| 3 | 30 | 160 | - | 293 | Double Stranded RNA Binding Domain |
| 3 | 30 | 160 | 60 | 293 | Classic Zinc Finger |
| 3 | 30 | 428 | - | 4 | HIT family, subunit A |
| 3 | 30 | 428 | 10 | 4 | HIT-like |
| 3 | 30 | 40 | - | 3 | Herpes Virus-1 |
| 3 | 30 | 40 | 130 | 2 | Herpes Virus-1 |
| 3 | 30 | 40 | 200 | 1 | Herpes Virus-1 |
| 2 | - | - | - | 3 | Mainly Beta |
| 2 | 170 | - | - | 1 | Beta Complex |
| 2 | 170 | 270 | - | 1 | Beta-clip-like |
| 2 | 170 | 270 | 10 | 1 | SET domain |
| 2 | 60 | - | - | 1 | Sandwich |
| 2 | 60 | 40 | - | 1 | Immunoglobulin-like |
| 2 | 60 | 40 | 10 | 1 | Immunoglobulins |
| 2 | 30 | - | - | 1 | Roll |
| 2 | 30 | 170 | - | 1 | Ribosomal Protein L24e; Chain: T; |
| 2 | 30 | 170 | 10 | 1 | Ribosomal Protein L24e; Chain: T; |
| 1 | - | - | - | 1 | Mainly Alpha |
| 1 | 10 | - | - | 1 | Orthogonal bundle |
| 1 | 10 | 10 | - | 1 | Arc Repressor Mutant, subunit A |
| 1 | 10 | 10 | 790 | 1 | Arc Repressor Mutant, subunit A |
| No value | No value | No value | No value | 326 |
Average distances for the solutions of the Cys2His2 pattern shown in Fig. 6d
| CATH code | Average distance | Hits | |||
|---|---|---|---|---|---|
| C 1-Zn | C 2-Zn | H 1-Zn | H 2-Zn | ||
| 3.30.160.60 | 2.32 | 2.28 | 2.09 | 2.12 | 293 |
| 3.30.428.10 | 2.35 | 2.23 | 2.02 | 2.04 | 4 |
| 3.30.40.130 | 2.13 | 2.13 | 1.99 | 6.96 | 2 |
| 3.30.40.200 | 2.20 | 2.38 | 2.10 | 2.18 | 1 |
| 2.170.270.10 | 2.29 | 2.28 | 2.12 | 2.12 | 1 |
| 2.60.40.10 | 2.61 | 2.51 | 2.39 | 2.41 | 1 |
| 2.30.170.10 | 4.39 | 4.01 | 5.11 | 6.49 | 1 |
| 1.10.10.790 | 2.52 | 2.66 | 2.36 | 2.38 | 1 |
| No value | 2.33 | 2.25 | 2.07 | 2.08 | 326 |
This table shows the average distances for interactions between ligand Zn and the amino acids (Cys 1, Cys 2, His 1 and His 2), grouped by CATH code
Fig. 7Graph-based structural patterns for six classes of zinc fingers
Six classes of zinc fingers used in case study 2 (C2H2 classical, C2H2 variation, THAP, C2HC, Fungal, CCHHC)
| Zinc Class | Textual Pattern (PROSITE convention) | Hits | Time (s) |
|---|---|---|---|
| C2H2-c | C-x(2,4)-C-x(12)-H-x(2,6)-H | 630 | 1.72 |
| C2H2-v | C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H | 554 | 2.89 |
| THAP | C-x(2,4)-C-x(35,50)-C-x(2)-H | 36 | 1.64 |
| C2HC | C-x(5)-C-x(n)-H-x(6)-C | 6 | 0.83 |
| Fungal | C-x(2)-C-x(6)-C-x(5,12)-C-x(2)-C-x(6,8)-C | 28 | 0.87 |
| CCHHC | C-P-x(1)-P-G-C-x(1)-G-x(1)-G-H-x(7)-H-R-x(4)-C | 1 | 1.14 |
For each pattern we present the number of results (hits) and the computation time (in seconds)
Summary of the results of case study 3: Sub-patterns of the Cys2His2 zinc finger
| Gap | Protein | Cath Code (0 = No value) | AVG Distance | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| G1 | G2 | G3 | PDBs | Classification | Organism | C | A | T | H | ZN-Cys1 | ZN-Cys2 | ZN-His1 | ZN-His2 |
| 2 | 12 | 2 | 4 | TRANSCRIPTION (1), | HOMO SAPIENS (4) | 0(2) | 0(2) | 0(2) | 0(2) | 2.25 | 2.31 | 2.07 | 5.49 |
| METAL BINDING | 3(1) | 30(1) | 40(1) | 200(1) | |||||||||
| PROTEIN/DNA (1), | 3(1) | 30(1) | 160(1) | 60(1) | |||||||||
| TRANSCRIPTION/DNA(1), | |||||||||||||
| ... | |||||||||||||
| 2 | 12 | 3 | 4 | TRANSCRIPTION (139), | HOMO SAPIENS (261), | 0(198) | 0(198) | 0(198) | 0(198) | 2.33 | 2.26 | 2.08 | 2.07 |
| TRANSCRIPTION/DNA (91), | UNDEFINED (50), | 2(1) | 60(1) | 40(1) | 10(1) | ||||||||
| GENE REGULATION/DNA (10), | MUS MUSCULUS (40), | 3(172) | 30(172) | 160(172) | 60(172) | ||||||||
| UNKNOWN FUNCTION (5), | MUS (10), | 3(4) | 30(4) | 428(4) | 10(4) | ||||||||
| ... | ... | ||||||||||||
| 2 | 12 | 4 | 97 | TRANSCRIPTION (29), | MUS MUSCULUS (6), | 0(66) | 0(66) | 0(66) | 0 (66) | 2.30 | 2.24 | 2.05 | 2.05 |
| TRANSCRIPTION/DNA (22), | HOMO SAPIENS (70), | 2(1) | 170(1) | 270(1) | 10(1) | ||||||||
| METAL BINDING PROTEIN (7), | UNDEFINED (11), | 3(30) | 30(30) | 160(30) | 60(30) | ||||||||
| ... | ... | ||||||||||||
| 2 | 12 | 5 | 15 | TRANSCRIPTION (15), | HOMO SAPIENS (9), | 0(4) | 0(4) | 0(4) | 0(4) | 2.32 | 2.29 | 2.37 | 2.60 |
| RNA BINDING | MUS MUSCULUS (1), | 1(1) | 10(1) | 10(1) | 790(1) | ||||||||
| PROTEIN RNA (1), | XENOPUS LAEVIS (2), | 3(10) | 30(10) | 160(10) | 60(10) | ||||||||
| ... | ... | ||||||||||||
| 3 | 12 | 4 | 1 | METAL BINDING PROTEIN (1) | HOMO SAPIENS (1) | 0(1) | 0(1) | 0(1) | 0(1) | 6.00 | 1.72 | 1.94 | 2.32 |
| 3 | 12 | 5 | 2 | RNA BINDING PROTEIN (69), | XENOPUS LAEVIS (1), | 2(1) | 30(1) | 170(1) | 10(1) | 5.60 | 3.15 | 3.60 | 4.29 |
| METAL BINDING PROTEIN (1) | SYNECHOCOCCUS | 3(1) | 30(1) | 160(1) | 60(1) | ||||||||
| ELONGATUS (1) | |||||||||||||
| 4 | 12 | 3 | 117 | TRANSCRIPTION DNA (69), | HOMO SAPIENS (63), | 0(42) | 0(42) | 0(42) | 0(42) | 2.28 | 2.30 | 2.07 | 2.04 |
| TRANSCRIPTION | MUS MUSCULUS (19), | 3(75) | 30(75) | 160(75) | 60(75) | ||||||||
| FACTOR/DNA (6), | ESCHERICHIA COLI (2), | ||||||||||||
| ... | ... | ||||||||||||
| 4 | 12 | 4 | 8 | METAL BINDING PROTEIN (1), | HOMO SAPIENS (3), | 0(4) | 0(4) | 0(4) | 0(4) | 2.26 | 2.26 | 2.07 | 2.14 |
| PROTEIN BINDING (1), | ARABIDOPSIS | 3(4) | 30(4) | 160(4) | 60(4) | ||||||||
| DNA BINDING PROTEIN (2), | THALIANA (1), | ||||||||||||
| ... | ... | ||||||||||||
| 4 | 12 | 6 | 2 | TRANSLATION | HOMO SAPIENS (2) | 3(2) | 30(2) | 40(2) | 130(2) | 2.13 | 2.13 | 1.99 | 6.95 |
| REGULATOR (1), | |||||||||||||
| METAL BINDING PROTEIN (1) | |||||||||||||
| 630 | 2.33 | 2.26 | 2.08 | 2.12 | |||||||||
Each row contains information of a sub-pattern, where G1, G2 and G3 indicate the specific sizes for the gaps of the pattern shown in Fig. 7a. For instance, the textual representation of the first sub-pattern is C-X(2)-C-X(12)-H-X(2)-H
Fig. 8Types of charts used in protein structure visualization: (a) Molecular structure represented in JSME; (b) Interaction diagram used by LigPlot+; (c) Ball & Stick visualization provided by NGL (WebGL); (d) Surface representation supported by iview; (e) Cartoon visualization provided by JSmol