| Literature DB >> 18315842 |
Rafael Ordög1, Zoltán Szabadka, Vince Grolmusz.
Abstract
BACKGROUND: The fast growing Protein Data Bank contains the three-dimensional description of more than 45000 protein- and nucleic-acid structures today. The large majority of the data in the PDB are measured by X-ray crystallography by thousands of researchers in millions of work-hours. Unfortunately, lots of structural errors, bad labels, missing atoms, falsely identified chains and groups make dificult the automated processing of this treasury of structural biological data.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18315842 PMCID: PMC2259412 DOI: 10.1186/1471-2105-9-S1-S11
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The Delaunay decomposition of the PDB entry 10gs.
Figure 2The triple logarithmic plot of the density of Delaunay regions. A point with coordinates (x, y) on the plot corresponds to all Delaunay regions whose volume is 10(x ± 0.01) and tetrahedrality is 10(y ± 0.01) and the color of the point corresponds to log(z + 1) where z is the number of such regions. The white barplot on the bottom of the image is the same for volume only.
The counts of different types of Delaunay tetrahedra in the test set of 5,757 PDB entries. Tetrahedron C_C_N_O_ (containing the peptide bound of amino acids) turns out to be the most frequent with 19, 463, 268 occurences in our test set. The frequency of other labels decrease exponentially.
| Pattern | Count | Pattern | Count | Pattern | Count |
| C C N O | 19,463,268 | C C C O | 13,979,006 | C C C N | 9,228,670 |
| C C C C | 8,549,030 | C C O O | 8,302,189 | C N O O | 7,148,317 |
| C N N O | 4,811,063 | C C N N | 4,137,294 | C O O O | 1,774,801 |
| N N O O | 983,656 | N O O O | 696,899 | C C C S | 575,423 |
| C C O S | 453,511 | C N N N | 320,021 | C C N S | 305,453 |
| C N O S | 255,407 | O O O O | 220,453 | N N N O | 184,983 |
| C O O S | 99,173 | C C S S | 56,480 | C N N S | 42,572 |
| N O O S | 30,644 | C O S S | 23,276 | N N N N | 21,076 |
| C N S S | 19,843 | N N O S | 16,119 | O O O S | 8,380 |
| C C C SE | 7,624 | N O S S | 4,995 | C C O SE | 4,582 |
| C C N SE | 2,822 | C N O SE | 2,289 | N N N S | 1,982 |
| N N S S | 1,872 | C S S S | 1,848 | O O S S | 1,565 |
| N S S S | 793 | C O O SE | 764 | C N N SE | 433 |
| S S S S | 420 | O S S S | 335 | C C C F | 256 |
| N O O SE | 230 | C C F O | 224 | N N O SE | 149 |
| C O O P | 145 | C C F N | 123 | O O O P | 101 |
| C C SE SE | 99 | C F N O | 96 | N O O P | 91 |
| C C S SE | 72 | O O O SE | 70 | C C C I | 65 |
| C C I O | 51 | C F O O | 47 | C CL N O | 40 |
| C N O P | 38 | N N N SE | 31 | C I O O | 28 |
| C C CL N | 27 | C C CLO | 26 | C O S SE | 25 |
| AS C C S | 21 | AS C C O | 20 | C I N O | 20 |
| C N SE SE | 19 | AS C C C | 17 | C F N N | 16 |
| C C O P | 15 | AS C O S | 15 | C C C CL | 15 |
| F N O O | 14 | C O O V | 12 | C C I N | 12 |
| AS C N O | 11 | B C O O | 10 | C CLO O | 10 |
| AS C C N | 10 | C O SE SE | 9 | C C F S | 9 |
| O O O V | 8 | F N N O | 6 | C C I S | 6 |
| N N O P | 6 | AS C O O | 6 | AS N O O | 5 |
| C N S SE | 5 | B C N O | 4 | N N SE SE | 4 |
| B C C O | 4 | CL N O O | 4 | I O O O | 4 |
| I N O O | 4 | CL N N O | 3 | N O SE SE | 3 |
| F O O O | 3 | AS N N O | 3 | AS C N N | 3 |
| N O S SE | 3 | C I O S | 2 | C C F F | 2 |
| B N O O | 2 | C O P S | 2 | C F O S | 2 |
| AS C N S | 1 | O O S SE | 1 | C F F N | 1 |
| C C C P | 1 | O O P S | 1 | N N S SE | 1 |
| AS O O S | 1 | AS N O S | 1 | C CL N N | 1 |
Figure 3Separate drawing for different tetrahedra. We give here similar density maps as in Figure 2, but now separately drawn for tetrahedra with vertices C_C_N_O (inset A), C_C_O_S (inset B), C_N_O_S (inset C) and N_N_O_O (inset D). It is clear that different vertex-compositions implies different shape/volume distributions.
Figure 4Ligand in a Delaunay decomposition. The Delaunay decomposition of the PDB entry 1n9c. The ligand is pictured with solid lines.
The classifications of the tetrahedra around metal ligand atoms. The tetrahedra not present contain no metal atoms.
| 5 | 7 | 3 | 2 | 0 | 14 | 14 | 0 | |
| 1 | 11 | 4 | 1 | 0 | 15 | 10 | 6 | |
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | |
| 0 | 0 | 0 | 3 | 1 | 3 | 4 | 0 | |
| 2 | 3 | 0 | 1 | 1 | 0 | 0 | 2 | |
| 38 | 2 | 0 | 0 | 4 | 0 | 32 | ||
| 5 | 6 | 0 | 5 | 0 | 0 | 0 | ||
| 2 | 48 | 0 | 0 | 0 | 0 | 0 | ||
| 4 | 5 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 2 | 1 | 0 | 5 | 0 | ||
The classifications of the tetrahedra around frequent non-metallic ligand atoms. An atom is called frequent, if it appears in at least 100 entries in our data set.
| C C N O | C C C O | C C O O | C N O O | C C C C | C N N O | C O O O | C C C N | C C N N | N N O O | N O O O | |
| H | 5590 | 5385 | 5461 | 4678 | 3091 | 2651 | 2899 | 2360 | 1304 | 1328 | 1334 |
| C | 4218 | 4289 | 3757 | 3295 | 2628 | 1806 | 1777 | 2091 | 1125 | 839 | 886 |
| O | 1673 | 823 | 1097 | 1470 | 345 | 1373 | 621 | 601 | 623 | 731 | 519 |
| N | 585 | 554 | 589 | 605 | 195 | 220 | 447 | 307 | 97 | 150 | 187 |
| P | 41 | 10 | 17 | 30 | 6 | 110 | 17 | 18 | 38 | 64 | 28 |
| S | 77 | 42 | 43 | 49 | 27 | 31 | 16 | 28 | 21 | 9 | 9 |
| F | 27 | 40 | 42 | 22 | 31 | 14 | 6 | 18 | 5 | 5 | 2 |
| C N N N | N N N O | O O O O | C C O S | C C C S | N N N N | C N O S | C O O S | C C N S | N N O S | N O O S | |
| H | 663 | 583 | 665 | 325 | 298 | 139 | 204 | 187 | 149 | 88 | 66 |
| C | 422 | 317 | 276 | 226 | 267 | 70 | 132 | 107 | 133 | 50 | 32 |
| O | 524 | 521 | 133 | 47 | 41 | 214 | 92 | 37 | 45 | 31 | 20 |
| N | 36 | 40 | 170 | 56 | 29 | 6 | 39 | 33 | 19 | 7 | 7 |
| P | 70 | 75 | 4 | 1 | 0 | 69 | 0 | 0 | 0 | 0 | 1 |
| S | 6 | 5 | 3 | 9 | 5 | 1 | 4 | 4 | 1 | 1 | 0 |
| F | 0 | 2 | 0 | 0 | 2 | 0 | 1 | 0 | 0 | 0 | 0 |
| C N N S | C C S S | N O S S | O O O S | C N S S | N N N S | O O S S | N N S S | C O S S | C N O S E | C N N S E | |
| H | 32 | 16 | 2 | 18 | 19 | 5 | 7 | 9 | 7 | 1 | 1 |
| C | 30 | 59 | 3 | 11 | 11 | 9 | 7 | 2 | 2 | 0 | 0 |
| O | 29 | 8 | 8 | 3 | 1 | 7 | 0 | 4 | 2 | 0 | 0 |
| N | 4 | 0 | 2 | 2 | 0 | 0 | 7 | 0 | 3 | 0 | 0 |
| P | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| S | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 |
| F | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |