| Literature DB >> 25425036 |
Sanchayita Sen1, Jasmine Young2, John M Berrisford2, Minyu Chen2, Matthew J Conroy2, Shuchismita Dutta2, Luigi Di Costanzo2, Guanghua Gao2, Sutapa Ghosh2, Brian P Hudson2, Reiko Igarashi2, Yumiko Kengaku2, Yuhe Liang2, Ezra Peisach2, Irina Persikova2, Abhik Mukhopadhyay2, Buvaneswari Coimbatore Narayanan2, Gaurav Sahni2, Junko Sato2, Monica Sekharan2, Chenghua Shao2, Lihua Tan2, Marina A Zhuravleva2.
Abstract
The Protein Data Bank (PDB) is the single global repository for three-dimensional structures of biological macromolecules and their complexes, and its more than 100,000 structures contain more than 20,000 distinct ligands or small molecules bound to proteins and nucleic acids. Information about these small molecules and their interactions with proteins and nucleic acids is crucial for our understanding of biochemical processes and vital for structure-based drug design. Small molecules present in a deposited structure may be attached to a polymer or may occur as a separate, non-covalently linked ligand. During curation of a newly deposited structure by wwPDB annotation staff, each molecule is cross-referenced to the PDB Chemical Component Dictionary (CCD). If the molecule is new to the PDB, a dictionary description is created for it. The information about all small molecule components found in the PDB is distributed via the ftp archive as an external reference file. Small molecule annotation in the PDB also includes information about ligand-binding sites and about covalent and other linkages between ligands and macromolecules. During the remediation of the peptide-like antibiotics and inhibitors present in the PDB archive in 2011, it became clear that additional annotation was required for consistent representation of these molecules, which are quite often composed of several sequential subcomponents including modified amino acids and other chemical groups. The connectivity information of the modified amino acids is necessary for correct representation of these biologically interesting molecules. The combined information is made available via a new resource called the Biologically Interesting molecules Reference Dictionary, which is complementary to the CCD and is now routinely used for annotation of peptide-like antibiotics and inhibitors.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25425036 PMCID: PMC4243272 DOI: 10.1093/database/bau116
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Number of new PDB chemical entity definitions created annually between 2000 and 2013.
Figure 2.Abbreviated category relationship diagram for the key CIF categories that are used in the CCD. Three major categories _chem_comp, _chem_comp_bond and _chem_comp_atom are joined together to generate the machine readable dictionary description of the chemical entity. The unique three character code assigned to every new chemical entity acts as the primary key in the _chem_comp category.id (coloured in purple) and is used to connect the other categories (_chem_comp_bond.comp_id and _chem_comp_atom.comp_id).
Figure 3.α-d-Glucose can form α(1–4) glycosidic linkages with other carbohydrate molecules. During the oligomerization process, the O1 oxygen (highlighted in the figure) of the glucose is eliminated by the O4 oxygen of the other carbohydrate. To account for this condensation reaction, the O1 oxygen of α-d-glucose (GLC) is annotated in the CCD as a leaving atom. The two-dimensional diagram in this figure is a copy of the image from the RCSB PDB website (http://rcsb.org/pdb/ligand/ligandsummary.do?hetId=GLC). It was generated using the ChemAxon software (http://www.chemaxon.com).
Figure 4.Seven α-d-glucose (GLC) molecules undergo condensation reaction to form the circular oligosaccharide β-cyclodextrin [from PDB entry 2v8l (30)].
Figure 5.Binding site for the Plk-2 inhibitor (7R)-8-cyclopentyl-7-ethyl-5-methyl-7,8-dihydropteridin-6(5H)-one (3 letter code 11 G) in PDB entry 4i6b (31). The figure depicts the neighbouring residues that are within 3.7 Å of the ligand 11 G.
Figure 6.The environment for the oligosaccharide poly-N-acetylglucosamine (PNAG) is annotated instead of listing the environment of individual sugars. This avoids repeating the same sugar molecule in multiple binding sites.
Figure 7.Diagram showing the relationship between the_struct_site and _struct_site_gen categories used for annotation of ligand-binding sites. The _struct_site category holds information about the ligands that are present in the PDB entry and every ligand in this category is assigned a alphanumeric binding site identifier. The _struct_site_gen category contains information of the residues that are present within the vicinity of the ligands described in the struct_site category. Both the categories are joined by the binding site identifier.
Figure 8.Tetrahedrally coordinated Zn ion in entry 2VW4 (32) along with the annotation of the bond angles. The REMARK 620 annotation indicates the software calculated bond angle values between Zn A 503 and its surrounding residues. The surrounding residues in anticlockwise direction are Glu A 195, HisB 165 and Asp B 167. The sidechain carboxylate group of the Glu residue exists in two alternate conformation (A and B conformers). The angle between GluA195B-Zn-HisB165 is 117.8, GluA195B-Zn-Asp(OD1)B167 is 86.6, HisB165-Zn-Asp(OD1)B167 is 91.6, Glu195B-Zn-ASP(OD2)B167 is 105.2, HisB165-Zn-Asp(OD2)B167 is 124.9, Asp(OD1)B167-Zn-Asp(OD2)B167 is 57.1, Glu(OE1)A195B-Zn-Glu(OE2)A195A is 24.1, HisB165-Zn- Glu(OE2)A195A is 122.8, Asp(OD1)B167-Zn-Glu(OE2)A195A is 109.3 and Asp(OD2)B167-Zn-Glu(OE2)A195A is 110.7.
Figure 9.Annotation of the glycopeptide antibiotic teicoplanin involves ‘chopping up’ the molecule into its component chemical entities that are validated against the CCD. The bonds highlighted in yellow demarcate the individual entities.
Figure 10.σA weighted 2Fo-Fc map of a carbohydrate binding protein shown at a contour level of 0.35e/A^3. Very little electron density is observed for the oligosaccharide molecule. This is reflected in the high LLDF values (shown in parentheses) for each of the component carbohydrate moieties.