Literature DB >> 18793461

Wanted: unique names for unique atom positions. PDB-wide analysis of diastereotopic atom names of small molecules containing diphosphate.

Christopher A Bottoms1, Dong Xu.   

Abstract

BACKGROUND: Biological chemistry is very stereospecific. Nonetheless, the diastereotopic oxygen atoms of diphosphate-containing molecules in the Protein Data Bank (PDB) are often given names that do not uniquely distinguish them from each other due to the lack of standardization. This issue has largely not been addressed by the protein structure community.
RESULTS: Of 472 diastereotopic atom pairs studied from the PDB, 118 were found to have names that are not uniquely assigned. Among the molecules identified with these inconsistencies were many cofactors of enzymatic processes such as mononucleotides (e.g. ADP, ATP, GTP), dinucleotide cofactors (e.g. FAD, NAD), and coenzyme A. There were no overall trends in naming conventions, though ligand-specific trends were prominent.
CONCLUSION: The lack of standardized naming conventions for diastereotopic atoms of small molecules has left the ad hoc names assigned to many of these atoms non-unique, which may create problems in data-mining of the PDB. We suggest a naming convention to resolve this issue. The in-house software used in this study is available upon request.A version of the software used for the analyses described in this paper is available at our web site: http://digbio.missouri.edu/ddan/DDAN.htm.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18793461      PMCID: PMC2537567          DOI: 10.1186/1471-2105-9-S9-S16

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Background

Often accompanying the macromolecules deposited in the Protein Data Bank (PDB) [1] are smaller molecules of biological importance. Some of these are energy-carrying cofactors, such as ATP, coenzyme A, and nicotinamide-adenine dinucleotide (NAD). Some analogs of these molecules are either drugs or can be used in drug design [2,3]. Like other biologically relevant molecules, many of these small molecules contain chiral or prochiral centers. An atom is a chiral center if four different chemical groups are attached to it. A chiral configuration can be designated R or S, depending on the arrangement of the attached groups (Figure 1). If, however, two of these groups are identical, then the center atom is prochiral, meaning that it would become chiral if either of the identical groups were substituted for a unique group. These two groups are called diastereotopic, i.e., if either were replaced with a unique group, the molecule would become one or another diastereomer. Within a pair of diastereotopic atoms, one is designated pro-R and the other pro-S, indicating the configuration of the chiral atom would result from replacing the diastereotopic atom with a group that has higher priority than the other groups. Many ligands contain diphosphate groups that contain at least one prochiral phosphorus atom (Figure 2).
Figure 1

S and R configurations for chiral centers. (a) S configuration and (b) R configuration, for atoms A, B, C, and D when they have the highest, second, third, and lowest priorities, respectively. Notice that when the three highest priority groups (A, B, and C) are facing the viewer, they have a counter-clockwise arrangement in the S configuration and a clockwise arrangement in the R configuration.

Figure 2

NAD molecules from X-ray crystal structure 2OHX. For comparison purposes, one molecule is superimposed on the other and then offset slightly. The atom names are similarly offset for the diastereotopic oxygen atoms of the adenine-side phosphate group. Note the differences in names for the pro-R and pro-S atoms. Both molecules of NAD shown are from an alcohol dehydrogenase structure [PDB:2OHX][16]. Following the CIP-algorithm, since all four oxygen atoms have the same atomic number, their priority is determined by subsequent bonded groups. The O3 oxygen atom is bonded to the next phosphorus atom and the O5' oxygen atom is bonded to the preceding C5' carbon atom, while the remaining two oxygen atoms are unbonded except to the original phosphorus atom. Therefore, the O3 oxygen atom has the highest priority, the O5' oxygen atom has the second highest priority, and the remaining two oxygen atoms tie for the lowest priority. The pro-S atom is the one that, if it were replaced with an atom of highest priority, would make the phosphorus atom chiral with an S configuration. Both molecules are drawn using red for oxygen, blue for nitrogen, and orange for phosphorus. One is drawn using light blue for carbon and the other is drawn using white for carbon.

S and R configurations for chiral centers. (a) S configuration and (b) R configuration, for atoms A, B, C, and D when they have the highest, second, third, and lowest priorities, respectively. Notice that when the three highest priority groups (A, B, and C) are facing the viewer, they have a counter-clockwise arrangement in the S configuration and a clockwise arrangement in the R configuration. NAD molecules from X-ray crystal structure 2OHX. For comparison purposes, one molecule is superimposed on the other and then offset slightly. The atom names are similarly offset for the diastereotopic oxygen atoms of the adenine-side phosphate group. Note the differences in names for the pro-R and pro-S atoms. Both molecules of NAD shown are from an alcohol dehydrogenase structure [PDB:2OHX][16]. Following the CIP-algorithm, since all four oxygen atoms have the same atomic number, their priority is determined by subsequent bonded groups. The O3 oxygen atom is bonded to the next phosphorus atom and the O5' oxygen atom is bonded to the preceding C5' carbon atom, while the remaining two oxygen atoms are unbonded except to the original phosphorus atom. Therefore, the O3 oxygen atom has the highest priority, the O5' oxygen atom has the second highest priority, and the remaining two oxygen atoms tie for the lowest priority. The pro-S atom is the one that, if it were replaced with an atom of highest priority, would make the phosphorus atom chiral with an S configuration. Both molecules are drawn using red for oxygen, blue for nitrogen, and orange for phosphorus. One is drawn using light blue for carbon and the other is drawn using white for carbon. The pro-S and pro-R oxygen atoms of nucleic acid strands are named "OP1" and "OP2", respectively [4]. Many enzymes treat the pro-R and pro-S oxygen atoms of DNA and RNA differently [5]. These diastereotopic oxygen atoms are also treated differently in RNA-intron splicing [6,7]. Small diphosphate-containing molecules also participate in enzymatic reactions in which the distinction between diastereotopic atoms or groups is important [5,8,9]. Unfortunately, many of these diastereotopic atoms do not have standardized names, an issue that has not been investigated to our knowledge. Consistent naming of diastereotopic atoms is needful when performing all-atom superpositioning or all-atom root mean square deviation (RMSD) calculation [10]. It is also needful for data mining in the PDB, e.g., structure-based virtual screening for drug candidates [11,12]. In this paper, we will conduct a systematic PDB-wide analysis on the diastereotopic atom names of small molecules containing diphosphate.

Results

Inconsistencies in PDB files

There were 4167 PDB files containing a total of 295 distinct ligands having prochiral centers that met our strict criteria. Over half of these ligands (175) had two prochiral phosphate centers that were adjacent to carbon, and one had three (OXT from [PDB:2JI7] [13]), for a total of 472 distinct prochiral centers adjacent to carbon. For example, NAD contains two because it has a diphosphate sandwiched between two ribose moieties. Each distinct prochiral center contains a pair of disastereotopic atoms. We analyzed the names of the atoms at each prochiral center. Of these distinct centers, 354 had a single naming convention but 241 of these also only occurred in a single PDB file. There were 118 distinct prochirality centers that had more than one naming convention. We defined a case of swapped names to occur when all of the following were true between two molecules with the same type of prochiral center: (1) the highest and second highest priority names were consistent, (2) the pro-R atom of one prochiral center had the same name as the pro-S atom of a second center, and (3) the pro-S atom of the first center had the same name as the pro-R atom of the second center (Figure 2). 117 of the 118 centers had swapped naming conventions as defined above. The remaining center, which had two naming conventions, actually had a naming error. Nine of the 117 centers with swapped names had additional naming conventions. In every case, we found that the extra naming conventions were caused by errors rather than mere inconsistencies. For example, in a structure of a surfactin synthetase-activating enzyme [PDB:1QR0] [14], the diastereotopic atoms attached to phosphorus atom P1A are labeled "O5A" and "O4A" instead of the names "O2A" and "O1A" defined in the Chemical Component Dictionary from the PDB. In a similar manner, the diastereotopic atoms attached to P2A are named "O2A" and "O1A", instead of the names "O5A" and "O4A" defined in the dictionary file. In another example, in a structure of E. coli carbamoyl phosphate synthetase [PDB:1CE8] [15] the O5' oxygen atom is mislabeled as O4' for 8 different ADP molecules. Interestingly, in four of these molecules, the pro-S and pro-R atoms are labeled "O1A" and "O2A", respectively, while in the other four molecules they are labeled "O2A" and "O1A", respectively. In Table 1, we present statistics for sample cases in which there were at least two nonredundant examples of each naming convention. For additional selected examples, see Supplement Table 1 in Additional File 1. For our full results, including cases that had no inconsistencies, see Supplemental Table 2 in Additional File 2 (explanation in Additional file 3). All results, including those resulting from errors, are included in Supplemental Table 2. However, we emphasize that the bulk of the results are due to inconsistencies, not errors.
Table 1

Naming convention statistics for selected ligands

ligand codeligand namecenter atompro-Spro-R#bias (%)example PDB
ACOacetyl-coenzyme AP1AO1AO2A2242%1DM3
O2AO1A3058%1B87
P2AO4AO5A2548%1B87
O5AO4A2752%1DM3
ADPadenosine-5'-diphosphatePAO1AO2A21133%1A6E
O2AO1A41967%13PK
ATPadenosine-5'-triphosphatePAO1AO2A10330%1B0U
O2AO1A24070%1A0I
COAcoenzyme AP1AO1AO2A6745%1ACA
O2AO1A8155%1CM0
P2AO4AO5A6746%1ESM
O5AO4A7854%1ACA
CTPcytidine-5'-triphosphatePAO1AO2A2049%1GQ9
O2AO1A2151%1COZ
FADflavin-adenine dinucleotidePO1PO2P55487%1A8P
O2PO1P8113%1B2R
PAO1AO2A29046%1AHV
O2AO1A34554%1A8P
GTPguanosine-5'-triphosphatePAO1AO2A3536%1CKM
O2AO1A6264%1A8R
NADnicotinamide-adenine-dinucleotidePAO1AO2A14427%1A5Z
O2AO1A38873%1A4Z
PNO1NO2N39474%1A4Z
O2NO1N13526%1A7A
NAPnadp nicotinamide-adenine-dinucleotide phosphatePAO1AO2A8726%1CIV
O2AO1A24774%1A27
PNO1NO2N28083%1A27
O2NO1N5817%1A80
TPPthiamine diphosphate (i.e. vitamin B1)PAO1AO2A2556%1AY0
O2AO1A2044%1B0P
UDPuridine-5'-diphosphatePAO1AO2A8079%1BGU
O2AO1A2121%1C3J

# = number of PDB files in which the given naming convention was observed.

Naming convention statistics for selected ligands # = number of PDB files in which the given naming convention was observed.

Examples of naming inconsistencies

Most of the atom naming inconsistencies mentioned in this paper relate to differences found between different files. However, there are a few cases in which naming inconsistencies can be found within a single file. One example is an X-ray crystal structure of alcohol dehydrogenase [PDB:2OHX] [16]. This structure contains two NAD molecules (see Figure 2). The prochiral center around phosphorus atom PN has consistent naming between the two molecules, however the prochiral center around phosphorus atom PA does not. In one case the pro-S and pro-R atoms are named "O1A" and "O2A", respectively, and in the other case, the names are "O2A" and "O1A", respectively. Another example is an NMR structure of bovine acyl-coenzyme A binding protein (Figure 3) [PDB:1NVL]. This structure contained 20 NMR models, in which one phosphorus prochiral center was consistently named and the other was not. For the P1A center, models 1, 2, 5 and 18 have pro-S and pro-R atoms named "O1A" and "O2A", while the remaining 15 models have them named "O2A" and "O1A", respectively. Meanwhile, the pro-S and pro-R atoms at the P2A center are consistently named "O5A" and "O4A", respectively.
Figure 3

Diphosphate of Coenzyme A from NMR structure 1NVL. The diphosphate region of coenzyme A of two models from 1NVL is shown. The diphosphate region of model 4 (light coloring) is superimposed on the same diphosphate region of model 2 (standard coloring). The diastereotopic names of prochiral center P2A have consistent names (O4A and O5A), but the pro-S and pro-R names for prochiral center P1A are not (O1A and O2A, respectively, for model 2, and O2A and O1A, respectively, for model 4).

Diphosphate of Coenzyme A from NMR structure 1NVL. The diphosphate region of coenzyme A of two models from 1NVL is shown. The diphosphate region of model 4 (light coloring) is superimposed on the same diphosphate region of model 2 (standard coloring). The diastereotopic names of prochiral center P2A have consistent names (O4A and O5A), but the pro-S and pro-R names for prochiral center P1A are not (O1A and O2A, respectively, for model 2, and O2A and O1A, respectively, for model 4).

Discussion

The inconsistent naming of atoms discussed in our paper is due largely to a lack of standardized names, not due to errors on the part of crystallographers or NMR researchers. There can be no errors where there are no rules. A study of NAD(P) molecules by Carugo and Argos ignored the diastereotopic oxygen atoms for purposes of superimposing molecules because of naming inconsistencies [17]. Despite their use of atom-specific names for other atoms in the molecules, they only generally referred to diastereotopic oxygen atoms as "terminal oxygen atoms". That was eleven years ago and only involved a study of 32 protein structures. This was long before the recent remediation project of the PDB [18]. This project has done well to bring molecular and atomic naming conventions for PDB files into conformity with standards established by the International Union of Pure and Applied Chemistry (IUPAC) and the International Union of Biochemistry and Molecular Biology (IUBMB). However, IUPAC and IUBMB do not have standards for most diastereotopic atoms of small molecules. There were no obvious overall trends in naming conventions with respect to the pro-R and pro-S atoms. This is likely due to the lack of naming standardization. However, trends are commonly seen among specific ligands (Table 1). One interesting observation is that the P prochiral center of FAD is highly biased in its naming convention (87% for one convention); however, the second center, PA has little bias (54% for one convention). Another observation is that NAD-like ligands tend to have naming conventions such that similar names (e.g. O1A and O1N) are seen on the same "side" of the molecule. We suggest a general rule that names for pro-S atoms come alphanumerically before names for pro-R atoms. This is similar to the standard of using "OP1" for pro-S and "OP2" for pro-R in nucleic acids. The data indicates that there is no strong bias for this nor for its opposite convention among diphosphate containing ligands. Regardless of what rules may become adopted, it is important to know to which atom a particular name refers. Establishing standard names and topologies that take prochirality into consideration will result in less confusion and more accuracy in studies involving small molecules. Until standards are adopted, individuals mining the data need to do their own standardization of the names. This naming can be enforced upfront, prior to the official release of data, or it can be enforced by individuals mining the data.

Conclusion

Current naming conventions do not completely map unique names to unique diastereotopic atoms, resulting in possible confusion or error, or at least the need for researchers to impose their own naming standardization. We herein describe many cases of naming inconsistencies for small molecules containing diphosphate moieties. A future study will assess naming conventions of all atoms in the PDB, addressing more general issues of chirality and prochirality. The in-house software used in this study is available upon request.

Methods

Selection of small molecules for analysis

PDB files were selected from the January 7, 2008 "snapshot" of the Protein Data Bank. The search feature of the Protein Data Bank website was used to select PDB codes for files containing ligands that had substructures matching the SMILES pattern "C~O~P(~O)(~O)~O~P(~O)(~O)~O". Here, "C" represents a carbon atom, "~" represents any bond, "O" represents oxygen, "P" represents phosphorus, and the parentheses indicate that the oxygen atoms inside them are bonded to the preceding phosphorus atom in the list, not to subsequent atoms in the list. This matches any ligand containing a (PO4)2 moiety, such as NAD, ATP, and Coenzyme A, resulting in a list of 4435 PDB codes. Since the PDB files corresponding to these codes also included other ligands not meeting our criteria, we analyzed each of the small molecules within each PDB file and selected each one that met the following criteria: (1) It did not have the same residue name as an amino acid or nucleic acid, including names mapped to standard residue names via the "MODRES" record. (2) It had an entry in the Chemical Component Dictionary from the PDB. (3) It had complete coordinates for the non-hydrogen atoms specified in the Chemical Component Dictionary. And (4), it had a diphosphate group attached to carbon, with the diphosphate group consisting of two phosphorus atoms, each covalently bonded to four oxygen atoms. We chose to analyze the prochiral phosphate centers adjacent to carbon atoms because of their abundance and because it allowed a simple and direct application of the CIP algorithm. Atoms were considered to be covalently bonded if the distance between their centers was less than the sum of their covalent radii plus a cushion of 0.4 Å, following the custom of the Cambridge Structural Database (CSD) [19]. Covalent radii were obtained from the CSD website . Also excluded were molecules that had alternate conformations that shared the same residue number. This guaranteed that any modeled alternate conformations would contain complete molecules. Those files containing diphosphates were further checked for phosphorus atoms having a prochiral configuration (see Determination of Prochiral Centers below). For those that did, the names of all four atoms attached to the prochiral center were recorded along with their relative stereochemical positions. Of the 4435 files originally selected, 4184 were found to have at least one ligand with a prochiral phosphate atom.

Determination of prochiral centers

The CIP algorithm [20,21] for assigning priorities to atoms within a molecule was implemented using in-house software. CIP priorities were calculated for all four atoms connected to a phosphorus atom. Following the CIP-algorithm, the oxygen atom attached to two phosphorus atoms always had the highest priority and the oxygen atom attached to carbon always had the second highest priority. The two remaining oxygen atoms were not bonded to any other atom besides the phosphorus atom. If each atom had a distinct priority, then the phosphorus is chiral and the determinant algorithm of Cieplak and Wisniewski[22] could be used to calculate whether the configuration is R or S as shown in Equation (1): XN, YN, and ZN are the x, y, and z components of the coordinates for group N. The subscripted letters A, B, C, and D represent the highest, second highest, third highest, and lowest priority atoms, respectively (see Figure 1). m is the result of calculating the determinant. It is negative for the R configuration and positive for the S configuration. If it is evaluated to be zero, then the atoms are all in the same plane [22], which should never be the case for tetrahedrally arranged molecules such as phosphates. For understanding the mathematics behind this equation and how it captures the handedness of four three-dimensional coordinates, we refer the reader to the work of Cieplak and Wisniewski [22]. If two of the atoms attached to the phosphorus atom have identical priorities, then they are diastereotopic and the phosphorus is prochiral. In the case of diphosphate-containing molecules, the diastereotopic atoms are only bonded to phosphorus and therefore have the lowest priority (see Figure 2). We will call the atoms attached to the phosphorus atom A, B, C, and C', where A and B have the highest and second highest priority, respectively, while C and C' tie for the lowest priority. In this case, Equation (1) can be adapted to determine whether C is the pro-S or pro-R atom and, concomitantly, whether C' is the pro-R or pro-S atom. By definition, a diastereotopic atom being pro-S (or pro-R) means that, if it were replaced by a group with higher priority than any other substituent, then the prochiral center would become chiral with an S (or R) configuration. Therefore, we treat C as if it had the highest priority and then calculate the resulting configuration. If the calculated configuration is S, then C is pro-S; if it is R, then C is pro-R. To do this computationally, we artificially raise the priority of C to be the highest (i.e. higher than A) changing Equation (1) to the following: If m is positive, then C is the pro-S atom and, concomitantly, C' is the pro-R atom (Figure 2). If m is negative, then C is the pro-R atom and C' is the pro-S atom.

Third-party software used

COOT [23] was used for visualizing PDB files, which was especially useful during the development of our software. As needed, the SSM [24] module of COOT was also used for superposition of molecules. Pymol was used for viewing NMR models as well as generating depictions of molecular structures for figures [25].

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

CAB participated in the design of the study, developed the in-house software, carried out the atom name analysis, and drafted the manuscript. DX participated in the design and coordination of the study, and helped draft the manuscript. Both authors read and approved the final manuscript.

Additional file 1

Supplemental Table 1. Contains Table 1 from this document with about four additional pages of examples of naming convention statistics for selected ligands. Click here for file

Additional file 2

Supplemental Table 2. Contains all of the calculated results, including those for prochiral centers that appear only once in the PDB. Click here for file

Additional file 3

Explanation of Supplemental Table 2. Contains an explanation of the columns in Supplemental Table 2. Click here for file
  20 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 2.  Structure-based virtual screening protocols.

Authors:  A Good
Journal:  Curr Opin Drug Discov Devel       Date:  2001-05

Review 3.  RNA catalysis by a group I ribozyme. Developing a model for transition state stabilization.

Authors:  T R Cech; D Herschlag; J A Piccirilli; A M Pyle
Journal:  J Biol Chem       Date:  1992-09-05       Impact factor: 5.157

4.  Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions.

Authors:  E Krissinel; K Henrick
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2004-11-26

Review 5.  Nucleoside phosphorothioates.

Authors:  F Eckstein
Journal:  Annu Rev Biochem       Date:  1985       Impact factor: 23.643

6.  The stereochemical course of group II intron self-splicing.

Authors:  R A Padgett; M Podar; S C Boulanger; P S Perlman
Journal:  Science       Date:  1994-12-09       Impact factor: 47.728

7.  Unambiguous stereochemical course of rabbit liver fructose bisphosphatase hydrolysis.

Authors:  P L Domanico; J F Rahil; S J Benkovic
Journal:  Biochemistry       Date:  1985-03-26       Impact factor: 3.162

8.  Adenosine analogues as selective inhibitors of glyceraldehyde-3-phosphate dehydrogenase of Trypanosomatidae via structure-based drug design.

Authors:  J C Bressi; C L Verlinde; A M Aronov; M L Shaw; S S Shin; L N Nguyen; S Suresh; F S Buckner; W C Van Voorhis; I D Kuntz; W G Hol; M H Gelb
Journal:  J Med Chem       Date:  2001-06-21       Impact factor: 7.446

9.  The binding of inosine monophosphate to Escherichia coli carbamoyl phosphate synthetase.

Authors:  J B Thoden; F M Raushel; G Wesenberg; H M Holden
Journal:  J Biol Chem       Date:  1999-08-06       Impact factor: 5.157

10.  Crystallographic snapshots of oxalyl-CoA decarboxylase give insights into catalysis by nonoxidative ThDP-dependent decarboxylases.

Authors:  Catrine L Berthold; Cory G Toyota; Patricia Moussatche; Martin D Wood; Finian Leeper; Nigel G J Richards; Ylva Lindqvist
Journal:  Structure       Date:  2007-07       Impact factor: 5.006

View more
  2 in total

1.  Proceedings of the 2009 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) conference. Introduction.

Authors:  Jonathan D Wren; Yuriy Gusev; Raphael D Isokpehi; Daniel Berleant; Ulisses Braga-Neto; Dawn Wilkins; Susan Bridges
Journal:  BMC Bioinformatics       Date:  2009-10-08       Impact factor: 3.169

2.  Proceedings of the 2008 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference.

Authors:  Jonathan D Wren; Dawn Wilkins; James C Fuscoe; Susan Bridges; Stephen Winters-Hilt; Yuriy Gusev
Journal:  BMC Bioinformatics       Date:  2008-08-12       Impact factor: 3.169

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.