| Literature DB >> 25253464 |
Yun Hao Hua1, Chih Yuan Wu1, Karen Sargsyan1, Carmay Lim2.
Abstract
Many enzymes use nicotinamide adenine dinucleotide or nicotinamide adenine dinucleotide phosphate (NAD(P)) as essential coenzymes. These enzymes often do not share significant sequence identity and cannot be easily detected by sequence homology. Previously, we determined all distinct locally conserved pyrophosphate-binding structures (3d motifs) from NAD(P)-bound protein structures, from which 1d sequence motifs were derived. Here, we aim to establish the precision of these 3d and 1d motifs to annotate NAD(P)-binding proteins. We show that the pyrophosphate-binding 3d motifs are characteristic of NAD(P)-binding proteins, as they are rarely found in nonNAD(P)-binding proteins. Furthermore, several 1d motifs could distinguish between proteins that bind only NAD and those that bind only NADP. They could also distinguish between NAD(P)-binding proteins from nonNAD(P)-binding ones. Interestingly, one of the pyrophosphate-binding 3d and corresponding 1d motifs was found only in enoyl-acyl carrier protein reductases, which are enzymes essential for bacterial fatty acid biosynthesis. This unique 3d motif serves as an attractive novel drug target, as it is conserved across many bacterial species and is not found in human proteins.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25253464 PMCID: PMC4174568 DOI: 10.1038/srep06471
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Derivation of 1d-motifs from distinct 3d-motifs.
(Left) The distinct locally conserved pyrophosphate-binding βα structures derived from NAD(P)-binding domains where the total number of βα structures exceeds 25. The βα structure is in green with the regions containing conserved glycines highlighted in yellow, while NAD(P) is shown in stick format. The class III and IV structures share a common backbone conformation but exhibit different side chain orientations: in the class IV structure (1zk4-A), the Leu side chain is shown in stick, but the corresponding side chain in the class III structure (1sby-A), indicated by the black arrow, point in an opposite direction. (Right) Sequence logos derived from aligning the same-length sequences comprising the distinct pyrophosphate-binding βα structures and corresponding 1d motif. Glycine is shown in green, polar (S, T, Y, N, Q, H, K, R, D, E) residues in blue, and nonpolar (A, V, L, I, P, W, F, C, M) residues in black.
Description of data sets employed
| Redundant Dataset | # of Proteins | Description of dataset |
|---|---|---|
| 3d-NAD(P) | 1,096 | Protein structures with NAD(P) bound |
| 3d-FAD | 348 | Protein structures with FAD bound |
| 3d-PO4 | 10,292 | Protein structures with ≥1 phosphate group bound, excluding NAD(P) but including FAD |
| 3d-nonPO4 | 33,514 | Protein structures with no bound NAD(P) or phosphate |
| 1d-NAD(P) | 24,516 | Sequences of NAD(P)-binding proteins |
| 1d-NAD | 15,340 | Sequences of proteins that bind only NAD |
| 1d-NADP | 6,722 | Sequences of proteins that bind only NADP |
| 1d-nonNAD(P) | 402,353 | Sequences of nonNAD(P)-binding proteins |
| 1d-FAD | 949 | Sequences of FAD-binding proteins |
| 1d-PO4 | 131,165 | Sequences in 1d-nonNAD(P) that bind ≥1 phosphate, including FAD-binding protein sequences |
| 1d-nonPO4 | 271,188 | Sequences in 1d-nonNAD(P) that do not bind phosphate |
Frequency distribution of the NAD(P) pyrophosphate-binding 3d motifs in the PDB
| % frequency of structural class | % PPV of 3d-NAD(P) vs. | ||||||
|---|---|---|---|---|---|---|---|
| Class | NAD(P) | FAD | PO4 | nonPO4 | FAD | PO4 | nonPO4 |
| I | 24.6 | 36.2 | 2.6 | 0.2 | 68 | 51 | 80 |
| II | 2.2 | 0.3 | 0.05 | 0.01 | 96 | 83 | 89 |
| III | 12.9 | 0 | 0.1 | 0.04 | 100 | 92 | 92 |
| IV | 11.3 | 0 | 0 | 0 | 100 | 100 | 100 |
| XII | 1.6 | 0 | 0 | 0 | 100 | 100 | 100 |
aThe number of structures in the given dataset containing the 3d motif belonging to class j divided by the total number of structures/proteins in the given dataset, multiplied by 100.
Precision of the 1d motifs to distinguish between NAD- and NADP-binding proteins
| 1d motif | Consensus sequence | NAD | NADP | %PPV |
|---|---|---|---|---|
| I_NAD | [VILCAF]-X3- | 18.5 | 6.5 | 82 |
| I_NADP | [VILF]-X- | 9.3 | 22.6 | 61 |
| II_NADP | [VICL]-X-[IVC]-X- | 0 | 0.4 | 100 |
| III_NAD | [VILFW]-X-[VIL]-X- | 2.4 | 7.5 | 34 |
| III_NADP | [VILFA]-X-[VILF]-X-[GA]-X2- | 2.8 | 3.9 | 47 |
| IV_NAD | [AVI]-[LVIFA]-[IV]- | 0.5 | 2.3 | 26 |
| IV_NADP | [AVIC]-[LIV]-[VIL]- | 0.4 | 1.8 | 76 |
| XII_NAD | [LFV]-[VI]-X- | 0.06 | 0 | 100 |
aThe number of protein sequences in the given dataset matching the 1d motif divided by the total number of sequences in the given dataset, multiplied by 100.
bThe number of true positives is the number of NAD-binding sequences matching a 1d motif derived from NAD-bound structures, whereas the number of false positives is the number of NADP-binding sequences matching the same 1d motif.
cThe number of true positives is the number of NADP-binding sequences matching a 1d motif derived from NADP-bound structures, whereas the number of false positives is the number of NAD-binding sequences matching the same 1d motif.
Precision of the 1d motifs to distinguish between NAD(P)-binding and nonNAD(P)-binding proteins
| % frequency of 1d motif in 1d dataset | % PPV of 1d-NAD(P) vs. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 1d motif | NAD(P) | FAD | PO4 | nonPO4 | nonNAD(P) | FAD | PO4 | nonPO4 | nonNAD(P) |
| I_NAD | 13.2 | 13.8 | 0.7 | 0.2 | 0.4 | 96 | 78 | 84 | 68 |
| I_NADP | 12.4 | 61.0 | 3.5 | 1.8 | 2.4 | 84 | 40 | 38 | 24 |
| II_NADP | 0.1 | 0 | 0.005 | 0 | 0.002 | 100 | 79 | 100 | 79 |
| III_NAD | 4.8 | 1.7 | 0.4 | 0.1 | 0.2 | 99 | 68 | 78 | 57 |
| III_NADP | 3.1 | 2.2 | 0.8 | 0.3 | 0.5 | 97 | 41 | 47 | 28 |
| IV_NAD | 1.2 | 0 | 0 | 0 | 0 | 100 | 100 | 100 | 100 |
| IV_NADP | 1.0 | 0 | 0 | 0 | 0 | 100 | 100 | 100 | 100 |
| XII_NAD | 0.07 | 0 | 0 | 0 | 0 | 100 | 100 | 100 | 100 |
aThe number of protein sequences in the given dataset matching the 1d motif divided by the total number of sequences in the given dataset, multiplied by 100.
Figure 2Flowchart of protocol for generating 3d datasets and 1d datasets.
See text in Methods for a description of the process used to generate the four 3d datasets (left), and seven 1d datasets (right). SI denotes sequence identity.
Figure 3Flowchart of process for determining hits.
A hit was recorded if (left) the 3d structure and one of the 3d motifs in Fig. 1 shared RMSDa ≤ 30° and pairwise Cα RMSD ≤ 1.0 Å, or (right) the 1d sequence matched one of the 1d motifs in Fig. 1 and the matched segment has a βα structure.