Literature DB >> 35721994

Computationally Decoding NudF Residues To Enhance the Yield of the DXP Pathway.

Abstract

Terpenoids form a large pool of highly diverse organic compounds possessing several economically important properties, including nutritional, aromatic, and pharmacological properties. The 1-deoxy-d-xylulose 5-phosphate (DXP) pathway's end enzyme, nuclear distribution protein (NudF), interacting with isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP), is critical for the synthesis of isoprenol/prenol/downstream compounds. The enzyme is yet to be thoroughly investigated to increase the overall yield of terpenoids in the Bacillus subtilis, which is widely used in industry and is generally regarded as a safe (GRAS) bacterium. The study aims to analyze the evolutionary conservation across the active site for mapping the key residues for mutagenesis studies. The 37-sequence data set, extracted from 103 Bacillus subtilis entries, shows a high phylogenetic divergence, and only six one-motif sequences ASB92783.1, ASB69297.1, ASB56714.1, AOR97677.1, AOL97023.1, and OAZ71765.1 show a monophyly relationship, unlike a complete polyphyly relationship between the other 31 three-motif sequences. Furthermore, only 47 of 179 residues of the representative sequence CUB50584.1 are observed to be significantly conserved. Docking analysis suggests a preferential bias of adenosine diphosphate (ADP)-ribose pyrophosphatase toward IPP, and a nearly threefold energetic difference is observed between IPP and DMAPP. The loops are hereby shown to play a regulatory role in guiding the promiscuity of NudF toward a specific ligand. Computational saturation mutagenesis of the seven hotspot residues identifies two key positions LYS78 and PHE116, orderly encoded within loop1 and loop7, majorly interacting with the ligands DMAPP and IPP, and their mutants K78I/K78L and PHE116D/PHE116E are found to stabilize the overall conformation. Molecular dynamics analysis shows that the IPP complex is significantly more stable than the DMAPP complex, and the NudF structure is very unstable. Besides showing a promiscuous binding of NudF with ligands, the analysis suggests its rate-limiting nature. The study would allow us to customize the metabolic load toward the synthesis of any of the downstream molecules. The findings would pave the way for the development of catalytically improved NudF mutants for the large-scale production of specific terpenoids with significant nutraceutical or commercial value.

Entities: Chemical

Year: 2022 PMID： 35721994 PMCID： PMC9202048 DOI： 10.1021/acsomega.2c01677

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Isoprenoids are the most functionally and structurally varied class of secondary metabolites, with over 55,000 identified molecules,[1,2] and have been used to make aromatic, flavoring, and medicinal molecules.[3−6] As the natural extraction of isoprenoid-based bioactives has led to an overexploitation of plants, the global research interest has now shifted to utilize the generally regarded as safe (GRAS) status microbes like Bacillus subtilis as the biocatalytic machinery.[7] A key protein controlling the yield of an end-product molecule is adenosine diphosphate (ADP)-ribose pyrophosphatase or NudF (EC 3.6.1.13) that orderly hydrolyzes dimethylallyl pyrophosphate (DMAPP) and isopentenyl pyrophosphate (IPP) to prenol and isoprenol (Figure ), collectively termed as isopentenol (C5 alcohol), a building block of all the higher-order downstream terpenoid molecules.[8] It belongs to the Nudix superfamily (Pfam PF00293; InterPro IPR000086)[9] and is involved in the production of adenosine monophosphate (AMP) and ribose-5-phosphate from ADP-ribose by actively dephosphorylating the phosphate moieties of a variety of substrates. Its close homolog is the NudB protein (EC: 3.6.1.67) from E. coli (EcNudB), having a catalytic activity toward geranyl pyrophosphate (GPP) and farnesyl pyrophosphate (FPP).[10] It has been suggested that an additional phosphatase activity (AphA) is needed to hydrolyze IPP to isoprenol because NudB can only catalyze the hydrolysis of IPP to isopentenol.[11] While EcNudB exhibits a strong affinity for DMAPP, the NudF protein of Bacillus subtilis (BsNudF) shows no such preference and equally interacts with both DMAPP and IPP to orderly produce an equivalent amount of prenol and isoprenol.[12] A varying affinity for various substrates may substantially increase pressure on the stoichiometric flux, and until the connected regulatory network of proteins does not drain out the added molecules at an equivalent rate, it becomes toxic to the cell and inhibits cell growth. Hence, to escape this major bottleneck, a fusion protein of isopentenyl-diphosphate delta isomerase (IDI), also known as isopentenyl pyrophosphate isomerase (IPP isomerase), and EcNudB has been utilized and with an increased expression, the metabolic flux increases the bioproduction rate of only prenol and reduces the isoprenol production. Endogenous overexpression of IspG and 1-deoxy-d-xylulose-5-phosphate synthase (DXP) as well as exogenous expression of YhfR and BsNudF have been shown to increase the production rate of downstream molecules in E. coli,[13] and the overexpression of NudF in B. subtilis has been shown to improve the isopentenol yield.[12]

Figure 1

MEP pathway, representing the biocatalysts at all intermediary stages. From the glyceradehyde-3-phosphate (G3P) and pyruvate, it synthesizes the 5-carbon building blocks DMAPP and IPP for producing terpenoids. It includes an ordered set of seven enzymes viz. DXP, DXP reductoisomerase (DXR), 2-C-methyl-d-erythritol 4-phosphate cytidylyltransferase (IspD), 4-diphosphocytidyl-2-C-methyl-d-erythritol kinase (IspE), 2-C-methyl-d-erythritol 2,4-cyclodiphosphate synthase (IspF), HMB-PP synthase (IspG), and HMB-PP reductase (IspH), followed by the interplay of IDI and NudF. Although sufficient IPP and DMAPP concentrations are required for terpenoid synthesis,[8] an excess of these molecules can inhibit cell growth,[12] and thus reduce terpenoid production,[14] and therefore, the NudF enzyme continuously uses these potentially harmful debris molecules. To increase the yield of the DXP pathway, researchers have specifically targeted the first rate-limiting enzyme, DXP[15,16] because its Kcat/Km score is much smaller than that of all the other enzymes.[17] Improving the DXP activity has been considered as the most effective measure to improve the overall yield for several species, including Streptomyces,[18]Lycopersicon esculentum,[19] and Synechococcus leopoliensis.[20,21] Heterologous expression of Bacillus subtilis DXP in E. coli has also been shown to increase the overall yield.[22] Yet another strategy, using the functionally mutated recombinant poplar DXP,[23] has shown a negligible feedback inhibition for IPP/DMAPP and has been fruitful. To mitigate the toxicity of IPP, the farnesyl diphosphate synthase and IDI synthase have been overexpressed in E. coli, and it has led to an 800-fold production of sesquiterpene β-farnesene.[24] Likewise, the balanced metabolic concentration of IPP and DMAPP,[25,26] heterologous expression of Haematococcus pluvialis IDI,[27] or overexpression of Lycium chinense IDI in E. coli(28) have been shown to increase the overall yield. Recently, the modulation of the culture medium has also been demonstrated to increase the isopentenol production by 2.5-folds in B. subtilis.[29] Although the majority of these methods have overexpressed the rate-limiting enzymes or have changed the culture conditions, it may lead to a complete metabolic imbalance and significantly minimize the yield of downstream terpenoids. In this regard, directed co-evolution of DXP, DXR, and IDI has been shown to increase isoprene production.[28] However, these strategies have also not efficiently focused the major enzyme BsNudF for increasing the overall yield, and directed evolution of NudF’s promiscuous active site would therefore be significantly useful to increase the biosynthetic rate of a downstream terpenoid. For understanding the preferential binding of IPP/DMAPP, and catalytic and behavioral switching of BsNudF, this article deciphers the key functional details across the active site for decoding the mutations that could improve its activity. As it comprehensively maps the crucial residues at the functionally important positions, the study will be fruitful in designing a custom set of key interacting residues against a required ligand to attain a theoretically impossible overall yield. Although DXP has been shown to be the rate-limiting enzyme of the nonmevalonate pathway,[17] NudF, as the last biocatalyst to channelize the metabolic flux, should also play a significant role in regulating the overall yield of terpenoids, and its limited molecular engineering study should thus be of prime interest for biological and industrial research to optimally increase the productivity.

Theoretical Calculations

Construction of the Sequence Data set and Functional Affirmation

Screening the NudF protein sequences in the NCBI protein database, a set of 105,910 protein sequences is identified, among which only 103 Bacillus subtilis entries are found and are used to build the primary data set. Purging the completely redundant entries with at least 80% alignment coverage through MMSeqs2,[30] a reduced data set of 40 entries is derived, and their length pattern is noted. Three entries WP_139026580.1, WP_139026569.1, and CUB36584.1 orderly encoding only 118, 86, and 55 residues, are found to be the partial sequences and are purged. A final data set of 37 entries, including 1, 6, and 30 sequences orderly encoding 179, 205, and 185 residues, is thus created. As it is necessary to functionally confirm the defined data set for further analysis, the sequences are fed to MEME[31] for screening the conservation of the signature motif and affirming the functions of the sequences to use the functionally correct entries. In the absence of any functionally similar experimentally resolved B. subtilis protein, the homologous Escherichia coli structure (PDB ID: 5U7E) is used as the representative entry to reliably localize the conserved motifs. Furthermore, to deploy the Bacillus subtilis representative sequence for the downstream analysis, the smallest sequence (CUB50584.1) is used from the constructed data set. It is because evolution tends to decrease a protein sequence length to pack the function more optimally.[32]

Modeling the Representative Sequence

As the B. subtilis representative sequence CUB50584.1 is still not experimentally determined, its tertiary structure is modeled through template-based modeling methodology using MODELER, as per the recently published strategy.[33,34] For reliably predicting the model, the wild-type nucleoside diphosphate sugar hydrolase from Bdellovibrio bacteriovorus (PDB ID: 5C7Q) is used as the template, sharing a sequence identity of 36%. As the first predicted protein model usually has several nonphysical local clashes, the constructed model is energetically relaxed through the conservative refinement strategy of Galaxyrefine2[35] to maximally retain the topology extracted from the template. The refined model is subsequently evaluated through Swissmodel scoring measures[36] and is also topologically assessed against the one, predicted using the Alphafold algorithm,[37] to assess its credibility.

Phylogenetic and Conservation Analysis

Statistically significant evolutionary relationships are typically considered for drawing the meaningful phylogenetic connections within the selected set of sequences/species. As a correct sequence alignment is certainly required for extracting the accurate evolutionary relationship, the 37 sequence B. subtilis data set is aligned through the hidden Markov model based clustal-omega module of HHPred.[38] The constructed profile is fed to the IQTree server[39] for deriving the evolutionary tree on the basis of the default parameters, using the default ultrafast methodology over 10,000 bootstrap alignments at the minimum correlation coefficient or the convergence threshold of 0.99. The resultant consensus tree is visualized and analyzed through ITOL.[40] Furthermore, the sequence alignment is fed to Consurf[41] for plotting the average sequence conservation scores over the predicted CUB50584.1 model, using the default conditions, and analyzing the sequence variations across the functionally crucial sites. Lastly, on basis of the functionally well-annotated B. subtilis (strain 168) homolog P54570, the constructed HMM profile is visualized using Espript[42] to map the encoded domains.

Active Site Prediction and Docking Analysis with IPP and DMAPP

To reliably channelize the interaction of substrates only at the biologically credible site(s) and to exclude any fake docking solution, the CastP version3 server is used to predict the active site(s).[43] For screening the most promising energetically feasible interaction site(s) of NudF against the substrates IPP and DMAPP, the Dockthor server is used,[44] using the default parameters. Through a set of 24 docking runs, it efficiently docks a ligand using the random seeds for all the rotational, translational, and conformational degrees of freedom of the ligand. Using a multiple solution genetic algorithm and the MMFF94S molecular force field scoring function, it constructs the docking solutions and assesses them on the basis of the docking energy, ligand entropy, and desolvation score to select their top-ranked cluster representatives. It is shown to be more accurate than seven state-of-the-art docking algorithms viz. AutoDock Vina, AutoDock, GOLD, Surflex-Dock, rDock, HPepDock, and Glide.[44] In the absence of any experimentally resolved IPP-/DMAPP-NudF complex structure, the docking result constructed through this highly accurate Dockthor algorithm would certainly be the reliable data to excavate the functional details. Hence, to attain biologically correct predictions, a complete degree of rotational freedom is orderly allowed for the six and five rotatable bonds of the selected ligands. The resultant solutions are subsequently analyzed using UCSF Chimera to extract the key interacting residues.

Crucial Residues for Functional Mutagenesis

The accuracy of a rationally evolved enzyme molecule is dependent on the identification of the hotspot residues, whose mutations could enhance its catalytic activity.[45] To analyze the hotspot regions, proximal to the active site and tunnel, and appropriately map the mutational landscape of the predicted protein molecule, the hotspot server 3.1 is used.[46] Although this server is capable of modeling an input protein sequence, the constructed CUB50584.1 model is considered to drive the analysis for the topologically reasonable model. As this server robustly estimates the thermodynamic stability of a mutation through FoldX and Rosetta, it has been demonstrated to successfully exclude disruptive mutations.[47,48] To extract its fullest potential and reliably select the best mutations from the preselected resultant ones, the resulting data are analyzed along with the other residues marked in the preceding steps. To understand the most promising mutations across the active site, the correlated mutations are mapped for the 179 residue sequence CUB50584.1 sequence through the GREMLIN server.[49] As these coevolution-based contacts are crucial to reliably discriminate the true hotspot residues from the spurious ones, the contact map network is analyzed to mark the true hotspots. Purging the unreliable positions, the true hotspots are analyzed through the Dynamut2 server,[50] trained, and tested on the Protherm database.[51] Its high accuracy is evident from the fact that it outperforms the other measures with a Pearson’s correlation of up to 0.72 and 0.64 for the single and multiple point mutations with a root-mean-square error (RMSE) (kcal/mol) of 1.02 and 1.8, respectively, making it a trustworthy strategy for prioritizing the stabilizing/destabilizing mutations.[50]

Computational Mutagenesis and Flexibility Analysis

On the basis of docking and mutagenesis results, the most favorable binding and substrate preferences are excavated. To robustly quantify the strength of interaction for each of the ligands, the Dynamut2 results are analyzed for the five prioritized residues (ARG18, ALA117, ASP139, GLU140, and ASP141), and their five top-ranking ΔΔG scores are analyzed. To examine these data, NetsurfP2.0,[52] with an 80% correlation with experimentally confirmed data, is preferred to estimate the relative solvent accessible surface area for the five key positions for the native protein. To further examine the topological features of the NudF structure, the flexibility of its Cα-backbone is predicted using CABS-flex 2.0, which computes an average topological fluctuation diversity of 10 medoids of the 10 cluster sets, derived from a 10 ns simulated set of 1000 decoys.[53] To further obtain more insights of the binding affinity of the two ligands, a small 100 ns molecular dynamics (MD) simulation is finally done to analyze the physical movements of ligands and protein atoms. The IPP and DMAPP topologies are constructed through PRODRG2,[54] and the CUB50584.1 model and its two complexes are simulated through the WebGro server.[55] To perform the simulation, the native protein and its complexes are prepared using the GROMOS96 43a1 forcefield.[56] Deploying the SPC water model as a solvent over the triclinic simulation box, the structures are neutralized by adding 0.15 M sodium chloride. To energetically relax the system before MD, the steepest descent algorithm (5000 steps) is applied. Considering 1000 frames per simulation, the structures are simulated for 100 ns using the constant temperature of 300 K and a pressure of 1.0 bar. The trajectory is lastly integrated through the leap-frog algorithm for assessing the resultant trajectory through all of its encoded parameters, viz. time-dependent root-mean-square deviation (RMSD) for the overall structure and RMS fluctuations (RMSF) across its residues, radius of gyration (Rg), number of hydrogen bonds, overall and per-residue solvent accessible surface area (SASA), and ligand RMSD.

Results and Discussions

Construction of the Sequence Data set and Its Functional Affirmation

To affirm the functional attribute of all the derived 37 sequences, the relative location of their three top-ranked conserved motifs is screened against the functionally deciphered structure (5U7E) of E. coli through MEME (Figure A). The members of the Nudix superfamily encode the signature sequence GX5EX7REUXEEXG/TU, where U is either L/V or I,[57] as represented in green in this figure. This conserved sequence is responsible for metal-binding and forms the catalytic site in more than 4000 enzymes in numerous species including eukaryotes, prokaryotes, and viruses.[58] The NudF length range is shown to be 179–205, with 182-residue sequence being the most common. Here, the six 205-residue sequence subset shows a stark feature, that is, ASB92783.1, ASB69297.1, ASB56714.1, AOR97677.1, AOL97023.1, and OAZ71765.1encode only one characteristic motif, as observed for 5U7E, unlike the other B.subtilis sequences having all the three conserved motifs.

Figure 2

(A) Top-3 motifs, screened against the functionally deciphered structure 5U7E of E. coli, (B) structural overlap of the constructed model over the 5U7E, (C) Ramachandran map of the predicted model, and (D) topological superimposition of this model (red) over the Alphafold model (cyan). A complete overlap implies the topological accuracy of the predicted NudF structure. The representative B. subtilis sequence CUB50584.1 is modeled using 5C7Q through the template-based modeling methodology using MODELLER9.25. The predicted structure is refined using GalaxyRefine2[35] and is subsequently evaluated through Swissmodel.[36] The model shows a Molprobity score of 1.28, indicating a topological accuracy of a reasonably high accuracy X-ray structure, and its RMSD score is found to be 0.887 against the deployed template (Figure B). Furthermore, 98.87% model residues are found localized within the Ramachandran favored regions (Figure C). The model also shows a low RMSD of 1.116 against the structure, predicted using Alphafold algorithm,[37] and these two decoys are found to be structurally superimposed, with a marginal deviation of a few loop and terminal residues (Figure D). It is thus confirmed that Alphafold specifically works well especially for proteins, sharing a sequence identity lesser than 30% against the known templates, as reported recently.[37] The predicted structure is then reliably deployed for the subsequent analysis.

Phylogenetic and Conservation Analysis

Feeding the sequence alignment of HHPred-clustal omega to IQTree and building the evolutionary tree using 10,000 bootstrap alignments at the convergence threshold of 0.99, the resultant consensus solution is visualized and analyzed through ITOL (Figure ). Although the single motif-six sequence subset (AOR97677.1, OAZ71765.1, ASB69297.1, ASB92783.1, ASB56714.1, and AOL97023.1) appears to be clustered with 5U7EA and the representative sequence CUB50584.1, encoding all three motifs, their average sequence identity is 30.3640.339, compared to 14.73 and 21.3630.3304 against 5U7EA and CUB50584.1, respectively. Moreover, all the other sequences are not found to share substantial sequence similarity, as also observed with their scrutiny through HMM-based Clustal Omega alignment.[38] All the other sequences are found uncluttered with any other entry, and as recently shown in the phylogenetic study of 80,000 Nudix homologs, a general monophyly is prominent, besides a few occasional incidences of homoplasy.[59] The constructed profile is further fed to Consurf for mapping the average sequence conservation scores over the predicted structure, using the default conditions,[41] and the sequence variation across the functionally crucial sites is analyzed (Figure ). The HMM alignment of the 37 sequence data set and the functionally decoded protein P54570 is mapped using Espript[39] to designate the essential sequence features of the CUB50584.1 sequence. It is observed that the Nudix domains of CUB50584.1 and P54570 fully overlap (Figure C).

Figure 3

Unrooted circular tree of the 37 sequence NudF data set. The scale bar represents the average number of substitutions per site.

Figure 4

(A) Consurf-derived sequence conservation for the 37 sequence NudF data set, projected onto the representative sequence model CUB50584.1. (B) Color range varying between 1 and 9, maroon referring to the completely conserved positions. (C) Complete overlap of the IPP- and DMAPP-interacting positions and Nudix domains of P54570 (residues 40–171) and CUB50584.1 (residues 42–177).

Unrooted circular tree of the 37 sequence NudF data set. The scale bar represents the average number of substitutions per site. (A) Consurf-derived sequence conservation for the 37 sequence NudF data set, projected onto the representative sequence model CUB50584.1. (B) Color range varying between 1 and 9, maroon referring to the completely conserved positions. (C) Complete overlap of the IPP- and DMAPP-interacting positions and Nudix domains of P54570 (residues 40–171) and CUB50584.1 (residues 42–177). A set of 14 residues, viz. ILE20, ALA45, GLN62, GLY77, GLU80, GLY82, ALA89, GLU92, GLU95, GLU96, THR112, ALA126, THR130, and GLU148 is found to be completely conserved, with a conservation score of 9. Moreover, 33 residues LEU4, THR8, PHE15, PRO30, ASN31, LYS36, ILE39, HIS42, PRO43, ALA50, LYS56, VAL60, LYS65, ILE71, PRO85, THR88, ARG91, LEU93, GLU94, THR97, LEU106, ILE107, GLU119, TYR124, ASP141, VAL144, ALA154, LEU157, ASP166, LYS167, THR168, PHE170, and GLN173 show a conservation score of 7 and 8, indicating significantly higher conservation. Furthermore, 11 residues, viz. PRO13, ARG23, VAL24, ALA33, MET34, LYS69, LYS131, SER150, GLU153, ASP160, and HIS164 are found to be completely variable across the defined data set. It is astonishing that the representative B. subtilis sequence encodes mere 11 (6.145%) variant residue loci in contrast to the statistically higher sequence conservation at 47 (26.259%) positions (Figure ), and still, the enzyme is able to show a highly promiscuous nature, and it indicates that a few key residues should play a major role within the active site of this enzyme.

Active Site Prediction and Docking Analysis with IPP and DMAPP

Excluding the shallow openings and using a probe radius of 1.4 Å, CastP[43] delineates the empty concavities on a protein surface to map the volume spectrum of cavities and pockets. It robustly deciphers the surface properties and localizes the functionally important zone and shows only two biologically meaningful active pockets in the predicted NudF model, with a molecular surface area (Å2) and molecular volume (Å3) of 501 and 946.6 (Pocket1) and 208.9 and 661.3 (Pocket2), respectively, and this is in accordance with the known structural details of the NudF’s E. coli homolog, wherein two active sites have been observed.[60] Furthermore, these pockets are orderly found to span a set 12 (MET1, LEU4-A-GLU6, ARG37, ILE39, TYR111, PRO114, GLY115, ALA117, ASP118, and ILE120) and 30 (ARG18-VI-LYS21, VAL40-NH-PRO43, ALA45, VAL60, GLN62-YRK-ALA66, GLU73-IPAG-LYS78, GLU96, THR112, GLU119, LEU121, ASP141, GLU142, ASP165, ALA166, and LYS167) residues. Here, it is worth noting that Dockthor uses only one most-voluminous docking site for both ligands (Pocket1, Figure A), and it indicates a preferential binding of ligands over the second superficial site (Pocket2, Figure B).

Figure 5

Dockthor results of both IPP and DMAPP. (A) Pocket1 (B) Pocket2 represented in correlation with the two ligands IPP (yellow) and DMAPP (blue), along with the close analytical view of Pocket1, in terms of (C) molecular surface. (D) Active site topology at the vent of the tunnel. (E) Two-dimensional and (F) three-dimensional interaction maps of protein with DMAPP. (G) Two-dimensional and (H) three-dimensional interaction maps of protein with IPP show the key residues interacting within the active site. As a modest level of NudF is enough to overcome the IPP/DMAPP toxicity,[12] this enzyme must interact with both of these substrates, and this responsible molecular NudF surface (Figure C) closely interacts with the two ligands (Figure D). The ligands IPP and DMAPP orderly show an interaction energy (kcal/mol) and affinity score (kcal/mol) of −115.388 and −6.431, and −41.402 and −5.271. Through random forest, Dockthor has tested the predicted binding affinity over the PDBbind v2013 data set, and it has shown the RMS error of 2.256 kcal/mol at a correlation coefficient of 0.705, indicating a reasonable credibility of the predicted docking solution.[61] Examining the docked complexes using the discovery studio, it is observed that DMAPP has two interactions with NudF. While the DMAPP-O atom forms a hydrogen bond with the Ser113-HG atom of NudF with a bond length of 1.355 Å, the DMAPP-C atom shows hydrophobic π-interactions with a DMAPP-C atom with a bond length of 5.085 Å. IPP, on the other hand, it has favorable interactions with NudF, and its oxygen atom forms conventional hydrogen bonds with NudF-ARG64-HH21 and -HH22 atoms, with bond lengths of 2.912 and 2.592, respectively, as well as two salt bridges with ARG64-HH12 and LYS167-HZ2 atoms, with bond lengths of 2.561 and 2.753, respectively. Moreover, the IPP-O atom shows three electrostatic interactions with ARG64-NH1, −NH2, and LYS167-NZ atoms with bond lengths of 5.255, 3.963, and 3.212 Å respectively. Screening the active site residues within 5 Å of these two ligands, it is observed that 14 residues viz. VAL19, ILE20, HIS42, ALA45, GLN62, ARG64, GLU73, ALA76, LYS78, THR112, GLU119, LEU121, ASP165, and LYS167, and 8 residues viz. VAL22, GLU38, VAL40, HIS42, SER113, PHE116, ALA117, and GLU119 orderly interact with the two ligands IPP (yellow) and DMAPP (blue), as shown in Figure C. To analyze it further, the two-dimensional and three-dimensional interaction maps are drawn for DMAPP (Figure E,F) and IPP (Figure G,H). It indicates that two key hotspot residues LYS78 and PHE116, orderly responsible for interacting with these ligands through one and two residues in the active site, could be the key to specifically alter the active site to stabilize its affinity for the conditionally required ligand.

Crucial Residues for Functional Mutagenesis

Through GREMLIN,[49] the contact map network of the modeled protein is constructed. For all the predicted contacts, it yields the two-dimensional distance matrix along with their probability scores and is shown to be accurately predicting both direct and indirect residue couplings. The coevolution-based network for CUB50584.1 is analyzed (Figure A), and the statistically top-ranked contacts are extracted. A set of 22 functional hotspot residues, viz. ARG18, ILE20, LYS21, ASN41, PRO43, LEU67, ILE71, LYS78, LEU79, PRO81, GLY82, PHE110, TYR111, THR112, PHE116, ALA117, LEU121, ASP139, GLU140, ASP141, ALA166, and PHE170, with a theoretically credible probability score higher than 0.5, are found in NudF. However, LEU79 is found to be completely buried in the structural core of NudF. Moreover, ILE20 and THR112 are found to be completely conserved and are thus excluded. The 19 key hotspot residues are found to define a structural network of 0–4 contacts. As previously stated,[62,63] the loop proline residues usually stabilize a structural bend through internal hydrogen bonding, and PRO43 and PRO81 are also not considered for subsequent computational mutagenesis study. Plotting the number of contacts for the hotspot residues, seven positions, viz. ARG18, PRO43, LYS81, ALA117, ASP139, GLU140, and ASP141 are not found to have any contacts (Figure B), and as the delicate balance between the flexibility and stiffness is sustained by the key contacts, majorly regulating the functional role(s) of a protein, these positions (Figure ) are selected for further analysis.

Figure 6

GREMLIN resultant (A) contact map network and (B) number of contacts for the 19 key hotspot residues, placed proximal to the active site.

Figure 7

Seven key hotspot residues excavated for designing the functionally improved mutants.

GREMLIN resultant (A) contact map network and (B) number of contacts for the 19 key hotspot residues, placed proximal to the active site. Seven key hotspot residues excavated for designing the functionally improved mutants. Excavating the key residues, the Dynamut2 server is used to estimate the stability changes upon a point mutation on the CUB50584.1 model. The method derives the scores through the topological environment property and dynamic behavior of a residue and is recently shown to outperform the prediction measures including FoldX and MAESTRO.[50] Restricting the search to the prioritized residues, the most stabilizing top five mutations are selected. The ΔΔG scores (KJ/mol) of these mutations ranges from −1.67 to 1.06 (Figure , Supplementary Table 1). To correctly drive the analysis, the relative surface accessibility (RSA) for the selected residues is computed using NetSurfP2.0.[52] It reveals how easily a residue can be accessed from the protein surface, and as the mutations at this position are less likely to disrupt the overall fold of a protein, the chance of developing a functionally fruitful mutation should be substantially higher. The prioritized residues orderly show an RSA score of 0.393, 0.232, 0.536, 0.696, 0.622, 0.449, and 0.397 (Figure A), categorized as exposed or buried at the threshold of 0.25.

Figure 8

Structural scrutiny. (A) DDG–RSA plot for the five top-ranked mutations of the seven prioritized residues, (B) predicted NudF flexibility showing a high score for the eight loop regions, and (C) structural position of the eight flexible loops. A lower DDG score implies a more energetically stable conformation. To excavate the active site and its key catalytic residues, the average structural fluctuations across NudF are predicted using CABS-flex 2,[53] and a set of 8 loops: loop1–loop8 (Figure B), spanning from ILE14-ARG23, ASP26-ARG37, ILE39-VAL46, ALA50-VAL58, GL62-ILE72, GLY77-PRO85, TYR111-LEU121, and THR130-VAL144, marked in red in Figure C, are found to be flexible. These loops, encompassing a 48.6% structure, majorly define the activity site topology and are envisaged to substantially regulate the functional nature of NudF. It is interesting to observe that the selected five residues ARG18, ALA117, ASP139, GLU140, and ASP141 are encoded by the loop1, loop7, and loop8, and it could be there that the loop1 acts as a capping loop and loop7 and loop8 make the active site sufficiently voluminous to interact with IPP and DMAPP. It is similar to E. coli NudF loop9, which is stabilized by its closed conformation.[60] To further analyze the dynamic atomic interactions of IPP/DMAPP within the active site, 100 ns NudF and its complexes are simulated using WebGro, as shown recently.[64,65] MD simulation is a robustly accurate strategy to analyze the conformational changes that occur when a ligand is induced to fit.[66] Using the GROMACS-based Webgro[55] server, the protein/complex system is computationally evolved using the classical mechanics algorithm for a short 100 ns timespan, and the conformational stability or binding affinity of a ligand is assessed across the simulation trajectory. To analyze the simulation results, Rg or the average distance between the center of mass and the rotational axis is usually used to estimate the conformational stability of a system against any physicochemical strain,[67] and its lower score implies a higher stability. Here, the apoprotein shows the Rg scoring variations between ∼1.5 and 2.75 nm (Figure ), and its complex with DMAPP and IPP orderly shows the respective scores of ∼1.8–2.05 and ∼1.875–2.05 nm, indicating that the apoprotein is relatively more unstable than its complex structure, exactly like E. coli NudF.[60] Moreover, the IPP complex is more stable than the DMAPP complex because its trajectory mostly crawls at substantially lower scores across the simulation. The RMSD is another helpful measure for estimating the structural stability and the overall deviation from the backbone topology during the complex formation at a specified temperature, as used earlier by GROMACS.[68,69] The backbone-based RMSD trajectory score should be the lowest because it directly gives the deviation of mean atomic coordinates and reflects the macromolecule’s conformational stability. The scrutiny of the RMSD across the trajectory shows that the RMSD variations of the apoprotein range between the acceptable range of ∼0.5–2.5 nm in contrast to the respective ranges of 0.3–0.9 and 0.3–0.75 for its DMAPP and IPP complexes (Figure ). Unlike the DMAPP complex trajectory, showing substantially higher undulations, the IPP complex gets stabilized after ∼60 ns.

Figure 9

MD analysis for the apoprotein NudF and its DMAPP and IPP complexes for the RMSD, RMSF, radius of gyration, solvent accessible surface area, area/residues, hydrogen bonds, and ligand RMSD scores. It clearly indicates a prioritized binding for IPP within the promiscuous active site of NudF. Furthermore, the RMSF, or residue fluctuations across the simulation, reveals a macromolecular system’s conformational stability, and its scoring undulations describe structural complexity, with a lower value signifying higher overall stability.[56,63] For NudF, RMSF is found to be within ∼0.5–2 nm (Figure ), although its three C-term loop residues and two preceding terminal 11-residue α-helices, connected through a five-residue loop, shows a stark increment crossing ∼3 nm. In contrast, the DMAPP and IPP complexes orderly show the RMSF divergence between ∼0.1–0.8 and ∼0.06–0.525 nm, and here, the terminal double-helix does not show any significant displacement. It is interesting to note that the RMSF score across the chain for NudF-IPP is lesser than the NudF-DMAPP scores, indicating that the former is a more stable complex structure. As the ligand binding causes conformational changes in the receptor, its respective substructural alterations are likely to affect SASA, and thus, it has been utilized multiple times to assess the level of receptor exposure to the surrounding solvent molecules.[67,68] Although a significant deviation is not observed in the per-residue SASA score, usually showing a comparatively large surface area or a lower compactness of this substructure, the SASA scores are found to be highly variant across the simulation trajectory. The SASA for the NudF structure is observed to consistently drop along the trajectory from 112.5 to 97.5 nm2 (Figure ), although the respective score of the DMAPP complex declines from 117.5 to 97.5 nm2, with a sharp drop at 20 ns. In contrast, the corresponding score for the IPP complex declines from 112.5 to 100 nm2, and except for a few time intervals, minute variations were observed throughout the simulation period. SASA for the IPP complex drops dramatically from 110 to 103 nm2 after 5 ns. The equivalent declination for the DMAPP complex, on the other hand, is noticed after 17.5 ns, and its trajectory has significantly more unequal undulations. NudF also exhibits this distinctive declination after 10 ns, and here, it could indicate a significant functional transition, leading to the substantial increase in structural compactness. Moreover, after the initial decline, the SASA graph moves likewise for both DMAPP and IPP complexes. For estimating the binding affinity of IPP/DMAPP with NudF (Figure ), the MD trajectories are further investigated to analyze the extent of hydrogen bonds made along the simulation, as usually done.[67,68] Along the trajectory, NudF shows a nearly 100–120 hydrogen bonds. However, the DMAPP and IPP complexes orderly show 100–130 and 110–130 hydrogen bonds, and an almost similar variation throughout the trajectory. It indicates their nearly similar scale of atomic interaction within the active site, as also shown by their substantially similar SASA undulations. Although, observing the earlier results, the affinity of DMAPP should not be comparable to that of IPP, consistency of hydrogen bonds is maintained for both the ligand complexes throughout the trajectory, and it indicates the comparable stability of these complexes. To excavate it further, the protein–ligand hydrogen bonding variations are analyzed through the trajectory, and for the DMAPP complex, the number of hydrogen bonds is found to slowly increase to two with significantly variant undulations. However, for the IPP complex, the number of bonds consistently increases, and after ∼60 ns, nearly one bond is maintained throughout the simulation, showing its more stable interaction. As hydrogen bonds are the major interactions to drive the proper anchoring of ligands within the active site, the higher number of such bonds should be responsible for a stronger interaction.[67,68] Figure deciphers three key features. First, the ddG score is found to be the highest for PHE116 and LYS78 residues, and it confirms that the two residues certainly hold the key to actively evolve the enzyme against the substrates by increasing the structural stability. Second, ASP139, GLU140, and ASP141 show a remarkably insignificant ddG score, and it implies that these positions are highly crucial for the NudF function and their top-ranked mutations also failed to stabilize the protein, as earlier discussed by Nobel Laureate Frances Arnold.[70] However, these three residues, along with the substructure T130-L138, show a high RSA score, and it indicates that the flexibility of this superficial loop region could impart a significant functional attribute to NudF. Restricting the search to LYS78 and PHE116 indicates that these positions should be significantly crucial for the stability and interaction affinity of the active site. Third, as LYS78 is superficially more exposed in comparison to ARG116 and is only in contact with one other residue, it should be first mutated to study its effect on the overall product yield. It opens venues for their experimental verifications as the top-ranked mutants K78I/K78L and PHE116D/PHE116E could selectively stabilize the conformation and could be responsible for the ligand specificity, urgently needed to design the novel industrially useful NudF enzyme. The study extends our understanding about ADP-ribose pyrophosphatase and shows that it has a preferential bias for IPP over DMAPP, with −115.388 (kcal/mol) versus −41.402 (kcal/mol), respectively, although if the former is missing, the protein interacts with DMAPP at a much slower rate, and probably, this could be a key signal to IDI to initiate the conversion of DMAPP to IPP. Recently, the promiscuous activity of EcNudB toward geranyl diphosphate and farnesyl diphosphate has been demonstrated to generate several isoprenoid alcohols, including isopentenol, geraniol, and farnesol, as well as their derivatives.[71] Thus, the promiscuous dephosphorylation of NudF should be studied further through various other substrates, and with that, it would open venues to industrially engineer the cells, wherein the downstream reactions could be channeled more actively with minimal cellular regulation.

Conclusions

The research thoroughly examines the Bacillus subtilis ADP-ribose pyrophosphatase and its 37 functionally confirmed homologs with the projected tertiary structure of the 179-residue representative sequence CUB50584.1. Besides analyzing the phylogenetic relationship, it maps the highly conserved and variant loci to build the knowledge base for its directed evolution experiments. Although the sequence data set shows a significantly high phylogenetic divergence, 26.259% residues are found to have a statistically higher evolutionary conservation. ADP-ribose pyrophosphatase shows a prioritized interaction with IPP than DMAPP, according to the docking energy data, with −115.388 (kcal/mol) versus −41.402 (kcal/mol). The topological variations are restricted to eight loop regions, maximally encompassing the active site, demonstrating their importance for the ligand binding. Seven residues (ARG18, PRO43, PRO81, ALA117, ASP139, GLU140, and ASP141) are not found to have even one contact in the contact map network of 22 hotspot sites, and mutational analysis for these seven positions shows the highest ΔΔG scores for LYS78 and PHE116, orderly encoded within loop1 and loop7, encapsulating the active site. Quite similar to NudF, the four top-ranked mutants F116E, F116D, K78I, and K78L show the highest ΔΔG scores of 1.06, 0.97, 0.88, and 0.84, and it indicates that these positions should be the key to maximally direct the synthesis of required terpenoids in Bacillus subtilis. MD analysis reveals that the NudF structure is unstable, just like E. coli NudB, and that its IPP complex is more stable than the DMAPP complex. Thus, the present study must be industrially useful to channelize the entire DXP pathway toward the increased production of prenol or isoprenol or their downstream molecules without generating any metabolic burden.

Authors Contributions

A.R. conceived the study. D.P. performed the experiments. Both the authors have written the manuscript and have carefully read and approved the manuscript.

64 in total

1. NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning.

Authors: Michael Schantz Klausen; Martin Closter Jespersen; Henrik Nielsen; Kamilla Kjaergaard Jensen; Vanessa Isabell Jurtz; Casper Kaae Sønderby; Morten Otto Alexander Sommer; Ole Winther; Morten Nielsen; Bent Petersen; Paolo Marcatili
Journal: Proteins Date: 2019-03-09

2. Probabilistic divergence of a template-based modelling methodology from the ideal protocol.

Authors: Ashish Runthala
Journal: J Mol Model Date: 2021-01-07 Impact factor: 1.810

3. Effect of introducing proline residues on the stability of Aspergillus awamori.

Authors: Y Li; P J Reilly; C Ford
Journal: Protein Eng Date: 1997-10

4. Quantification of biases in predictions of protein stability changes upon mutations.

Authors: Fabrizio Pucci; Katrien V Bernaerts; Jean Marc Kwasigroch; Marianne Rooman
Journal: Bioinformatics Date: 2018-11-01 Impact factor: 6.937

5. Escherichia coli open reading frame 696 is idi, a nonessential gene encoding isopentenyl diphosphate isomerase.

Authors: F M Hahn; A P Hurlburt; C D Poulter
Journal: J Bacteriol Date: 1999-08 Impact factor: 3.490

6. Identification of isopentenol biosynthetic genes from Bacillus subtilis by a screening method based on isoprenoid precursor toxicity.

Authors: Sydnor T Withers; Shayin S Gottlieb; Bonny Lieu; Jack D Newman; Jay D Keasling
Journal: Appl Environ Microbiol Date: 2007-08-10 Impact factor: 4.792

7. Isopentenyl diphosphate (IPP)-bypass mevalonate pathways for isopentenol production.

Authors: Aram Kang; Kevin W George; George Wang; Edward Baidoo; Jay D Keasling; Taek Soon Lee
Journal: Metab Eng Date: 2015-12-17 Impact factor: 9.783

8. Metabolic engineering of Escherichia coli for production of mixed isoprenoid alcohols and their derivatives.

Authors: Bakht Zada; Chonglong Wang; Ji-Bin Park; Seong-Hee Jeong; Ju-Eon Park; Hawaibam Birla Singh; Seon-Won Kim
Journal: Biotechnol Biofuels Date: 2018-07-24 Impact factor: 6.040

9. DynaMut2: Assessing changes in stability and flexibility upon single and multiple point missense mutations.

Authors: Carlos H M Rodrigues; Douglas E V Pires; David B Ascher
Journal: Protein Sci Date: 2020-09-11 Impact factor: 6.725

10. Highly accurate protein structure prediction with AlphaFold.

Authors: John Jumper; Richard Evans; Alexander Pritzel; Tim Green; Michael Figurnov; Olaf Ronneberger; Kathryn Tunyasuvunakool; Russ Bates; Augustin Žídek; Anna Potapenko; Alex Bridgland; Clemens Meyer; Simon A A Kohl; Andrew J Ballard; Andrew Cowie; Bernardino Romera-Paredes; Stanislav Nikolov; Rishub Jain; Demis Hassabis; Jonas Adler; Trevor Back; Stig Petersen; David Reiman; Ellen Clancy; Michal Zielinski; Martin Steinegger; Michalina Pacholska; Tamas Berghammer; Sebastian Bodenstein; David Silver; Oriol Vinyals; Andrew W Senior; Koray Kavukcuoglu; Pushmeet Kohli
Journal: Nature Date: 2021-07-15 Impact factor: 49.962