Devi Prasanna1, Ashish Runthala1. 1. Department of Bio-Technology, Koneru Lakshmaiah Education Foundation, Vaddeswaram, AP 520002, India.
Abstract
Terpenoids form a large pool of highly diverse organic compounds possessing several economically important properties, including nutritional, aromatic, and pharmacological properties. The 1-deoxy-d-xylulose 5-phosphate (DXP) pathway's end enzyme, nuclear distribution protein (NudF), interacting with isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP), is critical for the synthesis of isoprenol/prenol/downstream compounds. The enzyme is yet to be thoroughly investigated to increase the overall yield of terpenoids in the Bacillus subtilis, which is widely used in industry and is generally regarded as a safe (GRAS) bacterium. The study aims to analyze the evolutionary conservation across the active site for mapping the key residues for mutagenesis studies. The 37-sequence data set, extracted from 103 Bacillus subtilis entries, shows a high phylogenetic divergence, and only six one-motif sequences ASB92783.1, ASB69297.1, ASB56714.1, AOR97677.1, AOL97023.1, and OAZ71765.1 show a monophyly relationship, unlike a complete polyphyly relationship between the other 31 three-motif sequences. Furthermore, only 47 of 179 residues of the representative sequence CUB50584.1 are observed to be significantly conserved. Docking analysis suggests a preferential bias of adenosine diphosphate (ADP)-ribose pyrophosphatase toward IPP, and a nearly threefold energetic difference is observed between IPP and DMAPP. The loops are hereby shown to play a regulatory role in guiding the promiscuity of NudF toward a specific ligand. Computational saturation mutagenesis of the seven hotspot residues identifies two key positions LYS78 and PHE116, orderly encoded within loop1 and loop7, majorly interacting with the ligands DMAPP and IPP, and their mutants K78I/K78L and PHE116D/PHE116E are found to stabilize the overall conformation. Molecular dynamics analysis shows that the IPP complex is significantly more stable than the DMAPP complex, and the NudF structure is very unstable. Besides showing a promiscuous binding of NudF with ligands, the analysis suggests its rate-limiting nature. The study would allow us to customize the metabolic load toward the synthesis of any of the downstream molecules. The findings would pave the way for the development of catalytically improved NudF mutants for the large-scale production of specific terpenoids with significant nutraceutical or commercial value.
Terpenoids form a large pool of highly diverse organic compounds possessing several economically important properties, including nutritional, aromatic, and pharmacological properties. The 1-deoxy-d-xylulose 5-phosphate (DXP) pathway's end enzyme, nuclear distribution protein (NudF), interacting with isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP), is critical for the synthesis of isoprenol/prenol/downstream compounds. The enzyme is yet to be thoroughly investigated to increase the overall yield of terpenoids in the Bacillus subtilis, which is widely used in industry and is generally regarded as a safe (GRAS) bacterium. The study aims to analyze the evolutionary conservation across the active site for mapping the key residues for mutagenesis studies. The 37-sequence data set, extracted from 103 Bacillus subtilis entries, shows a high phylogenetic divergence, and only six one-motif sequences ASB92783.1, ASB69297.1, ASB56714.1, AOR97677.1, AOL97023.1, and OAZ71765.1 show a monophyly relationship, unlike a complete polyphyly relationship between the other 31 three-motif sequences. Furthermore, only 47 of 179 residues of the representative sequence CUB50584.1 are observed to be significantly conserved. Docking analysis suggests a preferential bias of adenosine diphosphate (ADP)-ribose pyrophosphatase toward IPP, and a nearly threefold energetic difference is observed between IPP and DMAPP. The loops are hereby shown to play a regulatory role in guiding the promiscuity of NudF toward a specific ligand. Computational saturation mutagenesis of the seven hotspot residues identifies two key positions LYS78 and PHE116, orderly encoded within loop1 and loop7, majorly interacting with the ligands DMAPP and IPP, and their mutants K78I/K78L and PHE116D/PHE116E are found to stabilize the overall conformation. Molecular dynamics analysis shows that the IPP complex is significantly more stable than the DMAPP complex, and the NudF structure is very unstable. Besides showing a promiscuous binding of NudF with ligands, the analysis suggests its rate-limiting nature. The study would allow us to customize the metabolic load toward the synthesis of any of the downstream molecules. The findings would pave the way for the development of catalytically improved NudF mutants for the large-scale production of specific terpenoids with significant nutraceutical or commercial value.
Isoprenoids are the most functionally and structurally varied class
of secondary metabolites, with over 55,000 identified molecules,[1,2] and have been used to make aromatic, flavoring, and medicinal molecules.[3−6] As the natural extraction of isoprenoid-based bioactives has led
to an overexploitation of plants, the global research interest has
now shifted to utilize the generally regarded as safe (GRAS) status
microbes like Bacillus subtilis as
the biocatalytic machinery.[7] A key protein
controlling the yield of an end-product molecule is adenosine diphosphate
(ADP)-ribose pyrophosphatase or NudF (EC 3.6.1.13) that orderly hydrolyzes
dimethylallyl pyrophosphate (DMAPP) and isopentenyl pyrophosphate
(IPP) to prenol and isoprenol (Figure ), collectively termed as isopentenol (C5 alcohol), a building block of all the higher-order downstream terpenoid
molecules.[8] It belongs to the Nudix superfamily
(Pfam PF00293; InterPro IPR000086)[9] and
is involved in the production of adenosine monophosphate (AMP) and
ribose-5-phosphate from ADP-ribose by actively dephosphorylating the
phosphate moieties of a variety of substrates. Its close homolog is
the NudB protein (EC: 3.6.1.67) from E. coli (EcNudB), having a catalytic activity toward geranyl pyrophosphate
(GPP) and farnesyl pyrophosphate (FPP).[10] It has been suggested that an additional phosphatase activity (AphA)
is needed to hydrolyze IPP to isoprenol because NudB can only catalyze
the hydrolysis of IPP to isopentenol.[11] While EcNudB exhibits a strong affinity for DMAPP, the NudF protein
of Bacillus subtilis (BsNudF) shows
no such preference and equally interacts with both DMAPP and IPP to
orderly produce an equivalent amount of prenol and isoprenol.[12] A varying affinity for various substrates may
substantially increase pressure on the stoichiometric flux, and until
the connected regulatory network of proteins does not drain out the
added molecules at an equivalent rate, it becomes toxic to the cell
and inhibits cell growth. Hence, to escape this major bottleneck,
a fusion protein of isopentenyl-diphosphate delta isomerase (IDI),
also known as isopentenyl pyrophosphate isomerase (IPP isomerase),
and EcNudB has been utilized and with an increased expression, the
metabolic flux increases the bioproduction rate of only prenol and
reduces the isoprenol production. Endogenous overexpression of IspG
and 1-deoxy-d-xylulose-5-phosphate synthase (DXP) as well
as exogenous expression of YhfR and BsNudF have been shown to increase
the production rate of downstream molecules in E. coli,[13] and the overexpression of NudF in B. subtilis has been shown to improve the isopentenol
yield.[12]
Figure 1
MEP pathway, representing the biocatalysts
at all intermediary
stages. From the glyceradehyde-3-phosphate (G3P) and pyruvate, it
synthesizes the 5-carbon building blocks DMAPP and IPP for producing
terpenoids. It includes an ordered set of seven enzymes viz. DXP,
DXP reductoisomerase (DXR), 2-C-methyl-d-erythritol 4-phosphate cytidylyltransferase (IspD), 4-diphosphocytidyl-2-C-methyl-d-erythritol kinase (IspE), 2-C-methyl-d-erythritol 2,4-cyclodiphosphate synthase
(IspF), HMB-PP synthase (IspG), and HMB-PP reductase (IspH), followed
by the interplay of IDI and NudF.
MEP pathway, representing the biocatalysts
at all intermediary
stages. From the glyceradehyde-3-phosphate (G3P) and pyruvate, it
synthesizes the 5-carbon building blocks DMAPP and IPP for producing
terpenoids. It includes an ordered set of seven enzymes viz. DXP,
DXP reductoisomerase (DXR), 2-C-methyl-d-erythritol 4-phosphate cytidylyltransferase (IspD), 4-diphosphocytidyl-2-C-methyl-d-erythritol kinase (IspE), 2-C-methyl-d-erythritol 2,4-cyclodiphosphate synthase
(IspF), HMB-PP synthase (IspG), and HMB-PP reductase (IspH), followed
by the interplay of IDI and NudF.Although sufficient IPP and DMAPP concentrations are required for
terpenoid synthesis,[8] an excess of these
molecules can inhibit cell growth,[12] and
thus reduce terpenoid production,[14] and
therefore, the NudF enzyme continuously uses these potentially harmful
debris molecules. To increase the yield of the DXP pathway, researchers
have specifically targeted the first rate-limiting enzyme, DXP[15,16] because its Kcat/Km score is much smaller than that of all the other enzymes.[17] Improving the DXP activity has been considered
as the most effective measure to improve the overall yield for several
species, including Streptomyces,[18]Lycopersicon esculentum,[19] and Synechococcus leopoliensis.[20,21] Heterologous expression of Bacillus subtilis DXP in E. coli has also been shown to increase the overall yield.[22] Yet another strategy, using the functionally mutated recombinant
poplar DXP,[23] has shown a negligible feedback
inhibition for IPP/DMAPP and has been fruitful. To mitigate the toxicity
of IPP, the farnesyl diphosphate synthase and IDI synthase have been
overexpressed in E. coli, and it has
led to an 800-fold production of sesquiterpene β-farnesene.[24] Likewise, the balanced metabolic concentration
of IPP and DMAPP,[25,26] heterologous expression of Haematococcus pluvialis IDI,[27] or overexpression of Lycium chinense IDI in E. coli(28) have been shown to increase the overall yield. Recently,
the modulation of the culture medium has also been demonstrated to
increase the isopentenol production by 2.5-folds in B. subtilis.[29] Although
the majority of these methods have overexpressed the rate-limiting
enzymes or have changed the culture conditions, it may lead to a complete
metabolic imbalance and significantly minimize the yield of downstream
terpenoids. In this regard, directed co-evolution of DXP, DXR, and
IDI has been shown to increase isoprene production.[28] However, these strategies have also not efficiently focused
the major enzyme BsNudF for increasing the overall yield, and directed
evolution of NudF’s promiscuous active site would therefore
be significantly useful to increase the biosynthetic rate of a downstream
terpenoid. For understanding the preferential binding of IPP/DMAPP,
and catalytic and behavioral switching of BsNudF, this article deciphers
the key functional details across the active site for decoding the
mutations that could improve its activity. As it comprehensively maps
the crucial residues at the functionally important positions, the
study will be fruitful in designing a custom set of key interacting
residues against a required ligand to attain a theoretically impossible
overall yield. Although DXP has been shown to be the rate-limiting
enzyme of the nonmevalonate pathway,[17] NudF,
as the last biocatalyst to channelize the metabolic flux, should also
play a significant role in regulating the overall yield of terpenoids,
and its limited molecular engineering study should thus be of prime
interest for biological and industrial research to optimally increase
the productivity.
Theoretical Calculations
Construction of the Sequence Data set and
Functional Affirmation
Screening the NudF protein sequences
in the NCBI protein database, a set of 105,910 protein sequences is
identified, among which only 103 Bacillus subtilis entries are found and are used to build the primary data set. Purging
the completely redundant entries with at least 80% alignment coverage
through MMSeqs2,[30] a reduced data set of
40 entries is derived, and their length pattern is noted. Three entries
WP_139026580.1, WP_139026569.1, and CUB36584.1 orderly encoding only
118, 86, and 55 residues, are found to be the partial sequences and
are purged. A final data set of 37 entries, including 1, 6, and 30
sequences orderly encoding 179, 205, and 185 residues, is thus created.
As it is necessary to functionally confirm the defined data set for
further analysis, the sequences are fed to MEME[31] for screening the conservation of the signature motif and
affirming the functions of the sequences to use the functionally correct
entries. In the absence of any functionally similar experimentally
resolved B. subtilis protein, the homologous Escherichia coli structure (PDB ID: 5U7E) is used
as the representative entry to reliably localize the conserved motifs.
Furthermore, to deploy the Bacillus subtilis representative sequence for the downstream analysis, the smallest
sequence (CUB50584.1) is used from the constructed data set. It is
because evolution tends to decrease a protein sequence length to pack
the function more optimally.[32]
Modeling the Representative Sequence
As the B. subtilis representative
sequence CUB50584.1 is still not experimentally determined, its tertiary
structure is modeled through template-based modeling methodology using
MODELER, as per the recently published strategy.[33,34] For reliably predicting the model, the wild-type nucleoside diphosphate
sugar hydrolase from Bdellovibrio bacteriovorus (PDB ID: 5C7Q) is used as the template, sharing a sequence identity
of 36%. As the first predicted protein model usually has several nonphysical
local clashes, the constructed model is energetically relaxed through
the conservative refinement strategy of Galaxyrefine2[35] to maximally retain the topology extracted from the template.
The refined model is subsequently evaluated through Swissmodel scoring
measures[36] and is also topologically assessed
against the one, predicted using the Alphafold algorithm,[37] to assess its credibility.
Phylogenetic and Conservation Analysis
Statistically
significant evolutionary relationships are typically
considered for drawing the meaningful phylogenetic connections within
the selected set of sequences/species. As a correct sequence alignment
is certainly required for extracting the accurate evolutionary relationship,
the 37 sequence B. subtilis data set
is aligned through the hidden Markov model based clustal-omega module
of HHPred.[38] The constructed profile is
fed to the IQTree server[39] for deriving
the evolutionary tree on the basis of the default parameters, using
the default ultrafast methodology over 10,000 bootstrap alignments
at the minimum correlation coefficient or the convergence threshold
of 0.99. The resultant consensus tree is visualized and analyzed through
ITOL.[40] Furthermore, the sequence alignment
is fed to Consurf[41] for plotting the average
sequence conservation scores over the predicted CUB50584.1 model,
using the default conditions, and analyzing the sequence variations
across the functionally crucial sites. Lastly, on basis of the functionally
well-annotated B. subtilis (strain
168) homolog P54570, the constructed HMM profile is visualized using
Espript[42] to map the encoded domains.
Active Site Prediction and Docking Analysis
with IPP and DMAPP
To reliably channelize the interaction
of substrates only at the biologically credible site(s) and to exclude
any fake docking solution, the CastP version3 server is used to predict
the active site(s).[43] For screening the
most promising energetically feasible interaction site(s) of NudF
against the substrates IPP and DMAPP, the Dockthor server is used,[44] using the default parameters. Through a set
of 24 docking runs, it efficiently docks a ligand using the random
seeds for all the rotational, translational, and conformational degrees
of freedom of the ligand. Using a multiple solution genetic algorithm
and the MMFF94S molecular force field scoring function, it constructs
the docking solutions and assesses them on the basis of the docking
energy, ligand entropy, and desolvation score to select their top-ranked
cluster representatives. It is shown to be more accurate than seven
state-of-the-art docking algorithms viz. AutoDock Vina, AutoDock,
GOLD, Surflex-Dock, rDock, HPepDock, and Glide.[44]In the absence of any experimentally resolved IPP-/DMAPP-NudF
complex structure, the docking result constructed through this highly
accurate Dockthor algorithm would certainly be the reliable data to
excavate the functional details. Hence, to attain biologically correct
predictions, a complete degree of rotational freedom is orderly allowed
for the six and five rotatable bonds of the selected ligands. The
resultant solutions are subsequently analyzed using UCSF Chimera to
extract the key interacting residues.
Crucial
Residues for Functional Mutagenesis
The accuracy of a rationally
evolved enzyme molecule is dependent
on the identification of the hotspot residues, whose mutations could
enhance its catalytic activity.[45] To analyze
the hotspot regions, proximal to the active site and tunnel, and appropriately
map the mutational landscape of the predicted protein molecule, the
hotspot server 3.1 is used.[46] Although
this server is capable of modeling an input protein sequence, the
constructed CUB50584.1 model is considered to drive the analysis for
the topologically reasonable model. As this server robustly estimates
the thermodynamic stability of a mutation through FoldX and Rosetta,
it has been demonstrated to successfully exclude disruptive mutations.[47,48] To extract its fullest potential and reliably select the best mutations
from the preselected resultant ones, the resulting data are analyzed
along with the other residues marked in the preceding steps.To understand the most promising mutations across the active site,
the correlated mutations are mapped for the 179 residue sequence CUB50584.1
sequence through the GREMLIN server.[49] As
these coevolution-based contacts are crucial to reliably discriminate
the true hotspot residues from the spurious ones, the contact map
network is analyzed to mark the true hotspots. Purging the unreliable
positions, the true hotspots are analyzed through the Dynamut2 server,[50] trained, and tested on the Protherm database.[51] Its high accuracy is evident from the fact that
it outperforms the other measures with a Pearson’s correlation
of up to 0.72 and 0.64 for the single and multiple point mutations
with a root-mean-square error (RMSE) (kcal/mol) of 1.02 and 1.8, respectively,
making it a trustworthy strategy for prioritizing the stabilizing/destabilizing
mutations.[50]
Computational
Mutagenesis and Flexibility
Analysis
On the basis of docking and mutagenesis results,
the most favorable binding and substrate preferences are excavated.
To robustly quantify the strength of interaction for each of the ligands,
the Dynamut2 results are analyzed for the five prioritized residues
(ARG18, ALA117, ASP139, GLU140, and ASP141), and their five top-ranking
ΔΔG scores are analyzed. To examine these
data, NetsurfP2.0,[52] with an 80% correlation
with experimentally confirmed data, is preferred to estimate the relative
solvent accessible surface area for the five key positions for the
native protein. To further examine the topological features of the
NudF structure, the flexibility of its Cα-backbone is predicted
using CABS-flex 2.0, which computes an average topological fluctuation
diversity of 10 medoids of the 10 cluster sets, derived from a 10
ns simulated set of 1000 decoys.[53] To further
obtain more insights of the binding affinity of the two ligands, a
small 100 ns molecular dynamics (MD) simulation is finally done to
analyze the physical movements of ligands and protein atoms. The IPP
and DMAPP topologies are constructed through PRODRG2,[54] and the CUB50584.1 model and its two complexes are simulated
through the WebGro server.[55]To perform
the simulation, the native protein and its complexes are prepared
using the GROMOS96 43a1 forcefield.[56] Deploying
the SPC water model as a solvent over the triclinic simulation box,
the structures are neutralized by adding 0.15 M sodium chloride. To
energetically relax the system before MD, the steepest descent algorithm
(5000 steps) is applied. Considering 1000 frames per simulation, the
structures are simulated for 100 ns using the constant temperature
of 300 K and a pressure of 1.0 bar. The trajectory is lastly integrated
through the leap-frog algorithm for assessing the resultant trajectory
through all of its encoded parameters, viz. time-dependent root-mean-square
deviation (RMSD) for the overall structure and RMS fluctuations (RMSF)
across its residues, radius of gyration (Rg), number of hydrogen bonds,
overall and per-residue solvent accessible surface area (SASA), and
ligand RMSD.
Results and Discussions
Construction of the Sequence Data set and
Its Functional Affirmation
To affirm the functional attribute
of all the derived 37 sequences, the relative location of their three
top-ranked conserved motifs is screened against the functionally deciphered
structure (5U7E) of E. coli through
MEME (Figure A). The
members of the Nudix superfamily encode the signature sequence GX5EX7REUXEEXG/TU, where U is either L/V or I,[57] as represented in green in this figure. This
conserved sequence is responsible for metal-binding and forms the
catalytic site in more than 4000 enzymes in numerous species including
eukaryotes, prokaryotes, and viruses.[58] The NudF length range is shown to be 179–205, with 182-residue
sequence being the most common. Here, the six 205-residue sequence
subset shows a stark feature, that is, ASB92783.1, ASB69297.1, ASB56714.1,
AOR97677.1, AOL97023.1, and OAZ71765.1encode only one characteristic
motif, as observed for 5U7E, unlike the other B.subtilis sequences having all the three conserved motifs.
Figure 2
(A) Top-3 motifs, screened
against the functionally deciphered
structure 5U7E of E. coli, (B) structural
overlap of the constructed model over the 5U7E, (C) Ramachandran map
of the predicted model, and (D) topological superimposition of this
model (red) over the Alphafold model (cyan). A complete overlap implies
the topological accuracy of the predicted NudF structure.
(A) Top-3 motifs, screened
against the functionally deciphered
structure 5U7E of E. coli, (B) structural
overlap of the constructed model over the 5U7E, (C) Ramachandran map
of the predicted model, and (D) topological superimposition of this
model (red) over the Alphafold model (cyan). A complete overlap implies
the topological accuracy of the predicted NudF structure.The representative B. subtilis sequence
CUB50584.1 is modeled using 5C7Q through the template-based modeling
methodology using MODELLER9.25. The predicted structure is refined
using GalaxyRefine2[35] and is subsequently
evaluated through Swissmodel.[36] The model
shows a Molprobity score of 1.28, indicating a topological accuracy
of a reasonably high accuracy X-ray structure, and its RMSD score
is found to be 0.887 against the deployed template (Figure B). Furthermore, 98.87% model
residues are found localized within the Ramachandran favored regions
(Figure C). The model
also shows a low RMSD of 1.116 against the structure, predicted using
Alphafold algorithm,[37] and these two decoys
are found to be structurally superimposed, with a marginal deviation
of a few loop and terminal residues (Figure D). It is thus confirmed that Alphafold specifically
works well especially for proteins, sharing a sequence identity lesser
than 30% against the known templates, as reported recently.[37] The predicted structure is then reliably deployed
for the subsequent analysis.
Phylogenetic and Conservation
Analysis
Feeding the sequence alignment of HHPred-clustal
omega to IQTree
and building the evolutionary tree using 10,000 bootstrap alignments
at the convergence threshold of 0.99, the resultant consensus solution
is visualized and analyzed through ITOL (Figure ). Although the single motif-six sequence
subset (AOR97677.1, OAZ71765.1, ASB69297.1, ASB92783.1, ASB56714.1,
and AOL97023.1) appears to be clustered with 5U7EA and the representative
sequence CUB50584.1, encoding all three motifs, their average sequence
identity is 30.3640.339, compared to 14.73 and 21.3630.3304 against
5U7EA and CUB50584.1, respectively. Moreover, all the other sequences
are not found to share substantial sequence similarity, as also observed
with their scrutiny through HMM-based Clustal Omega alignment.[38] All the other sequences are found uncluttered
with any other entry, and as recently shown in the phylogenetic study
of 80,000 Nudix homologs, a general monophyly is prominent, besides
a few occasional incidences of homoplasy.[59] The constructed profile is further fed to Consurf for mapping the
average sequence conservation scores over the predicted structure,
using the default conditions,[41] and the
sequence variation across the functionally crucial sites is analyzed
(Figure ). The HMM
alignment of the 37 sequence data set and the functionally decoded
protein P54570 is mapped using Espript[39] to designate the essential sequence features of the CUB50584.1 sequence.
It is observed that the Nudix domains of CUB50584.1 and P54570 fully
overlap (Figure C).
Figure 3
Unrooted
circular tree of the 37 sequence NudF data set. The scale
bar represents the average number of substitutions per site.
Figure 4
(A) Consurf-derived sequence conservation for the 37 sequence
NudF
data set, projected onto the representative sequence model CUB50584.1.
(B) Color range varying between 1 and 9, maroon referring to the completely
conserved positions. (C) Complete overlap of the IPP- and DMAPP-interacting
positions and Nudix domains of P54570 (residues 40–171) and
CUB50584.1 (residues 42–177).
Unrooted
circular tree of the 37 sequence NudF data set. The scale
bar represents the average number of substitutions per site.(A) Consurf-derived sequence conservation for the 37 sequence
NudF
data set, projected onto the representative sequence model CUB50584.1.
(B) Color range varying between 1 and 9, maroon referring to the completely
conserved positions. (C) Complete overlap of the IPP- and DMAPP-interacting
positions and Nudix domains of P54570 (residues 40–171) and
CUB50584.1 (residues 42–177).A set of 14 residues, viz. ILE20, ALA45, GLN62, GLY77, GLU80, GLY82,
ALA89, GLU92, GLU95, GLU96, THR112, ALA126, THR130, and GLU148 is
found to be completely conserved, with a conservation score of 9.
Moreover, 33 residues LEU4, THR8, PHE15, PRO30, ASN31, LYS36, ILE39,
HIS42, PRO43, ALA50, LYS56, VAL60, LYS65, ILE71, PRO85, THR88, ARG91,
LEU93, GLU94, THR97, LEU106, ILE107, GLU119, TYR124, ASP141, VAL144,
ALA154, LEU157, ASP166, LYS167, THR168, PHE170, and GLN173 show a
conservation score of 7 and 8, indicating significantly higher conservation.
Furthermore, 11 residues, viz. PRO13, ARG23, VAL24, ALA33, MET34,
LYS69, LYS131, SER150, GLU153, ASP160, and HIS164 are found to be
completely variable across the defined data set. It is astonishing
that the representative B. subtilis sequence encodes mere 11 (6.145%) variant residue loci in contrast
to the statistically higher sequence conservation at 47 (26.259%)
positions (Figure ), and still, the enzyme is able to show a highly promiscuous nature,
and it indicates that a few key residues should play a major role
within the active site of this enzyme.
Active
Site Prediction and Docking Analysis
with IPP and DMAPP
Excluding the shallow openings and using
a probe radius of 1.4 Å, CastP[43] delineates
the empty concavities on a protein surface to map the volume spectrum
of cavities and pockets. It robustly deciphers the surface properties
and localizes the functionally important zone and shows only two biologically
meaningful active pockets in the predicted NudF model, with a molecular
surface area (Å2) and molecular volume (Å3) of 501 and 946.6 (Pocket1) and 208.9 and 661.3 (Pocket2),
respectively, and this is in accordance with the known structural
details of the NudF’s E. coli homolog, wherein two active sites have been observed.[60] Furthermore, these pockets are orderly found
to span a set 12 (MET1, LEU4-A-GLU6, ARG37, ILE39, TYR111, PRO114,
GLY115, ALA117, ASP118, and ILE120) and 30 (ARG18-VI-LYS21, VAL40-NH-PRO43,
ALA45, VAL60, GLN62-YRK-ALA66, GLU73-IPAG-LYS78, GLU96, THR112, GLU119,
LEU121, ASP141, GLU142, ASP165, ALA166, and LYS167) residues. Here,
it is worth noting that Dockthor uses only one most-voluminous docking
site for both ligands (Pocket1, Figure A), and it indicates a preferential binding of ligands
over the second superficial site (Pocket2, Figure B).
Figure 5
Dockthor results of both IPP and DMAPP. (A)
Pocket1 (B) Pocket2
represented in correlation with the two ligands IPP (yellow) and DMAPP
(blue), along with the close analytical view of Pocket1, in terms
of (C) molecular surface. (D) Active site topology at the vent of
the tunnel. (E) Two-dimensional and (F) three-dimensional interaction
maps of protein with DMAPP. (G) Two-dimensional and (H) three-dimensional
interaction maps of protein with IPP show the key residues interacting
within the active site.
Dockthor results of both IPP and DMAPP. (A)
Pocket1 (B) Pocket2
represented in correlation with the two ligands IPP (yellow) and DMAPP
(blue), along with the close analytical view of Pocket1, in terms
of (C) molecular surface. (D) Active site topology at the vent of
the tunnel. (E) Two-dimensional and (F) three-dimensional interaction
maps of protein with DMAPP. (G) Two-dimensional and (H) three-dimensional
interaction maps of protein with IPP show the key residues interacting
within the active site.As a modest level of
NudF is enough to overcome the IPP/DMAPP toxicity,[12] this enzyme must interact with both of these
substrates, and this responsible molecular NudF surface (Figure C) closely interacts
with the two ligands (Figure D). The ligands IPP and DMAPP orderly show an interaction
energy (kcal/mol) and affinity score (kcal/mol) of −115.388
and −6.431, and −41.402 and −5.271. Through random
forest, Dockthor has tested the predicted binding affinity over the
PDBbind v2013 data set, and it has shown the RMS error of 2.256 kcal/mol
at a correlation coefficient of 0.705, indicating a reasonable credibility
of the predicted docking solution.[61] Examining
the docked complexes using the discovery studio, it is observed that
DMAPP has two interactions with NudF. While the DMAPP-O atom forms
a hydrogen bond with the Ser113-HG atom of NudF with a bond length
of 1.355 Å, the DMAPP-C atom shows hydrophobic π-interactions
with a DMAPP-C atom with a bond length of 5.085 Å. IPP, on the
other hand, it has favorable interactions with NudF, and its oxygen
atom forms conventional hydrogen bonds with NudF-ARG64-HH21 and -HH22
atoms, with bond lengths of 2.912 and 2.592, respectively, as well
as two salt bridges with ARG64-HH12 and LYS167-HZ2 atoms, with bond
lengths of 2.561 and 2.753, respectively. Moreover, the IPP-O atom
shows three electrostatic interactions with ARG64-NH1, −NH2,
and LYS167-NZ atoms with bond lengths of 5.255, 3.963, and 3.212 Å
respectively.Screening the active site residues within 5 Å
of these two
ligands, it is observed that 14 residues viz. VAL19, ILE20, HIS42,
ALA45, GLN62, ARG64, GLU73, ALA76, LYS78, THR112, GLU119, LEU121,
ASP165, and LYS167, and 8 residues viz. VAL22, GLU38, VAL40, HIS42,
SER113, PHE116, ALA117, and GLU119 orderly interact with the two ligands
IPP (yellow) and DMAPP (blue), as shown in Figure C. To analyze it further, the two-dimensional
and three-dimensional interaction maps are drawn for DMAPP (Figure E,F) and IPP (Figure G,H). It indicates
that two key hotspot residues LYS78 and PHE116, orderly responsible
for interacting with these ligands through one and two residues in
the active site, could be the key to specifically alter the active
site to stabilize its affinity for the conditionally required ligand.
Crucial Residues for Functional Mutagenesis
Through GREMLIN,[49] the contact map network
of the modeled protein is constructed. For all the predicted contacts,
it yields the two-dimensional distance matrix along with their probability
scores and is shown to be accurately predicting both direct and indirect
residue couplings. The coevolution-based network for CUB50584.1 is
analyzed (Figure A),
and the statistically top-ranked contacts are extracted. A set of
22 functional hotspot residues, viz. ARG18, ILE20, LYS21, ASN41, PRO43,
LEU67, ILE71, LYS78, LEU79, PRO81, GLY82, PHE110, TYR111, THR112,
PHE116, ALA117, LEU121, ASP139, GLU140, ASP141, ALA166, and PHE170,
with a theoretically credible probability score higher than 0.5, are
found in NudF. However, LEU79 is found to be completely buried in
the structural core of NudF. Moreover, ILE20 and THR112 are found
to be completely conserved and are thus excluded. The 19 key hotspot
residues are found to define a structural network of 0–4 contacts.
As previously stated,[62,63] the loop proline residues usually
stabilize a structural bend through internal hydrogen bonding, and
PRO43 and PRO81 are also not considered for subsequent computational
mutagenesis study. Plotting the number of contacts for the hotspot
residues, seven positions, viz. ARG18, PRO43, LYS81, ALA117, ASP139,
GLU140, and ASP141 are not found to have any contacts (Figure B), and as the delicate balance
between the flexibility and stiffness is sustained by the key contacts,
majorly regulating the functional role(s) of a protein, these positions
(Figure ) are selected
for further analysis.
Figure 6
GREMLIN resultant (A) contact map network and (B) number
of contacts
for the 19 key hotspot residues, placed proximal to the active site.
Figure 7
Seven key hotspot residues excavated for designing the
functionally
improved mutants.
GREMLIN resultant (A) contact map network and (B) number
of contacts
for the 19 key hotspot residues, placed proximal to the active site.Seven key hotspot residues excavated for designing the
functionally
improved mutants.Excavating the key residues, the Dynamut2 server
is used to estimate the stability changes upon a point mutation on
the CUB50584.1 model. The method derives the scores through the topological
environment property and dynamic behavior of a residue and is recently
shown to outperform the prediction measures including FoldX and MAESTRO.[50] Restricting the search to the prioritized residues,
the most stabilizing top five mutations are selected. The ΔΔG scores (KJ/mol) of these mutations ranges from −1.67
to 1.06 (Figure , Supplementary Table 1). To correctly drive the
analysis, the relative surface accessibility (RSA) for the selected
residues is computed using NetSurfP2.0.[52] It reveals how easily a residue can be accessed from the protein
surface, and as the mutations at this position are less likely to
disrupt the overall fold of a protein, the chance of developing a
functionally fruitful mutation should be substantially higher. The
prioritized residues orderly show an RSA score of 0.393, 0.232, 0.536,
0.696, 0.622, 0.449, and 0.397 (Figure A), categorized as exposed or buried at the threshold
of 0.25.
Figure 8
Structural scrutiny. (A) DDG–RSA plot for the five top-ranked
mutations of the seven prioritized residues, (B) predicted NudF flexibility
showing a high score for the eight loop regions, and (C) structural
position of the eight flexible loops. A lower DDG score implies a
more energetically stable conformation.
Structural scrutiny. (A) DDG–RSA plot for the five top-ranked
mutations of the seven prioritized residues, (B) predicted NudF flexibility
showing a high score for the eight loop regions, and (C) structural
position of the eight flexible loops. A lower DDG score implies a
more energetically stable conformation.To excavate the active site and its key catalytic residues, the
average structural fluctuations across NudF are predicted using CABS-flex
2,[53] and a set of 8 loops: loop1–loop8
(Figure B), spanning
from ILE14-ARG23, ASP26-ARG37, ILE39-VAL46, ALA50-VAL58, GL62-ILE72,
GLY77-PRO85, TYR111-LEU121, and THR130-VAL144, marked in red in Figure C, are found to be
flexible. These loops, encompassing a 48.6% structure, majorly define
the activity site topology and are envisaged to substantially regulate
the functional nature of NudF. It is interesting to observe that the
selected five residues ARG18, ALA117, ASP139, GLU140, and ASP141 are
encoded by the loop1, loop7, and loop8, and it could be there that
the loop1 acts as a capping loop and loop7 and loop8 make the active
site sufficiently voluminous to interact with IPP and DMAPP. It is
similar to E. coli NudF loop9, which
is stabilized by its closed conformation.[60]To further analyze the dynamic atomic interactions of IPP/DMAPP
within the active site, 100 ns NudF and its complexes are simulated
using WebGro, as shown recently.[64,65] MD simulation
is a robustly accurate strategy to analyze the conformational changes
that occur when a ligand is induced to fit.[66] Using the GROMACS-based Webgro[55] server,
the protein/complex system is computationally evolved using the classical
mechanics algorithm for a short 100 ns timespan, and the conformational
stability or binding affinity of a ligand is assessed across the simulation
trajectory. To analyze the simulation results, Rg or the average distance
between the center of mass and the rotational axis is usually used
to estimate the conformational stability of a system against any physicochemical
strain,[67] and its lower score implies a
higher stability. Here, the apoprotein shows the Rg scoring variations
between ∼1.5 and 2.75 nm (Figure ), and its complex with DMAPP and IPP orderly
shows the respective scores of ∼1.8–2.05 and ∼1.875–2.05
nm, indicating that the apoprotein is relatively more unstable than
its complex structure, exactly like E. coli NudF.[60] Moreover, the IPP complex is
more stable than the DMAPP complex because its trajectory mostly crawls
at substantially lower scores across the simulation. The RMSD is another
helpful measure for estimating the structural stability and the overall
deviation from the backbone topology during the complex formation
at a specified temperature, as used earlier by GROMACS.[68,69] The backbone-based RMSD trajectory score should be the lowest because
it directly gives the deviation of mean atomic coordinates and reflects
the macromolecule’s conformational stability. The scrutiny
of the RMSD across the trajectory shows that the RMSD variations of
the apoprotein range between the acceptable range of ∼0.5–2.5
nm in contrast to the respective ranges of 0.3–0.9 and 0.3–0.75
for its DMAPP and IPP complexes (Figure ). Unlike the DMAPP complex trajectory, showing
substantially higher undulations, the IPP complex gets stabilized
after ∼60 ns.
Figure 9
MD analysis for the apoprotein NudF and its DMAPP and
IPP complexes
for the RMSD, RMSF, radius of gyration, solvent accessible surface
area, area/residues, hydrogen bonds, and ligand RMSD scores. It clearly
indicates a prioritized binding for IPP within the promiscuous active
site of NudF.
MD analysis for the apoprotein NudF and its DMAPP and
IPP complexes
for the RMSD, RMSF, radius of gyration, solvent accessible surface
area, area/residues, hydrogen bonds, and ligand RMSD scores. It clearly
indicates a prioritized binding for IPP within the promiscuous active
site of NudF.Furthermore, the RMSF, or residue
fluctuations across the simulation,
reveals a macromolecular system’s conformational stability,
and its scoring undulations describe structural complexity, with a
lower value signifying higher overall stability.[56,63] For NudF, RMSF is found to be within ∼0.5–2 nm (Figure ), although its three
C-term loop residues and two preceding terminal 11-residue α-helices,
connected through a five-residue loop, shows a stark increment crossing
∼3 nm. In contrast, the DMAPP and IPP complexes orderly show
the RMSF divergence between ∼0.1–0.8 and ∼0.06–0.525
nm, and here, the terminal double-helix does not show any significant
displacement. It is interesting to note that the RMSF score across
the chain for NudF-IPP is lesser than the NudF-DMAPP scores, indicating
that the former is a more stable complex structure.As the ligand
binding causes conformational changes in the receptor,
its respective substructural alterations are likely to affect SASA,
and thus, it has been utilized multiple times to assess the level
of receptor exposure to the surrounding solvent molecules.[67,68] Although a significant deviation is not observed in the per-residue
SASA score, usually showing a comparatively large surface area or
a lower compactness of this substructure, the SASA scores are found
to be highly variant across the simulation trajectory. The SASA for
the NudF structure is observed to consistently drop along the trajectory
from 112.5 to 97.5 nm2 (Figure ), although the respective score of the DMAPP
complex declines from 117.5 to 97.5 nm2, with a sharp drop
at 20 ns. In contrast, the corresponding score for the IPP complex
declines from 112.5 to 100 nm2, and except for a few time
intervals, minute variations were observed throughout the simulation
period. SASA for the IPP complex drops dramatically from 110 to 103
nm2 after 5 ns. The equivalent declination for the DMAPP
complex, on the other hand, is noticed after 17.5 ns, and its trajectory
has significantly more unequal undulations. NudF also exhibits this
distinctive declination after 10 ns, and here, it could indicate a
significant functional transition, leading to the substantial increase
in structural compactness. Moreover, after the initial decline, the
SASA graph moves likewise for both DMAPP and IPP complexes.For estimating the binding affinity of IPP/DMAPP with NudF (Figure ), the MD trajectories
are further investigated to analyze the extent of hydrogen bonds made
along the simulation, as usually done.[67,68] Along the
trajectory, NudF shows a nearly 100–120 hydrogen bonds. However,
the DMAPP and IPP complexes orderly show 100–130 and 110–130
hydrogen bonds, and an almost similar variation throughout the trajectory.
It indicates their nearly similar scale of atomic interaction within
the active site, as also shown by their substantially similar SASA
undulations. Although, observing the earlier results, the affinity
of DMAPP should not be comparable to that of IPP, consistency of hydrogen
bonds is maintained for both the ligand complexes throughout the trajectory,
and it indicates the comparable stability of these complexes. To excavate
it further, the protein–ligand hydrogen bonding variations
are analyzed through the trajectory, and for the DMAPP complex, the
number of hydrogen bonds is found to slowly increase to two with significantly
variant undulations. However, for the IPP complex, the number of bonds
consistently increases, and after ∼60 ns, nearly one bond is
maintained throughout the simulation, showing its more stable interaction.
As hydrogen bonds are the major interactions to drive the proper anchoring
of ligands within the active site, the higher number of such bonds
should be responsible for a stronger interaction.[67,68]Figure deciphers
three key features. First, the ddG score is found to be the highest
for PHE116 and LYS78 residues, and it confirms that the two residues
certainly hold the key to actively evolve the enzyme against the substrates
by increasing the structural stability. Second, ASP139, GLU140, and
ASP141 show a remarkably insignificant ddG score, and it implies that
these positions are highly crucial for the NudF function and their
top-ranked mutations also failed to stabilize the protein, as earlier
discussed by Nobel Laureate Frances Arnold.[70] However, these three residues, along with the substructure T130-L138,
show a high RSA score, and it indicates that the flexibility of this
superficial loop region could impart a significant functional attribute
to NudF. Restricting the search to LYS78 and PHE116 indicates that
these positions should be significantly crucial for the stability
and interaction affinity of the active site. Third, as LYS78 is superficially
more exposed in comparison to ARG116 and is only in contact with one
other residue, it should be first mutated to study its effect on the
overall product yield. It opens venues for their experimental verifications
as the top-ranked mutants K78I/K78L and PHE116D/PHE116E could selectively
stabilize the conformation and could be responsible for the ligand
specificity, urgently needed to design the novel industrially useful
NudF enzyme.The study extends our understanding about ADP-ribose
pyrophosphatase
and shows that it has a preferential bias for IPP over DMAPP, with
−115.388 (kcal/mol) versus −41.402 (kcal/mol), respectively,
although if the former is missing, the protein interacts with DMAPP
at a much slower rate, and probably, this could be a key signal to
IDI to initiate the conversion of DMAPP to IPP. Recently, the promiscuous
activity of EcNudB toward geranyl diphosphate and farnesyl diphosphate
has been demonstrated to generate several isoprenoid alcohols, including
isopentenol, geraniol, and farnesol, as well as their derivatives.[71] Thus, the promiscuous dephosphorylation of NudF
should be studied further through various other substrates, and with
that, it would open venues to industrially engineer the cells, wherein
the downstream reactions could be channeled more actively with minimal
cellular regulation.
Conclusions
The
research thoroughly examines the Bacillus subtilis ADP-ribose pyrophosphatase and its 37 functionally confirmed homologs
with the projected tertiary structure of the 179-residue representative
sequence CUB50584.1. Besides analyzing the phylogenetic relationship,
it maps the highly conserved and variant loci to build the knowledge
base for its directed evolution experiments. Although the sequence
data set shows a significantly high phylogenetic divergence, 26.259%
residues are found to have a statistically higher evolutionary conservation.
ADP-ribose pyrophosphatase shows a prioritized interaction with IPP
than DMAPP, according to the docking energy data, with −115.388
(kcal/mol) versus −41.402 (kcal/mol). The topological variations
are restricted to eight loop regions, maximally encompassing the active
site, demonstrating their importance for the ligand binding. Seven
residues (ARG18, PRO43, PRO81, ALA117, ASP139, GLU140, and ASP141)
are not found to have even one contact in the contact map network
of 22 hotspot sites, and mutational analysis for these seven positions
shows the highest ΔΔG scores for LYS78
and PHE116, orderly encoded within loop1 and loop7, encapsulating
the active site. Quite similar to NudF, the four top-ranked mutants
F116E, F116D, K78I, and K78L show the highest ΔΔG scores of 1.06, 0.97, 0.88, and 0.84, and it indicates
that these positions should be the key to maximally direct the synthesis
of required terpenoids in Bacillus subtilis. MD analysis reveals that the NudF structure is unstable, just like E. coli NudB, and that its IPP complex is more stable
than the DMAPP complex. Thus, the present study must be industrially
useful to channelize the entire DXP pathway toward the increased production
of prenol or isoprenol or their downstream molecules without generating
any metabolic burden.
Authors Contributions
A.R. conceived
the study. D.P. performed the experiments. Both
the authors have written the manuscript and have carefully read and
approved the manuscript.
Authors: Michael Schantz Klausen; Martin Closter Jespersen; Henrik Nielsen; Kamilla Kjaergaard Jensen; Vanessa Isabell Jurtz; Casper Kaae Sønderby; Morten Otto Alexander Sommer; Ole Winther; Morten Nielsen; Bent Petersen; Paolo Marcatili Journal: Proteins Date: 2019-03-09
Authors: Sydnor T Withers; Shayin S Gottlieb; Bonny Lieu; Jack D Newman; Jay D Keasling Journal: Appl Environ Microbiol Date: 2007-08-10 Impact factor: 4.792
Authors: John Jumper; Richard Evans; Alexander Pritzel; Tim Green; Michael Figurnov; Olaf Ronneberger; Kathryn Tunyasuvunakool; Russ Bates; Augustin Žídek; Anna Potapenko; Alex Bridgland; Clemens Meyer; Simon A A Kohl; Andrew J Ballard; Andrew Cowie; Bernardino Romera-Paredes; Stanislav Nikolov; Rishub Jain; Demis Hassabis; Jonas Adler; Trevor Back; Stig Petersen; David Reiman; Ellen Clancy; Michal Zielinski; Martin Steinegger; Michalina Pacholska; Tamas Berghammer; Sebastian Bodenstein; David Silver; Oriol Vinyals; Andrew W Senior; Koray Kavukcuoglu; Pushmeet Kohli Journal: Nature Date: 2021-07-15 Impact factor: 49.962