Literature DB >> 31026257

Identification of GCC-box and TCC-box motifs in the promoters of differentially expressed genes in rice (Oryza sativa L.): Experimental and computational approaches.

Gopal Kumar Prajapati1, Bharati Pandey1, Awdhesh Kumar Mishra2, Kwang-Hyun Baek2, Dev Mani Pandey1.   

Abstract

The transcription factor selectively binds with the cis-regulatory elements of the promoter and regulates the differential expression of genes. In this study, we aimed to identify and validate the presence of GCC-box and TCC-box motifs in the promoters of upregulated differentially expressed genes (UR-DEGs) and downregulated differentially expressed genes (DR-DEGs) under anoxia using molecular beacon probe (MBP) based real-time PCR. The GCC-box motif was detected in UR-DEGs (DnaJ and 60S ribosomal protein L7 genes), whereas, the TCC-box was detected in DR-DEGs (DnaK and CPuORF11 genes). In addition, the mechanism of interaction of AP2/EREBP family transcription factor (LOC_Os03g22170) with GCC-box promoter motif present in DnaJ gene (LOC_Os06g09560) and 60S ribosomal protein L7 gene (LOC_Os08g42920); and TCC-box promoter motif of DnaK gene (LOC_Os02g48110) and CPuORF11 gene (LOC_Os02g01240) were explored using molecular dynamics (MD) simulations analysis including binding free energy calculations, principal component analyses, and free energy landscapes. The binding free energy analysis revealed that AP2/EREBP model residues such as Arg68, Arg72, Arg83, Lys87, and Arg90 were commonly involved in the formation of hydrogen bonds with GCC and TCC-box promoter motifs, suggesting that these residues are critical for strong interaction. The movement of the entire protein bound to DNA was restricted, confirming the stability of the complex. This study provides comprehensive binding information and a more detailed view of the dynamic interaction between proteins and DNA.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 31026257      PMCID: PMC6485614          DOI: 10.1371/journal.pone.0214964

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Standing crops face various stresses during their life cycle, which result in a drastic reduction in yield [1]. Although some crops withstand environmental stresses by developing new features, others are unable to develop adaptive mechanisms and consequently die. Importantly, rice has a lower tolerance and higher susceptibility to abiotic stresses than other crops [2,3]. In plants, low oxygen stress stimulates composite metabolic pathways and genetic programs, including the differential expression of several genes [4]. Gene expression studies have revealed the upregulation of genes encoding transcription factors, as well as signal transduction components [5]. For example, a wide range of differentially expressed genes (DEGs) have been studied using microarray analyses [6], and the expression pattern of 23 proteins and their respective mRNAs has been analyzed in anoxic rice coleoptile [7]. In Arabidopsis, ETHYLENE RESPONSE FACTOR (AtERF) positively or negatively responds by binding specifically to AGCCGCC sequence known as GCC-box or to its substituted form TCC-box and modulate the gene expression in response to biotic and abiotic factors [8,9]. GCC-box is also found in the promoters of many pathogen-responsive genes such as PDF1.2 and PR regulates specific defense phenomena [9,10]. APETALA2/ethylene response factor (AP2/ERF) plant transcription factor genes regulate developmental processes and are involved in the responses to various biotic and abiotic stresses [11]. Furthermore, the AP2/ERF family of transcriptional regulators with the Sub1A-1-mediated response plays important role in submergence tolerance [12]. The differential expression of 163 AP2/EREBP(APETALA 2/ethylene responsive element-binding protein)genes in rice under abiotic stress conditions has been studied [13]. Kumar et al. [14] reported the presence of a consensus promoter motif with a conserved GCC-box (GCCGCC) in the upregulated differentially expressed genes (UR-DEGs) using publicly available microarray data for anoxic rice coleoptile [6]. Likewise, another study reported the presence of a TCC-box (TCCTCC) in the promoters of downregulated DEGs (DR-DEGs) in anoxic rice coleoptiles [14,15]. Techniques for the detection of specific nucleic acid sequence, probe-based like Molecular Beacon Probe (MBP), TaqMan, Minor groove binding (MGB) is being used by various researchers [16-18]. However, MBP is more sensitive and precision based detection over conventional PCR without post-reaction analysis [19]. More importantly, MBP probe differentiates with single nucleotide differences which increase high specificity over TaqMan [20]. Also, unlike TaqMan probes, MBP are designed in such a way so that they remain intact during the amplification reaction and capable to rebind with target in every cycle for signal measurement [21]. Promoter motifs/cis-regulatory elements are involved in the regulation of differentially expressed genes, and regulates cellular mechanisms in response to abiotic and biotic stresses. Thus, the identification of differentially expressed genes and the mechanisms underlying differential expression is of great interest. The presence of consensus motifs, such as a GCC-box, in UR-DEGs and TCC-box in DR-DEGs needs to be validated using a sequence-based technique by designing motif sequence-specific MBPs and performing MBP based real-time PCR analyses. Real-time PCR data can be analyzed using the Ct value, which is the number of cycles required for the fluorescent signal to cross a threshold [16-18]. GCC-box and TCC-box of DEGs has important role in the transcriptional regulation of genes during various stress [8,9,11,15]. Therefore, in this study, we aimed to use MBP based real-time PCR assays to accurately detect GCC-boxes in UR-DEGs such as DnaJ (LOC_Os06g09560) and 60S ribosomal protein L7 (LOC_Os08g42920), and TCC-boxes in DR-DEGs such as DnaK (LOC_Os02g48110) and CPuORF11 (LOC_Os02g01240). In the recent scenario, Molecular Dynamics (MD) simulation has proven to be powerful atomistic simulation algorithms for predicting interaction strength between two macromolecules [22]. MD simulations have been extensively applied in elucidating residues responsible for transcription factor and DNA motif. WRKY transcription factor-DNA complex interaction using 10 ns MD simulations in A. thaliana have been studied [23]. In a similar study, important structural features stabilizing DOF zinc finger-DNA complexes using in silico approaches have also been identified [24]. In addition, Pandey et al. [25] have studied the AP2-DNA interaction in barley and found that residues in the beta-strand were crucial for stabilizing the AP2-DNA complex. Therefore, in the present study, we examined the key interactions occurring between AP2/EREBP family transcription factor(LOC_Os03g22170) and GCC and TCC-box DNA motifs using molecular and essential dynamics based binding mechanics analysis.

Material and methods

Selection of DEGs and MBP design

Microarray data of DEGs in anoxic rice coleoptiles [6] and a dataset of Kumar et al [14] were used to shortlist UR-DEGs and DR-DEGs for analysis in this study. The UR-DEGs and DR-DEGs were ranked based on their expression score ≥2 fold (≥2X) and ≤-2 fold ≤ -2X), respectively. The promoter sequences -499 to +100 bp of the selected UR-DEGs and DR-DEGs were retrieved from the Eukaryotic Promoter Database as described previously [14]. The retrieved promoter regions were analyzed using the MEME (Multiple Em for Motif Elicitation) web server (http://meme-suite.org/tools/meme). Furthermore, the consensus promoter motif of UR-DEGs and DR-DEGs were used to design MBPs using Beacon Designer 7 (BD7, PREMIER Biosoft, USA). Custom made MBPs and primers were procured from Gene Link, (New York, USA). The methodology used for rice genomic DNA isolation and the validation of the consensus promoter motif is described in our previous work. It is well established that the AP2/EREBP transcription factor (TF) DNA-binding domain (DBD) binds to GCC-box [12,13,15,26]. The AP2/EREBP TF model from rice was generated using SWISS-MODEL web server [27] and the structure quality was assessed using PROCHECK [28] based on the Ramachandran plot. A three dimensional (3D) structural model of the DNA motif was generated using 3D-DART (3DNA-Driven DNA Analysis and Rebuilding Tool) [29]. Five 3D DNA models of GCC- (CGCCGCCGCCG) and TCC-box motifs (CTCCTCCTCCTCCTC) were generated with a bend angle of 0–40°. 3D-DART enables the generation of DNA models based on customized local and global conformations, such as the bend angle range and bend angle orientation range.

High ambiguity driven protein-DNA docking

For the protein-DNA interaction study, DNA models of gene promoter motifs (GCC- and TCC-box) were docked onto the specific site of the AP2/EREBP TF using the HADDOCK (High Ambiguity Driven protein-protein Docking) web server (version 2.2) [15,26,30]. Residues 68, 69, 71, 73, 75, 77, 82, 83, 90, 92, 94, 95, 108, 109, and 110 were considered as active site residues for the protein, and 1-50 base pair (bp) nucleotides from both DNA stands were selected as active residues for the DNA motif. Passive residues were spontaneously defined around active residues. In reference to active and passive residues, Ambiguous Interaction Restraints (AIR) was generated. Here, illustration and visualization of the final docked complex were completed using UCSF Chimera [31].

Molecular dynamics simulations for the protein and docked complexes

To study the dynamics and recognition mechanism between AP2/EREBP TF and DNA motifs, the generated complexes were subjected to MD simulations using the GROMACS 5.0 software package [32,33]. OPLS-AA/L all-atom force field and AMBER99SB-ILDN force field were applied to AP2/EREBP TF and protein-DNA complexes simulations, respectively [34]. Furthermore, systems were solvated in a minimal cubic water box using the Simple Point Charge (SPC) water model [35]. Solvated systems carry a charge; therefore, ions were added to neutralize the entire system by substituting water molecules with ions. The systems were energy minimized (50000 cycles of steepest descent) to remove steric clashes and inappropriate geometry. The minimized systems were equilibrated (the solvent and ions around the protein needed to be equilibrated) into NVT (constant Number of particles, Volume, and Temperature) and NPT (constant Number of particles, Pressure, and Temperature) phases for 1000 ps [25, 36–38]. The well-equilibrated systems were then subjected to a production run at 300 K and 100000-pascal pressure for 50,000 ps. The analyses of the 50 ns MD trajectories were carried out using GROMACS built-in tools. The various interactions involved in the pre- and post-MD of protein-DNA simulated complexes were deduced using Nucplot [39]. The stability of the complex was calculated by measuring the RMSD (root mean square deviation) of the protein backbone atoms’ positions with respect to the start or reference structure using the following equation: where M=Σi mi and ri(t) is the position of atom i at time t after least square fitting the structure to the reference structure. The RMSF (root mean square fluctuations) was calculated using the following equation: where T is the time over which one wants to average and riref is the reference position of particle i.

Binding free energy and free energy decomposition analysis

The package g_mmpbsa calculates the binding energy of bimolecular associations such as protein-protein, protein-ligand, and protein-DNA associations using the Molecular Mechanics Poisson-Boltzmann Surface Area (MM-PBSA) protocol [40]. It provides the different components of energy terms such as polar solvation, non-polar solvation, and electrostatic energy. The MmPbSaDecomp.py python script was used to determine the residue-wise contribution to the total binding energy, which provides information about important residues contributing to the molecular association.

Principal component analysis (PCA) and free energy analysis

Principal component analysis (PCA) is widely used to gain insights into the adequate structural and dynamics of the protein and complex trajectories [41]. PCA is a multivariate statistical analysis used to extract covariant motions on a number of different lengths and time scales from a protein structure. The covariance matrix of the atomic fluctuations was calculated using the gmx-covar module of gromacs software and calculated using the following equation: In which, C implies 3n x 3n symmetric matrix, n is a number of residues and M is a diagonal matrix [42]. Diagonalization of this matrix yields a set of eigenvectors and eigenvalues that describe collective modes of fluctuations of the protein. The eigenvectors corresponding to the largest eigenvalues are called “principal components”, as they represent the largest-amplitude collective motions. The eigenvectors were analyzed using the gmx-anaeig gromacs built-in command. The gmx-sham tool was used to generate the input for free energy landscapes using the axes of a principal component analysis.

Results and discussion

GCC-box and TCC-box detection and validation

Under anoxia UR-DEGs with expression by equal or higher than two-fold (≥2X) and expression by equal or lower than -2 fold (≤ -2X) for DR-DEGs were selected from the microarray results [6] and the aforementioned datasets [14]. The selected UR-DEGs and DR-DEGs were analyzed using MEME (v 4.5.0) to identify consensus promoter motifs (GCC-box and TCC-box). We identified the presence of GCC-box and TCC-box motifs in the promoter region of UR-DEGs (DnaJ and 60S ribosomal protein L7) and DR-DEGs (DnaK and CPuORF11), respectively. The GCC-box motif was acknowledged in the DnaJ (EP01201) and 60S ribosomal protein L7 (EP02799) genes with the lowest p-value of 6.28e-07 indicates the most significant match score of the given motifs (Fig 1A). Similarly, TCC-box motifs were identified in DnaK (EP03077) and CPuORF11(EP01079) genes with the lowest p-value 7.37e-10 (Fig 1B).
Fig 1

The position of promoter motifs in various genes.

(a) GCC-boxes in the promoters of DnaJ (EP01201 or LOC_Os06g09560) and 60S ribosomal protein L7 genes (EP02799 or LOC_ Os08g42920); (b) TCC-box in DnaK (EP03077 or LOC_Os02g48110) and CPuORF11 (EP01079 or LOC_Os02g01240) genes.

The position of promoter motifs in various genes.

(a) GCC-boxes in the promoters of DnaJ (EP01201 or LOC_Os06g09560) and 60S ribosomal protein L7 genes (EP02799 or LOC_ Os08g42920); (b) TCC-box in DnaK (EP03077 or LOC_Os02g48110) and CPuORF11 (EP01079 or LOC_Os02g01240) genes. Gene expression studies revealed the upregulation of genes encoding transcription factors under hypoxic response in Arabidopsis [5]. However, the regulation of gene expression occurs through the core promoter motif sequence [43]. Promoter motifs contain specific nucleotide sequences that are responsible for gene regulation and function under different biotic and abiotic conditions. Hence, the identification and validation of these regulatory elements are essential. Expression analysis of the 60S ribosomal protein L7 has been used as an internal control for gene expression studies in Coffea arabica under different experimental conditions [44]. DnaJ, which contains a J domain of 70 amino acid consensus sequence, is a co-chaperone of Hsp70 (DnaK) and facilitates Hsp70’s ATPase activity, substrate delivery, and specific cellular localization [45]. In Arabidopsis and rice, J proteins have been implicated in the protection against environmental stresses [46]. DnaK family proteins also include heat shock proteins that are involved in protecting plants against abiotic stresses [47]. CPuORF11, which has an ORF found in the 5' UTR of a mature mRNA, mediate translational regulation in response to sucrose concentration, amino acid production, starvation and polyamine concentration. However, it’s mechanism of action is not clearly raised in Arabidopsis and Rice [48-50]. A sequence of GCC-box and TCC-box repeats was used to design a molecular beacon probe. Forward and reverse primers were designed (Table 1) using the parameters and compatibility in Beacon Designer 7. The MBPs designed for UR-DEGs and DR-DEGs were 5’-[6-FAM] CGCGATCGCCGCCGCCGGATCGCG [BHQ-1]-3’, and 5’-[6-FAM] CGCGATCCTCCTCCTCCTCCTCGATCGCG [BHQ-1]-3’, respectively. The MBPs included the reporter dye 6-FAM (6-Carboxyfluorescein) at the 5′ end and the quencher BHQ1 (Black Hole Quencher-1) at the 3’ end [15,26]. In the present study, two UR-DEGs (DnaJ and 60S ribosomal protein L7) and two DR-DEGs (DnaK and CPuORF11) were validated by experimental and computational studies. The presence of GCC- and TCC-boxes in selected genes was verified by real-time PCR assays. We have taken promoter region belongs to TSS (transcription start site) of the selected gene considering promoter position from -499 to +100 i.e., 600 nt and the same region has been used for motif detection by MBP. In DnaJ gene promoter is in upstream position i.e., -62 to -52 and in 60S ribosomal protein L7 gene promoter, GCC box position is in downstream i.e., from + 30 to + 40 (Table 2). Similarly, in DnaK gene promoter, TCC box position is in upstream -18 to -4 and in CPuORF11 genes promoter TCC box position is in upstream -58 to -44 (Table 2). Amplification of GCC-box sequences was confirmed by MBP, with average Ct values of 34.21 and 31.65 for DnaJ and 60S ribosomal protein L7, respectively (Table 2). Similarly, TCC-box containing genes were amplified by MBP, with average Ct values of 27.79 and 28.5 for DnaK and CPuORF11, respectively (Table 2).
Table 1

List of primers designed for UR-DEGs and DR-DEGs.

DEGsForwardReverseAmplicon size
DnaJ (EP01201)5′-CGTGAGTGAGTCTTCCGTGTCTTC3′5′-GCCACCGAGCACCTGTCC-3′137
60S ribosomal protein L7 (EP02799)5′-GCCATAATAAGACGGTGAGA-3′5′-CCGCTATCTCTACGCAAG-3′112
DnaK (EP03077)5′-TTCAGCAGCAACGCACAA-3′5′-GGAGAGAGCAGCGAA GGA-3′173
CPuORF11 (EP01079)5′-GAGTGATCCGTTATATCTGTT5′5′-CTCTCCTTCCTTCCTTC T-3′200
Table 2

Promoter motif position, strand position and Ct values of UR-DEGs (DnaJ, 60S ribosomal protein L7) and DR-DEGs (DnaK, CPuORF11) amplified using MBPs specific to the GCC-box and TCC-box motifs.

DEGsMBPMotif positionStrand positionReplicatesCt valueAverage Ct value
DnaJGCC box-62 to -52449+strandR134.0734.21
R234.34
60S ribosomal protein L7GCC box+30 to + 40+strandR132.1731.65
R231.12
DnaKTCC box-18 to -4+strandR128.0427.79
R227.54
CPuORF11TCC box-58 to -44- strandR128.2828.5
R228.71
In rice, the Submergence1 (Sub1) locus encodes three ethylene-responsive factor (ERF), transcriptional regulators. It has been described that a large member of the ERF family interacts specifically with AGCCGCC through their conserved domain [51]. Direct interaction of GCC-boxes and non-GCC-boxes with Tomato transcription factor Pti4 (an ERF) revealed the involvement of ERFs in gene regulation and expression [52]. The binding of maltose binding protein (AtERF) to the GCC sequence (AGCCGCC) in Arabidopsis was hampered when both G residues within the GCC-box were replaced by T (ATCCTCC) [8, 53]. Several reports based on the gene ontology classification and differential expression of DnaJ, 60S ribosomal protein L7, DnaK and CPuORF11 genes in diverse species suggest that these genes are involved in cellular, biological, and molecular functions in the plant. In our previous work, MBP based real-time PCR analysis indicated that UR-DEGs and DR-DEGs under anoxic conditions that contained a GCC-box and TCC-box in their promoter region bound AP2/EREBP TF in rice[15]. Hence, validation of the in silico findings of GCC-box and TCC-box promoter motifs in the UR-DEGs (DnaJ and 60S ribosomal protein L7) and DR-DEGs (DnaK and CPuORF11) in O. sativa is essential.

Protein and DNA motif modeling

BLASTP was performed for AP2/EREBP TF sequences (LOC_Os03g22170) against the PDB database. Blast hits showed a 71% sequence identity with an E value of 2e-21 to the recently solved crystal structure of AtERF96 containing a GCC-box (resolution: 1.76Å) from Arabidopsis thaliana (PDB ID: 5wx9; chain A), which was selected as a template for the construction of the AP2/EREBP TF model (Fig 2A).
Fig 2

Three-dimensional structure of the rice AP2/EREBP TF and MD simulation.

(a) Superimposition of pre- and post-MD simulation AP2/EREBP TF; (b) RMSD analysis; (c) RMSF analysis; (d) radius of gyration for MD simulations with a 50 ns time period.

Three-dimensional structure of the rice AP2/EREBP TF and MD simulation.

(a) Superimposition of pre- and post-MD simulation AP2/EREBP TF; (b) RMSD analysis; (c) RMSF analysis; (d) radius of gyration for MD simulations with a 50 ns time period. Analysis of the stereochemical quality of individual residues in the protein was carried out using Ramachandran Plot. In the generated model, the percentage of residues in the most favored regions and additional allowed regions was 89.7% and 6.9%, respectively. According to the plot, 3.4% of the residues were located in the disallowed region. Analysis of the secondary structure of the AP2/EREBP TF revealed that it consists of one β-sheet, three β-strands, one α-helix, five β-turns, and one gamma (γ) turn (S1A Fig).

Analysis of molecular dynamics (MD) simulations of the AP2/EREBP TF

Structural refinement was carried out using molecular dynamics (MD) simulations with solvents and ions. Superimposition of pre-and post-MD simulated AP2/EREBP TF revealed a backbone RMSD deviation of 1.17Å (Fig 2A). The AP2/EREBP TF attains equilibrium after 10 ns and sustains the stability until the end of the simulation time period with an average RMSD of 0.59 nm (Fig 2B). RMSF showed a peak for individual residue, and two regions of the protein showed the highest fluctuation; 83-90 and 110-120 residues, whereas the rest of the structure remained stable with an average RMSF value of approximately 0.17 nm (Fig 2C). The radius of gyration of the protein backbone atoms was 1.26 nm, which contributed to the compactness of the protein. The representative structure was extracted from the stable time frame and used for the protein-DNA docking analysis. The simulated structure was analyzed using Ramachandran Plot, which revealed that residues found in the additional allowed regions had increased to 15.5% whereas, residues found in the disallowed region reduced to 1.7%, suggesting that the MD simulations increased the stability of the protein structure [54]. No difference in the secondary structure elements was observed in the pre- and post MD simulated AP2/EREBP TF structures (S1B Fig).

Protein-DNA interaction and stability analysis

To predict which amino acids interact with DNA, the representative structure of the AP2/EREBP TF was docked with a GCC-box and TCC-box using HADDOCK. Protein-GCC-box complexes were named as IHSAPDTM-BS, IRPAPDTM-BS and protein-TCC-box complexes as IDNAPDTM-BS and IOFAPBTM-BS. Both GCC-box and TCC-box motif DNA models were generated with 0° to 40° DNA bend angles (S2A and S2B Fig) and docked individually with the AP2/EREBP TF (S3 and S4). Cluster 1 had a maximum cluster size of 98 with the highest HADDOCK score of -134.2 ± 2.3 and -142.2 ± 3.3 for both IHSAPDTM-BS and IRPAPDTM-BS, respectively (Table 3). The IHSAPDTM-BS complex was stabilized by the formation of five hydrogen bonds (H-bonds) (Arg68, Arg73, Lys77, Lys87, and Thr95) and six hydrophobic interactions (Table 4 and Fig 3A). Similarly, four bonds (Arg64, Arg73, Lys77, and Arg83) and an extensive network of seven hydrophobic interactions reinforced the IRPAPDTM-BS complex stability (Table 4 and Fig 3B). It was evident from the HADDOCK results that DNA bends at 40° in both IHSAPDTM-BS and IRPAPDTM-BS complexes (GCC-box) had a strong affinity for the AP2/EREBP TF.
Table 3

Characteristics of HADDOCK interaction analysis of the AP2/EREBP TF with GCC and TCC-box motifs.

InteractionHADDOCK scoreCluster sizeRMSDVan der Waals energyElectrostatic energyDesolvation energyRestraints violation energyBuried Surface AreaZ-Score
GCC-BOXIHSAPDTM-BS-134.2 ± 2.3981.1 ± 0.8-67.1 ± 6.0-442.0 ± 37.520.2 ± 1.910.7 ± 12.831648.8 ± 99.0-2
IRPAPDTM-BS-142.2 ± 3.3981.9 ± 1.5-67.9 ± 3.5-495.2 ± 32.324.4 ± 1.94.1 ± 1.731723.4 ± 72.9-2
TCC-boxIDNAPDTM-BS-144.2 ± 2.8331.6 ± 1.5-65.5 ± 6.2-631.8 ± 36.941.1 ± 5.565.4 ± 29.742000.4 ± 113.3-1.6
IOFAPBTM-BS-147.5 ± 7.3392.8 ± 1.6-55.4 ± 6.2-694.5 ± 49.543.0 ± 4.738.1 ± 21.901769.1 ± 124.5-2

Keys: I-Interaction; AP-AP2/EREBP (LOC_Os03g22170) TF; HS-Heat Shock protein DnaJ gene promoter DNA segment (LOC_Os06g09560); RP-60S ribosomal protein L7 gene promoter DNA segment (LOC_Os08g42920); DN-DnaK gene promoter DNA segment (LOC_Os02g48110); OF-CPuORF11-conserved peptide uORF transcript gene promoter DNA segment (LOC_Os02g01240); (A/B/C/D)/T- 10–-40º bend angle; M-Model; BS- binding site.

Table 4

List of residues involved in the formation of hydrogen bonds and hydrophobic interactions in AP2/EREBP TF -DNA complexes.

Protein-DNA complexResidues involved in hydrogen bondingResidues involved in hydrophobic interactions
IHSAPDTM-BSPre-MDArg68, Arg73, Lys77, Lys87, Thr95Arg71, Arg72, Trp75, Arg83, Arg90, Trp92
Post-MDArg68, Arg73, Lys87, Arg90, Thr95Arg73, Trp92
IRPAPDTM-BSPre-MDArg64, Arg73, Lys77, Arg83Gly69, Arg71, Pro74, Trp75, Lys87, Arg90, Trp92
Post-MDArg71, Arg72, Arg73, Trp75, Arg83, Lys87, Thr95Arg90, Trp92
IOFAPBTM-BSPre-MDArg64, Arg68, Gly69, Arg71, Arg72, Arg83, Arg90, Lys117, Lys119Glu62, Arg63, Arg73
Post-MDGlu62, Arg63, Arg64, Arg72, Arg83, Lys87, Arg90Gly69, Arg73
IDNAPDTM-BSPre-MDGlu62, Arg63, Arg64, Arg68, Arg71, Arg73, Thr106, Lys119Leu66, Gly69, Pro74, Lys117, Pro123
Post-MDGlu62, Arg63, Arg64, Thr65, Arg68, Arg71, Thr106, Lys117Gly69, Arg83, Arg114, Lys119, Pro123
Fig 3

Superimposition of pre- and post-MD simulation complexes.

Interactions of the pre-MD and post-MD simulated complexes for (a) IHSAPDTM-BS; (b) IRPAPDTM-BS; (c) IDNAPDTM-BS; and (d) IOFAPBTM-BS. DNA is represented in green (pre-MD) and purple (post-MD), and protein is represented in pink (pre-MD) and gold (post-MD).

Superimposition of pre- and post-MD simulation complexes.

Interactions of the pre-MD and post-MD simulated complexes for (a) IHSAPDTM-BS; (b) IRPAPDTM-BS; (c) IDNAPDTM-BS; and (d) IOFAPBTM-BS. DNA is represented in green (pre-MD) and purple (post-MD), and protein is represented in pink (pre-MD) and gold (post-MD). Keys: I-Interaction; AP-AP2/EREBP (LOC_Os03g22170) TF; HS-Heat Shock protein DnaJ gene promoter DNA segment (LOC_Os06g09560); RP-60S ribosomal protein L7 gene promoter DNA segment (LOC_Os08g42920); DN-DnaK gene promoter DNA segment (LOC_Os02g48110); OF-CPuORF11-conserved peptide uORF transcript gene promoter DNA segment (LOC_Os02g01240); (A/B/C/D)/T- 10–-40º bend angle; M-Model; BS- binding site. The highest HADDOCK score for IDNAPDTM-BS and IOFAPBTM-BS (TCC-box) complexes were found to be -144.2 ± 2.8 and -147.5 ± 7.3, respectively (Table 3). The number of hydrogen bonds and hydrophobic interactions in IOFAPBTM-BS and IDNAPDTM-BS complexes were nine (Arg64, Arg68, Gly69, Arg71, Arg72, Arg83, Arg90, Lys117, and Lys119) and three, and eight (Glu62, Arg63, Arg64, Arg68, Arg71, Arg73, Thr106, and Lys119) and five, respectively (Table 4 and Fig 3C and 3D). The cluster size and Z-score for the selected clusters were 33 and -1.6 for IDNAPDTM-BS, and 39 and -2.0 for IOFAPBTM-BS, respectively. DNA bends at 40° and 20° in IDNAPDTM-BS and IOFAPBTM-BS complexes had strong binding affinities. The HADDOCK results were selected for further MD simulations. Therefore, the conformation adopted by DNA play a very significant role in specific interaction between AP2/EREBP TF and DNA [55].

Conformational and interaction analysis of the docked complexes after MD simulations

To examine the dynamics and to gain specific interaction information, the protein-DNA complexes were subjected to 50 ns MD simulations. IHSAPDTM-BS and IRPAPDTM-BS attained a final conformation with a backbone RMSD of approximately 0.53 nm and 0.37 nm, respectively (Fig 4A). In addition, IDNAPDTM-BS and IOFAPBTM-BS showed an average deviation from the initial structure of 0.36 nm and 0.43 nm, respectively (Fig 4A). RMSD value for the backbone atoms less than 1.0nm suggested stability of the complex structures [56]. Furthermore, the structural deviations of the DNA-bound complexes were analysed at regular time intervals across the simulation trajectory (S1 Table).
Fig 4

MD simulation trajectory analysis of the AP2/EREBP TF bound to GCC-box and TCC-box motifs.

(a) RMSD analysis; (b) RMSF analysis; (c) radius of gyration; and (d) number of hydrogen bonds during the 50 ns MD simulation time period.

MD simulation trajectory analysis of the AP2/EREBP TF bound to GCC-box and TCC-box motifs.

(a) RMSD analysis; (b) RMSF analysis; (c) radius of gyration; and (d) number of hydrogen bonds during the 50 ns MD simulation time period. The RMSF value of key residues stabilizing the IHSAPDTM-BS (Arg68, Arg73, Lys87, Arg90, and Thr95) and IRPAPDTM-BS (Arg71, Arg72, Arg73, Trp75, Arg83, Lys87, and Thr95) complexes varied from 0.08 to 0.25 nm, respectively (Fig 4B). The RMSF value for the interacting residues in IOFAPBTM-BS (Glu62, Arg63, Arg64, Arg72, Arg83, Lys87, and Arg90) and IDNAPDTM-BS (Glu62, Arg63, Arg64, Thr65, Arg68, Arg71, Thr106, and Lys117) ranged from 0.07 to 0.34 nm, respectively (Fig 4B). Moreover, the radius of gyration and the hydrogen bond analysis for all four complexes indicated the compactness and stability of the complexes (Fig 4C and 4D). MD analysis results indicated that all four complexes underwent minor conformational changes during the simulation time period. The representative docked complexes were extracted from the stable time frame for the identification of key interacting residues. A comparative interaction analysis was carried out for all protein-DNA complexes. The total number of hydrogen bonds remained unchanged in pre- and post-MD simulated IHSAPDTM-BS and increased from four to seven in IRPAPDTM-BS complexes (Table 4). However, in the IOFAPBTM-BS, the number of hydrogen bonds decreased from nine to seven but remained constant for the IDNAPDTM-BS complex (Table 4). In subsequent MD simulations, the number of hydrophobic interactions reduced drastically in all complexes (IHSAPDTM-BS, IRPAPDTM-BS, and IOFAPBTM-BS) except IDNAPDTM-BS (Table 4). Most of the interacting residues in the pre-simulated complex were conserved in the post-simulated structures, suggesting that they play a crucial role in the formation of AP2/EREBP TF -DNA complex.

Conformation analysis of the complexes

To study the conformational variation during MD simulations, we extracted snapshots of each complex at 10 ns intervals (0ns, 10 ns, 20 ns, 30 ns, 40 ns, and 50 ns) and analyzed these for the IHSAPDTM-BS, IRPAPDTM-BS, IDNAPDTM-BS, and IOFAPBTM-BS complexes (S3 and S4 Figs). The analysis revealed that the amino acid residues involved in the formation of hydrogen bonds (H-bond) with the DNA remained stable and consistent after 10 ns (S2 Table). Thus, the overall MD simulation trajectory analysis along with the comparative interaction analysis at regular time intervals, indicated that there was a fairly stable interaction between the AP2/EREBP TF and DNA motif through H-bonding and hydrophobic interactions [57].

Binding free energy analysis

Calculation of protein-DNA binding free energy is a very vast field of research and computational techniques. MM-PBSA method uses the last 5 ns (45–50 ns) of MD simulation trajectories to calculate the binding free energy components, including van der Waal energy, electrostatic energy, polar and non-polar energies and their contribution towards protein-DNA complex stability. The total binding free energy for the IHSAPDTM-BS, IRPAPDTM-BS, IDNAPDTM-BS, and IOFAPBTM-BS complexes were computed to be -27488.958±372.317 kJ/mol, -31225.294±467.742 kJ/mol, -28791.293±438.664 kJ/mol, and -31168.009±438.691 kJ/mol, respectively, high negative binding free energy values suggested strong binding affinity between the AP2/EREBP TF and DNA motifs (Table 5).
Table 5

Binding free energy calculation for the AP2/EREBP TF complex with GCC-box and TCC-box motifs.

Protein-DNA complexVan der Waals (kJ/mol)ΔGvdWElectrostatic (kJ/mol)ΔGcoulPolar contribution(kJ/mol)ΔGpolarNon-polar contribution(kJ/mol)ΔGnonpolarBinding energy(kJ/mol)ΔG
IHSAPDTM-BS-234.378 ±18.944-28944.093 ±412.8671727.964 ±115.540-38.452 ± 2.856-27488.958 ±372.317
IRPAPDTM-BS-305.842 ±22.860-33044.70 ±519.2602165.584 ±159.310-40.328 ± 2.464-31225.294 ±467.742
IDNAPDTM-BS-333.626 ±24.718-30568.986 ±465.5222157.740 ±161.546-46.421 ± 2.424-28791.293±438.664
IOFAPBTM-BS-213.779 ± 23.833-33277.075 ±568.8132354.455 ±207.294-31.610± 2.974-31168.009±438.691
The effect of each residue to the binding energy was computed and showed that the contribution of most of the common interacting residues (Arg68, Arg72, Arg83, Lys87, and Arg90) was observed to be very similar in DNA-bound complexes, suggesting a significant role for these residues in complex stabilization(Fig 5A–5D). Highest contributions were made by electrostatic energy, followed by polar energy. The high binding energy profile was in agreement with the interaction profile of each DNA-bound complex.
Fig 5

Decomposition of binding free energy per amino acid residue.

(a) IHSAPDTM-BS; (b) IRPAPDTM-BS; (c) IDNAPDTM-BS; and (d) IOFAPBTM-BS complexes.

Decomposition of binding free energy per amino acid residue.

(a) IHSAPDTM-BS; (b) IRPAPDTM-BS; (c) IDNAPDTM-BS; and (d) IOFAPBTM-BS complexes.

Analysis of conformational fluctuation in AP2/EREBP TF and DNA- bound complexes

The development of multivariate methods, such as PCA, promises to enrich the analysis of MD data and to reveal quantitative insights into the relationships between structure, dynamics, and function. Covariance provides information about the cooperativity of motion and can be positive or negative, however, the trace is the sum of the leading diagonal, therefore, and the trace is the sum of the individual variances [58]. The trace value for the AP2/EREBP TF, IHSAPDTM-BS, IRPAPDTM-BS, IDNAPDTM-BS, and IOFAPBTM-BS was 7.6 nm2, 8.2 nm2, 4.5 nm2, 6.3 nm2, and 6.2 nm2, respectively; the small trace values corresponded to positive covariance and confirmed the decrease in flexibility in the collective motion of the protein, thus revealing a higher stability (Fig 6). The covariance matrix was used to generate the eigenvector and its corresponding eigenvalues for the AP2/EREBP TF and DNA-bound complexes (S5 Fig). The Gibbs free energy (∆G) value ranged from 12.6 to 14.7 kJ/mol for DNA-bound complexes. The overall results indicated the stability of the AP2/EREBP TF and its DNA-bound complexes (Fig 7).
Fig 6

Principal component analysis for the unbound and bound structures.

(a) AP2/EREBP TF; (b) IHSAPDTM-BS; (c) IRPAPDTM-BS; (d) IDNAPDTM-BS; and (e) IOFAPBTM-BS.

Fig 7

Gibbs free energy landscape for the unbound and bound structures.

(a) AP2/EREBP TF (b) IHSAPDTM-BS; (c) IRPAPDTM-BS; (d) IOFAPBTM-BS; and (e) IDNAPDTM-BS.

Principal component analysis for the unbound and bound structures.

(a) AP2/EREBP TF; (b) IHSAPDTM-BS; (c) IRPAPDTM-BS; (d) IDNAPDTM-BS; and (e) IOFAPBTM-BS.

Gibbs free energy landscape for the unbound and bound structures.

(a) AP2/EREBP TF (b) IHSAPDTM-BS; (c) IRPAPDTM-BS; (d) IOFAPBTM-BS; and (e) IDNAPDTM-BS.

Conclusion

We successfully designed MBP and specific primers for UR-DEGs (DnaJ and 60S ribosomal protein L7) and DR-DEGs (DnaK and CPuORF11) and validated the presence of GCC-box and TCC-box promoter motifs. The molecular dynamics study of the protein-DNA complexes revealed a high binding affinity of the AP2/EREBP TF for GCC- and TCC-box motifs in selected genes. The GCC-box amino acid residues Arg68, Arg71, Arg72, Arg73, Trp75, Arg83, Lys87, Arg90 and Thr95, and the TCC-box amino acid residues Glu62, Arg63, Arg64, Thr65, Arg68, Arg71, Arg72, Arg83, Lys87, Arg90, Thr106, and Lys117 directly interacted with DNA. Consequently, these residues play an important role in the stabilization of the complex and the regulation of the differential expression of these genes in rice. Therefore, our results shed light on the underlying mechanism of GCC-box and TCC-box recognition by proteins.

Secondary structure analysis of the AP2/EREBP TF before and after MD simulations.

(TIF) Click here for additional data file. DNA motif bend angle of 0°, 10°, 20°, 30°, and 40° for an (a) GCC-box; and (b) TCC-box. (TIF) Click here for additional data file. Extracted snapshots of (a) IHSAPDTM-BS and (b) IRPAPDTM-BS complexes at regular intervals during the 50 ns simulation time period. (TIF) Click here for additional data file. Extracted snapshots of (a) IDNAPDTM-BS and (b) IOFAPBTM-BS complexes at regular intervals during the 50 ns simulation time period. (TIF) Click here for additional data file. Covariance analysis of the (a) AP2/EREBP TF; (b) IHSAPDTM-BS; (c) IRPAPDTM-BS; (d) IOFAPBTM-BS; and (e) IDNAPDTM-BS. (TIF) Click here for additional data file.

RMSD of AP2/EREBP TF -DNA complexes at different time intervals.

(DOCX) Click here for additional data file.

List of residues involved in the formation of hydrogen bonds in AP2/EREBP TF-DNA complexes at different time intervals.

(DOC) Click here for additional data file.
  50 in total

1.  The HADDOCK web server for data-driven biomolecular docking.

Authors:  Sjoerd J de Vries; Marc van Dijk; Alexandre M J J Bonvin
Journal:  Nat Protoc       Date:  2010-04-15       Impact factor: 13.491

2.  Molecular beacons: probes that fluoresce upon hybridization.

Authors:  S Tyagi; F R Kramer
Journal:  Nat Biotechnol       Date:  1996-03       Impact factor: 54.908

3.  Assessing the performance of the MM/PBSA and MM/GBSA methods. 1. The accuracy of binding free energy calculations based on molecular dynamics simulations.

Authors:  Tingjun Hou; Junmei Wang; Youyong Li; Wei Wang
Journal:  J Chem Inf Model       Date:  2010-11-30       Impact factor: 4.956

4.  Principal component analysis for protein folding dynamics.

Authors:  Gia G Maisuradze; Adam Liwo; Harold A Scheraga
Journal:  J Mol Biol       Date:  2008-10-15       Impact factor: 5.469

5.  In silico analysis of motifs in promoters of differentially expressed genes in rice (Oryza sativa L.) under anoxia.

Authors:  Ashutosh Kumar; Shuchi Smita; Neeti Sahu; Vivekanand Sharma; Ambarish Vidyarthi; Dev Mani Pandey
Journal:  Int J Bioinform Res Appl       Date:  2009

6.  A conserved upstream open reading frame mediates sucrose-induced repression of translation.

Authors:  Anika Wiese; Nico Elzinga; Barry Wobbes; Sjef Smeekens
Journal:  Plant Cell       Date:  2004-06-18       Impact factor: 11.277

Review 7.  Regulation of gene expression via the core promoter and the basal transcriptional machinery.

Authors:  Tamar Juven-Gershon; James T Kadonaga
Journal:  Dev Biol       Date:  2009-08-13       Impact factor: 3.582

8.  The submergence tolerance regulator Sub1A mediates stress-responsive expression of AP2/ERF transcription factors.

Authors:  Ki-Hong Jung; Young-Su Seo; Harkamal Walia; Peijian Cao; Takeshi Fukao; Patrick E Canlas; Fawn Amonpant; Julia Bailey-Serres; Pamela C Ronald
Journal:  Plant Physiol       Date:  2010-01-27       Impact factor: 8.340

9.  3D-DART: a DNA structure modelling server.

Authors:  Marc van Dijk; Alexandre M J J Bonvin
Journal:  Nucleic Acids Res       Date:  2009-05-05       Impact factor: 16.971

10.  Novel natural structure corrector of ApoE4 for checking Alzheimer's disease: benefits from high throughput screening and molecular dynamics simulations.

Authors:  Manisha Goyal; Sonam Grover; Jaspreet Kaur Dhanjal; Sukriti Goyal; Chetna Tyagi; Sajeev Chacko; Abhinav Grover
Journal:  Biomed Res Int       Date:  2013-11-13       Impact factor: 3.411

View more
  1 in total

1.  Root endophyte induced plant thermotolerance by constitutive chromatin modification at heat stress memory gene loci.

Authors:  Kirti Shekhawat; Maged M Saad; Arsheed Sheikh; Kiruthiga Mariappan; Henda Al-Mahmoudi; Fatimah Abdulhakim; Abdul Aziz Eida; Rewaa Jalal; Khaled Masmoudi; Heribert Hirt
Journal:  EMBO Rep       Date:  2021-01-10       Impact factor: 8.807

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.