Literature DB >> 30733795

In silico analysis of squalene synthase in Fabaceae family using bioinformatics tools.

Abstract

Triterpenoid saponins are a diverse group of bioactive compounds, which are used for possessing of many biomedical and pharmaceutical products. Generally, squalene synthase (SQS) is defined as an emerging and essential branch point enzyme far from the major pathway of isoprenoids biosynthetic and a latent adjusting point, which manages carbon flux into triterpenes biosynthesis and sterols. The present study deals with the detailed characterization of SQS by bioinformatics approaches to evaluate physicochemical properties, structural characteristics including secondary and 3D structure prediction and functional analysis from eight plants related to Fabaceae family and Arabidopsis thaliana. Bioinformatics analysis revealed that SQS proteins have two transmembrane regions in the C-terminal. The predicted motifs were used to design universal degenerate primers for PCR analysis and other molecular applications. Phylogenetic analysis showed conserved regions at different stretches with maximum homology in amino acid residues within all SQSs. The secondary structure prediction results showed that the amino acid sequence of all squalene synthases had α helix and random coil as the main components. The reliability of the received model was confirmed using the ProSA and RAMPAGE programs. Determining of active site by CASTp proposes the possibility of using this protein as probable medication target. The findings of the present study may be useful for further assessments on characterization and cloning of squalene synthase.

Entities: CellLine Chemical Disease Gene Species

Keywords: Bioinformatics analysis; Fabaceae; Squalene synthase; Triterpenoid biosynthesis

Year: 2018 PMID： 30733795 PMCID： PMC6353760 DOI： 10.1016/j.jgeb.2018.06.001

Source DB: PubMed Journal: J Genet Eng Biotechnol ISSN： 1687-157X

Introduction

The Fabaceae family, commonly known as legumes, are one of the most important plant families in both economical and medicine terms. Being the third largest family of angiosperms after Orchidaceae and Asteraceae with 630 genera and over 18,860 species, this family can be found in many climates [1]. Members of the Fabaceae such as Glycine max (soybean), Phaseolus (beans), Pisum sativum (pea), Cicer arietinum (chickpeas) and Medicago sativa (alfalfa) are among the most important crops in the world, by which a widespread range of natural products (e.g., drugs, dyes, flavors and poisons) could be synthesized [2]. Triterpenes are a huge variety of natural products which produced by Legumes [3]. For example, it was possible to retrieve soyasaponins (β-amyrin-derived oleanane-type triterpenoid saponins) from different legumes, including Pisum sativum [4], Glycine max [5] and various types of the genus Medicago [6]. However, this isolation demonstrated the normal presence of soyasaponins in leguminous vegetables. Since ancient times, licorice (stolon and root of glycyrrhzia) has been recognized as one of the legumes with the most basic therapeutic features. Medicinal importance of Licorice is due to the presence of a main active component called Glycyrrhizin , an oleanane-type saponin, in a large amount (2–8% of the dry weight) in the underground parts of plant [7]. Moreover, licorice has been recognized as the most conventional Asiatic folk drugs, which is applied as an agent that reduces inflammation on neutrophil functions, such as generation of reactive oxygen species (ROS) (Rahman and Sultana, 2007). Recognized as blocking agent of lipid peroxidation chain reactions, Glycyrrhizin is a quenching agent of free radicals. Other pharmacological activities of Glycyrrhizin are antiviral against various DNA and RNA viruses including HIV, antioxidative activities, hepatoprotective effect (probably by preventing changes in cell membrane permeability) and immunomodulatory activities [8]. It is also used worldwide as a natural sweetener [9]. Furthermore, Glycyrrhizin normally transpires as potassium and calcium salt in licorice [10]. All these triterpenic compounds are synthesized via the isoprenoid pathway. Isoprenoids are derived from common C5 building blocks namely isopentenyl diphosphate (IPP) and its isomer dimethylallyl diphosphate (DMAPP). Two various pathways, including MEP (2-C-methyl-D-erythritol 4-phosphate), which happens in the plastids, and MVA (mevalonate), which functions in the cytoplasm, are applied to create these practical entities, which work as precursors to all terpenoid structures [11], [12], [13]. Squalene synthase (SQS) is the key enzyme of triterpenoid synthesis pathway. This enzyme is generally defined as an emerging and essential branch point enzyme far from the major pathway of isoprenoids biosynthetic and a latent adjusting point, which manages carbon flux into triterpenes biosynthesis and sterols [14]. The biosynthetic pathway and two-step reaction of reductive dimerization of farnesyl pyrophosphate (FPP) by SQS is shown in Fig. 1.

Fig. 1

Squalene synthase (SQS) in the triterpenoid biosynthesis pathway. Squalene synthase participates in a two-step reaction in which two identical molecules of FPP are converted into one molecule of squalene with the consumption of NADPH. In this study, we focused on squalene synthase as an important rate-limiting enzyme at the branch point of metabolic pathway FPP and other metabolic products [15]. Due to the positive relationship between the levels of triterpenes and SQS expression rate [16], [17], [18], [19], we used the bioinformatics methods to analyze the SQS protein in eight plants of Fabaceae family and A. thaliana and designed universal degenerate primers of SQS. These information can be used for gene expression analysis and other molecular applications.

Materials and methods

Retrieval of the sequences and plant materials

The amino acids sequences of SQSs (FASTA format) related to eight plants of Fabaceae family and Arabidopsis thaliana were retrieved from UniProtKB database (http://www.uniprot.org/) and documented in Table 1. As Table 1 shows, two copies of the SQS gene were reported in some studying plants. Plant materials including leaves samples of studying plants were collected from the field of Medicinal Plants and Drugs Research Institute (MPH), Shahid Beheshti University, Tehran, Iran and frozen immediately using liquid nitrogen until RNA extraction.

Table 1

Squalene synthase protein and cDNA sequences in eight plants of Fabaceae family and A. thaliana.

Plant	Copy number of the SQS	Protein registry number (UniProtKB)	Encoding amino acids length/aa
Arabidopsis thaliana	SQS1	P53799	410
Arabidopsis thaliana	SQS 2	O65688	413
Glycyrrhiza eurycarpa	SQS 1	A4F3P0	413
Glycyrrhiza eurycarpa	SQS 2	A4F3P1	412
Glycyrrhiza glabra	SQS 1	Q42760	413
Glycyrrhiza glabra	SQS 2	Q42761	412
Glycyrrhiza uralensis	SQS 1	A4F3N8	413
Glycyrrhiza uralensis	SQS 2	A4F3N9	412
Astragalus membranaceus	SQS	A0A0K2DSF0	413
Glycin max	SQS	O22107	413
Lotus japonicus	SQS	Q84LE3	413
Medicago truncatula	SQS	Q8GSL6	413
Phaseolus vulgaris	SQS	V5N8M3	413

Squalene synthase protein and cDNA sequences in eight plants of Fabaceae family and A. thaliana.

Sequence analysis

The composition and various physical and chemical parameters of squalene synthase cDNA and amino acid sequence including isoelectric point (pI), molecular weight (Mw), instability index (II) [20], aliphatic index (AI) [21] and grand average hydropathy (GRAVY) [22] of the protein were analyzed using ExPasy tool ProtParam online analysis (http://us.expasy.org/tools/protparam.html). Hydrophobic and hydrophilic features and transmembrane structure domain were analyzed using Protscale (http://web.expasy.org/cgi-bin/protscale/protscale.pl) and TMHMM (http://www.cbs.dtu.dk/services/TMHMM/). The SignalP software version 4.1 and SMART online tool were used for the signal peptide prediction of squalene synthase. (http://www.cbs.dtu.dk/services/SignalP/, http://smart.embl-heidelberg.de/). PSORT Prediction [23] can be used in subcellular localization analysis (http://psort.hgc.jp/form.html). One of the most useful tools for analyzing post-translational modifications is PROSITE, a database on the ExPASy site. It contains a list of short sequence motifs (also some named patterns) that experiments have associated with particular biological properties. Many of these patterns are associated with post-translational modifications. The motifs were identified applying PROSITE database and MOTIF search tool (http://www.genome.jp/tools/motif). Multiple alignments were conducted using T-coffee server using ClustalW method. The universal degenerate primers were designed regarding to conserved motifs and then tested by PCR analysis. Total RNA was extracted using IHTB method [17] from the leaves of the plants, then the first strand cDNA was synthesized from the isolated RNA templates by reverse transcriptase with Oligo-(dT)18 primers according to the instructions of the RevertAidTM H M-MuLV First Stand cDNA Synthesis Kit (Thermo Fisher Scientific, Fermentas). Polymerase chain reaction (PCR) was performed using degenerate primers which designed (forward 5′- GTGGCWGGACTTGTTGG -3′ and reverse 5′- CCARAACATGCGTGACTT -3′). The 25 μl reactions contained 1 μl of the synthesized cDNA, 1 μl of each primer and 12.5 μl 2X PCR master mix (Sinaclon, Iran) and 9.5 μl distilled water. The PCR conditions include initial denaturation at 94 °C for 5 min, followed by 35 cycles [94 °C for 1 min, 50 °C for 1 min, 72 °C for 30 sec] and final extension at 72 °C for 5 min. The amplified product were separated on a 1% agarose gel and visualized by ethidium bromide staining. Phylogenetic analysis was performed using the Neighbor Joining method through molecular evolution genetics analysis (MEGA) software, version 7 [24]. The evolutionary history of the taxa analyzed was presented taking the bootstrap consensus tree, which is deduced from 5000 replicates. In addition, P-distance technique was applied to calculate the evolutionary distances. Moreover, elimination of all positions, which had missing data and gaps, from the dataset was carried out in the research (full option of omission).

Secondary and 3D structure prediction

Secondary structure was predicted via SOPMA [25] and the PSIPRED server (PSIPRED V3.3) (http://bioinf cs.ucl.ac.uk/psipred/). Three-dimensional (3D) protein structure of Glycyrrhiza glabra (Q42760) was constructed as a representative of this family using MODELLER 9.18 [26] (https://salilab.org/modeller/), a system that was applied for modeling the 3D form of a protein from its sequence of amino acid applying methods of homology modeling. The sequence homology search was performed using the algorithm of protein-BLAST (blastp) against the protein databank. After that, extraction of template proteins’ 3D structure and sequence from the database of PDB was performed. After modeling, the models were evaluated by MODELLER’s normalized DOPE (Discrete Optimized Protein Energy) function. Energy minimization or geometry optimization was carried out to determine a stable conformer with Swiss PDB Viewer [27]. Structural evaluation and stereochemical analyses were done with Ramachandran plot analysis by using RAMPAGE [28]. The model having least number of residues in disallowed region was chosen as a best model. Problematic parts of the model were identified by ProSA (Protein Structure Analysis) server (https://prosa.services.came.sbg.ac.at), a tool widely used to check 3D models of protein structures for potential errors [29]. Structure visualization and structural superimposition was performed with chimera1.11.2 [30]. Applying the CASTp (calculated Atlas of surface topography of proteins) server (http://sts.bioengr.uic.edu/castp/index.php), the visualization and anticipated binding GGQS1 sites was assessed [31].

Results and discussions

Physicochemical properties

Table 2 contains the parameters calculated applying Expasy’s ProtParam tool. From the relative molecular analysis, the Q42760 had the largest relative molecular mass, Q42761 had the smallest relative molecular mass, both related to G. glabra (SQS1 & SQS2). Containing a zero net charge of protein, isoelectric point (pI) is the pH at which the amino acid is neutral, which is interpreted as the dominance of the zwitterion form. One of the important features of a protein is its pI, where the protein has the least solubility, making it unsteady. It is noteworthy that the protein is soluble in both above and below the isoelectric point (isoelectric pH). Another benefit of calculation of pI is its use in the development of buffer system in order to carry out purification process using the technique of isoelectric focusing. In an electro-focusing system, mobility is zero at pI. While a pI value below seven (pI < 7) demonstrates the zero net electrical charge of proteins at this pH value, the pI value above seven (pI > 7) indicates the zero net electrical charge at basic value of pH [32], [33] In theoretical isoelectric point value analysis, O22107 was the minimum (6.13), A4F3P1 was the maximum (8.21). Negatively charged residues (D + E) and positively charged residues (R + K), were basically the same range (47–53). This is helpful in determining the topology of protein as stated by Krogh et al [34]. Total atoms of Q42761 (6605) and O65688 (6703) were the minimum and maximum respectively. The amount of light absorbed by protein at a specific wavelength is indicated by the extinction coefficient. Calculation of this coefficient is beneficial to follow a protein with a spectrophotometer during purification. It should be noted that the range of this parameter was 0.9–1.229.

Table 2

Physical and chemical properties of the amino acid encoding sequence of squalene synthase in eight plants of Fabaceae family and A. thaliana.

UniProtKB	A	B	C	D	E	F	G	H	I
P53799	6.18	53	49	6632	1.229	30 h	43.28	93.93	−0.129
O65688	8.13	50	53	6703	0.987	30 h	35.87	101.5	−0.055
A4F3P0	7.93	49	51	6651	0.9	30 h	42.69	96.13	−0.085
A4F3P1	8.21	47	50	6620	1.051	30 h	34.98	93.76	−0.077
Q42760	7.56	50	51	6670	1.014	30 h	41.48	96.13	−0.086
Q42761	6.89	49	48	6605	1.052	30 h	37.61	94	−0.071
A4F3N8	7.93	48	50	6652	1.016	30 h	42.26	96.13	−0.078
A4F3N9	7.57	49	50	6625	1.051	30 h	36.62	95.19	−0.081
A0A0K2DSF0	6.7	49	47	6647	1.047	30 h	42.51	95.42	−0.036
O22107	6.13	53	47	6627	1.049	30 h	42.38	93.75	−0.087
Q84LE3	7.95	50	52	6640	1.05	30 h	36.79	94.94	−0.089
Q8GSL6	7.54	51	52	6625	1.05	30 h	38.45	91.84	−0.142
V5N8M3	7.17	49	49	6653	1.014	30 h	41.43	96.56	−0.015

A: Theoretical isoelectric point B: Negatively charged residues (Asp + Glu) C: Positively charged residues (Arg + Lys) D: Total number of atoms E: Extinction coefficient F: Half-life period G: Instability index H: Aliphatic index I: Grand average of hydropathicity (GRAVY).

Physical and chemical properties of the amino acid encoding sequence of squalene synthase in eight plants of Fabaceae family and A. thaliana. A: Theoretical isoelectric point B: Negatively charged residues (Asp + Glu) C: Positively charged residues (Arg + Lys) D: Total number of atoms E: Extinction coefficient F: Half-life period G: Instability index H: Aliphatic index I: Grand average of hydropathicity (GRAVY). The half-life period was exactly the same. The exact amount of time required by half of the protein in a cell to vanish after synthesis in the cell is predicted by the half-life. A relationship between a protein’s half-life and identity of its N-terminal residue is established by ProtParam, which depends on the “N-end rule” [35]. According to stability coefficients O65688, A4F3P1, Q42761, A4F3N9, Q84LE3, Q8GSL6 were stable proteins and the remaining were non-stable proteins. This results show that SQS2 is stable-protein in all plants. While a protein with the unsteadiness index of <40 is anticipated to be constant, index higher than 40 anticipates the possibility of instability of the protein. A protein’s aliphatic indicator is described as the comparative capacity filled by aliphatic side chains, including leucine, alanine, isoleucine and valine. This indicator might be considered a positive variable for elevation of globular proteins’ thermos-stability. The formula by Atsushi is applied to compute the aliphatic indicator of protein [21]. For Q8GSL6 and Q65688, this indicator was 91.84 to 101.5, respectively. In addition, Kyte and Doolittle [22] established the scale of hydropathy, where attention is paid to the hydrophobic features of a protein’s amino acid side chains. In a protein separated by its total residue number, estimation of GRAVY score is carried out by considering it as the sum of the hydropathy values for a protein’s amino acids. While positive value demonstrated the bipolar nature of protein, negative GRAVY value indicated that the protein is non-polar. Moreover, the range of −0.142 (Q8GSL6) to -0.015 (V5N8M3) was determined for the GRAVY index. The possibility of improved collaboration with water is demonstrated by this low range.

Hydropathy plot and transmembrane domain analysis

The hydrophilic or hydrophobic properties of the side-chain of an amino acid are presented by its hydropathy index [22]. A hydrophobicity score within the range of −4.5 to 4.5 is allocated to each amino acid, where the −4.5 score is indicative of most hydrophilic character and 4.5 determines the most hydrophobic feature. The amino acid sequences of SQSs were analyzed using Kyte-Doolittle Hydropathy plot of Protscale tool at window size 19. In SQSs two transmembrane regions were observed in the C-terminal (Fig. 2a). Presence of transmembrane domains was also predicted on TMHMM server (http://www.cbs.dtu.dk/services/TMHMM/). Two domains were found in all member of Fabaceae family which only is showed for G. glabra in Fig. 2b, the first was from AA282 to AA304 for SQS1 and AA281 to AA304 for SQS2; the second was from AA388 to AA408 for SQS1 and AA385 to AA407 for SQS2, showing that the two transmembrane domains were compose of 20–23 amino acid residues in all members of Fabaceae, which is a binding site of membrane protein and membrane lipids.

Fig. 2

(a) Hydrophobicity and hydrophilicity of SQS amino acid sequence from G. glabra. (Arrows indicate highly hydrophobic regions). (b) Transmembrane domains of SQS gene from G. glabra.

Signal peptide and subcellular localization features of squalene synthase

The most significant first part of comprehension of the function of a protein is discovering its subcellular localization. Online tools SignalP and SMART were used to predict amino acid signal peptide in all studying plants. The graphical output from SignalP (Fig. 3a) showed the three different scores, C, S and Y, for each position in the sequence. These indexes of signal peptide were lower than its threshold, indicating that No signal peptides were predicted for all squalene synthase proteins of this family and protein transport regulates by other factor. Recognition and interpretation of domains, that are genetically mobile, and evaluation of their architectures are possible by SMART (a Simple Modular Architecture Research Tool). There are broad explanations for these domains according to their tertiary structures, phyletic distributions, practically significant residues and practical level. The results of SMART tool showed that there were two functional domains in SQS protein that was coincidence with the prediction of transmembrane structure. The results are briefly explained in the SMART diagram presented in Fig. 3b. It is noteworthy that obtaining scores, which were less significant, compared to the determined cut-off point, led to the elimination of some domains from the diagram.

Fig. 3

(a) Signal peptide prediction in G. glabra SQS protein by SignalP (SQS1: the left & SQS2: the right). (b) Schematic representation of transmembrane regions of G. glabra SQS protein by SMART (SQS1: the left & SQS2: the right).

Homologous alignment and phylogenetic analysis of SQS:

Multiple Sequence Alignment (MSA) analysis of SQS domain resulted in two conserved motifs. Motif 1 ‘‘YCHYVAGLVGLGLSKL’’ which was 16 residue long and motif 2 ‘‘MGLFLQKTNIIRDYLEDINEIPKSRMFWP’’ which was 29 residue long. These conserved motifs were present in all proteins considered for the study. According to previous reports, these motifs sequences are found in different plants with numerous residues replacement [36], [37], [38], [39]. Regarding the sequence of protein in all Fabaceae plants, there were two ranges of 168–183 and 201–229 for the motifs’ positions in motif 1 and 2, respectively. These positions are the same reported in W. somnifera [38]. These two conserved motifs were considered to design the universal degenerate primers. The efficiency of the designed primers in amplifying fragment of SQS gene was tested using RT-PCR. The fragment of 171 bp of SQS was successfully amplified for all plants (Fig. 4) which indicated the primers worked well and they can be used for molecular applications especially real time PCR.

Fig. 4

The results of RT-PCR for SQS gene (A), Glycyrrhiza eurycarpa (B), Glycyrrhiza glabra (C), Glycyrrhiza uralensis (D), Astragalus membranaceus (E), Glycin max (F) Lotus japonicus (G) Medicago truncatula (H) Phaseolus vulgaris. Amplification products were separated on a 1% agarose gel. Phylogenetic tree was constructed for all Fabaceae species and A. thaliana to investigate the evolutionary relations (Fig. 5). According to molecular evidence, in some plants there is only one copy of the SQS gene in the genome, while other plants have multiple copies in their genome. For example, two or three SQS genes were reported in G. glabra, N. tabacum, A. thaliana, and p. genseng [40]. The results indicated that, they can be divided into two groups: SQSs of Fabaceae family were in Group 1; SQS1 and SQS2 of A. thaliana were in Group 2. Based on phylogeny, despite phylogenetic divergences at primary sequence level between Fabaceae species and A. thaliana SQSs, the homology was high within all plants, suggesting SQSs were derived from one ancestor gene and it has similar function in squalene biosynthesis. These data also showed that SQS1s of Fabaceae family were closely related to each other as compared to SQS2s, which showed phylogenetic divergence between SQS1 and SQS2.

Fig. 5

Phylogenetic relationship of SQS eight plants of Fabaceae family and A.thaliana. Numbers above the branches are the Neighbor Joining bootstrap values. The evolutionary distance is reflected by the numbers below the branches and the branch lengths proportional to the degree of amino acid substitutions.

The advanced structures of SQS

Online tools SOPMA predicted SQS secondary structure in all studying plants, the results indicated SQS secondary structure is composed of α helix, random coil, the extended strand and Beta turn. Among them, α helix and random coil were the largest structural elements, extension chain and Beta turn scattered distribution in the entire protein. In this regard, our findings are in line with the results previously reported in SQS of ginseng species [41] and W. somnifera [38] that the distribution of alpha helix and random coils are comparatively higher. The secondary structure of SQS was also predicted by the PSIPRED, Protein Structure Prediction Server (PSIPRED V3.3) which verified SOPMA results. The modeling of the three dimensional structure of the protein was performed by MODELLER for G. glabra (Q42760). We tried to obtain the best model from multiple templates. To select the appropriate templates for our query sequence, Six templates (1EZF, 3LEE, 3VJ8, 3WEK, 3WEJ and 3WEI (chain: A)) having maximum identity and lowest e value, belonging to the crystal structure of human Squalene Synthase with 48% sequence identity were considered as the best hits and ten models were created using this program. But only 4 models were selected based on DOPE score. Before modeling the residues in regions with no homology with the templates and hydrophobic regions were deleted. Ramachandran plot, a special way for plotting protein torsion angles, was used to assess the stereochemical quality of modeled protein structure by Rampage tool. The results revealed that the modeled structures have 96.9%, 2.8% and 0.3% of the residues in favoured, allowed and outlier regions respectively (Fig. 6). Efficient stereochemical accuracy is indicated by this spread of the dihedral angles (ϕ,ψ) in the enzyme residues of the plot.

Fig. 6

(a) The 3D modelstructure of GgSQS1. Motif 1 and 2 are shown in yellow and pink colour, respectively. (b) 3D model validation of the model by using the Ramachandran plots (GgSQS1).

(a) The 3D modelstructure of GgSQS1. Motif 1 and 2 are shown in yellow and pink colour, respectively. (b) 3D model validation of the model by using the Ramachandran plots (GgSQS1). The total quality of the 3D model was assessed applying ProSA-web (z-score). The method of ProSA can accurately separate the total correct fold and those having incorrect folds [42]. The two significant outputs of this method include a plot of residue energies and Z-score. The ProSA Z-score of a model is the extent of similarity between its structure and sequence. It is preferred for the model to have the Z-score that can be compared to the template’s Z-score. The overall model quality Z-Score from ProSA-web was shown in Fig. 7a and b, where distinguishing colors are used to show the groups containing structures that are explained by NMR or X-ray. In addition, the Z-score of the model was −8.62, which is often observed in the range of scores of native proteins that have the same dimension. ProSA displayed quite similar overall model quality (Z-score) for homology model (−8.62) and template (−9.71). The plot of residue scores was showed in Fig. 7c. The local model quality can be shown by this plot of energy through maneuvering energies as a proposed to sequence position of amino acid. Generally, the inaccurate or problematical sections of the model is corresponded by positive values.

Fig. 7

(a,b) Z-score of PROSA energy indicating overall model quality and its value displaying in a plot that contained the Z-scores of all experimentally determined protein chains in current PDB (a: homology model of GgSQS1, b: template). (c) PROSA energy plot calculated for GgSQS1 model. The interaction energy for each residue of the structures was assessed by PROSA energy plot with the application of a distance-based pair potential residue with PROSA energies (negative), which confirmed the reliability of the model. After analyzing the model by PROSA, negative interaction energy of all residues was confirmed, where no residue had positive interaction energy. Finally, the model was also confirmed by superposition and the RMSD found to be 0.394 A0 (Fig. 8).

Fig. 8

Superposition of optimized homology model (Red) against the structure of selected template (1EZF) (yellow) by using Chimera.

Superposition of optimized homology model (Red) against the structure of selected template (1EZF) (yellow) by using Chimera. Usually, there are cavities and pockets created by active and bindings sites of proteins. Interior inaccessible cavities and surface accessible pockets for proteins are provided with measurements and recognition via CASTp. Predicted binding sites for GgSQS1 were M43-K46-V47-S48-R49-S50-F51-V54-I55-V66-F69-Y70-L73-R74-D77-T78-E80-D81-D82-T83-I85-A86-T87-K90-T112-K113-E114-T115-V117-L118-M147-M151-F154-I155-N157-E158-V159-T168-V172-A173-A176-G177-L180-S181-F184-S198-M201-G202-L205-Q206-T208-N209-I210-R212-D213-L215-E216-D217-E220-I221-T223-S224-R225-M226-F227-T273-F285-C286-I287-P288-Q289-M291-A292-T296-V306-G309-V310-K312-M313-R314-R315-T318 (Fig. 9).

Fig. 9

3D structure and predicted binding sites of GgSQS1, predicted by using CASTp servers. Green colors indicate the predicted active site residues of SQS.

Conclusion

In the present research, bioinformatics tools were applied to evaluate the SQS proteins from the Fabaceae family. The secondary structure prediction results showed that the amino acid sequence of all studying SQSs had α helix and random coil as the main components. Multiple sequence alignment showed that SQSs had a high identity with each other. The phylogenetic analysis of SQS demonstrated a difference between the sequences of Arabidopsis SQS and other studies Fabaceae species, which might be due to differences in the functions of SQS sequences in triterpenoid saponins or phytosterol pathways. According to the obtained information, SQS proteins were efficiently preserved in the structure and sequence both, which might be due to their biological function. The homology model of GgSQS1 was built using high homologous proteins. The generated model was subjected to various validation methods which proved that it held significant degree of authenticity. The model created with server of CASTp and template were structurally compared, which was followed by searching of the probable binding sites of SQS. In addition, the similar performance of SQS in squalene biosynthesis was proposed as a result of the high preserving of residues (included in the activity and binding for performance of SQS) in most SQSs. Valuable theoretical information can be provided by the physicochemical and structural data anticipated in the presented study in order to functionally assess this protein. Moreover, following biotechnological studies can use the results of the current research in order to manipulate the triterpenoid saponins biosynthesis pathway.

31 in total

1. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors: A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal: J Mol Biol Date: 2001-01-19 Impact factor: 5.469

2. The effect of net charge on the solubility, activity, and stability of ribonuclease Sa.

Authors: K L Shaw; G R Grimsley; G I Yakovlev; A A Makarov; C N Pace
Journal: Protein Sci Date: 2001-06 Impact factor: 6.725

3. Modeller: generation and refinement of homology-based protein structure models.

Authors: András Fiser; Andrej Sali
Journal: Methods Enzymol Date: 2003 Impact factor: 1.600

4. Structure validation by Calpha geometry: phi,psi and Cbeta deviation.

Authors: Simon C Lovell; Ian W Davis; W Bryan Arendall; Paul I W de Bakker; J Michael Word; Michael G Prisant; Jane S Richardson; David C Richardson
Journal: Proteins Date: 2003-02-15

5. UCSF Chimera--a visualization system for exploratory research and analysis.

Authors: Eric F Pettersen; Thomas D Goddard; Conrad C Huang; Gregory S Couch; Daniel M Greenblatt; Elaine C Meng; Thomas E Ferrin
Journal: J Comput Chem Date: 2004-10 Impact factor: 3.376

6. Overexpression of squalene synthase in Eleutherococcus senticosus increases phytosterol and triterpene accumulation.

Authors: Jin-Wook Seo; Jae-Hun Jeong; Cha-Gyun Shin; Seog-Cho Lo; Seong-Soo Han; Ki-Won Yu; Emiko Harada; Jeong-Yeon Han; Yong-Eui Choi
Journal: Phytochemistry Date: 2005-04 Impact factor: 4.072

7. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.

Authors: O Emanuelsson; H Nielsen; S Brunak; G von Heijne
Journal: J Mol Biol Date: 2000-07-21 Impact factor: 5.469

Review 8. Biosynthetic diversity in plant triterpene cyclization.

Authors: Dereth R Phillips; Jeanne M Rasbery; Bonnie Bartel; Seiichi Pt Matsuda
Journal: Curr Opin Plant Biol Date: 2006-04-03 Impact factor: 7.834

9. Enhanced triterpene and phytosterol biosynthesis in Panax ginseng overexpressing squalene synthase gene.

Authors: Mi-Hyun Lee; Jae-Hun Jeong; Jin-Wook Seo; Cha-Gyun Shin; Young-Soon Kim; Jun-Gygo In; Deok-Chun Yang; Jae-Seon Yi; Yong-Eui Choi
Journal: Plant Cell Physiol Date: 2004-08 Impact factor: 4.927

10. CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues.

Authors: Joe Dundas; Zheng Ouyang; Jeffery Tseng; Andrew Binkowski; Yaron Turpaz; Jie Liang
Journal: Nucleic Acids Res Date: 2006-07-01 Impact factor: 16.971

4 in total

1. In silico functional and structural characterization of hepatitis B virus PreS/S-gene in Iranian patients infected with chronic hepatitis B virus genotype D.

Authors: Nastaran Khodadad; Seyed Saeed Seyedian; Afagh Moattari; Somayeh Biparva Haghighi; Roya Pirmoradi; Samaneh Abbasi; Manoochehr Makvandi
Journal: Heliyon Date: 2020-07-15

2. In silico structural modeling and analysis of physicochemical properties of curcumin synthase (CURS1, CURS2, and CURS3) proteins of Curcuma longa.

Authors: R Santhoshkumar; A Yusuf
Journal: J Genet Eng Biotechnol Date: 2020-07-02

3. Molecular Cloning and Functional Identification of a Squalene Synthase Encoding Gene from Alfalfa (Medicago sativa L.).

Authors: Junmei Kang; Qiaoyan Zhang; Xu Jiang; Tiejun Zhang; Ruicai Long; Qingchuan Yang; Zhen Wang
Journal: Int J Mol Sci Date: 2019-09-11 Impact factor: 5.923

4. Structural and Molecular Characterization of Squalene Synthase Belonging to the Marine Thraustochytrid Species Aurantiochytrium limacinum Using Bioinformatics Approach.

Authors: Sachin Vyas; Maurizio Bettiga; Ulrika Rova; Paul Christakopoulos; Leonidas Matsakas; Alok Patel
Journal: Mar Drugs Date: 2022-02-28 Impact factor: 5.118

4 in total