Literature DB >> 32489302

Molecular cloning and in-depth bioinformatics analysis of type II ribosome-inactivating protein isolated from Sambucus ebulus.

Masoumeh Rezaei-Moshaei¹, Ali Bandehagh¹, Ali Dehestani², Ali Pakdin-Parizi², Majid Golkar³.

Abstract

Plant ribosome-inactivating proteins (RIPs) are N-glycosidases which inhibit protein synthesis through depurination of the ribosomal RNA sequence. Type II RIPs are heterodimer proteins which can bind to cell surfaces. The cytotoxicity of these RIPs is different. Sambucus spp. are a rich source of RIP proteins with different properties. In the present study, a type II RIP was isolated from S. ebulus plant that grows widely in the north of Iran, and different bioinformatics tools were used for the evaluation of physicochemical, functional and 3D protein characteristics. The results showed significant differences among isolated RIP and other Sambucus RIP proteins. The study of these differences can not only expand our insight into the functioning mechanisms of plant RIPs but also provide information about a novel RIP protein with potential biological applications.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: Physicochemical properties; Protein modeling; Recombinant RIP

Year: 2020 PMID： 32489302 PMCID： PMC7253926 DOI： 10.1016/j.sjbs.2020.02.009

Source DB: PubMed Journal: Saudi J Biol Sci ISSN： 2213-7106 Impact factor: 4.219

Introduction

Ribosome-inactivating proteins (RIPs) are toxins that are extensively distributed in nature but are mainly found in plants, bacteria, fungi, and some algae. These proteins specifically and irreversibly inhibit protein translation through binding to the large ribosomal subunit and cleaving the specific adenine base of 28S rRNA by N-glycosidase activity (Barbieri et al., 1993, Girbes et al., 2004, Stirpe, 2004). However, the biological role of RIPs in plants is unclear (Lee-Huang et al., 1990). Existing evidence suggests that they might be involved in protecting plants against insects (Arias et al., 1992), fungi (Munoz et al., 1991) and viruses (Merino et al., 1990). RIP proteins have been classified into three structural classes. Type I RIPs, such as saporins, dianthins, and beetins, are a single polypeptide chain (approximately 30 kDa) that strongly displays the N-glycosidase activity. Type II RIPs such as ricin and abrin are heterodimeric proteins (60–65 KDa) with an enzymatic A chain similar to type I RIPs linked by a disulfide bond to a slightly larger B chain with lectin properties (Lord et al., 1994). Transfer of functional A chain to host cells is more convenient than other RIP types due to B chain binding ability to sugar units of cell surface receptors (Mathews et al., 2007). Type III RIPs are composed of a polypeptide chain containing a type I RIP in the N-terminal and a C-terminal domain with unknown function (Jimenez et al., 2014). Type III RIPs are also synthesized as inactive precursors that are converted into an active form by proteolytic processing events (Peumans et al., 2001). Sambucus ebulus (dwarf elder) is an annual plant of the Adoxaceae family. This family of flowering plants includes about 190 species and four genera and is often dispersed in southern and central Europe, Northwest Africa, and Southwest Asia (Westwood, 1985). The S. ebulus extracts show high antioxidant, anti-inflammatory, anti-arthritic, anti-nociceptive, antimicrobial, and anticancer activities (Schwaiger et al., 2011, Tasinov et al., 2012, Tasinov et al., 2013). It has been suggested that part of the biological activities of plant extracts, e.g., anticancer properties, are possibly connected to RIP proteins (Benitez et al., 2005, Shokrzadeh et al., 2009). Sambucus species have a complex mixture of different types of RIPs and related lectins. The heterodimeric type II RIPs are present in the leaves (Girbes et al., 1993), rhizome (Citores et al., 1997) and fruits (Nikolov, 2007) of S. ebulus. Ebulin l, the first RIP isolated from the dwarf elder, is non-toxic as compared with the highly toxic ricin (Stripe et al., 1992, Girbes et al., 1993). The structure of ebulin l has been characterized by X-ray diffraction analysis and it is very similar to type I RIPs and other type II RIPs such as ricin (Pascal et al., 2001). Due to the diverse biological activities of RIPs, extensive research has been done to investigate their use as antiviral and antitumor agents (Zhu et al., 2016, Sipahioglu et al., 2017). The use of RIPs as part of the conjugates are the most promising applications of RIPs in medicine, especially in cancer therapy. In these conjugates, the enzymatic RIPs are attached to tumor-targeting ligands or antibodies that cause their binding and entry into malignant cells. Since the drugs of plant source are more efficient and more cost-effective than the synthetic drugs, new drug substitutes isolated from plants should be identified and designed to produce drugs with insignificant side effects (Kumar et al., 2010). In this regard, a RIP II gene was isolated from native S. ebulus plant in the north of Iran, and its primary and secondary structures, physico-chemical and functional properties were evaluated by various bioinformatics tools. The obtained results can be useful for a comprehensive understanding of the nature and mechanism of action of RIP proteins as well as the accurate design of new drug substitutes from plant sources.

Material and methods

Plant material and gene cloning

Sambucus ebulus plants were collected from the north of Iran (36°39.5′N 53°4.3′E) during July 2017. Fresh leaves were finely grinded and used for DNA extraction according to a CTAB method (Doyle and Doyle, 1987). Forward (5′-ATGATAGACTATCCCTCCGTC-3′) and degenerate reverse primers (5′-CTAAMTTKGARGGACTTGTGT-3′□) were designed according to the deposited RIP gene sequences in the GenBank, NCBI (AJ400822.1, AF249280.1, U41299.1 and AF409135.1), using the Primer premier software. PCR amplifications were carried out with the following cycling parameters: 35 cycles of 94 °C for 1 min, 68 °C for 1 min, and 72 °C for 2 min with a final extension at 72 °C for 10 min. PCR products were subjected to electrophoresis in a 1% (w/v) agarose gel, DNA fragments of the expected length were extracted and purified with High Pure PCR Product Purification Kit (Cat. No. 11732668, Roche). The purified DNA fragments were cloned into pTZ57R/T vector (Thermo Scientific, InsTAclone PCR Cloning Kit) and were sequenced by universal M13 primers.

Nucleotide sequence accession number

The RIP II gene sequence obtained in this study is available in the GenBank under accession number MH053462. The isolated RIP gene was named as pebulin, in which p at the beginning of the phrase stands for Persian.

Gene ontology

For gene ontology and gene annotation of EBULIN protein, the keyword “ribosome-inactivating protein” is searched as gene product in QuickGO (ebi.ac.uk/QuickGO/). The ancestor chart of related GO term was also obtained.

Conserved domains search

Using the NCBI conserved domain database (CDD, National Institutes of Health, Maryland, USA), identification of conserved domains within pebulin was done.

Primary protein structure and physicochemical properties analysis

Nucleotide sequence of isolated gene was translated to a protein sequence based on standard genetic code using ExPASy translate tool. The sequence similarity search was done by the blastp algorithm, BLAST tool (blast.ncbi.nlm.nih.gov). Sequences of Sambucus spp. RIP II proteins were retrieved from UniProt database (Q9AVR2, Q41358, P33183 and O22415). The primary structure analysis and physicochemical characterization of protein were conducted by Expasy ProtParam tool (http://expasy.org/tools/protparam.html) (Gasteiger et al., 2005).

Protein solubility upon overexpression in E. coli

Probability of protein solubility upon overexpression in E. coli was predicted by SOLpro (http://scratch.proteomics.ics.uci.edu/) tool (Magnan et al., 2009).

Subcellular localization and transmembrane protein prediction

The presence of the N-terminal pre-sequences in the protein sequence was investigated by TargetP 1.1 (http://www.cbs.dtu.dk/services/TargetP/) to predict the subcellular location of the protein (Emanuelsson et al., 2007). Furthermore, TMHMM Server v. 2.0 (http://www.cbs.dtu.dk/services/TMHMM/) is used for the prediction of transmembrane helixes in the protein (Krogh et al., 2001).

Disulfide bond prediction

DIpro (http://scratch.proteomics.ics.uci.edu/) a cysteine disulfide bond predictor was used for prediction of the sequence disulfide bonds, estimation the number of disulfide bonds, and prediction the bonding state of each cysteine and the bonded pairs (Cheng et al., 2006).

Post-translational modification prediction

Different post-biosynthesis protein modifications, including ubiquitination, phosphorylation, methylation, and glycosylation were predicted in the pebulin. For prediction of potential ubiquitination sites in the pebulin protein, UbPred tool is used (Radivojac et al., 2010). NetPhos 3.1 server is used to predict phosphorylation sites at threonine, serine and tyrosine residues of the protein (Blom et al., 1999). Prediction of potential arginine methylation sites, and glycosylation is done using PRmePRed tool (bioinfo.icgeb.res.in/PRmePRed) (Kumar et al., 2017) and NetNGlyc 1.0 server (cbs.dtu.dk/services/NetNGlyc/) (Gupta et al., 2004), respectively.

Secondary protein structure analysis

The secondary structure of RIPII protein chains is predicted using GOR IV tool (Garnier et al., 1998), and the percentage of alpha helices, extended strand, beta-turn, and random coils were determined.

Protein modeling

Based on the availability of S. ebulus protein structure as a template, SWISS-MODEL server (https://swissmodel.expasy.org/) is used for built pebulin protein models. Validation of predicted models is checked by generated Ramachandran plots.

Protein-ligand interactions

The Protein-Ligand Interaction Profiler (PLIP) is used for the analysis of non-covalent interactions (hydrogen bonds, water bridges, salt bridges, halogen bonds, hydrophobic interactions, π-stacking, π-cation interactions, metal complexes) in protein-ligand complexes from custom pebulin PDB file (Salentin et al., 2015).

Protein chains interaction prediction

For analysis of protein chains interactions, COCOMAPS tool is applied (molnac.unisa.it/BioTools/cocomaps/) (Vangone et al., 2011). Threshold distance to select interacting residues was set as the default value (8 Å).

Prediction of geometric and topological properties of protein structures

Computed Atlas of Surface Topography of proteins (CASTp) tool is used for locating, delineating, and measuring the geometric and topological properties of the pebulin protein structure (Wei et al., 2018).

Phylogenetic analysis

Similarity comparison of each protein chain sequence separately was done by DELTA-BLAST algorithm in protein databank (PDB) database. Sequences were producing significant alignments with E-value better than the threshold (0.05) after several iterations were selected and MSA was done by Clustal Ω program. Phylogenetic analyses and tree construction were conducted with MEGA 6.

Results and discussion

Gene cloning and sequence analysis of the pebulin gene

The pebulin gene (1641 bp) was isolated from S. ebulus leaves using PCR with specific primers. The purified PCR product was cloned into pTZ57R/T vector and transformed into E. coli DH5α competent cells. The accuracy of cloning was confirmed by restriction enzyme digestion and colony PCR. The results of PCR and digestion are shown in Fig. 1. After sequencing the desired gene with the M13 universal primers, the validated full- length sequence of the pebulin gene was submitted to the GenBank database (MH053462). The obtained sequence was used for subsequent bioinformatics analyses.

Fig. 1

(a) PCR product of pebulin gene. Lane 1: GeneRuler 1 kb DNA Ladder (Thermo Fisher Scientific); Lane 2: pebulin gene amplicon with predicted size, 1641 bp. (b) Gel electrophoresis analysis of recombinant plasmids by BamHI/XhoI restriction enzyme digestion on a 1% agarose gel. Lane 1: GeneRuler 1 kb DNA Ladder; Lanes 2, 3: digested pTZ-pebulin recombinant plasmid; Lanes 4, 5: undigested pTZ-pebulin recombinant plasmid. Two GO terms were obtained by searching the “ribosome- inactivating protein” as the gene product in QuickGO. Generally, negative regulation of translation is the biological process of RIP (GO:0017148) that prevents and reduces protein synthesis rate. Another GO term (GO: 0030598) indicated the molecular function of the RIP, which catalyzes the hydrolysis of an N-glycosidic bond at adenine base in 28S rRNA. As shown in Fig. 2, GO:0017148 is related to the negative regulation of biological processes such as cellular protein metabolic process (GO:0032269), cellular macromolecule biosynthetic process (GO:2000113) and gene expression (GO:0010629).

Fig. 2

Simplified ancestor chart for pebulin (ebulin isolated from Persian S. ebulus), GO: 0017148.

Protein conserved domain and chain determination

The pebulin A and B chains amino acid sequences were obtained by comparing the protein sequence with the previously determined S. ebulus type II RIP chains (UniProtKB: Q9AVR2) based on the X-Ray crystal structure determination method (Fig. 3). The length of A and B chains was 254 and 267 aa, respectively, and there is a 24 aa intermediate sequence between the pebulin chains (Table 1). Analyses of the conserved region using CDD suite showed that all the three type II RIPs of Sambucus spp. shared similar conserved domains with identical locations including a domain on A chain belonging to RIP superfamily (cl08249) and two domains on B chain from Ricin-B-lectin superfamily (cl26069). The reported results predict that these proteins have been highly conserved and have the same activities. The schematic representation of pebulin conserved domains is shown in Fig. 4.

Fig. 3

Structure of ebulin (S. ebulus) RIP II protein chains (UniProtKB: Q9AVR2).

Table 1

The sequence and length of pebulin protein chains.

Chain		length	sequence
A		254	IDYPSVSFNLTGAKWTTYRDFIKDLRQIVANGTYEVNGLPVLRRENEVQEKNRFVLVLLTNYNGDTVTLAVDVTNLYVVAFMANGTSYFFNDTTPLERNNLFRETTQHILPYTGNYEHLERAARSTRESTNLGPDPLDEAITTLWYNGSIARSLLVVIQMVSEAARFGYIEQEIRRSIRKQVCFTPSALMLSMENNWSSMSLEVQQSGDNVSPFSGTVQLQNYNHTLRLVDNFEELYQITGIAILLFRCVSPRS
B	B1	136	DGETCPVAASFTKRISGGRDGLCVDVRNGYDTDGTPIQLFPCGSEKNQQWTFYKDGTTRSMGKCMTANGLNSGSSIMTFNCDTAVENATKWALPIDGSIINPSSGRVITAPSAASRTTLLLDNNIHAASQGWTVSN
B	B2	131	DVQPIVTSIVGYNETCLQANGENNRVWMEDCEITSLQQQWVLFGDRTIRVNSDRGLCVTSNGYSSKDLIIILKCQGLASQSWLFNSDGTIVNLNATLVMDVKQSDVSLRQIIIVPPTGNPNQQWRTQVPQI
Intermediate sequence		24	SSSYCNDKALRMPLVLAGEDNKYN
Signal peptide		25	MRVVKAAMLYLHIVVLAIYSVGIQG

Fig. 4

Conserved domains of pebulin protein.

Structure of ebulin (S. ebulus) RIP II protein chains (UniProtKB: Q9AVR2). The sequence and length of pebulin protein chains. Conserved domains of pebulin protein.

Gene and protein sequence identity

The identities of gene and protein sequences from Sambucus spp. Type II RIPs are shown in Table 2. The protein sequence identity of pebulin with S. ebulus (CAC33178.1) and S. nigra (AAC15886.1) protein sequences was 78.1% and 79.8%, respectively. The identity of pebulin with S. nigra B chain sequences was 81.2%, and the highest similarity was observed between B1 sub-chains (83.8%) compared to B2 sub-chains (78.6%). These sequence differences determine the specificity of each RIP and are the main reason for the contrasting results reported for RIPs in previous research. The ribosome-binding specificity of each RIP and the differences between ribosomes among various species determine the efficiency of RIP in blocking the ribosome activity (Domashevskiy and Goss, 2015).

Table 2

The nucleotide and amino acid sequence identity of Sambucus spp. Type II RIPs and sequence identity comparison of protein chains.

Sequence	Pebulin		S. ebulus		S. nigra
Sequence	Nucleotide	Protein	Nucleotide	Protein	Nucleotide	Protein
PebulinQAU19548.1	ID	ID	0.882	0.781	0.895	0.798
S. ebulusCAC33178.1	0.882	0.781	ID	ID	0.904	0.840
S. nigraAAC15886.1	0.895	0.798	0.904	0.840	ID	ID

Chain	Pebulin protein		S. ebulus protein		S. nigra protein
Chain
A	B (B1, B2)	A	B (B1, B2)	A	B (B1, B2)

pebulin	ID	ID	0.771	0.790(0.794,0.786)	0.783	0.812(0.838, 0.786)
S. ebulus	0.771	0.790(0.794, 0.786)	ID	ID	0.889	0.793(0.851, 0.732)
S. nigra	0.783	0.812(0.838, 0.786)	0.889	0.793(0.851, 0.732)	ID	ID

The nucleotide and amino acid sequence identity of Sambucus spp. Type II RIPs and sequence identity comparison of protein chains. The primary structure analysis and physicochemical properties of pebulin and other RIP II genes are shown in Table 3. The estimated half-life of all proteins was similar within 20 h (mammalian reticulocytes, in vitro), 30 min (yeast, in vivo) and >10 h (Escherichia coli, in vivo).

Table 3

Comparison of physicochemical properties of Sambucus spp. type II RIPs.

Protein	No. aa	MW (KDa)	The. pI	Asp + Glu	Arg + Lys	Ext. coefficient*M⁻¹ cm⁻¹Abs 0.1%	Ext. coefficient**M⁻¹ cm⁻¹Abs 0.1%	Instability index	Aliphatic index	GRAVY
Pebulin	521	58.020	5.19	51	42	79465, 1.37	78840, 1.359	34.99	85.83	−0.245
S. ebulus	520	57.792	5.12	53	44	79465, 1.375	78840, 1.364	32.36	84.69	−0.242
S. nigra	520	57.649	5.69	47	42	73965, 1.283	73340, 1.272	37.76	84.69	−0.193

Assuming all pairs of Cys residues form cysteines.

Assuming all Cys residues are reduced.

Comparison of physicochemical properties of Sambucus spp. type II RIPs. Assuming all pairs of Cys residues form cysteines. Assuming all Cys residues are reduced. The relative volume of a protein occupied by its aliphatic side chains (alanine, valine, isoleucine, and leucine) is defined as aliphatic index, which plays an important role in determining protein thermostability (Panda and Chandra, 2012). Based on an aliphatic index, it is predicted that pebulin is more thermostable than the other known RIPs. The high aliphatic index of the studied RIPs (84.69 and 85.83) indicated that these proteins have high amounts of aliphatic side chains; hence they are thermally stable and could be used at high temperatures and maintain their activities. Thermostability of pebulin can also be attributed to the greater number and different position of disulfide bonds in pebulin compared to other Sambucus spp. RIP proteins (shown in Table 4) (Khoo and Norton, 2011).

Table 4

Predicted disulfide bonds in Sambucus spp. RIP II proteins.

Protein	Number of disulfide bonds	Positions
Pebulin	5	249–259, 277–296, 318–335, 406–421, 447–464
S. ebulus/S. nigra RIPs	4	249–259, 276–295, 405–420, 446–463

Predicted disulfide bonds in Sambucus spp. RIP II proteins. In a study, the thermal stability of abrin II, a heterodimeric RIP isolated from Abrus pulchellus, and its subunits were evaluated under different pH and ligand binding conditions, using differential scanning calorimetry (DSC). Two peaks were observed in DSC scan of abrin II, which may be attributed to the presence of 2 different entities which are unfolded at various temperatures (Krupakar et al., 1999). All Sambucus spp. RIP proteins included in this study showed negative GRAVY values; hence they are hydrophilic and can easily interact with water. However, the GRAVY value of S. nigra is lower than other proteins. RIP of S. nigra had the highest pI (5.69) and the pI values of pebulin (5.19) and S. ebulus RIP (5.12) were almost similar. Knowing the calculated isoelectric point (pI) will be essential for the development of buffering systems in the purification process. The values of instability index indicated that almost all RIP proteins are stable in nature (instability index < 40). The total number of negatively charged residues (Asp + Glu) was different among Sambucus spp. RIPs, as the highest number (53) and the lowest number (47) were observed in S. ebulus, and S. nigra, respectively. The highest number of positively charged residues (Arg + Lys) was in S. ebulus RIP and the number of these residues was similar in pebulin and S. nigra.

Protein solubility upon overexpression in Escherichia coli

There is a high demand for pure, soluble, and functional proteins in industry, food production, agriculture, and bioengineering. So, high-level production of soluble recombinant proteins in bacterial expression hosts such as E. coli has great importance (Rosano and Ceccarelli, 2014). Prediction of protein solubility upon overexpression in E. coli helps to employ appropriate strategies for correct protein folding and handle protein misfolding (Baneyx and Mujacic, 2004). In the present study, the SOLpro tool was used to predict the solubility of recombinant RIP proteins from Sambucus spp., overexpressed in E. coli. The results indicated that pebulin, ebulin and S. nigra RIP proteins are insoluble with the probability of 0.72, 0.6 and 0.53, respectively. In some cases, the growth of the host bacterial cell was remarkably affected by expression of a recombinant toxin in E. coli, which might have resulted from depurination of ribosomes (Chaddock et al., 1994). Using E. coli strains specifically selected for toxic protein expression and modifying the expression conditions can be implemented to overcome this issue (Zuppone et al., 2019). The mature α-luffin (a type I RIP from Luffa cylindica seeds) gene linked with SUMO fusion tag was cloned, and the recombinant α-luffin was successfully expressed in E. coli. To prevent the toxic effect of α-luffin on the host expression, a rhamnose promoter was used to tightly control the protein expression. The high level of soluble expression of α-luffin was achieved by reducing the induction temperature, fed-batch cultivation system, and SUMO fusion tag (Namvar et al., 2018). The curcin (a type I RIP from Jatropha curcas) was expressed in E. coli strain M15, and the target protein was produced in the form of inclusion bodies (Luo et al., 2007). Likewise, the unglycosylated A-chain sequence of MLI (a type II RIP) was expressed in BL21 strain of E. coli. The resulting recombinant protein was insoluble, and further purification and refolding of recombinant MLI led to a pure and homogeneous protein species with an apparent molecular mass of 27 kDa and a pI value of 6.4 (Eck et al., 1999).

Subcellular prediction

Based on the TMHMM results, a transmembrane helix with the 94.65% probability was predicted in the N-terminal of the pebulin sequence. The expected number of amino acids in transmembrane helix is 21.08 amino acids. The results were re-evaluated by TargetP server using PLANT networks that showed the sequence contains a signal peptide with the 0.989 score that was used by the secretory pathway and membrane translocation of the protein. The sequences with a score higher than 0.5 are considered as putative signal peptides. Thus, sequences that have a score above 0.7 are more probable to do the same activity (Petersen et al., 2011).

Disulfide bonds prediction

Disulfide bonds play an important role in protein folding and structural stability. Predicting accurately disulfide bonds from protein sequences is important for modeling the structural and functional characteristics of proteins. Five disulfide bonds were predicted in pebulin protein by Dipro algorithm. In all Sambucus spp. Type II RIPs, the inter-chain disulfide bond (between 249 and 259 Cys residues) that links A chain to the B chain was identical. However, the number and position of intra-chain disulfide bonds in pebulin B chain were more than those in S. ebulus and S. nigra RIP proteins (Table 4). Considering the fact that the formation of disulfide bonds is often the main part of the folding pathway of a protein and their position in the protein structure may also influence its role in stabilizing or folding of the protein. The differences among Sambucus RIPs may lead to an alternative folding of pebulin protein. Moreover, disulfide bonds maintain protein integrity and protect proteins from damage by stabilizing the tertiary and/or quaternary structures of proteins, increase the thermodynamic stability, making them less susceptible to denaturation and degradation and more resistant to high temperature and pH (Khoo and Norton, 2011). Proteins with high sequence identity do not always exhibit the same disulfide bond pattern. In a study on families of proteins, the conservation of disulfide bonds in homologous protein domains was investigated, and only 54% conservation was found, showing a poor relationship between sequence identity and disulfide bond conservation (Thangudu et al., 2008). In general, disulfide bonds appear to play important structural or functional roles and are well conserved in proteins with similar folding (Khoo and Norton, 2011). However, it has been shown that some disulfide bonds have no direct role in protein’s function, and removing them may have minor effects on the structure and function of the protein. In many cases, the removal of a disulfide bond has been accompanied by local changes in the protein structure, but the folding has remained unchanged (Khoo and Norton, 2011).

Post-translational modifications prediction

Ubiquitination site prediction

Attachment of ubiquitin to lysine residues in proteins (ubiquitination) is one of the most important reversible post-translational modifications that controls the activity of proteins (Peng et al., 2003). Furthermore, protein ubiquitination plays significant roles in various biological functions, immune response, signal transduction, DNA repair, receptor modulation, and transcription regulation (Ebner et al., 2017). Potential ubiquitination sites in Sambucus spp. RIP II proteins were evaluated by UbPred tool. Based on the results, a ubiquitination site at position 300 with score 0.67 was predicted, which falls in the range of low confidence probabilities (0.62 < s < 0.69). No possible ubiquitination sites were detected in the S. ebulus and S. nigra RIP proteins.

Phosphorylation site prediction

Protein phosphorylation plays important roles in biological processes, e.g., metabolism regulation, proliferation, differentiation, motility, survival and death, apoptosis, and subcellular trafficking (Wang et al., 2014). This type of post-translational modification mainly takes place on serine and threonine residues of eukaryotic proteins, whereas tyrosine phosphorylation is less abundant compared to the other PTMs (van Bentem and Hirt, 2009). As expected, the number of potential Ser and Ther phosphorylation sites (26 and 18 sites, respectively) in pebulin were greatly higher than Tyr phosphorylation sites (7 sites). The results of phosphorylation site prediction are shown in Fig. 5. It is indicated that the functional effects of phosphorylation in a protein are site-dependent and they are functional if phosphorylation takes place on a specific site, not random (Olsen et al., 2006).

Fig. 5

Predicted phosphorylation sites in pebulin protein sequence isolated from Iranian S. ebulus. (NetPhos 3.1a).

Arginine methylation sites prediction

Based on the PRmePRed tool results, 11 arginine methylation sites were predicted in pebulin, four sites of which were in chain A and seven sites in chain B. The position and score of predicted arginine methylation sites are shown in table 5. The predicted arginine methylation sites in S. ebulus and S. nigra RIP proteins were 8 and 10, respectively.

Table 5

Arginine methylation sites prediction of pebulin.

SeqId		R site	Peptides	Prediction Score
A Chain		121	YTGNYEHLERAARSTREST	0.561527
		124	NYEHLERAARSTRESTNLG	0.776309
		127	HLERAARSTRESTNLGPDP	0.866791
		248	ITGIAILLFRCVSPRSXXX	0.540343

B Chain	B1	14	CPVAASFTKRISGGRDGLC	0.879204
		19	SFTKRISGGRDGLCVDVRN	0.869622
		27	GRDGLCVDVRNGYDTDGTP	0.74138
		59	WTFYKDGTTRSMGKCMTAN	0.569186
		106	GSIINPSSGRVITAPSAAS	0.911514
		116	VITAPSAASRTTLLLDNNI	0.667303
	B2	54	DRTIRVNSDRGLCVTSNGY	0.792351

Arginine methylation sites prediction of pebulin. Post-translational methylation at arginine residues significantly influences the structure and functions of the protein by changing the bulkiness and hydrophobicity of modified residues. Hence, it modulates a myriad of essential biological processes, including transcriptional regulation, RNA metabolism, DNA repair, signal transduction, protein sorting, and apoptosis, etc. (Ahmad et al., 2011). PRMTs (Protein arginine methyltransferases) methylate a large number of essential proteins and a wide variety of substrates, most of which are either RNA binding proteins or proteins involved in transcription such as histones, transcription factors and a number of proteins involved in RNA processing, transport, and translation. RNA methylation as an epigenetic mechanism occurs in different RNA species including tRNA, rRNA, mRNA, tmRNA, snRNA, snoRNA, miRNA, and viral RNA, affecting several biological processes such as RNA stability and mRNA translation (Wang et al., 2014, Wang et al., 2015, Ji and Chen, 2012, Dev et al., 2017), through a variety of RNA methyltransferases.

N-glycosylation sites prediction

Using NetNGlyc 1.0 server, eight potential N-glycosylation sites were predicted in pebulin protein, while only one glycosylation site was predicted in S. ebulus RIP protein (Fig. 6). More potential motifs were found in pebulin than in S. ebulus RIP protein, and the position of potential glycosylation sites was different between the two proteins. In eukaryotes, N-glycosylation takes place in the endoplasmic reticulum and/or the Golgi apparatus on secreted or membrane-bound proteins. N-linked glycosylation alters the structure and function of the eukaryotic proteins (Schwarz and Aebi, 2011). N-glycans can also affect protein stabilization and folding acceleration through an intrinsic chemical mechanism (Glozman et al., 2009). Many studies have revealed that glycoproteins are more stable than their corresponding unglycosylated counterparts, even when there are no major structural changes associated with glycosylation (Imperiali and O’Connor, 1999). It is shown that N-linked glycans may also facilitate the formation of a key segment of a secondary structure, serving to enhance the overall stability of the protein or to enable some specific function performed around the attached site (Qi et al., 2014). Accordingly, the higher potential N-glycosylation sites in pebulin in comparison with S. ebulus RIP may enhance the reversibility of the unfolding process and improve protein stabilization.

Fig. 6

N-Glycosylation sites prediction in pebulin (a) and S. ebulus RIP protein (b). The position and score of predicted sites are shown adjacent of each potential glycosylation site.

Secondary structure prediction

GOR IV method was applied to predict the secondary structures (SS) of Sambucus spp. RIP II proteins. The result of SS prediction of pebulin is shown in Fig. 7 and Fig. 8. The pebulin protein is comprised of alpha helix, beta-sheet, and random coil with 18.43%, 28.6%, and 52.98%, respectively (Table 6). Furthermore, the distribution and percentage of different secondary structures in the Sambucus spp. RIP II proteins are compared and shown in Fig. 9 and Table 6, respectively. GOR IV analysis revealed that random coil has the maximum amount in all the proteins studied. The random coil is not a true secondary structure but is the class of conformations that indicate an absence of regular secondary structure.

Fig.7

Secondary structure prediction of pebulin protein.

Fig. 8

Detailed secondary structure of pebulin protein chains.

Table 6

Secondary structure prediction of Sambucus sp. RIP II proteins by GOR IV method.

	Pebulin		S. ebulus		S. nigra
Structure	aa No.	%	aa No.	%	aa No.	%
Alpha helix	96	18.43%	98	18.18%	105	19.52%
Extended strand	149	28.60%	177	32.84%	157	29.18%
Random coil	276	52.98%	264	48.98%	276	51.30%

Fig. 9

Comparison of Secondary structure of Sambucus sp. RIP II proteins. (a) pebulin, (b) S. ebulus, and (c) S. nigra.

Secondary structure prediction of pebulin protein. Detailed secondary structure of pebulin protein chains. Secondary structure prediction of Sambucus sp. RIP II proteins by GOR IV method. Comparison of Secondary structure of Sambucus sp. RIP II proteins. (a) pebulin, (b) S. ebulus, and (c) S. nigra. The SWISS-MODEL template library (SMTL version 2019-06-27, PDB release 2019-06-21) was searched with BLAST (Camacho et al., 2009) and HHBlits (Remmert et al., 2012) for evolutionary related structures matching the templates to the target sequences. Among 144 templates found, 1hwm.1 was selected as the most appropriate template (Table 7) and used for protein model building. The built model was a hetero-1-1-mer, 1 × NAG: N-ACETYL-D-GLUCOSAMINE ligand with GMQE: 0.93 and QMEAN: −1.65. The Schematic representation of the model is shown in Fig. 10. The folding of the pebulin B chain is very similar to type II RIP B-chains and lectins and is composed of two beta domains (I and II) (Ferreras et al., 2010).

Table 7

The structural template used for protein modeling.

Template	Seq Identity	Oligo-state	QSQE	Found by	Method	Resolution	Seq Similarity	Coverage	Description
1hwm.1	78.27	Hetero-1-1-mer	0.78	BLAST	X-ray	2.80 Å	0.54	1.00	EBULIN; EBULIN

Fig. 10

Schematic representation of pebulin protein. The chains are shown in different colors, Chain A (green) and Chain B (blue).

The structural template used for protein modeling. Schematic representation of pebulin protein. The chains are shown in different colors, Chain A (green) and Chain B (blue).

Model quality evaluation

The results of structure assessment of the protein model were shown by Ramachandran Plot (Fig. 11) and MolProbity Results (Table 8). Ramachandran plot visualizes the backbone dihedral angels (Psi against Phi) of amino acid residues in the protein structure, and MolProbity gives information about problems in a 3D model of the protein. There is no Gly in Ramachandran outliers, but a Pro residue (265) in chain B is among outliers. Although Ramachandran outliers are the consequence of mistakes during the data processing, (Ramachandran and Sasisekharan, 1968) sometimes Ramachandran outliers might play a unique role in the function (Lenfant et al., 2013).

Fig. 11

General Ramachandran Plot of pebulin chains structure assessment. The Darkest region is the most favored region consisting of more than 90% of residues.

Table 8

The result of the model quality evaluation of pebulin obtained from MolProbity 4.4 tool.

MolProbity Score	2.38
Clash Score	3.37
Ramachandran Favored	92.77%
Ramachandran Outliers	0.98%	B265 PRO, B4 THR, A104 GLU, B87 ASN, B148 TYR

Rotamer Outliers	10.96%	B89 THR, B135 SER, B149 ASN, B37 ILE, A221 GLN, B77 MET, B70 LEU, B39 LEU, B112 SER, A215 SER, A163 GLU, A249 CYS, B63 LYS, B204 LEU, A128 GLU, A92 ASP, B261 ARG, B134 VAL, B116 ARG, B103 SER, A29 VAL, A175 ARG, A230 VAL, A220 LEU, B120 LEU, A173 GLU, B227 VAL, B248 ILE, B229 LEU, A152 ARG, B15 ILE, A137 LEU, B192 LEU, B266 GLN, A96 LEU, B61 MET, B71 ASN, B242 VAL, A154 LEU, A202 LEU, A23 LYS, A44 ARG, A47 GLU, A49 GLN, A219 GLN, A245 LEU, B38 GLN, B90 LYS, B100 ILE, B238 LYS

C-Beta Deviations	4	B232 THR, B19 ARG, B116 ARG, A251 SER
Bad Bonds	1/4136	B187 ASN-_1 NAG
Bad Angles	19/5638	(B187 ASN-_1 NAG), A92 ASP, (B266 GLN-B267 ILE), (A212 SER-A213 PRO), (B264 VAL-B265 PRO), B250 VAL, B57 THR, (A135 ASP-A136 PRO), (B40 PHE-B41 PRO), A93 THR, B77 MET, (B93 LEU-B94 PRO), (B251 PRO-B252 PRO), (B98 SER-B99 ILE), (A185 THR-A186 PRO), B108 ILE, B126 HIS, A118 HIS

General Ramachandran Plot of pebulin chains structure assessment. The Darkest region is the most favored region consisting of more than 90% of residues. The result of the model quality evaluation of pebulin obtained from MolProbity 4.4 tool.

Ligand modeling

The Protein-Ligand Interaction Profiler (PLIP) tool is used for the analysis of non-covalent interactions in protein-ligand complexes from the predicted pebulin protein model. The results of protein-ligand interaction prediction are shown in Table 9. Accordingly, among different ligands, only N-acetyl-D-glucosamine was added to the pebulin protein. The interaction of ligand with amino acid residues and spatial ligand position on the proteins are shown in Fig. 13, Fig. 14, respectively. The N-acetyl-D-glucosamine molecule was attached to ASN187B and ARG190B by hydrogen bonds in pebulin, whereas this molecule was attached to four different amino acid residues by hydrogen bond (ASP24B, GLN37B, ASN46B and ARG115B) in S. ebulus RIP protein (Fig. 12, Fig. 13).

Table 9

Prediction of Ligand- pebulin interactions.

Ligand	Added to Model	Description
NAG	✓	N-ACETYL-D-GLUCOSAMINE
BMA	✕ – Not in contact with the model.	BETA-D-MANNOSE
GAL	✕ – Binding site not conserved.	BETA-D-GALACTOSE
MAN	✕ – Not in contact with the model.	ALPHA-D-MANNOSE

Fig. 13

Representation of ligand and protein interaction, (a) S. ebulus RIP and (b) pebulin.

Fig. 14

(a) Contact map of distance ranges and (b) Contact map of property between pebulin protein chains.

Fig. 12

Protein-Ligand Interaction of (a) pebulin and (b) S. ebulus RIP protein with N-ACETYL-D-GLUCOSAMINE. The number of amino acid residues in the PDB file of protein B chain are indicated.

Prediction of Ligand- pebulin interactions. Protein-Ligand Interaction of (a) pebulin and (b) S. ebulus RIP protein with N-ACETYL-D-GLUCOSAMINE. The number of amino acid residues in the PDB file of protein B chain are indicated. Representation of ligand and protein interaction, (a) S. ebulus RIP and (b) pebulin. (a) Contact map of distance ranges and (b) Contact map of property between pebulin protein chains.

Interactions between protein chains

Accessible surface area and the interface statistics between the protein chains are shown in Table 10, Table 11, respectively. Based on the results, the amino acids that contributed in salt bridge formation are constant in both pebulin and S. ebulus RIP proteins and Cys249A and Cys5B that contributed to inter-chain sulfide bond are conserved. There are three substitutions in amino acids that contributed in H bond formation, Gly instead of Arg in position 168, Gln instead of Lys in position 238 in A chain, and Lys instead of Arg in position 13 in B chain.

Table 10

Accessible surface area table for pebulin and S. ebulus RIP protein chains.

Title	Pebulin	S. ebulus
Buried area upon the complex formation (Å²)	3333.7	3251.6
Buried area upon the complex formation (%)	13.84	14.14
Interface area (Å²)	1666.85	1625.8
Interface area Chain A (%)	14.14	14.71
Interface area Chain B (%)	13.55	13.61
POLAR Buried area upon the complex formation (Å²)	1891.8	1898.9
POLAR Interface (%)	56.75	58.40
POLAR Interface area (Å²)	945.9	949.45
NON-POLAR Buried area upon the complex formation (Å²)	1441.9	1352.6
NON-POLAR Interface (%)	43.25	41.60
NON-POLAR Interface area (Å²)	720.95	676.3
Residues at the interface_TOT (n)	86	83
Residues at the interface_Chain A	43	41
Residues at the interface_Chain B	43	42

Table 11

Interaction summary statistics of pebulin and S. ebulus RIP chains.

Title	Pebulin	S. ebulus RIP
Number of interacting residues Chain A	77	76
Number of interacting residues Chain B	79	80
Number of hydrophilic-hydrophobic interaction	208	205
Number of hydrophilic-hydrophilic interaction	176	170
Number of hydrophobic-hydrophobic interaction	59	58
Number of salt bridges	3	3
Number of hydrogen bonds	14	15
Number of disulfide bonds	1	1

Accessible surface area table for pebulin and S. ebulus RIP protein chains. Interaction summary statistics of pebulin and S. ebulus RIP chains. In Fig. 14a., the residue distances between A and B chains are shown with color codes. The region of 150–250 aa of chain A is closer than 16 Å to the region of 100–250 aa of Chain B. The highest amount of interactions between chain A and chain B of pebulin are hydrophilic- hydrophobic interactions (Fig. 14b and Table 11). In Fig. 15, the contact maps of pebulin and S. ebulus RIP are compared. There is no significant difference between the contact position and the type of residue interaction in pebulin and another RIP. However, around position 175 of S. ebulus chain, some more hydrophilic-hydrophilic interactions are observed in comparison with pebulin. The C-terminal (200–250 residues) of both chain A RIP proteins have the highest hydrophobic-hydrophobic interactions with the two regions of B chain N-terminal, consisting of residues around position 10 and 100. On the whole, the results show that residue interactions in both proteins are reasonably similar; however, there are differences between hydrophilic-hydrophobic and hydrophobic-hydrophobic interaction numbers which may lead to distinctive protein functions and are a good starting point to define the protein interaction network.

Fig. 15

Contact map of (a) pebulin and (b) S. ebulus RIP protein chains.

Geometric and topological properties of the protein structure

The function of proteins is fundamentally related to the geometric and topological properties of protein structures, including surface empty concavities, internal cavities, and channels (Edelsbrunner et al., 1998). The detailed secondary structure of pebulin and the probable functional sites are shown in Fig. 16. Pockets are empty concavities on the protein surface into which the solvent can gain access, i.e., these concavities have mouth openings connecting their interior with the outside bulk solution. Three empty concavities based on area and volume are shown on pebulin protein in Fig. 16.

Fig. 16

(a) Secondary structures and the functional active sites of pebulin largest surface concavity (red boxes). (b) Top three empty concavities of the pebulin protein (based on area and volume).

Phylogenetic analysis results

After three iterations in PSI blast, no new sequences were found above the 0.005 threshold in PDB protein database. The phylogenetic relationship of A and B chains of RIP proteins are shown in Fig. 17, Fig. 18, separately. The phylogenetic tree revealed that as expected, A and B chains of pebulin with other Sambucus spp. RIP II protein chains were grouped together. This demonstrates that these proteins are highly conserved during evolution and carry out the same functions. The close phylogenetic relationship of isolated RIP from native Iranian S. ebulus with Sambucus spp. RIPs is shown in Fig. 19.

Fig. 17

Fig. 18

Phylogenetic tree of the Iranian ebulin (pebulin) chain A similar sequences constructed by Neighbor-joining method. Bootstrap values (1000 replicates) are reported as percentages. The scale bar represents the number of differences between sequences.

Fig. 19

Phylogenetic tree of the Iranian ebulin (pebulin) similar sequences constructed by Neighbor-joining method. Bootstrap values (1000 replicates) are reported as percentages. The scale bar represents the number of differences between sequences.

Phylogenetic tree of the Iranian ebulin (pebulin) B chain similar sequences constructed by Neighbor-joining method. Bootstrap values (1000 replicates) are reported as percentages. The scale bar represents the number of differences between sequences. Phylogenetic tree of the Iranian ebulin (pebulin) chain A similar sequences constructed by Neighbor-joining method. Bootstrap values (1000 replicates) are reported as percentages. The scale bar represents the number of differences between sequences. Phylogenetic tree of the Iranian ebulin (pebulin) similar sequences constructed by Neighbor-joining method. Bootstrap values (1000 replicates) are reported as percentages. The scale bar represents the number of differences between sequences.

Conclusion

Research and study of RIPs have increased in recent years because of their use as part of immunotoxins, conjugates or recombinant chimeras with the aim of the treatment of several important diseases such as cancer (Allahyari et al., 2017, Naran et al., 2018.), AIDS (Hogan et al., 2018) and autoimmune diseases. In addition, some RIPs have antiviral activity against animal and plant viruses (Zhu et al., 2018, Parikh and Tumer, 2004). Pebulin is a type II RIP isolated from native Iranian S. ebulus. In this study, the sequence of pebulin was evaluated using different bioinformatics tools and compared with two type II RIPs obtained from Sambucus species. Generally, the main biological function of Sambucus spp. RIP is the negative regulation of translation which prevents protein synthesis by the hydrolysis of a N-glycosidic bond at adenine base in 28S rRNA. Based on the vast bioinformatics analysis, this protein has distinct properties that differ from other Sambucus spp. type II RIPs. Hence, it can be a novel and promising protein in the researches related to RIP proteins.

58 in total

1. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors: A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal: J Mol Biol Date: 2001-01-19 Impact factor: 5.469

Review 2. Regulation of small RNA stability: methylation and beyond.

Authors: Lijuan Ji; Xuemei Chen
Journal: Cell Res Date: 2012-03-13 Impact factor: 25.617

3. Locating proteins in the cell using TargetP, SignalP and related tools.

Authors: Olof Emanuelsson; Søren Brunak; Gunnar von Heijne; Henrik Nielsen
Journal: Nat Protoc Date: 2007 Impact factor: 13.491

Review 4. Immunotoxin: A new tool for cancer therapy.

Authors: Hossein Allahyari; Sahar Heidari; Mehdi Ghamgosha; Parvaneh Saffarian; Jafar Amani
Journal: Tumour Biol Date: 2017-02

5. N(6)-methyladenosine Modulates Messenger RNA Translation Efficiency.

Authors: Xiao Wang; Boxuan Simen Zhao; Ian A Roundtree; Zhike Lu; Dali Han; Honghui Ma; Xiaocheng Weng; Kai Chen; Hailing Shi; Chuan He
Journal: Cell Date: 2015-06-04 Impact factor: 41.582

6. Pokeweed antiviral protein (PAP) mutations which permit E.coli growth do not eliminate catalytic activity towards prokaryotic ribosomes.

Authors: J A Chaddock; J M Lord; M R Hartley; L M Roberts
Journal: Nucleic Acids Res Date: 1994-05-11 Impact factor: 16.971

7. Isolation and partial characterization of nigrin b, a non-toxic novel type 2 ribosome-inactivating protein from the bark of Sambucus nigra L.

Authors: T Girbés; L Citores; J M Ferreras; M A Rojo; R Iglesias; R Muñoz; F J Arias; M Calonge; J R García; E Méndez
Journal: Plant Mol Biol Date: 1993-09 Impact factor: 4.076

8. Identification, analysis, and prediction of protein ubiquitination sites.

Authors: Predrag Radivojac; Vladimir Vacic; Chad Haynes; Ross R Cocklin; Amrita Mohan; Joshua W Heyen; Mark G Goebl; Lilia M Iakoucheva
Journal: Proteins Date: 2010-02-01

9. Physicochemical characterization and functional analysis of some snake venom toxin proteins and related non-toxin proteins of other chordates.

Authors: Subhamay Panda; Goutam Chandra
Journal: Bioinformation Date: 2012-09-21

Review 10. Recombinant protein expression in Escherichia coli: advances and challenges.

Authors: Germán L Rosano; Eduardo A Ceccarelli
Journal: Front Microbiol Date: 2014-04-17 Impact factor: 5.640

1 in total

1. Integrated Transcriptome and Proteome Analysis Provides Insight into the Ribosome Inactivating Proteins in Plukenetia volubilis Seeds.

Authors: Guo Liu; Zhihua Wu; Yan Peng; Xiuhua Shang; Liqiong Gao
Journal: Int J Mol Sci Date: 2022-08-24 Impact factor: 6.208

1 in total