Literature DB >> 24307767

Domain wise docking analyses of the modular chitin binding protein CBP50 from Bacillus thuringiensis serovar konkukian S4.

Ujala Sehar¹, Muhammad Aamer Mehmood, Khadim Hussain, Salman Nawaz, Shahid Nadeem, Muhammad Hussnain Siddique, Habibullah Nadeem, Munazza Gull, Niaz Ahmad, Iqra Sohail, Saba Shahid Gill, Summera Majeed.

Abstract

This paper presents an in silico characterization of the chitin binding protein CBP50 from B. thuringiensis serovar konkukian S4 through homology modeling and molecular docking. The CBP50 has shown a modular structure containing an N-terminal CBM33 domain, two consecutive fibronectin-III (Fn-III) like domains and a C-terminal CBM5 domain. The protein presented a unique modular structure which could not be modeled using ordinary procedures. So, domain wise modeling using MODELLER and docking analyses using Autodock Vina were performed. The best conformation for each domain was selected using standard procedure. It was revealed that four amino acid residues Glu-71, Ser-74, Glu-76 and Gln-90 from N-terminal domain are involved in protein-substrate interaction. Similarly, amino acid residues Trp-20, Asn-21, Ser-23 and Val-30 of Fn-III like domains and Glu-15, Ala-17, Ser-18 and Leu-35 of C-terminal domain were involved in substrate binding. Site-directed mutagenesis of these proposed amino acid residues in future will elucidate the key amino acids involved in chitin binding activity of CBP50 protein.

Entities: Chemical Disease Species

Keywords: CBP50; homology modeling; molecular docking; substrate-protein interaction

Year: 2013 PMID： 24307767 PMCID： PMC3842575 DOI： 10.6026/97320630009901

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

The family 33 chitin binding proteins (CBM33) are believed to interact with chitin in crystalline form, making it accessible for degradation by chitinases [1], where some of them can specifically bind to α-chitin [2] while others prefer β-chitin. The ChbB from B. amyloliquefaciens [3] and CBP21 from S. marcescens [1] preferably bind β-chitin. On the other hand, some chitin binding proteins have shown synergistic action with chitinases either specifically or non-specifically [4]. They are believed to have important roles such as antifungal activity, oviparous and cuticular layer development, biosynthesis of fungal cell walls and anti-inflammatory actions [5, 6]. Moreover, they have been found involved in adhesion strategy of some pathogenic bacteria [7]. So, it seems that chitin binding proteins are not only important for biomass turnover, but they have crucial roles in different metabolic pathways. The computational packages are frequently being used these days for the sequence analyses and characterization of proteins. Despite of the fact that they are not as much reliable as experimental ones, still they can provide us nearly exact deep understanding of structure-function relationship and substrate-protein interactions of proteins at almost no cost [8, 9]. We are using B. thuringiensis as a model organism to elucidate the substrate utilization pathways in prokaryotes. In this context, we have characterized two chitinases (Chi74, Chi39) and a chitin binding protein (CBP24) from B. thuringiensis [10-12]. In present study, the chitin binding protein CBP50 from the strain S4 was characterized using an in-silico approach. The CBP50 is a unique chitin binding protein and this is the very first study of its kind. This study will lead us towards better understanding of the substrate-protein interaction principles and mechanism.

Methodology

Template-based methods were used to predict the structure of the CBP50. Amino acid sequence of the target protein was obtained from Uniprot [13] Accession number: C7DQN9. It was ascertained that three dimensional (3D) structure of target protein was not available in PDB so homology modeling approach was adopted to predict 3D structure of the CBP50. The SignalP 4.0 server [14] was used to identify the extra cellular transport signal peptide. Primary sequence analyses of the query protein were performed using structure analysis tools available at Expasy [15]. The physico-chemical parameters of the protein sequence that includes amino acid and atomic compositions, isoelectric point, extinction coefficient were predicted by Protparam. Secondary structure analyses of the query protein were performed using Expasy tools and PSIPRED Server [16]. The domains of the CBP50 were predicted by InterPro protein sequence analysis, classification Database Pfam, and by Conserved Domains Database [17] at NCBI. The presence of particular motifs that reflects the specific functions of the proteins was searched by Motif Search Library. Because of large protein size and alignment length constraints, domain-wise modeling strategy was adopted. Domain wise multiple sequence alignment of the CBP50 was performed by ClustalW [18] to see the conserved residues in each domain using multiple templates that were selected from different life forms. Multiple sequence alignment was used to analyze the conserved residues in each domain. A fully automated modeling based on online servers (PHYRE, PS2, and SWISS Model) and manual homology modeling were adopted to generate the modeling dataset. Among all the generated models by both strategies the best one were selected for docking analysis. Homology models of proteins are of great interest for planning and analyzing biological experiments when no experimental three dimensional structures are available [19]. MODELLER 9.9 was used to generate homology models based on comparative modeling using 3D atomic coordinates of a template structure [20]. The first step in homology modeling was the selection of templates of known structure for the query protein. For this purpose library of experimentally determined protein structures was searched to identify suitable templates for query protein. The 3D structure of proteins is better conserved than their sequences, it is often possible to identify a homologous protein with a known structure (template) for a given protein sequence. The sequence identity between query protein and template were obtained by running BLAST against PDB [21]. The resulting candidates were short listed on four criteria by eliminating hits with E-values greater than 0.01, alignment length shorter than 85% of target sequence or lower functional similarity and origin of protein. After the selection of template, the alignment between template and target sequence was generated by ALIGN 2D function of Modeller9.9 [12]. Once a target-template alignment was constructed, MODELLER calculated 3D models of the target using its auto model class. Models of the whole CBP50 and the models of its domains were also built separately. Overall, 50 Models of native CBP50 and 10 models of each domain were built. The Lowest Objective Function is used to select the best model by the smallest value of normalized Discrete Optimized Molecule Energy (DOPE) score [22]. In the next step Loop optimization function of MODELLER9.9 was used for the loop modeling of the CBP50. The MODELLER includes a facility explicitly designed for loop modeling by a satisfaction of spatial restraints method. Side chain modeling of the CBP50 was done by adopting knowledge based approach. Rotamers libraries extracted from high-resolution X-ray structures were used. The various rotamers were tried successively and scored with a variety of energy functions. The CHIMERA [23], containing libraries of high-resolution X-ray rotamers, was used for side chain modeling of the CBP50 models. Different tools were used for evaluation of model. Structures were assessed by means of MolProbity [24] and NIH server. Through Molprobity server Ramachandarn outliers, rotamers, Cβ deviation, bad angels and bond lengths of all the models were assessed. NIH server gives assess to various evaluating tools like PROCHECK, WHAT_CHECK, ERRAT, VERIFY 3D and PROVE. Structural alignment of proteins was performed using FATCAT server [25]. The outputs of a structural alignment are a superposition of the atomic coordinate sets and a minimal root mean square deviation (RMSD) between the structures [26]. After generating and evaluating models by the tools and servers, finally the models that show the best values of all the parameters were selected. Homology modeling may contain errors. The number of errors (for any given method) mainly depends on two values: percentage sequence identity between template and target and the number of errors in the template. After evaluating all models the selected model was validated. For the verification of the best selected model of query protein, the model and the template were superimposed. Superimposition was done by CHIMERA's structure comparison command; the RMSD value validates the model. The RMSD of superimposed structures indicates their divergence from one another. To study the protein-substrate interaction molecular docking of the CBP50 domains with the chitin substrate was performed. For this purpose Autodock Vina [27] was used on windows 7. AutoDock vina is the newer version which inherits some of the ideas and approaches of AutoDock 4, such as treating docking as a stochastic global optimization of the scoring function, pre-calculating grid maps (Vina does that internally), and some other implementation tricks, such as pre-calculating the interaction between every atom type pair at every distance. The docking procedure includes four step; coordinate file preparation, defining grid maps, docking calculations and analysis of results. All four steps were used iteratively for each domain separately.

Results & Discussion

The N-terminal signal peptide meant for the extra-cellular secretion of the protein was shown maximum cleavage site probability (C-score 0.752) between Ala-40 and His-41. So, the signal peptide of the CBP50 contained 40 residues and thus removed. According to ProtParam primary sequences analysis the target protein had overall 48 negatively charged residues and 44 positively charged residues moreover the instability index (II) was computed to be 29.86 which classifies the protein as negatively charged and stable. Secondary structure analyses of the query protein have shown that major portion of the target protein consist of loops that was nearly 63.1%, whereas approximately 7.5% were α- helices and 29.5% β-strands. Domain analyses by different tools predicted the presence of four domains in CBP50 presenting a modular structure, containing an N-terminal CBM33 domain (CBD-N, His-41 through Leu- 209), two Fn-III type fibronectin binding domains (Pro-220 through Thr-304 and Pro-314 through Gly-389) and a C-terminal CBM5 domain (Glu-391 through Val-455) as described previously [5]. Only one Motif was found in CBP50 by PROSITE Profile search i.e. FN3 a Fibronectin type-III domain profile, two motifs (PD022912 and PD864246) were found by ProDom. Where, PD022912 is a chitin-binding protein fusolin and PD864246 is another chitin-binding motif. Whereas, seven motifs were found in the query sequence by PRINTS that were GPCRRHODOPSN4-Rhodopsin-like GPCR superfamily signature, FNTYPEIII3- Fibronectin type III repeat signature, FUMRATELYASE3-Fumarate lyase superfamily signature, GPCRRHODOPSN1-Rhodopsin-like GPCR superfamily signature, CADHERIN2-Cadherin signature, FADPNR5- FAD-dependent pyridine nucleotide reductase signal and JDOMAIN4- DnaJ domain signature. Presence of such different motifs in the CBP50 protein reflects its possible involvement in biological functions other than chitin degradation pathway. By this, CBP50 is a unique protein because none of the chitin binding proteins have ever shown such modular structure. It reflects that either B. thuringiensis has been in close connection with several other species during its evolutionary past or the chitin degradation system has been evolved in a relatively complex environment. Moreover, Fn-III like domain is a human originated domain, its presence in the CBP50 structure shows that B. thuringiensis has been in close connection with humans as well.

Homology modeling:

To predict 3D structure of the CBP50, the protein sequence was submitted to online servers such as SwissModel-Automated Mode [28], PS2: Protein Structure Prediction Server and Phyre [29]. But online servers could model few residues of respective domains only. So, manual homology modeling had to be performed using Modeller9.9 and models of CBP50 domains were generated. Advance search feature of the PDB was used; PSI-BLAST was done for template selection for each of the four domains. For N-terminal domain only two hits and for C-Terminal six hits were found. Whereas, several hits were found for Fn-III domains but those having identity greater than 30% were considered. For N-terminal chitin binding domain 2BEM was selected as template because 2BEM is the crystal structure of chitin binding protein (CBP21) from Serratia marcescens [30]. The CBP21 had three chitin binding domains belonging to CBM-33 and N-terminal of target protein (CBP50) also belongs to CBM-33. For Fn-III like domains 1K85-Chain A, solution structure of the fibronectin type III domain from Bacillus circulans WL-12 chitinase A1 [31] was selected as a template. For C-terminal, 1ED7 which is a solution structure of the chitin-binding domain of B. circulans WL-12 chitinase A1 [32], was selected as template because 1ED7 is representative structure of CBM5, and the C-terminal of target protein also belongs to CBM5. Other hits found by various tools were not representative of CBM5, that's why despite of low E-value 1ED7 was selected as template. This short domain is found in many different glycosyl hydrolase enzymes and is presumed to have a carbohydrate binding function. The domain has six aromatic groups that may be important for binding [33]. In the next step each domain was aligned with its respective template sequence and two files in PIR and PAP format were generated for each domain by ALIGN 2D command of Modeller9.9. On the basis of aligned files models were generated by “get-model” command of Modeller9.9 for each domain. Twenty models were generated for each domain and the models with lowest objective function were selected (Figure 1). MODELLER includes a facility explicitly designed for loop modeling by a satisfaction of spatial restraints method. Thus loop optimization of the models was done by using loop refinement script of Modeller9.9. With the loop refinement, the ERRAT quality factor [34] for each model was observed to be increased Table 1 ( See supplementary material). Molprobity Server [35] was used to identify poor rotamers of each model. The side-chain conformations (rotamers) of the residues were corrected by CHIMERA that has libraries of rotamers extracted from high-resolution X-ray structures. After evaluating all models by various tools and servers the selected model was validated Table 2 (See supplementary material).

Figure 1

Templates and Generated Model: A: a) Template (2BEM) downloaded from PDB b) N-Terminal domain of the CBP50 generated by MODELLER; B): a) Template (1K85) downloaded from PDB b)1st Fn-III domain of the CBP50 generated by MODELLER. c) 2nd Fn-III domain model of the CBP50 generated by MODELLER; C): a) Template model (1ED7) downloaded from PDB b) C-Terminal domain model of the CBP50 generated by MODELLER. All images in Figure 1 were created using UCSF-CHIMERA 1.5.3

For the verification of the best selected model of each domain and of whole CBP50, the model and the template were superimposed on the selected templates (Figure 2). Superimposition was done by CHIMERA's structure comparison command; the RMSD value validates the model for each domain. The RMSD between 165 atom pairs of superposed N-Terminal domain was 0.274 Å. While the structure alignment had 167 equivalent positions with an RMSD of 0.32, without twist according to FATCAT server. Both the Fn-III domains were superimposed with their template 1K85. For this superimposition, RMSD was 0.257 Å between 85 atom pairs, whereas according to FATCAT the structure alignment had 85 equivalent positions with an RMSD of 0.26, without twists. For second Fn-III like domain, the RMSD was 0.441 Å between 62 atom pairs and FATCAT server shown structure alignment of 72 equivalent positions with an RMSD of 0.58, without twists. The C-Terminal domain was superimposed with its template 1ED7 and showed RMSD 0.705 Å between 33 atom pairs whereas the structure alignment had 43 equivalent positions with an RMSD of 1.08, without twists shown (Figure 3).

Figure 2

Superimposition of the models with templates: a): Template (2BEM) and N-Terminal domain superimposition; b & c): Template (1K85) and Fn-III Domain superimposition; d) Template (1ED7) and C-Terminal domain Superposition. Our target model is represented by green and template is shown in magenta.

Figure 3

: Domain wise docking analyses of the CBP50: The CBP50 domains are represented in green color while substrate is represented in sticks model. On the basis of lowest RMSD and minimum energy, the best docking was selected for each domain: A) N-terminal domain-substrate interaction, four amino acid residues of N-terminal domain, Glu-71, Ser-74, Glu-76 and Gln-90 have shown interaction with the substrate; B) The amino acid residues Trp-20, Asn-21, Ser-23 and Val-30 from the Fn-III like domain have shown interaction with chitin hexamer; C) The residues Glu-15, Ala-17, Ser-18 and Leu-35 have shown H-bonding with the substrate.

Molecular docking analyses:

Usually chitin binding proteins are placed in families 14, 18, and 33 of carbohydrate-binding modules [36]. Studies on various chitin binding proteins of family 33 have revealed that they interact with insoluble crystalline chitin, making it more accessible for degradation by chitinases [1]. To understand the mode of chitin binding, the docking analysis of the CBP50 was carried out. Each domain of CBP50 was separately docked with the chitin oligomer and best docked complex were analyzed. All three domains were separately docked using Autodock Vina, and the output files were analyzed by Pymol [37]. Because of the dimensional limitations of grid, docking was performed by defining grid on particular portion of our models. The best conformation given by Autodock Vina was selected on the basis of RMSD cluster ranking and with lowest free energy of the protein-ligand complex. For studying molecular interaction of N-Terminal domain a grid box having coordinates center- x=10.724, center_y=27.8, center_z=11.359, size_x=102 size_y=76 , size_z=100 was set and let this portion to interact with chitin substrate. The 2nd conformation with lowest energy (-10.5 kcal mol-1) and distance from best model to be computed 20.471 RMSD l.b and 24.613 RMSD u.b was selected. The structural analyses of both chitin binding domains and chitinases have shown that such proteins, except CBP21, contain surface exposed aromatic residues, lined in a cleft, groove or in a tunnel that are necessary for substrate binding [38-43], whereas CBP21 showed neither a groove nor surface exposed aromatic residues [30]. The analysis of N terminal domain and high amino acid sequence identity suggests an evolutionary relationship of CBP50 with CBP21 [5]. The CBD-N has shown four amino acid residues Glu-71, Ser-74, Glu-76 and Gln-90 interacting with the substrate through strong H-bondings (Figure 3). The amino acids serine and glutamic acid were also found to be involved in substrate binding in the previous reported work on same domain found in CBP24 which is also a component of B. thuringiensis chitinolytic system [12]. Similarly, Fn-III domain was docked by setting coordinates at center_x=3.889, center_y=0.43, center_z=-1.773, size_x=98, size_y=60 , size_z=76 positions. The very first conformation with the lowest energy (-10.0 kcal mol-1) was analyzed and four residues, Trp-20, Asn-21, Ser-23 and Val-30 were shown to be involved in substrate interaction. The CBP50 protein is the only chitin binding protein containing an Fn-III like domain [5]. For the docking analysis of the C-Terminal domain, a grid box was set at points center_x=-4.43, center_y=3.57, center_z=-4.436, size_x=76 ,size_y=60 and size_z=62. Out of nine conformations the one with lowest energy was selected for further analysis. Previously the Fn-III domains have shown cellulose binding activity but this is the first report presenting the interaction of this domain with chitin. To find the mechanism of the C-terminal domain with subtratrate interaction, this domain was also docked with chitin hexamer. It was shown that the amino acid residues Glu-15, Ala-17, Ser-18 and Leu-35 showed bonding with the chitin substrate (Figure 3). Previous studies have shown that C-terminal domain of the CBP50 is essential for the efficient chitin binding because the CBP50 lacking C-terminal domain with showed poor interaction with chitin [5] which is in accordance with our results. Being unique in its structure, CBP50 has shown interesting structural and substrate interaction insights. Site-directed mutagenesis studies will further elucidate the mechanism of substrate interaction of this protein in specific and will lead us to develop more understanding about the biodegradation of recalcitrant substrates by prokaryotes.

39 in total

1. The PSIPRED protein structure prediction server.

Authors: L J McGuffin; K Bryson; D T Jones
Journal: Bioinformatics Date: 2000-04 Impact factor: 6.937

2. Structure of a two-domain chitotriosidase from Serratia marcescens at 1.9-A resolution.

Authors: D M van Aalten; B Synstad; M B Brurberg; E Hough; B W Riise; V G Eijsink; R K Wierenga
Journal: Proc Natl Acad Sci U S A Date: 2000-05-23 Impact factor: 11.205

3. Analysis and assessment of comparative modeling predictions in CASP4.

Authors: A Tramontano; R Leplae; V Morea
Journal: Proteins Date: 2001

4. FATCAT: a web server for flexible structure comparison and structure similarity searching.

Authors: Yuzhen Ye; Adam Godzik
Journal: Nucleic Acids Res Date: 2004-07-01 Impact factor: 16.971

5. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling.

Authors: Konstantin Arnold; Lorenza Bordoli; Jürgen Kopp; Torsten Schwede
Journal: Bioinformatics Date: 2005-11-13 Impact factor: 6.937

Review 6. Evaluation of the performance of 3D virtual screening protocols: RMSD comparisons, enrichment assessments, and decoy selection--what can we learn from earlier mistakes?

Authors: Johannes Kirchmair; Patrick Markt; Simona Distinto; Gerhard Wolber; Thierry Langer
Journal: J Comput Aided Mol Des Date: 2008-01-15 Impact factor: 3.686

7. Protein structure prediction on the Web: a case study using the Phyre server.

Authors: Lawrence A Kelley; Michael J E Sternberg
Journal: Nat Protoc Date: 2009 Impact factor: 13.491

8. Chitin-binding proteins of Artemia diapause cysts participate in formation of the embryonic cuticle layer of cyst shells.

Authors: Wen-Ming Ma; Hua-Wei Li; Zhong-Min Dai; Jin-Shu Yang; Fan Yang; Wei-Jun Yang
Journal: Biochem J Date: 2013-01-01 Impact factor: 3.857

9. Bioinformatics analysis of the epitope regions for norovirus capsid protein.

Authors: Liping Chen; Di Wu; Lei Ji; Xiaofang Wu; Deshun Xu; Zhiwei Cao; Jiankang Han
Journal: BMC Bioinformatics Date: 2013-03-08 Impact factor: 3.169

10. MolProbity: all-atom structure validation for macromolecular crystallography.

Authors: Vincent B Chen; W Bryan Arendall; Jeffrey J Headd; Daniel A Keedy; Robert M Immormino; Gary J Kapral; Laura W Murray; Jane S Richardson; David C Richardson
Journal: Acta Crystallogr D Biol Crystallogr Date: 2009-12-21