Literature DB >> 26339152

Modeling and structural analysis of human Guanine nucleotide-binding protein-like 3,nucleostemin.

Farinaz Nazmi¹, Mohammad Amin Moosavi², Marveh Rahmati³, Mohammad Ali Hoessinpour-Feizi⁴.

Abstract

Human GNL3 (nucleostemin) is a recently discovered nucleolar protein with pivotal functions in maintaining genomic integrity and determining cell fates of various normal and cancerous stem cells. Recent reports suggest that targeting this GTP-binding protein may have therapeutic value in cancer. Although, sequence analyzing revealed that nucleostemin (NS) comprises 5 permuted GTP-binding motifs, a crystal structure for this protein is missing at Protein Data Bank (PDB). Obviously, any attempt for predicting of NS structure can further our knowledge on its functional sites and subsequently designing molecular inhibitors. Herein, we used bioinformatics tools and could model 262 amino acids of NS (132-393 aa). Initial models were built by MODELLER, refined with Scwrl4 program, and validated with ProsA and Jcsc databases as well as PSVS software. Then, the best quality model was chosen for motif and domain analyzing by Pfam, PROSITE and PRINTS. The final model was visualized by vmd program. This predicted model may pave the way for next studies regarding ligand binding states and interaction sites as well as screening of databases for potential inhibitors.

Entities: Chemical Disease Gene Species

Year: 2015 PMID： 26339152 PMCID： PMC4546995 DOI： 10.6026/97320630011353

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

Human GNL3 (nucleostemin) has been viewed as a nucleolar protein with variety of roles, including pre-rRNA processing, cell-cycle control, telomere stability, genomic integrity and selfrenewal maintaining of embryonic and tissue stem cells [1, 2]. Since its discovery in 2002, there are accumulating reports that nucleostemin (NS) is also abundantly expressed in many cancerous cells, and contributed directly to formation of cancer stem cells (CSCs), highlighting its importance as diagnostic marker and/or therapeutic target in cancer [1, 2]. In this line, we and other groups evidenced that NS depletion can inhibit tumor growth in in vitro and in vivo and can lead to inhibition of proliferation and induction of cell death [3, 4]. Although its exact mechanism(s) of action is not clear, this nucleolar protein can interact with some important functional proteins in nucleoplasm, such as p53, mouse double minute protein 2 (MDM2), and telomeric repeat binding factor 1 (TRF1), thereby modulating different fates of the cells [5]. In fact, GTP status of NS is the key factor in its nucleolar-nucleoplasm recycling and interaction with nucleoplasmic proteins [4-5]. The gene encoding NS is a member of a gene family with an MMR_HSR1 domain [6]. The MMR_HSR1 domain consists of five GTP-binding motifs that have been detected in singlecelled microorganisms to high vertebrates [2]. Among them, NS and its homologues, guanine nucleotide binding proteinlike 3 (GNL3L) constitute a subfamily of GTP-binding proteins with a unique domain of circularly permuted GTP-binding motifs [6]. The chromosomal location of human NS is 3p21.1 with 3 typical transcript variants. The first variant encodes a protein with 549 amino acids while variants 2 and 3 missed Nterminal and contain 537 aa [http://www.ncbi.nlm.nih.gov/gene/26354]. Sequence analysis of isoform 1 demonstrated that the encoded protein has one chain consisting basic (B) domain (amino acid 2 to 46), intermediate (I) domain (amino acid 282 to 456), acidic domain (amino acid 456 to 543) and a coiled coil region (amino acid 56 to 95) [5-7]. Functionally, the nucleolar entry of NS needs GTP binding (G) domain and I domain whereas its accumulation into nucleolus is dependent to B domain [7]. Despite these data, however, there is no crystal (three-dimensional) structure for NS in the literature, offering an emerging work for predicting its structure by bioinformatics tolls. In this study we represented a predicted model for target sequence of NS, particularly its GTP-binding motifs, which may be helpful for better understanding its functional sites and subsequently designing therapeutic drugs.

Methodology

Sequence retrieval:

The sequence of human GNL3 in FASTA format was retrieved from Uniprot Knowledge Base (http://www.uniprot.org/uniprot/Q9BVP2) with Q9BVP2 accession number [8].

Disordered residues:

Disordered proteins are a kind of protein that lacks a fixed or ordered three dimensional structures and therefore cannot be predicted, so we firstly tried to find which residues of NS can be potentially disordered through a disorder prediction server (http://iupred.enzim.hu/).

Sequence alignment and homology modeling:

Different blast algorithms (blastp, PSI-blast and PHI-blast) were used against Protein Data Bank (PDB) to choose a suitable template [9]. The template and target sequence were aligned subsequently using ClustalW with the default parameters. Finally, the aligned sequence was used as the input for modeler 9.14 to generate a model.

Model refinement:

Predicted models were then refined using Scwrl4 program for prediction of protein side chain conformations. This program is based on a new algorithm and new potential function that result in improved accuracy [10].

Structural validation:

The resulted structure was subjected to structural quality assessment. The ProsA program was used to assess the energy of residue-residue interaction using a distance based pair potential and the energy was transformed to a score called Zscore. Also, predicted models were assessed with Jcsg [11] server indexes and the best one was chosen. Moreover, we used PSVS program for model validation that give results in PROCHECK [12], MolProbity [13], Verify3D [14] and ProsaII plots and graphs.

Secondary structure:

Stride database was used for secondary structure prediction and computation of α – helical, β – strand and coiled regions [15].

Domain and motif analysis:

The Pfam database is a large collection of protein families [16], each represented by multiple sequence alignments and hidden Markov models (HMMs). Pfam results showed one conserved MMR-HSR1 domain in our query sequence. Amino acid residues within a domain that occur consistently and are responsible for specific function are called motifs. They can be used as fingerprints in evolutionary studies and in assigning a recently sequenced protein to a particular family. Fingerprint in our query sequence were found by motif search server PROSITE [17] and PRINTS [18].

Structure visualization:

The predicted model is visualized by vmd software [19].

Results & discussion

Although NS plays an important role in physiological and pathological conditions of stem and cancerous cells, the data on its structure is insufficient [3– 7]. In a try to predict structure of NS we engaged bioinformatics tools in this study. The first issue that should be consider before starting the modeling process is sequence evaluation of target protein regarding to its tendency for being disordered. The results in Figure 1A showed a low disorder tendency in our target sequence while high tendency was seen in amino acids of 1-131 and 393-549. After evaluation of disordered residues we started modeling process. Different blast algorithms were also performed. Among blast results, the best score belonged to 3CNL chain A which contains 263 amino acids and had 99% coverage with amino acids of 132–393 of NS; there was no structural template with acceptable scores for whole protein at PDB. Although, a secondary structure prediction for whole NS were performed via http://cho-fas.sourceforge.net/index.php (data not shown), we decided to predict our model based on amino acids 132-393 largely because any models predicted from disorder regions (1-131 and 393-549 aa) may not reliable. In addition, the functional GTP-binding motifs were found among amino acids of 132–393. Blast results for target sequence showed 86.7 total score, 8e-20 E value and 26% identity, meaning that it is proper for choosing as a template. In the next step, the template and target sequence were aligned using ClustalW with the default parameters (Figure 2) and further, the aligned sequence was used as the input for modeler 9.14 to generate ten (10) models. After model refinement, different indexes of models were compared by jcsg database to choose the best one and then validated with ProsA and PSVS software (Figure 1). The constructed model is monomer with 263 amino acids; its molecular weight estimated as 28875 Dalton. The Prosa analysis showed that Z-score in the predicted model is -2.29, indicating reasonable side chain interactions (Figure 1B). Ramachandran Plot from Procheck evaluation for refined model showed 82.6% residues in allowed regions, 13.0% in additionally allowed regions and 2.6% in disallowed regions (Figure 1C). Global quality scores are given in supplementary Tables 1 & 2 (see supplementary material). In addition, PROCHECK G-factor for phi-psi and all dihedral angles was performed (Figure 1D & E). With respect to mean and standard deviation for a set of 252 Xray structures < 500 residues, of resolution <= 1.80 Å, R-factor <= 0.25 and R-free <= 0.28; a positive value indicates a ‘better’ score. Results showed no close contacts within 2.2 Å, 2.4 ° RMS deviation for bond angles and 0.018 Å for RMS deviation for bond lengths (Figure 1D & E). ProsaII and Verify3D results also demonstrated that approximately 60% of residues represent a score over 0.2 which indicate an acceptable model (Figure 1F &G). Secondary structure was predicted with Stride database and analyzed with PSVS (Figure 1H). As depicted from Figure 1H, the predicted secondary structure contains 7 α- helices, 3 β- strands, a 3- helix and a lot of Turn and Coil structures. In fact, amino acids 2A-10A, 55A-67A, 137A-144A, 187A-195A, 207A- 221A, 232A-240A, 253A-262A forms α- helices and 15A-19A, 44A-48A, 72A-74A forms β− strands. By contrast, much less β− sheets were observed in 1-132 and 393-549 regions (data not shown). Finally, the predicted three-dimensional (3D) structure was shown in Figure 3A. Motif and domain analysis showed the conserved MMR-HSR1 domain is located between amino acids 256-370 where order of GTP-binding motifs is G4 (amino acids 47-50), G5 (amino acids 75-77), G1 (amino acids 131-138), G2 (amino acids 157-161) and G3 (amino acids 175-178) in our model (Figure 3B).

Figure 1

Structural validation of predicted model for NS protein. A) Disordered residue prediction. Residues under blue line represent low tendency to be disordered; (B) PROSA results for showing Z-score; (C) Ramachandran plot of predicted model; (DE) G-factor calculation for phi-psi (D) and all dihedral angles (E) by PROCHECK. ProsaII (F) and Verify3D (G) results for modeled residues and secondary structure analysis by STRIDE (H).

Figure 2

Sequence alignment using clustalW.

Figure 3

Three dimensional (3D) structure predicted for target sequence of human GNL3 (A) and GTP-binding motifs (B) by MODELLER and visualized VMD

GTP binding sites are good therapeutic targets for drug designing against GTP-binding proteins. Therefore, the sites of 5 GTP-binding motifs in the predicted model are highly important and can be considered as valuable targets. Indeed, GTP-binding is the main mechanism that can control the functions of NS by its shuttling between nucleolus and nucleoplasm [5– 7]. For instance, it has been reported that NS can enter into nucleoplasm to interact with MDM2 and as a results induce p53-depndent cell-cycle arrest and/or apoptosis [4-7]. To get more insights on predicted model, we served the target sequence to corresponding database (http://iupred.enzim.hu/) and predicted disorder and anchor tendency in all residues. Then, we compared given scores for each GTPbinding motif and calculated the average number of score for being disorder and ANCHOR (data not shown). The results presented in Figure S1 showed that G1 and G4 orderly present the lowest scores (0.11 and 0.18) for being disorder while the best ANCHOR score belonged to G4 (0.25). Since a higher ANCHOR score represents a more fixed structure, we proposed that G4 motif may be more reliable target for drug design. Collectively, this study represented a general view of NS structure. Predicted structural model can fit a variety of applications in future studies including, ligand binding states and interaction sites as well as screening of databases for potential lead compounds or inhibitors. It should be mentioned that our model for target sequence of NS is only a predictive model and need to be confirmed in further studies.

19 in total

1. The Protein Data Bank: unifying the archive.

Authors: John Westbrook; Zukang Feng; Shri Jain; T N Bhat; Narmada Thanki; Veerasamy Ravichandran; Gary L Gilliland; Wolfgang Bluhm; Helge Weissig; Douglas S Greer; Philip E Bourne; Helen M Berman
Journal: Nucleic Acids Res Date: 2002-01-01 Impact factor: 16.971

Modeling and structural analysis of human Guanine nucleotide-binding protein-like 3,nucleostemin.

Background

Methodology

Sequence retrieval:

Disordered residues:

Sequence alignment and homology modeling:

Model refinement:

Structural validation:

Secondary structure:

Domain and motif analysis:

Structure visualization:

Results & discussion

1. The Protein Data Bank: unifying the archive.

2. Classification and evolution of P-loop GTPases and related ATPases.

3. PROSITE: a documented database using patterns and profiles as motif descriptors.

4. STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins.

5. MOLPROBITY: structure validation and all-atom contact analysis for nucleic acids and their complexes.

6. VERIFY3D: assessment of protein models with three-dimensional profiles.

7. Turning a new page on nucleostemin and self-renewal.

8. Screening and identification of proteins interacting with nucleostemin.

9. Nucleolar trafficking of nucleostemin family proteins: common versus protein-specific mechanisms.

10. The JCSG high-throughput structural biology pipeline.