Literature DB >> 31398194

Computational analysis of high-risk SNPs in human CHK2 gene responsible for hereditary breast cancer: A functional and structural impact.

Nutan V Badgujar1, Bhoomi V Tarapara1, Franky D Shah1.   

Abstract

Nowadays CHK2 mutation is studied frequently in hereditary breast and ovarian cancer patients in addition to BRCA1/BRCA2. CHK2 is a tumor suppressor gene that encodes a serine/threonine kinase, also involved in pathways such as DNA repair, cell cycle regulation and apoptosis in response to DNA damage. CHK2 is a well-studied moderate penetrance gene that correlates with third high risk susceptibility gene with an increased risk for breast cancer. Hence before planning large population study, it is better to scrutinize putative functional SNPs of CHK2 using different computational tools. In this study, we have used various computational approaches to identify nsSNPs which are deleterious to the structure and/or function of CHK2 protein that might be causing this disease. Computational analysis was performed by different in silico tools including SIFT, Align GVGD, SNAP-2, PROVEAN, Poly-Phen-2, PANTHER, PhD-SNP, MUpro, iPTREE-STAB, Consurf, InterPro, NCBI Conserved Domain Search tool, ModPred, SPARKS-X, RAMPAGE, Verify-3D, FT Site, COACH and PyMol. Out of 78 nsSNP of human CHK2 gene, seven nsSNPs were predicted functionally most significant SNPs. Among these seven nsSNP, p.Arg160Gly, p.Gly210Arg and p.Ser415Phe are highly conserved residues with conservation score of 9 and three nsSNP were predicted to be involved in post translational modification. The p.Arg160Gly and p.Gly210Arg may interfere in phosphopeptide binding site on FHA conserved domain. The p.Ser415Phe may interfere in formation of activation loop of protein-kinase domain and might interfere in interactions of CHK2 with ligand. The study concludes that mutation of serine to phenylalanine at position 415 is a major mutation in native CHK2 protein which might contribute to its malfunction, ultimately causing disease. This is the first comprehensive study, where CHK2 gene variants are analyzed using in silico tools hence it will be of great help while considering large scale studies and also in developing precision medicines related to these polymorphisms in the era of personalized medicine.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 31398194      PMCID: PMC6688789          DOI: 10.1371/journal.pone.0220711

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Of all cancers, one of the main cause of cancer related deaths is breast cancer among women worldwide, with 5% to 10% of cases being due to hereditary risk [1]. The CHK2 gene is moderately penetrance gene most extensively studied as possible third high risk susceptibility gene in hereditary breast and ovarian cancer. CHK2 gene is the human homolog of Rad53 (Saccharomyces cerevisiae) and Cds1 (Schizosaccharomyces pombe). Human CHK2 gene is a tumor suppressor gene, located on long arm of chromosome 22 at q12.1 and encoded by CHK2 serine/threonine kinase. It consists of three major domains. 1) N-terminal has SQ/TQ cluster domain that serves as a site for phosphorylation in response to DNA damage, 2) forkhead-associated protein interaction domain (FHA) which is essential for activation in response to DNA damage and is rapidly phosphorylated in response to replication blocks and DNA damage. In FHA domain residues 112–175 are involved in dimerization of CHK2 molecules in phosphorylation manner, for full activation of CHK2 by trans-autophosphorylation of the activation loop. The major function of FHA domain is to regulate the kinase activities in CHK2 by interacting with other proteins thus mediates protein-protein interactions [2, 3] and 3) C-terminal which has serine/threonine kinase activity [4]. CHK2 is activated by the kinases ATM and ATR in response to DNA double-strand breaks or replicative stress [5]. These proteins catalyze the phosphorylation of threonine 68 of CHK2 causing its transient dimerization via the FHA domain leading to CHK2 trans-autophosphorylation and its full activation. In response to DNA damage, CHK2 gene is involved in different pathways such as cell cycle regulation, DNA repair and apoptosis. CHK2 phosphorylates downstream cell cycle regulators such as p53, Cdc25, and BRCA1 to activate checkpoint repair or recovery responses, as well as concurrently delay entry into mitosis [6, 7]. Deviation from its normal physiological function is likely to contribute to disease pathogenesis. In particular, the missense variants of CHK2 p.Ile157Thr, p.Asp252Gly, c.1100delC, p.His371Tyr, p.Glu161del, p.Ser428Phe, c.591delA, p.Arg117Gly, p.Thr476Met and p.Asp438Tyr were significantly associated with germ-line variants in hereditary breast and ovarian cancer [8-11]. Finnish population (1.4%) and Polish population (0.2%) confer a relative risk for developing breast tumors of about 2 for women and 10 for men if c.1100delC mutation is present [12, 13]. Variant p.Ile157Thr, present in 5.3% of the Finnish population and in 4.8% of the Polish population, confers a relative risk of breast cancer of 1.5 [14, 15]. A recent analysis by the Breast Cancer Association Consortium (BCAC) estimated a relative risk of 2.26 for p.Thr367MetfsTer15 (rs555607708) [16]. Limited data is available for whole CHK2 gene for hereditary breast and ovarian cancer. Further, impact of missense variants on protein function is not known fully, although substitutions in the FHA domain and the kinase domain have been shown to abolish activity [17-19]. Single nucleotide polymorphism is a common genetic variant in human and about 93% SNPs are present in human genes [20]. SNPs can be present in coding, noncoding or intergenic regions [21, 22]. Both non-coding and intergenic SNPs may have slight impact, but non-synonymous coding SNPs (nsSNPs) have more impact on protein [20]. Identification of the impact of variants on structure, stability and function of the protein is an important task as not all reported polymorphisms are deleterious [23]. Therefore there is a need to understand the deleterious impact of nsSNPs on protein structure and function using different recent molecular biology techniques. Till now large numbers of SNPs are reported in NCBI data, to screen these nsSNPS for their impact on biological function through experimental work is very tedious and costly. However, utilization of computational methods could be an efficient alternative for the same. Nowadays, different computational tools have been extensively used for predicting deleterious nsSNP and their role in protein function, stability and structure maintenance. Taking all these in consideration, the present study is aimed to determine various deleterious nsSNPs of human CHK2 gene using SIFT, Align GVGD, SNAP-2, PROVEAN, PolyPhen-2, PANTHER, PhD-SNP, I-Mutant, iPTREE-STAB, Mupro etc. Conservation of amino acid residues was predicted using ConSurf. ModPred was used to identify post-translational modification site present in protein. The 3D structure of the CHK2 protein was generated using SPARK-X and refined using ModRefiner. The quality of model was checked using RAMPAGE and Varify3D. The ligand binding sites were predicted using FTsite and COACH. The visualization of 3-D structure and labelling of native as well mutant amino acid was done using Pymol and Swiss PDB viewer.

Materials and methods

SNP dataset

The data related to human CHK2 gene was retrieved from following databases: Uniport database (https://www.uniprot.org) (UniprotKB ID 096017), the NCBI database SNP (rsIDs) and FASTA nucleotide sequence (NC_000022.11) and amino acid sequence (NP_001005735) sequence from (https://www.ncbi.nlm.nih.gov) for further computational analysis [24, 25].

Prediction of functional consequence of non-synonymous SNPs

The functional consequences of the nsSNP of human CHK2 gene were analysed using different computational tools.

SIFT

SIFT (Sorting intolerant from tolerant) predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids. SIFT can be applied to naturally occurring nonsynonymous polymorphisms and laboratory-induced missense variants. SIFT (http://siftdna.org/www/SIFT_dbSNP.html) determines if an amino acid substitution is deleterious to protein function [26]. A SIFT score predicts whether an amino acid substitution affects protein function. The SIFT score ranges from 0.0 (deleterious) to 1.0 (tolerated). The input query for SIFT algorithm is rsIds of SNPs from dbSNP.

Align GVGD

Align GVGD is a web based program available at http://agvgd.hci.utah.edu/. It combines the biophysical characteristics of amino acids, protein multiple sequence alignments to predict whether the missense substitution is deleterious or not [27]. The input query is FASTA sequence of protein and amino acid substitution.

SNAP2

SNAP2 (Screening of non-acceptable Polymorphism 2) predicts the functional consequences of amino acid variation based on neutral network classification method [28]. It is a web based tool available at https://www.rostlab.org/services/SNAP/ in which the input query is a protein sequence of CHK2 in FASTA format.

PROVEAN

PROVEAN (Protein variation effect analyzer) predicts whether single nucleotide variant affects protein function through alignment based score [29]. It is an online software available at http://provean.jcvi.org/index.php produced by J Craig Venture Institute. Based on this, if the score is below threshold value of 2.5, variant is predicted deleterious whereas the variant is neutral if the score is above 2.5. The input query is the FASTA sequence of protein CHK2 and amino acid variants.

PolyPhen-2

PolyPhen-2 (Polymorphism Phenotyping V2) predicts the impact of amino acid substitution on protein structure and function by using straight forward physical and comparative consideration [30]. It is a web based tool available online at http://genetics.bwh.harvard.edu/pph2/. It calculates the PSIC (Position-Specific independent score). If score is >0.85, then variant is probably damaging and score is >0.15 possibly damaging and rest are considered as benign. The input query for PolyPhen-2 is FASTA sequence of protein CHK2 and amino acid variants.

PANTHER

PANTHER cSNP (Protein analysis through evolutionary relationship- coding SNP) predicts functional consequences of variants on the protein. It is an online tool available at http://pantherdb.org/tools/csnpScoreForm.jsp. It compares the sequence of protein with a family of evolutionarily related protein. Longer the preservation time, higher the functional impact of amino acid variant. It calculates the subPSEC (Substitution Position Specific evolutionary conservation) score on the basis of alignment of evolutionary related proteins [31]. The input query is plain protein sequence, amino acid variants and human organism.

PhD-SNP

PhD-SNP (Predictor of human deleterious single nucleotide polymorphism) server is a Support Vector Machine (SVM) based method to discriminate between neutral and disease-related single point protein variants [32]. It is an online tool available at http://snps.biofold.org/phd-snp/phd-snp.html. Results were obtained through evolutionary information and using hybrid predictive model. The input query is plain protein sequence, position of SNP along with new residue.

MUpro

MUpro is a set of machine learning programs which predicts the protein stability changes for single nucleotide variation in amino acid sequence [33]. It is a web based server available at http://mupro.proteomics.ics.uci.edu/. Prediction of result based on both value and sign of energy change using SVM and sequence information only. The input query for this is also a plain sequence of protein followed by original and substituted amino acid.

iPTREE-STAB

iPTREE-STAB is a web based server available at http://203.64.84.190:8080/IPTREEr/iptree.html which is based on decision tree. It predicts the impact of single amino acid change on protein stability [34]. The input query is original amino acid as well as mutated amino acid residue followed by three flanking residues from both sides of the mutated residue.

Phylogenetic conservation

Consurf is a computational tool available at http://consurf.tau.ac.il which calculates the evolutionary conservation of amino acid position through phylogenic relations between homologous sequences [35]. Consurf calculates conservation score from 0 to 9 which is classified into variable, average and highly conserved. The input query for consurf is FASTA sequence of protein CHK2.

Prediction of post translational modification sites

The ModPred server is available at http://www.modpred.org which is used to predict post translational modification sites within CHK2 protein sequence. ModPred is a sequence-based predictor of potential post-translational modification (PTM) sites in proteins. It consists of 34 ensembles of logistic regression models, trained separately on a combined set of 126,036 non-redundant experimentally verified sites for 23 different modifications, obtained from public databases and an ad-hoc literature search [36].

ExAC browser beta

ExAC browser is freely available at http://exac.broadinstitute.org. The minor allele frequency (MAF) was retrieved from ExAC Browser Beta for the nsSNPs of human CHK2 gene. The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a variety of large-scale sequencing projects and to make summary data available for the wider scientific community. The ExAC browser provides gene and transcriptcentric displays of variation, a critical view for clinical applications. Additionally, it provides a variant display, which includes population frequency and functional annotation data as well as short read support for the called variant. ExAC has already been used extensively by clinical laboratories worldwide [37]. The input query is name of human CHK2 gene.

Prediction of nsSNPs position in different protein domains

NCBI Conserved Domain Search tool (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi and InterPro (https://www.ebi.ac.uk/interpro/) were used to locate the position of SNPs in different domains of CHK2 protein structure [38, 39]. Input query for InterPro is a plain sequence of CHK2 and for NCBI Conserved Domain Search tool the input query is FASTA amino acid sequence of protein CHK2.

Protein 3D modelling and structural analysis

The 3D structure of full length CHK2 protein is not available in protein data bank. The 3D structure of protein CHK2 was generated using SPARKS-X fold recognition server (http://sparks-lab.org/yueyang/server/SPARKS-X) [40]. The input query for SPARKS-X server is FASTA amino acid sequence of protein CHK2. The degree of similarity of templates used by SPARKS-X server for 3D model prediction was checked by BLASTp. The 3D structure predicted by SPARKS-X server was further refined using Modrefiner (https://zhanglab.ccmb.med.umich.edu/ModRefiner) [41]. The quality of refined model was checked using Varify3D (http://servicesn.mbi.ucla.edu/Verify3D) and RAMPAGE (http://mordred.bioc.cam.ac.uk/~rapper/rampage.php) [42]. Input query for Varify3D and RAMPAGE analysis is refined structure predicted using SPARKS-X.

Ligand binding site prediction

The ligand binding sites within CHK2 protein were predicted using FT site server (http://ftsite.bu.edu/) and COACH server (https://zhanglab.ccmb.med.umich.edu/COACH/). FT site is freely available online tool which predicts ligand binding sites of CHK2 protein. FT site accurately identifies binding sites in over 94% of apoproteins, including structure based prediction of protein, the explanation of functional relationships among proteins, protein engineering and drug designing [43]. COACH is a meta-server based approach used for protein-ligand binding site prediction. Using two comparative methods, TM-SITE and S-SITE COACH predicts complementary ligand binding sites [44]. The input query for COACH is refined structure generated by modrefiner. PyMol and Swiss PDB viewer were used to visualize 3D structure of protein.

Results

SNP database

The CHK2 gene investigated in the present study was retrieved from dbSNP database (dbSNP- NCBI: https://www.ncbi.nlm.nih.gov/snp/?term=chek2). It contained a total of 13929 SNPs out of which 753 are missense (nsSNP), 105 are frame shift, 642 in 5'UTR, 55 in 3' UTR, 50 nonsense, 13062 intronic, 50 stop gained, 19 in 3' splice site, 24 in 5' splice site and 266 in coding synonymous SNPs (Fig 1). Only nsSNP of CHK2 were selected for this investigation.
Fig 1

Distribution of SNPs in different functional classes of CHK2 gene according to the dbSNP database.

Prediction of functional nsSNPs in CHK2

The CHK2 single nucleotide variants obtained from dbSNP analysis were subjected to computational analysis through variety of tools. According to SIFT result out of 753 nsSNPs of CHK2 gene total 78 SNP were predicted to be tolerated or deleterious and rest of 675 were not found in SIFT results. From these 78 SNPs, SIFT classified 35 nsSNPs as damaging, 43 as tolerated. To increase the accuracy of computational techniques, all the 78 SNPs predicted in SIFT were further validated by Align GVGD, SNAP2, PROVEAN, PolyPhen2 and PANTHER tools. Align GVGD is a method that combines Grantham Variation (GV) and Grantham Deviation (GD) scores to predicts whether the missense substitution is deleterious or not. In Align GVGD, if GD score is less than C15 then substitution is less likely affected and score is greater C65 then substitution is most likely affected. Out of 78 nsSNP Align GVGD predicted 43 SNPs as most likely affected and 10 nsSNPs as less likely affected. SNAP2 predicts whether the impact of amino acid variation is neutral or has effect on a query protein function by evaluating mutability landscape of the entire query protein sequence. Out of 78 SNPs subjected to SNAP2 prediction, 41 showed effect on protein function and 37 predicted as neutral SNPs. Among 78 SNPs subjected to PROVEAN analysis, 35 SNPs were predicted as deleterious and 43 SNPs were predicted as neutral. Out of 78 SNPs subjected to PolyPhen2 analysis 41 were predicted probably damaging, 10 predicted possibly damaging, 26 predicted benign and 1 was not predicted by PolyPhen2. For every input variant PolyPhen2 calculates PSIC (Position specific independent score). Out of 78 nsSNPs, 37 SNPs were predicted probably damaging, 17 predicted possibly damaging and rest 24 SNPs predicted probably benign by PANTHER cSNP. The nsSNP predicted as probably damaging by PolyPhen and PANTHER were considered as damaging and used for further analysis. All the 78 nsSNPs of CHK2 gene were further analyzed for correlation with disease after functional impact through PhD-SNP. PhD-SNP is a SVM based classifier which predicts the result through evolutionary information and hybrid predictive method with the accuracy of 78% of human protein [29]. PhD-SNP revealed the most unique results showing only 20 nsSNPs as diseased and rest of 58 SNPs as neutral. We predicted any stability alteration in the CHK2 protein with the help of MuPro and iPTREE-STAB which predict the result by considering single site variant. MuPro predicted 56 nsSNP which decrease stability of CHK2 protein and rest of 22 SNPs increase stability. iPTREE-STAB result revealed to decrease stability of 74 nsSNP and 4 nsSNP showed increase in protein stability. According to some studies, decreased protein stability causes increase in degradation, misfolding and aggregation of proteins. We shortlisted those nsSNP which are common in all 9 different algorithm tools and predicted as deleterious SNPs. Total 7 SNPs out of 78 SNPs met the criteria and classified them as high risk and selected for further analysis. Result of SIFT, Align GVGD, SNAP2, PROVEAN, PANTHER, Ph-D SNP, MuPro and iPTREE-STAB is shown in Table 1.
Table 1

Prediction of functional consequences of nsSNP in human CHK2.

SNPs rs IDAA VariantSIFTAlign GVGDSNAP2PROVEANPoly Phen-2PantherPh-D SNPMuProiPTREE-STAB
rs17879961I200TTC65EffectNPosDPosDN-1-ve
rs17882942L555VTC25NeutralNBen.ProBN-9-ve
rs17883172E544KTC55NeutralNBen.ProBN-3-ve
rs17883862P85LTC65EffectNProDPosDN-6-ve
rs17886163I491STC65NeutralNBen.PosDN-5-ve
rs28909980D390NDC15EffectDProDProDN-0-ve
rs28909982R160GDC65EffectDProDProDDi-5-ve
rs72552322G210RDC65EffectDProDProDDi-7-ve
rs72552323I203TDC65EffectDProDProDDi-2-ve
rs77130927R223CDC65NeutralDPosDPosDDi-10-ve
rs121908694S41FDC65EffectNProDProDN-6-ve
rs121908701R224HTC25NeutralNBen.ProBDi-2-ve
rs121908702E282KTC55NeutralNProDPosDDi-2-ve
rs121908703S399LTC65NeutralDPosDPosDN-2-ve
rs121908704T444ATC55NeutralNBen.ProBN-6-ve
rs121908705N489DTC15NeutralNBen.ProBN-7-ve
rs121908706R517HTC25EffectDProDProDDi-7-ve
rs137853007R188WDC65EffectDProDProDDi-2-ve
rs137853008A17STC65NeutralNBenProBN-5-ve
rs137853009R223HDC25NeutralDBenPosDDi-4-ve
rs137853010R224CTC65NeutralNBenProBDi-4-ve
rs137853011S471FTC65NeutralDProDPosDDi-1-ve
rs138040612E571KTC55NeutralNProDPosDN-6-ve
rs139088611V494ATC55NeutralNBenProBDi-0-ve
rs139366548Y467HTC65EffectDProDProDN-8-ve
rs141568342E64KDC55EffectNBenProBN-6-ve
rs141776984C286RTC65EffectDPosDProDDi-1-ve
rs142243299V25ITC25NeutralNBenProBN-5-ve
rs142763740T519MDC65EffectDProDPosDN-7-ve
rs143611747R361HTC25NeutralNBenPosDN-6-ve
rs143965148D540NTC15NeutralNBenProBN-8-ve
rs144850845G210EDC65EffectDProDProDN-1-ve
rs145324174C428YDC65EffectDProDProDN-1-ve
rs146198085N229HDC65NeutralDProDProDDi-8-ve
rs147877722S415FDC65EffectDProDProDDi-5-ve
rs148053495R361CDC65EffectDProDPosDDi-3-ve
rs149501505R566CTC65EffectNProDProBN-5-ve
rs149991239T59KDC65EffectDProDProDN-4-ve
rs199708878R3WDC65EffectNProDProDN-8-ve
rs199749372I264VTC25NeutralNBenProDN-8-ve
rs199859140D404HTC65NeutralNPosDProBDi-7-ve
rs200050883D481YDC65EffectDPosDPosDDi-7-ve
rs200432447R562GDC65EffectDBenProDN-2-ve
rs200451612I264MTC0NeutralNPosDProDN-3-ve
rs200649225R449HTC25NeutralNProDPosDN-3-ve
rs200928781Y433CDC65EffectDProDProDN-6-ve
rs201084748S5LTC65EffectNBenProBN-8-ve
rs201206424R389CDC65EffectDProDProDN-7-ve
rs202051128I387MTC0EffectNProDProDN-7-ve
rs202089930T426IDC65EffectDProDProDN-1-ve
rs267606211S422FDC65EffectDPosDProDN-2+ve
rs368570187R180QTC35NeutralNBenProBN-7-ve
rs369070738N448STC45NeutralNBenProBN-2-ve
rs369223840N229STC45NeutralDProDProDN-4-ve
rs369256181Q11RTC35NeutralNBenProDN-7-ve
rs370968992F518ITC15NeutralNBenProBN-7-ve
rs371207635H382YDC65EffectDProDProDN-6-ve
rs371657037S53TDC55NeutralNProDProDN-5-ve
rs372874441D177HTC65NeutralNProDProBN-8-ve
rs373073383A435VDC55EffectDProDProDN-8-ve
rs373648967K162RTC25NeutralNBenProBN-6+ve
rs373959274R564QDC35EffectNProDProDN-3-ve
rs374395284E364ADC65EffectDProDProDN-4-ve
rs374660293L381HDC65EffectDProDProDN-4-ve
rs375130261M424VDC15EffectDProDProDN-8-ve
rs17880867N489KTC65NeutralNBenProBN-4-ve
rs17881473F490IDC15EffectDBenProDN-8-ve
rs17882922L479MDC0NeutralNProDProDN-1-ve
rs112032663G30DTC65EffectNProDProBN-3+ve
rs113947614I264TTC65NeutralNBenProDN-4-ve
rs137926355R144QTC35NeutralN--N-8-ve
rs141502354I386VTC25NeutralNPosDPosDN-6-ve
rs142966756R191MDC65EffectDProDPosDN-5-ve
rs150677496L173QTC65EffectDProDPosDDi-5-ve
rs151218932C18YTC65EffectNBenProBN-4-ve
rs372168051P225HDC65EffectDProDProDDi-5-ve
rs375507194Q20HDC15NeutralNProDProBN-6-ve
rs376736188Q27ETC25EffectNPosDProBN-9+ve

Where D: Deleterious; T: Tolerated; GD ≥C65 = most likely affected; GD ≥C0 = less likely affected; N: Neutral; D: Deleterious; ProD: Probably damaging; ProB: Probably benign; PosD: Possibly damaging; Ben: Benign; N: Neutral; Di: Disease: ↓: Decrease; ↑: Increase; -ve: negative; +ve: positive

Where D: Deleterious; T: Tolerated; GD ≥C65 = most likely affected; GD ≥C0 = less likely affected; N: Neutral; D: Deleterious; ProD: Probably damaging; ProB: Probably benign; PosD: Possibly damaging; Ben: Benign; N: Neutral; Di: Disease: ↓: Decrease; ↑: Increase; -ve: negative; +ve: positive

Conservation profile of deleterious nsSNP in CHK2

Evolutionary information is used to predict whether the substitution of amino acid affects the protein functions or not. Consurf web server was used to calculate the conservation score of amino acid residue of CHK2 protein to further analyze possible effect of 7 most deleterious nsSNP predicted through different computational tool. Results were obtained in the form of structural representation of the protein (S1 Fig). Highly conserved residues are predicted as either functional or structural based on their location either on protein surface or inside its core. Results obtained via conSurf represented all residues of CHK2 showing their structural and functional conservation levels. But we focused only on those residues which matched their positions with 7 high risk nsSNPs which we have identified. Taking this into consideration, those nsSNPs which are located at these conserved regions are considered immensely damaging to protein as compared to those at non-conserved sites [45, 46]. According to consurf output, p.Arg160Gly, p.Gly210Arg, p.Ser415Phe are highly conserved residues with conservation score of 9. Four amino acids were predicted average conserved. The result of consurf is shown in Table 2. The summary of deleterious prediction for each SNP is shown in Fig 2.
Table 2

ConSurf predictions of most deleterious nsSNP showing conservation profile and their post translation sites prediction by ModPred and their clinical significance in clinvar with their minor allelic frequency (MAF).

SNP IDResidue and PositionConser-vation scoreB/EF/SPTMClinvarMAF
rs28909982R160G9EFProteolytic cleavageConflicting-interpretations-of-pathogenicity, likely-pathogenic0.0001320
rs137853007R188W6EFProteolytic cleavagenot-provided, pathogenic, likely-benign, likely-pathogenic0.00003296
rs72552323I203T6B--Variable of uncertain-significance0.000008240
rs72552322G210R9E--uncertain-significance, conflicting-interpretations of pathogenicity, likely-pathogenic0.00004120
rs77130927R223C5E-ADP ribosylationuncertain-significance, conflicting-interpretations of pathogenicity0.001360
rs372168051P225H5BS-uncertain-significance-
rs147877722S415F9E--uncertain-significance-

B: Buried; E: exposed; F: functional; S: structural; PTM: post translation modification site; MAF: minor allele frequency

Fig 2

Graphical representation of the position of nsSNP in CHK2 gene and protein.

B: Buried; E: exposed; F: functional; S: structural; PTM: post translation modification site; MAF: minor allele frequency Post translational modification sites present within human CHK2 protein were predicted using ModPred. Out of 7 most significant nsSNPs, three amino acids p.Arg160Gly, p.Arg223Cys, p.Arg188Trp were predicted to be involved in post translational modification sites including proteolytic cleavage and ADP ribosylation. The results of modpred are shown in Table 2.

ExAC

The minor allele frequency (MAF) was retrieved from ExAC Browser Beta (http://exac.broadinstitute.org/gene/ENSG00000183765) for the nsSNPs of human CHK2 gene. The result of minor allele frequency of nsSNPs is shown in Table 2. According to Interpro and NCBI Conserved Domain Search tool two major domains were predicted in CHK2 protein. One was STKc_Chk2 domain (serine/threonine kinase, cell cycle checkpoint kinase 2) which comprises 256–529 amino acids and another one was FHA domain (Forkhead associated domain) which comprises 156 to 244 amino acids. In CHK2 amino acid sequences 269–411 were predicted catalytic domain of ATP Binding site; 264–471 were predicted catalytic domain of dimer interface; 269–434 amino acid sequences were present in active site; 273–434 amino acid sequences were present in polypeptide substrate binding site. The 22 amino acid residues present in activation loop (Thr, Asp, Phe, Gly, His, Ser at 415, Lys, Ile, Leu, Gly, Glu, Thr, Ser, Leu, Met, Arg, Thr, Leu, Cys, Gly, Thr, Pro, Thr) of STKc_Chk2 domain. The 160 to 210 amino acid sequences present in polypeptide binding site on conserved domain of FHA domain (Arg at 160 and Gly at 210 position).

Protein 3D modeling and structural analysis

The 3D structure of full length CHK2 protein was not available in protein data bank. SPARKS-X modeled 3D structure of CHK2 protein by submitting FASTA amino acid sequences, where 10 best full length models were generated using different similar templates. The quality of full length models were predicted based on (>6) Z-score. All the templates were subjected to BLASTp analysis to identify the sequence similarity of the templates with CHK2 protein. The 3D structure generated using 3i6wA as a template was used for further analysis. The 3D structure was further refined by submitting structure in ModRefiner server which showed RMSD value 2.821 and TM-score of 0.9685 to initial model. After that refined structure was further validated using Verify3D and RAMPAGE tools. Ramachandran plot analysis by RAMAPAGE for the native protein model showed 541 (92.6%) residues in favoured region, 38 (6.5%) residues in allowed region and 5 (0.9) residues in outlier region. Varify-3D showed 66.21% of the amino acids have scored ≥0.2 in 3D-1D profile. The results of both tools are shown in Table 3.
Table 3

Validation of protein structure.

Varify3DRAMPAGE
Percentage of the amino acids have scored ≥0.2 in 3D-1D profileFavoured regionAllowed regionOutlier region
66.21%541 (92.6%)38 (6.5%)5 (0.9)
FT site server predicted 3 binding sites present in CHK2 protein. First binding site consisted residues Lys at 292th, Leu at 320th, Ile at 329th, Ile at 342th, Ile at 331th, Leu at 344th, Thr at 410th, Asp at 411th, Phe at 412th, Gly at 413th,His at 414 and Ser at 415 position. Second binding site consisted residues Trp at 93th, Tyr at 199th, Ile at 200th, Ala at 201th, Pro at 225, Leu at 226th, Asn at 227th and Asp at 246th position. Third binding site constituted Trp at 93th, Asn at 197th, Ser at 198th, Tyr at 199th, Asp at 246th, Thr at 248th and Val at 249th position. Two binding sites are presented using PyMOL in Fig 3. Coach server also predicted Ser at 415 within ligand binding sites. The detailed results of COACH prediction are shown in Table 4.
Fig 3

Ft site prediction showing Ser at 415 and Pro at 225 positions in 1st and 2nd ligand binding site respectively.

A) Pink, green and purple coloured mesh are 1st, 2nd and 3rd ligand binding site respectively of human CHK2 protein predicted using FT site server B) Zoom in on interaction at Pro 225 C) Zoom in on at Ser 415.

Table 4

Prediction of ligand binding sites within CHK2 protein using COACH.

COACH Result
C-ScoreCluster SizeName of ligandsResidue number
0.882636MP6269,270,271,277,290,292,329,344,345,346,347,348,350,351,394,395,397,410,411
0.071482K5269,277,290,292,316,320,329,342,344,345,346,347,350,397,411,412
0.0373MG272,392,395,411
0.015807Q269,272,275,276,277,290,292,293,294,344,345,346,347,411
0.0152PEPTIDE271,273,351,353,357,361,390,392,393,394,415,430,431,432,462,466,467,468,471,472,473,479,480
0.01241RA270,271,272,277,290,292,316,320,329,342,344,345,346,410,411,412,416
0.01MG312,315,413
0.01CA535,536,537
0.02CA282,333
0.011MG292,316,411,415
TM-site
C-ScoreCluster SizeName of ligandsResidue number
0.58308ANP, ADP, ATP269,270,271,277,290,292,329,344,345,346,347,348,350,351,394,395,397,410,411
0.2911AMP,FMM,I76269,277,290,292,316,320,329,342,344,345,346,347,350,397,411,412
0.2030III271,272,273,274,351,353,357,392,393,394,415,431,432,434,462,467,468,471,472,473,479,480
0.199Mg, ANP, B11292,395,411
0.185AF3, MG, PO4272,273,274,390,392,395,411
S-Site
C-ScoreCluster SizeName of ligandsResidue number
0.41488ANP, ADP, ATP267,269,270,271,272,273,274,275,277,290,292,316,329,344,345,346,347,348,350,351,354,394,395,397,410,411
0.1966MG, MN, IMD273,274,292,351,390,392,394,395,410,411
0.1427MG, 7PE, MN269,270,271,273,274,275,277,290,292,294,304,309,312,313,316,317,320,329,342,344,345,346,347,397,410,411,412,413,414,415,416,423
0.1318PDY, IMD, AGX353,354,355,356,357,358,359,393,394,433,462,466,467,468,469,470,471,472
0.119III, TAR351,353,354,356,357,360,390,392,393,394,411,414,427,428,429,430,431,432,434,438,439,440,441,462,466,467,468,469,470,471,476,477,479,480,482,488,489,491,492,493,494
FINDSITE
C-ScoreCluster SizeName of ligandsResidue number
0.82115Site 1269,270,272,274,275,277,290,292,294,329,344,346,347,351,395,397,410,411
0.046Site 2269,270,272,273,274,275,277,290,292,305,312,329,344,346,347,351,353,357,390,392,393,394,397,410,411,415,427,428,429,430,431,432,433,434,462,468,471
0.045Site 3325,376,379,380,524,535,536,539,540
0.045Site 4273,302,303,305,351,353,354,357,390,392,393,394,415,427,428,429,430,431,432,433,434,462,467,468,471,473,476,479,480
0.012Site 5428,430,434,438,439,476,480

Ft site prediction showing Ser at 415 and Pro at 225 positions in 1st and 2nd ligand binding site respectively.

A) Pink, green and purple coloured mesh are 1st, 2nd and 3rd ligand binding site respectively of human CHK2 protein predicted using FT site server B) Zoom in on interaction at Pro 225 C) Zoom in on at Ser 415.

3D structure prediction of mutant and model validation

The 3D structure of mutant of CHK2 protein was generated by substituting serine with phenylalanine at 415th position in wild type sequence and the sequence was submitted to SPARKS-X server. The 3D structure generated was further refined by submitting structure in ModRefiner server which showed RMSD value 2.310 and TM score of 0.9517. The prediction of TM score suggested the structural deviation of mutant protein as compared to native. After that refined structure was further validated using Verify3D and RAMPAGE. Verify 3D showed 60.75% of the amino acids have scored ≥0.2 in 3D-1D profile. Mutant model is a good quality as having more than 90% region in favoured region. Mutant model showed (93.3%) residues in favoured region, 31 residues (5.3%) residues in allowed region and 8 (1.4%) residues in outlier region.

Discussion

The CHK2 gene is a tumor suppressor gene, involved in cell-cycle regulation, in response to DNA damage, DNA repair and apoptosis pathway. Variants of CHK2 have been implicated in various types of cancer including breast cancer [47]. Single nucleotide polymorphism plays an important role in most of the diseases. About more than 4 million unique human single nucleotide polymorphism (SNPs) have been described by dbSNPs and 2% of the reported SNPs associate with monogenic diseases are present in protein coding region and hence predicted that these SNPs can be related to complex inherited disease traits [48]. Testing the functional consequences of variant by using functional assay can be the best approach but it is quite costly and time consuming too. Hence, for this purpose we have exploited computational approach by using various in silico tools of different algorithms for the analysis of SNVs in CHK2 gene. To date, 13929 human CHK2 gene SNPs are reported in NCBI dbSNP (database) which have been located in non-coding, coding and regulatory regions. The coding SNVs cause amino acid variation which further alters the protein function and leads to disease susceptibility. All the nsSNPs may not have major deleterious effect on protein function, some may have neutral effect. Therefore it is necessary to differentiate deleterious SNPs from the neutral SNPs to analyze susceptibility of individual SNPs to diseases, and also to focus on those SNVs which are responsible for structural and functional consequences of CHK2 protein [49]. However, to predict the pathogenic effect of nsSNP using single bioinformatic tool may not be reliable [50]. In present study prediction of CHK2 genetic variants was accomplished by utilizing sequence and structure based bioinformatics tools- SIFT, Align GVGD, SNAP2, PolyPhen 2, PROVEAN, PANTHER, PhD SNP, MuPro and iPTREE-STAB. According to study of Hicks et al., and Thusberg and Vihinen, to identify most deleterious nsSNPs, SIFT and PolyPhen 2 were reported as best performing tools [51, 52]. To check the stability of protein, MuPro and iPTREE-STAB were used. Out of 79 nsSNP subjected to functional analysis 7 SNPs (p.Arg160Gly, p.Arg188Trp, p.Ile203Thr, p.Gly210Arg, p.Arg223Cys, p.Pro225His and p.Ser415Phe) were predicted to be most deleterious nsSNP in human CHK2 protein. To the best of our knowledge none of the studies showed the genetic risk of p.Arg160Gly, p.Arg188Try, p.Ile203Thr, p.Gly210Arg, p.Arg223Cys, p.Pro225His and p.Ser415phe with any known disease condition. p.Arg160Gly, p.Arg188Try, p.Ile203Thr, p.Gly210Arg, p.Arg223Cys and p.Pro225His all nsSNP are part of FHA domain which is activated in response to DNA damage. p.Arg160Gly marks the substitution of arginine (basic amino acid) by glycine (non-polar amino acid) and vice-versa in p.Gly210Arg substitution. p.Ile203Thr entails the substitution of isoleucine (nonpolar) to threonine (-OH containing amino acids) leading to decrease in stability of protein. In FHA domain, 2 SNPs (p.Arg223Cysand p.Arg188Try) leads to substitution of arginine (basic amino acids) to cysteine (sulphar containing amino acid) and tryptophan (nonpolar aromatic amino acid) which decrease protein stability. R160, G210, S415 are highly conserved residues with conservation score of 9. Four amino acids (R160, I203, R223 and P225) were predicted average conserved. p.Arg160Gly, p.Gly210Arg and p.Arg223Cys might interfere in post-translational modification of CHK2 protein as these residues were predicted to be involved in post translational modifications through ModPred. The Arg at 160 and Gly at 210 residue present in polypeptide binding site on conserved site of FHA domain. Ser 415 residue is present in STKc_Chk2 domain. In p.Ser415Phe substitution of serine (-OH containing amino acid) to phenyl alanine (non-polar amino acid). This residue is part of aimer interface, catalytic domain of ATP binding site and active site of STKc_Chk2 domain. Any change in this residue alters the stability of protein which is predicted by Mupro and iPTREE_STAB. Two amino acids S415 and P225 were predicted to be involved in ligand binding site interactions. These suggest that p.Pro225His and p.Ser415Phe might interfere in ligand binding site interactions. Several studies have investigated the role of CHK2 polymorphism as a genetic determinant for susceptibility to diseases. Several polymorphisms (p.Ile157Val, p.Asp252Gly, c.1100delC, p.Asp438Tyr and p.His371Tyr) have been reported for the CHK2 gene [53, 54]. Pritzlaff and their colleagues assessed multi-gene panel testing using male breast cancer patients and identified pathogenic variants i.e.c.591delA, p.Arg117Gly, p.Thr476Met, p.Ser428Phe, p.Iso157Thr, p.Gln29* andc.1100delC in different population [55]. CHK2*c.1100delC and p.Ile157Thr were most studied in populations all over the world. The inherited variants CHK2 c.1100delC truncates the kinase domain of the CHK2 protein and is responsible for a two fold increase in breast cancer risk in families of northern and north-western European ancestry [12, 13, 56]. According to Delimitsou and his colleagues study, p.Ile160Arg and p.Ile160Thr variants were characterized as damaging and p.Asp203Gly variant was characterized as benign. All these variants were located within the kinase domain [57]. p.Ile160Arg was characterized as intermediate according to study done by Roeb et al in 2012 [58]. Different CHK2 variants were categorized as damaging according to in silico tools and yeast based assay i.e p.Trp93Arg, p.Cys108Arg, p.Arg117Gly, p.Arg145Trp, p.Arg148Gly, p.Ile160Arg, p.Ile160Thr, p.Asp162Gly, p.Asn166Ser, p.Gly167Arg, p.Leu183Ser, p.Leu183Phe, p.Leu236Pro, p.Ile251Phe, p.Arg346Cys, p.Arg346His, p.Asp347Ala, p.Asn352Asp, p.Gly370Glu, p.Cys385Arg, p.Thr387Ser, p.Tyr390Ser, p.Ala392Pro, p.Ala392Val, p.Glu394Lys, p.Cys420Thr, p.Tyr424His, p.Arg474Cys and p.His483Arg [57]. Avraham Shaag and his team discovered two novel amino acid substitutions, p.Ser428Phe in the kinase domain and p.Pro85Leu in the N-terminal region [59]. The individual having CHK2 sequence variants (c.1100delC) may contribute to the Li-Fraumeni syndrome in Dutch families [60]. In Pakistan, two novel mutations p.Gln20X and p.Glu85X at exons 1 and 2 respectively have been identified in breast cancer patients [61]. However the results were contradictory among different studies. By means of in silico, deleterious prediction done in the present study, the p.Gln20His and p.Pro85Leu were not predicted highly deleterious. However, Pro at 85 and Gln at 20th were predicted as conserved residue with conservation score of 7. However none of the study till now available that identify p.Pro225His and p.Ser415Phe nsSNPs as damaging. CHK2 variants found in this study have not been reported earlier so they need to be validated to check its significance. The major limitation of this study is the fact that it is in silico study thus the results cannot be blindly extrapolated to humans without validation by wet lab study. When we predict pathogenicity of CHK2 variants, it is important to conduct functional assay in cell-lines. In addition to this, analysing data from epidemiological and genetic studies as well as segregation analysis would provide more accurate classification. As multiple CHK2 variants of unknown clinical significance emerge every day when performing genetic testing analyses in patients with cancer, a rapid variant assessment is of great importance. Therefore, the in silico assay used herein provides essential, fast and low ‐cost evaluation for the largest series of tested CHK2 variants to date, thus providing valuable information that can be ultimately implemented in clinical practice. Thus, the present study indicates that the procedure of computational approach provides an alternative approach to select SNPs targets by considering the role of SNPs on the functional attributes or molecular phenotype of protein. These results may be helpful for further understanding of CHK2 SNPs in disease susceptibility by laboratory experiments.

Conclusion

The present study suggests that structure and function of CHK2 can be distributed by various nsSNPs. In native protein of CHK2 gene, out of 79 SNPs, seven major variants found were: p.Arg160Gly, p.Arg188Trp, p.Ile203Thr, p.Gly210Arg, p.Arg223Cys, p.Pro225His and p.Ser415Phe. Among seven most significant SNPs, 3 were highly conserved and 4 SNPs were averaged conserved residues. Among 7 most significant SNPs, 3 were predicted to be involved in post translational modifications. A variant of SerinePhenyl alanine at position 415 occurs in activation loop of protein-kinase domain of CHK2 protein hence is of particular concern as this is the functional domain of the protein. The one SNP p.Ser415Phe might interfere in interactions of CHK2 with ligand. Therefore, these nsSNPs can be strongly considered as key candidates in causing diseases related to CHK2 malfunction and hence will help in effective drug discovery and developing precision medicines. Wet lab experiments are needed to explore the effects of these polymorphisms on structure and function of protein.

Consurf prediction showing conservation profile of amino acids in CHK2 gene.

(PDF) Click here for additional data file.
  57 in total

Review 1.  The FHA domain.

Authors:  Daniel Durocher; Stephen P Jackson
Journal:  FEBS Lett       Date:  2002-02-20       Impact factor: 4.124

2.  A CHEK2 genetic variant contributing to a substantial fraction of familial breast cancer.

Authors:  Pia Vahteristo; Jirina Bartkova; Hannaleena Eerola; Kirsi Syrjäkoski; Salla Ojala; Outi Kilpivaara; Anitta Tamminen; Juha Kononen; Kristiina Aittomäki; Päivi Heikkilä; Kaija Holli; Carl Blomqvist; Jiri Bartek; Olli-P Kallioniemi; Heli Nevanlinna
Journal:  Am J Hum Genet       Date:  2002-07-28       Impact factor: 11.025

3.  Structure validation by Calpha geometry: phi,psi and Cbeta deviation.

Authors:  Simon C Lovell; Ian W Davis; W Bryan Arendall; Paul I W de Bakker; J Michael Word; Michael G Prisant; Jane S Richardson; David C Richardson
Journal:  Proteins       Date:  2003-02-15

4.  DNA damage-induced activation of p53 by the checkpoint kinase Chk2.

Authors:  A Hirao; Y Y Kong; S Matsuoka; A Wakeham; J Ruland; H Yoshida; D Liu; S J Elledge; T W Mak
Journal:  Science       Date:  2000-03-10       Impact factor: 47.728

5.  Heterozygous germ line hCHK2 mutations in Li-Fraumeni syndrome.

Authors:  D W Bell; J M Varley; T E Szydlo; D H Kang; D C Wahrer; K E Shannon; M Lubratovich; S J Verselis; K J Isselbacher; J F Fraumeni; J M Birch; F P Li; J E Garber; D A Haber
Journal:  Science       Date:  1999-12-24       Impact factor: 47.728

6.  Understanding human disease mutations through the use of interspecific genetic variation.

Authors:  M P Miller; S Kumar
Journal:  Hum Mol Genet       Date:  2001-10-01       Impact factor: 6.150

Review 7.  CHK2 kinase--a busy messenger.

Authors:  J Bartek; J Falck; J Lukas
Journal:  Nat Rev Mol Cell Biol       Date:  2001-12       Impact factor: 94.444

8.  The ATM-Chk2-Cdc25A checkpoint pathway guards against radioresistant DNA synthesis.

Authors:  J Falck; N Mailand; R G Syljuåsen; J Bartek; J Lukas
Journal:  Nature       Date:  2001-04-12       Impact factor: 49.962

9.  Single nucleotide polymorphisms and recombination rate in humans.

Authors:  M W Nachman
Journal:  Trends Genet       Date:  2001-09       Impact factor: 11.639

10.  Human non-synonymous SNPs: server and survey.

Authors:  Vasily Ramensky; Peer Bork; Shamil Sunyaev
Journal:  Nucleic Acids Res       Date:  2002-09-01       Impact factor: 16.971

View more
  4 in total

1.  Exploring the Structural and Functional Effects of Nonsynonymous SNPs in the Human Serotonin Transporter Gene Through In Silico Approaches.

Authors:  Md Arzo Mia; Md Nasir Uddin; Yasmin Akter; Lolo Wal Marzan
Journal:  Bioinform Biol Insights       Date:  2022-06-09

2.  A computational in silico approach to predict high-risk coding and non-coding SNPs of human PLCG1 gene.

Authors:  Safayat Mahmud Khan; Ar-Rafi Md Faisal; Tasnin Akter Nila; Nabila Nawar Binti; Md Ismail Hosen; Hossain Uddin Shekhar
Journal:  PLoS One       Date:  2021-11-18       Impact factor: 3.240

3.  Comprehensive Characterization of the Coding and Non-Coding Single Nucleotide Polymorphisms in the Tumor Protein p63 (TP63) Gene Using In Silico Tools.

Authors:  Shamima Akter; Shafaat Hossain; Md Ackas Ali; Md Ismail Hosen; Hossain Uddin Shekhar
Journal:  Biomolecules       Date:  2021-11-20

4.  Pathogenic nsSNPs that increase the risks of cancers among the Orang Asli and Malays.

Authors:  Nurul Ain Khoruddin; Mohd NurFakhruzzaman Noorizhab; Lay Kek Teh; Farida Zuraina Mohd Yusof; Mohd Zaki Salleh
Journal:  Sci Rep       Date:  2021-08-09       Impact factor: 4.379

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.