Literature DB >> 26435059

Phenotype Prediction of Pathogenic Nonsynonymous Single Nucleotide Polymorphisms in WFS1.

Xuli Qian1, Luyang Qin1, Guangqian Xing2, Xin Cao1.   

Abstract

Wolfram syndrome (WS) is a rare, progressive, neurodegenerative disorder that has an autosomal recessive pattern of inheritance. The gene for WS, wolfram syndrome 1 gene (WFS1), is located on human chromosome 4p16.1 and encodes a transmembrane protein. To date, approximately 230 mutations in WFS1 have been confirmed, in which nonsynonymous single nucleotide polymorphisms (nsSNPs) are the most common forms of genetic variation. Nonetheless, there is poor knowledge on the relationship between SNP genotype and phenotype in other nsSNPs of the WFS1 gene. Here, we analysed 395 nsSNPs associated with the WFS1 gene using different computational methods and identified 20 nsSNPs to be potentially pathogenic. Furthermore, to identify the amino acid distributions and significances of pathogenic nsSNPs in the protein of WFS1, its transmembrane domain was constructed by the TMHMM server, which suggested that mutations outside of the TMhelix could have more effects on protein function. The predicted pathogenic mutations for the nsSNPs of the WFS1 gene provide an excellent guide for screening pathogenic mutations.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26435059      PMCID: PMC4592972          DOI: 10.1038/srep14731

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Wolfram syndrome (WS) (MIM 222300), also known as DIDMOAD (diabetes insipidus, insulin-deficient diabetes mellitus, optic atrophy and deafness), is a rare neurodegenerative disorder of autosomal recessive inheritance, characterised by diabetes insipidus, insulin-deficient diabetes mellitus, optic atrophy and deafness. Of these symptoms, diabetes mellitus is the most common manifestation of WS with a median onset age of 6 years1 and always presents before the age of 162. The prevalence of WS is approximately 1/700,000 individuals in the UK, and 1/100,000 individuals in North America3. Since the first report for WS by Wolfram and Wagener in 19384, progressively more cases have been observed. Many studies have been performed to investigate the genetic basis of this hereditary disease and have identified that loss-of-function mutations in the WFS1 gene are the main cause of the syndrome5. WFS1, located on human chromosome 4p16.1, is composed of eight exons, of which only the first exon is a noncoding exon, and most mutations in WFS1 have been identified in exon 8 but also in exons 3, 4, 5 and 6678. WFS1 encodes the protein wolframin, which is abundantly expressed in pancreas, brain, heart, and muscle and is thought to be a novel endoplasmic reticulum (ER) calcium channel or a regulator of channel activity910. Additionally, wolframin appears to be involved in membrane trafficking, protein processing11, regulation of intracellular Ca2+ homeostasis12 and β-cell dysfunction1314. Mutations in the WFS1 gene may result in instability and a significantly reduced half-life of wolframin in the endoplasmic reticulum and then may cause disease15. To date, approximately 230 mutations in WFS1 have been reported (https://lovd.euro-wabb.org/home.php?select_db=WFS1). Although nsSNPs are the most common form of genetic variation in these mutations, the relationship between the genotype and phenotype of other nsSNPs in the WFS1 gene is unclear. Given the large number of nsSNPs in the WFS1 gene, it is expensive and time-consuming to experimentally explore the functional effects of these SNPs. The prediction of the phenotypic effects of nsSNPs based on different computational methods has become a well-known methodology1617, and several research articles have cited its effectiveness in identifying deleterious, disease-related mutations1819. In those methods, predicting pathogenic nsSNPs is based on identifying structural and functional damaging properties. This study will facilitate the investigation of the role of nsSNPs in WFS1 and identify pathogenic nsSNPs associated with the WFS1 gene based on different computational methods. Among these methods, the prediction of deleterious and damaging nsSNPs was performed by SIFT and PolyPhen-2. A support vector machine (SVM) along with the SIFT algorithm, PhD-SNP and MutPred were used to detect disease-associated nsSNPs. In addition, to identify the amino acid distributions and significances of pathogenic nsSNPs in the protein of WFS1, we constructed the transmembrane domain by the TMHMM server v2.0.

Results

SNP dataset from databases

The nsSNPs were collected from the NCBI dbSNP, HGMD, Deafness Variation Databases and the Locus Specific Database, in which the NCBI dbSNP database was the primary source, containing approximately 1,500 SNPs, and the other three were as supplemental. After filtering, a total of 395 nsSNPs were identified.

NsSNP prediction results of WFS1

To identify deleterious mutations from the nsSNPs in the WFS1 gene, the SIFT and PolyPhen-2 server were used to predict whether the mutations were deleterious/damaging. The SIFT server was used to calculate the tolerance index of all 395 collected nsSNPs with evolutionary conservation analysis, and a SIFT score value of <0.05 was considered to be deleterious. Meanwhile, we subjected all 395 nsSNPs to the PolyPhen-2 structure-based analysis server to further analyze the effects of amino acid substitutions (AAS) on the structures and functions. Of the 395 nsSNPs in the WFS1 gene, 174 nsSNPs were predicted to be deleterious by SIFT and the remaining nsSNPs were tolerated except for nonsense mutations for which SIFT provided no score. Among these deleterious nsSNPs, 32 mutations (P7L, G154A, W314R, P346L, Y351C, S353C, R375C, E394V, E394K, S430L, S430W, Y528D, P533S, A684V, A684T, A684G, C690R, C690G, G695V, Y699H, Y699C, Y699S, G702S, G702D, R708C, N714T, G736R, G736D, G736S, G834S, L842F and P885L) were reported to be highly deleterious with SIFT scores of 0.000. Obviously, in these highly deleterious nsSNPs, the mutation frequencies in the amino acid loci 394, 430, 684, 690, 699, 702 and 736 were higher than other loci. In PolyPhen-2, 235 nsSNPs were predicted to be damaging to protein structure and function, of which 89 mutations were predicted to be highly deleterious with PolyPhen-2 scores of 1.000. A total of 156 nsSNPs were predicted to be deleterious and damaging by both SIFT and PolyPhen-2 (Table 1) after excluding all nonsense mutations. Additionally, of these 156 nsSNPs, 28 nsSNPs (P346L, Y351C, S353C, R375C, E394V, E394K, S430L, S430W, Y528D, P533S, Y669H, Y669C, Y669S, A684T, A684G , A684V, C690R, C690G, G695V, G702D, G702S, R708C, G736D, G736R, G736S, G834S, L842F and P885L) were predicted to be highly deleterious and damaging by both algorithms with SIFT scores of 0.000 and PolyPhen-2 scores of 1 (Table 1).
Table 1

Deleterious and damaging nsSNPs of WFS1 prioritised using SIFT and PolyPhen-2 scores.

Amino Acid ChangeNucleotide VariationSIFT ScorePolyPhen-2 ScoreSNP ID*
R24HG/A0.0110.999rs71524364
T104IC/T0.0210.992 
G107EG/A0.0041rs71530914
G107RG/A0.0031WFS1_00227
Y110NT/A0.0230.999CM050353
D118AA/C0.0040.999rs71524349
A126TG/A0.0071rs145639028
G154AG/C00.996rs71530927
T156MC/T0.0021 
D171NG/A0.0490.953 
R177PG/C0.0101CM083208
A198VC/T0.0470.875rs142687752
E202GA/G0.0430.998WFS1_00230
D211NG/A0.0170.813rs138682654
R228HG/A0.0371rs150771247
E273KG/A0.0180.904rs142428158
P292SC/T0.0081CM992981
I296ST/G0.0030.688CM992982
W314RT/A00.999WFS1_00229
L327IC/A0.0131rs71537678
F329IT/A0.0310.99rs188848517
P346LC/T01CM073420
F350VT/G0.0450.999 
Y351CA/G01rs181988441
S353CC/G01rs143547567
C360YG/A0.0010.999rs147157374
T361IC/T0.0021WFS1_00075
R375CC/T01rs200095753
R375HG/A0.0031rs142671083
T378NC/A0.0070.999WFS1_00097
D389ET/G0.0070.978rs201282601
E394KG/A01rs373146435
E394VA/T01rs146563951
L402PT/C0.0011CM112216
H407RA/G0.0100.684rs140407862
V412AT/C0.0210.981rs144951440
F417ST/C0.0020.95rs111570388
I427ST/G0.0050.903CM073419
S430LC/T01WFS1_00218
S430WC/G01WFS1_00194
L432VC/G0.0271rs35031397
F439CT/G0.0020.913rs141585847
S443IG/T0.0020.997CM015195
T455MC/T0.0271rs139361521
R456CC/T0.0100.689rs144452795
E462GA/G0.0160.99rs398123066
E462GA/G0.0160.99 
C505YG/A0.0010.998CM031397
L506RT/G0.0030.95CM043878
L511PT/C0.0010.949 
Y513SA/C0.0360.98 
R517HG/A0.0240.986rs150394063
R517PG/C0.0220.904 
M518IG/A0.0130.978rs138232538
A519VC/T0.0471rs201557396
Y528DT/G01CM087003
P533SC/T01rs146132083
C537YG/A0.0030.999rs199910987
L543RT/G0.0031CM031400
V545MG/A0.0380.992rs201993978
V546DT/A0.0040.999CM031401
R558CC/T0.0011rs199946797
R558HG/A0.0021CM031402
A575GC/G0.0180.528rs71524360
G576SG/A0.0310.882rs1805069
V582MG/A0.0090.916rs377677092
R587WC/T0.0050.999rs138968466
L594RT/G0.0010.999rs200288171
A602EC/A0.0110.74rs2230720
A602GC/G0.0010.74 
P607LC/T0.0400.999rs373862003
P607RC/G0.0101CM033825
R611CC/T0.0080.999rs144993516
L637PT/C0.0021WFS1_00215
T641MC/T0.0180.985rs376626985
R653CC/T0.0071rs201064551
E655GA/G0.0060.999CM024439
E655KG/A0.0150.995CM108408
S662PT/C0.0041rs376341411
L664RT/G0.0011CM090453
T665IC/T0.0020.976 
T665NC/A0.0050.544rs138258392
T665PA/C0.0040.544rs369656458
Y669CA/G01CM983479
Y669HT/C01CM072120
Y669SA/C01CM090454
L672PT/C0.0260.998CM056420
G674EG/A0.0291CM020990
G674RG/A0.0241rs200672755
G674VG/T0.0131CM020991
R676CC/T0.0301rs201623184
W678LG/T0.0080.999CM073425
A684GC/G01 
A684TG/A01 
A684VC/T01rs387906930
R685CC/T0.0031rs112967046
R685PG/C0.0230.999CM081852
R685PG/C0.0230.999 
I688TT/C0.0020.999 
C690GT/G01CM087004
C690RT/C01CM992988
G695VG/T01rs28937891
T699MC/T0.0011rs28937894
W700CG/T0.0011CM992989
G702DG/A01CM090455
G702SG/A01rs71532862
R703CC/T0.0241rs201888856
K705NG/C0.0320.997CM032680
R708CC/T01rs200099217
R708HG/A0.0031rs369062548
D713GA/G0.0120.999rs143280847
N714TA/C00.998rs397517196
L723PT/C0.0011 
P724LC/T0.0021rs28937890
P724SC/T0.0431 
R732CC/T0.0071rs71526458
R732HG/A0.0181rs149013740
G736DG/A01rs71530912
G736RG/C01 
G736SG/A01rs71532864
Y739DT/G0.0061rs367737581
C742RT/C0.0101rs71532865
C742WC/G0.0021rs71532866
R756CC/T0.0021rs138127684
A761VC/T0.0310.818rs71526459
H763PA/C0.0140.995 
D771GA/G0.0111CM015267
D771HG/C0.0031CM052942
R772CC/T0.0051rs149540655
E776VA/T0.0011rs56002719
G780RG/C0.0460.989CM012813
G780SG/A0.0490.896rs387906931
R791CC/T0.0190.982rs200528166
K800EA/G0.0380.958rs55674815
L804PT/C0.0011WFS1_00226
S807RA/C0.0120.973CM020992
E809KG/A0.0420.999rs71539673
R818CC/T0.0141rs35932623
L829PT/C0.0011rs104893883
G831DG/A0.0121rs28937895
R832CC/T0.0101rs148089728
G834SG/A01rs398124214
L842FC/T01rs71530915
A844TG/A0.0470.973CM053436
A844VC/T0.0360.999rs200192011
R859PG/C0.0041CM052943
R859WC/T0.0011rs372298367
H860DC/G0.0070.96CM043881
I863MC/G0.0030.977rs71524393
E864KG/A0.0451rs74315205
R868CC/T0.0081rs148611943
R868HG/A0.0311rs56393026
A874TG/A0.0061rs200775335
K876TA/C0.0060.98rs144900514
P885LC/T01rs372855769
A889VC/T0.0240.855rs147934586

*In the SNP ID column, the nsSNPs with the prefix “rs” are from dbSNP, and those with the prefix “CM” and “WFS1_” are from HGMD and Locus Specific Database, respectively, and the remaining with no SNP ID are in the Deafness Variation Database. The nsSNPs highlighted in bold are predicted to be highly deleterious and damaging, with a SIFT score of 0, and PolyPhen-2 score of 1.

For further study, we used PhD-SNP and MutPred to investigate whether these 156 filtered deleterious and damaging nsSNPs were associated with disease. PhD-SNP is optimised to classify disease-causing point mutations from the given datasets, and MutPred is also a web application tool developed to classify an AAS as either disease-associated or neutral in humans but also predicts the molecular cause of disease/deleterious AASs. Of the 156 nsSNPs, 97 diseased-associated nsSNPs were predicted by PhD-SNP and 91 nsSNPs were predicted to be disease-associated by MutPred tools. But it is worth noting that some of the 28 mutations with scores of 0.000 for SIFT and 1.000 for Polyphen-2 in Table 1 like P346L, Y351C, G834S or L842F were not predicted as diseased-associated by both PhD-SNP and MutPred, this might be because the loci of these amino acid were conserved, but the mutants on these loci could not cause the molecular changes or affect the whole protein structure. Finally, 70 nsSNPs were predicted to be diseased-associated using both PhD-SNP and MutPred, in which the numbers of mutations predicted as very confident hypotheses, confident hypotheses and actionable hypotheses were 16, 33 and 21, respectively. The most common changes in the molecular mechanisms in the mutants predicted by MutPred were gains or losses of helixes and sheets. Representative diseased-associated nsSNPs and the corresponding AAS of nsSNPs in the WFS1 gene are provided in Table 2. After inspecting these mutations in their reference sources, most of the nsSNPs predicted have also been reported, demonstrating that the nsSNPs predicted were credible from multiple computational methods. Finally, we predicted 20 mutations (F329I, S353C, R375H, R375C, E394K, F439C, R517P, L594R, P607L, S662P, T665I, R732C, R732H, G736D, Y739D, C742R, R832C, R859W, R868C and A874T) to be potentially pathogenic mutations, and 50 other mutations had been previously published or cited (Table 2).
Table 2

Diseased-associated nsSNPs of WFS1 predicted using the PhD-SNP and MutPred servers.

Amino Acid Changeg Valuep ValueMolecular ChangePrediction ReliabilitySNP ID*Reported or not
Y110N0.8490.0133Gain of disorderConfident HypothesesCM050353Y41
R177P0.8170.0021Loss of MoRF bindingVery Confident HypothesesCM083208Y42
P292S0.9420.0093Gain of helixVery Confident HypothesesCM992981Y20
I296S0.8670.0051Gain of loopVery Confident HypothesesCM992982Y20
W314R0.8840.0162Gain of methylation at W314Confident HypothesesWFS1_00229Y43
F329I0.7740.0344Gain of sheetActionable Hypothesesrs188848517N
S353C0.5020.0266Gain of sheetActionable Hypothesesrs143547567N
R375H0.6700.0444Loss of helixActionable Hypothesesrs142671083N
R375C0.6690.0444Loss of helixActionable Hypothesesrs200095753N
E394V0.8110.0425Gain of helixConfident Hypothesesrs146563951Y44
E394K0.8260.0176Gain of methylation at E394Confident Hypothesesrs373146435N
L402P0.6790.0215Gain of relative solvent accessibilityActionable HypothesesCM112216Y23
I427S0.8280.0082Gain of disorderVery Confident HypothesesCM073419Y45
S430L0.7930.0203Loss of loopConfident HypothesesWFS1_00218Y22
S430W0.7900.0266Gain of sheetConfident HypothesesWFS1_00194Y23
F439C0.8350.0357Loss of sheetConfident Hypothesesrs141585847N
S443I0.8360.0221Gain of sheetConfident HypothesesCM015195Y21
C505Y0.9750.0062Loss of catalytic residue at P504Very Confident HypothesesCM031397Y46
L506R0.8580.0196Loss of helixConfident HypothesesCM043878Y47
L511P0.7480.0016Gain of sheetActionable Hypotheses Y25
R517P0.5340.0072Loss of helixActionable Hypotheses N
Y528D0.9390.0037Loss of sheetVery Confident HypothesesCM087003Y48
P533S0.8860.0228Loss of sheetConfident Hypothesesrs146132083Y44
L543R0.7680.0228Loss of sheetActionable HypothesesCM031400Y46
V546D0.8280.0037Loss of sheetVery Confident HypothesesCM031401Y46
R558C0.8900.0296Loss of methylation at R558Confident Hypothesesrs199946797Y49
R558H0.9500.0296Loss of methylation at R558Confident HypothesesCM031402Y46
L594R0.6880.0344Gain of sheetActionable Hypothesesrs200288171N
P607L0.7480.0022Gain of helixActionable Hypothesesrs373862003N
P607R0.9540.0005Gain of MoRF bindingVery Confident HypothesesCM033825Y50
L637P0.6830.0072Loss of helixActionable HypothesesWFS1_00215Y51
E655G0.7560.0187Loss of solvent accessibilityActionable HypothesesCM024439Y44
E655K0.8110.0049Gain of MoRF bindingVery Confident HypothesesCM108408Y52
S662P0.8160.0312Gain of loopConfident Hypothesesrs376341411N
L664R0.9260.0090Gain of MoRF bindingVery Confident HypothesesCM090453Y53
T665I0.8210.0117Gain of helixConfident Hypotheses N
L672P0.8740.0076Loss of helixVery Confident HypothesesCM056420Y54
G674R0.9640.0328Gain of MoRF bindingConfident Hypothesesrs200672755Y55
G674V0.9580.0325Gain of helixConfident HypothesesCM020991Y56
W678L0.9330.0132Loss of catalytic residue at A677Confident HypothesesCM073425Y57
A684V0.7550.0104Loss of helixActionable Hypothesesrs387906930Y21
R685P0.8590.0033Loss of helixVery Confident Hypotheses Y58
C690R0.9450.0008Gain of MoRF bindingVery Confident HypothesesCM992988Y20
C690G0.9550.0115Gain of disorderConfident HypothesesCM087004Y48
G695V0.9110.0036Gain of sheetVery Confident Hypothesesrs28937891Y6
H696Y0.7640.0390Gain of sheetActionable HypothesesWFS1_00098Y59
W700C0.9420.0157Loss of MoRF bindingConfident HypothesesCM992989Y20
G702S0.8870.0315Loss of sheetConfident Hypothesesrs71532862Y23
G702D0.960.0315Loss of sheetConfident HypothesesCM090455Y53
R708C0.9210.0182Loss of MoRF bindingConfident Hypothesesrs200099217Y21
L723P0.7310.0045Gain of loopActionable Hypotheses Y23
P724L0.9260.0336Loss of catalyticresi due at P724Confident Hypothesesrs28937890Y6
R732H0.8550.0444Loss of helixConfident Hypothesesrs149013740N
R732C0.8480.0376Loss of helixConfident Hypothesesrs71526458N
G736D0.9340.0425Gain of helixConfident Hypothesesrs71530912N
G736R0.9650.0117Gain of helixConfident Hypotheses Y60
Y739D0.7360.0332Gain of disorderActionable Hypothesesrs367737581N
C742R0.8140.013Gain of disorderConfident Hypothesesrs71532865N
E776V0.9390.050Gain of MoRF bindingConfident Hypothesesrs56002719Y47
L804P0.7680.0063Loss of sheetActionable HypothesesWFS1_00226Y26
L829P0.9280.0079Gain of loopVery Confident Hypothesesrs104893883Y61
G831D0.9230.0143Gain of helixConfident Hypothesesrs28937895Y61
R832C0.5050.0228Loss of sheetActionable Hypothesesrs148089728N
R859W0.5960.0152Loss of disorderActionable Hypothesesrs372298367N
R859P0.8530.0315Loss of sheetConfident HypothesesCM052943Y27
H860D0.7690.0104Loss of sheetActionable HypothesesCM043881Y47
E864K0.9010.0016Gain of MoRF bindingVery Confident Hypothesesrs74315205Y62
R868C0.8430.0179Loss of disorderConfident Hypothesesrs148611943N
A874T0.7690.0061Gain of sheetActionable Hypothesesrs200775335N
P885L0.9530.0117Gain of helixConfident Hypothesesrs372855769Y20

*In the SNP ID column, the nsSNPs with the prefix “rs” are from dbSNP, and those with the prefix “CM” and “WFS1_” are from HGMD and Locus Specific Database, respectively, and the remaining with no SNP ID are in the Deafness Variation Database.The nsSNPs highlighted in bold are potential pathogenic nsSNPs which have not been reported.

Additionally, to better understand how the pathogenic nsSNPs affect protein conformation and result in disease states, we constructed wild type and mutant proteins via the Robetta and SWISS-MODEL tools (Fig. 1, Supplementary file 1–4). And the geometric evaluations of the modeled 3D structure were performed using PROCHECK by calculating the Ramachandran plot (Fig. 2). The wild type protein showed 99.4% of residues in most favoured and allowed region and the overall average of G factors was 0.27 which showed the structure was usual. In this step, we randomly selected three predicted nsSNPs (P292S, S443I and G695V) that have been reported to be pathogenic62021 and compared the structures between the wild type and mutant proteins. We observed that after mutation, not only did the amino acid change, but it also affected the entire protein structure. All of the three protein structures (P292S, S443I and G695V) representing different mutations gained or lost some α-helixes, suggesting a potential molecular mechanism resulting in WS.
Figure 1

Protein structure predicted by the SWISS-MODEL server.

(A,B) indicate the changes between wild type and mutant wolframin with the amino acid change P292S, (C,D) depict the structural changes between wild type and mutant S443I, and E and F illustrate the effects of G695V. (A,C,E) are protein structures of the wild type wolframin, and (B,D,F) are structures of the mutant proteins (created by SWISS-MODEL and illustrated with VMD). The arrows in yellow and the circles in red indicate the differences between the wild type and the mutant.

Figure 2

Ramachandran Plot of the wild type wolframin protein structure evaluated by PROCHECK.

Amino acid distribution in the transmembrane domain

To elucidate the amino acid distributions and significances of predicted pathogenic nsSNPs in wolframin, we constructed its transmembrane domain using the TMHMM server v2.0 (Fig. 3). In this analysis, the transmembrane domain of wolframin was divided into 9 TMhelixes, with each TMhelix being approximately 23-amino acids long. Except for the third and seventh TMhelix, 18 pathogenic mutations were distributed across the other seven TMhelixes, accounting for 25.71% of all 70 pathogenic mutations, of which 13 were previously known. Notably, most pathogenic mutations in our study were not located in the transmembrane domain but in the C-terminal domain of wolframin (Table 3). In all 70 pathogenic mutations, approximately 52 were not located in the TMhelix (74.29%), 39 of which were located in the C-terminal domain. Thirty-seven pathogenic mutations have been previously reported in the 52 mutations not located in the TMhelix, and only 15 mutations were predicted to be potentially pathogenic.
Figure 3

Transmembrane domain structure of wolframin and its distribution of mutations22.

The 70 predicted pathogenic mutations are highlighted with green/red coloured circles compared to “normal” sequence with blue circles . The 50 known pathogenic mutations are depicted in green and the 20 predicted potentially pathogenic mutations are in red . The transmembrane domain is depicted in yellow . The circle with green and red denotes that the locus has a known and predicted mutation.

Table 3

NsSNP distributions of the transmembrane domain of wolframin from the TMHMM server.

Distribution of Transmembrane DomainRange of Amino AcidNumber of Reported Pathogenic nsSNPsNumber of Predicted Pathogenic nsSNPsTotal Number of nsSNPs in Each DomainRatio of Each Domain (%)
Outside1–3104045.714
TMhelix 1311–3331122.857
Inside334–3390000
TMhelix 2340–3620111.429
Outside363–4042357.142
TMhelix 3405–4220000
Inside423–4281011.429
TMhelix 4429–4513145.714
Outside452–4920000
TMhelix 5493–5153034.286
Inside516–5260111.429
TMhelix 6527–5494045.714
Outside550–5582022.857
TMhelix 7559–5810000
Inside582–5870000
TMhelix 8588–6101234.286
Outside611–6290000
TMhelix 9630–6521011.429
Inside*653–89028113955.714
Total890-amino acids502070100

*The domain highlighted in bold is the distribution of the C terminal domain.

Discussion

WS is a rare autosomal recessive disorder with a number of loss-of-function mutations of the WFS1, both within and between most affected patients/families. Wide tissue distribution of wolframin and many mutations in WFS1 resulting in WS may contribute to different phenotypes. Growing evidences have presented many clinical signs and possible correlations between the genotype and the development of the neurologic manifestations, the age at onset of diabetes mellitus, hearing defects, and diabetes insipidus in WS on the cohort of WS patients2223. So far, although a large number of variants of the WFS1 gene have been identified, novel mutations are continuously found in this gene. Furthermore, the pathogenic role of different mutations, polymorphisms and sequencing variants of the gene remains largely unknown. Phenotypic prediction of the effects of nsSNPs might identify meaningful changes in genes that alter protein function to induce phenotypic consequences. The sheer number of SNPs in online databases provides an abundant resource to predict the phenotypic effects of nsSNPs, and known pathogenic mutations from the literature provide us an opportunity to inspect prediction accuracy, which indicates whether the relationships between nsSNP prediction results and known pathogenic mutations are confirmed by in vivo and in vitro experiments. In the present study, we predicted 20 potentially pathogenic mutations and 50 known pathogenic mutations using in silico methods, and combined the results of the most common changes by MutPred and the predictions of the three protein structures by the SWISS-MODEL to determine that the most probable mutational effects causing WS might be the gains or losses of α-helixes. It is worth to consider that some predicted pathogenic nsSNPs have been confirmed by in vitro functional studies and genetic analysis for WS families, which could indirectly verify the accuracy of our methods. For example, p.P724L(c.2171C>T) and p.G695V(c.2084G>T) of WFS1 have been reported to lead to WS and which cause the formation of detergent-insoluble aggregates of wolframin when was expressed in COS-7 cells24; the p.A684V(c.2051C>T) and p.L511P (c.1532T>C) were ectopically expressed in HEK293 cells which showed reduced protein levels compared to wild type wolframin, strongly indicating that the mutation is disease-causing2125. Meanwhile, by direct DNA sequencing and linkage analysis, p.L804P (c.2411T>C) and p.R859P (c.2576G>C) were identified after screening the entire coding region of the WFS1 gene in a Chinese WS family and in a US family with the nonsyndromic hearing loss, respectively2627. WFS1 spanning approximately 33.4 kb of genomic DNA, consists of eight exons and produces a peptide product which is 890-amino acid long (wolframin). The amino acid distribution results of wolframin suggest that wolframin contains 9 transmembrane domains. These results are consistent with the previous research which provides experimental evidence that wolframin contains 9 transmembrane segments and is embedded in the membrane in an Ncyt/Clum topology15. However, the prediction for wolframin available at UniProt database gives 11 transmembrane domains (http://www.uniprot.org/uniprot/O76024) (Table 4), and the difference between the two predicted results was mainly in the TMhelix 5, TMhelix 6 and TMhelix 11. In our result, the 493–515 amino acids are located in TMhelix5; while in UniProt, this region has been divided into TMhelix 5 and TMhelix 6 domains, respectively; the 653–890 amino acids have also been predicted as two TMhelixes in the same way in the UniProt. With reference to most researches, the wolframin were considered as 9 transmembrane domains with some evidences, and this is due to the differences in the execution of algorithm. Additionally, our results also indicate that the mutations outside of the TMhelix could have more pronounced functional effects, especially in the C-terminal with 39 predicted mutations. Many of the reported missense mutations are located in the C-terminal hydrophilic part of the protein15, and the experiments also support these predictions. Just as de Heredia et al. found that besides the transmembrane domains, the mutations identified in WS patients also concentrate in the last 100 amino acids in the C-terminal1. Using yeast two-hybrid analysis, Zatyka et al. identified that the C-terminal domain of wolframin, which is positioned in the ER lumen, bound the C-terminal domain (amino acids 652–890) of the ER-localized Na+/K+ ATPase beta-1 subunit (ATP1B1)28. And the Na+/K+ ATPase deficiency has a crucial role in apoptosis and in neural degenerative disease which can be induced by mutations in WFS1, leading to the development of WS29.
Table 4

The prediction results to the transmembrane domain of wolframin from the TMHMM server and UniProt database.

TMHMM server
UniProt database
Distribution of Transmembrane DomainRange of Amino AcidDistribution of Transmembrane DomainRange of Amino Acid
Outside1–310Outside1–313
TMhelix-1311–333TMhelix-1314–334
Inside334–339Inside335–339
TMhelix-2340–362TMhelix-2340–360
Outside363–404Outside361–401
TMhelix-3405–422TMhelix-3402–422
Inside423–428Inside423–426
TMhelix-4429–451TMhelix-4427–447
Outside452–492Outside448–464
TMhelix-5493–515TMhelix-5465–485
  Inside486–495
  TMhelix-6496–516
Inside516–526Outside517–528
TMhelix-6527–549TMhelix-7529–549
Outside550–558Inside550–562
TMhelix-7559–581TMhelix-8563–583
Inside582–587Outside584–588
TMhelix-8588–610TMhelix-9589–609
Outside611–629Inside610–631
TMhelix-9630–652TMhelix-10632–652
Inside*653–890Topological domain653–869
  TMhelix-11870–890
Total890-amino acidsTotal890-amino acids

*The domains highlighted in bold are the distributions of the C terminal domain.

In summary, we used extensive functional and structural level analyses to predict potentially pathogenic mutations for nsSNPs in the WFS1 gene and analysed the amino acid distributions of wolframin to provide a guide for screening pathogenic mutations and investigating the function of wolframin. Furthermore, we provide information for predicting the effects of nsSNPs in genes encoding transmembrane proteins and for further research in variant effect prediction.

Materials and Methods

Dataset collection

NsSNP datasets of the WFS1 gene were obtained from the NCBI dbSNP database (http://www.ncbi.nlm.nih.gov/projects/SNP/)30, HGMD (http://www.hgmd.cf.ac.uk/ac)31, Deafness Variation Database (http://deafnessvariationdatabase.org) and the Locus Specific Database (https://lovd.euro-wabb.org/home.php?select_db=WFS1). The amino acid sequence of wolframin was retrieved from the UniProt database (http://www.uniprot.org/). Data for the WFS1 gene were collected from Entrez Gene on the NCBI web site (http://www.ncbi.nlm.nih.gov/genbank/), and the literature search was performed using PubMed, Science Direct, and Web of Science.

Filtering and mining of nsSNPs

Because SNPs from the databases were not initially nsSNPs, we needed to perform some manual filtering. In this process, we eliminated SNPs in 3′ or 5′UTRs and synonymous SNPs. For prediction and analysis, SNP ID, gene name, protein accession, amino acid residue 1 (wild type), amino acid position, and amino acid residue 2 (missense) for all nsSNPs were collected from the NCBI dbSNP database, HGMD, and Deafness Variation Databases.

Predicting the phenotype of nsSNPs with the SIFT and PolyPhen-2 tools

After filtering the nsSNPs, we predicted their functional effects with the SIFT (http://sift-dna.org) and PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/) tools. In SIFT server, a highly conserved position is more likely to be deleterious with a SIFT score <0.05, whereas a tolerant mutation will have a SIFT score >0.053233. PolyPhen-2 extracts various sequence- and structure-based features of the substitution site and inputs them into a probabilistic classifier based on a given AAS and protein accession. The mutation is appraised qualitatively, as benign, possibly damaging, or most likely damaging34.

Identifying disease-associated nsSNPs using the PhD-SNP and MutPred tools

PhD-SNP (http://snps.biofold.org/phd-snp) and MutPred (http://mutpred.mutdb.org/) were based on a support vector machine (SVM) and the SIFT algorithm. To PhD-SNP, in briefly, after inputting the protein sequence, position and new residue, the substitution from the wild type residue to the mutant is encoded in a 20-element vector that is −1 in position relative to the wild type residue, 1 in the position relative to the mutant residues and 0 in the remaining 18 positions. Next, a second 20-element vector encoding the sequence environment is constructed to report the occurrence of residues in a window of 19 residues around the mutated residue. With this supervised learning approach, a given mutation is classified as disease or neutral3536. MutPred is based on SIFT scores, the gain or loss of 14 different structural and functional properties. Two important scores are contained in the output of MutPred: a general score (g), and top 5 property score (p). The general score (g) indicates the probability that the AAS is deleterious/disease-associated, whereas the top 5 property score (p) is the P-value that indicates whether certain structural and functional properties are affected. The combinations of high general scores and low property scores are referred to as actionable hypotheses, confident hypotheses, and very confident hypotheses37.

Protein structure prediction of pathogenic nsSNPs via Robetta and SWISS-MODEL tools

As the structure of wolframin is not available and there is not suitable template for modelling, so we choose the Robetta server (http://robetta.bakerlab.org/) to construct the protein structure. The Robetta server is a full chain protein structure prediction server for ab initio and comparative modeling, and the SWISS-MODEL (http://swissmodel.expasy.org/) is a fully automated, dedicated protein structure homology-modelling server3839. The amino acid sequence of wolframin was retrieved from NCBI (accession number: NP_005996.2). 3D-structure of wolframin was performed using Robetta server. And the mutant proteins were constructed by SWISS-MODEL with the template performed using Robetta server (Sup.file S). The quality of the modelled structure of native and mutant protein was evaluated by the PROCHECK (http://services.mbi.ucla.edu/SAVES/).

Analysis of the transmembrane domain by the TMHMM server v2.0

TMHMM server v2.0 (http://www.cbs.dtu.dk/services/TMHMM/), based on a hidden Markov model (HMM) with an architecture that corresponds closely to the biological system, is a membrane protein topology prediction method. Compared with other servers, TMHMM server v2.0, which is thought to be currently the best performing transmembrane prediction program, can model and predict the location and orientation of alpha helices in membrane-spanning proteins with high accuracy40.

Additional Information

How to cite this article: Qian, X. et al. Phenotype Prediction of Pathogenic Nonsynonymous Single Nucleotide Polymorphisms in WFS1. Sci. Rep. 5, 14731; doi: 10.1038/srep14731 (2015).
  58 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  Mutations in the Wolfram syndrome 1 gene (WFS1) are a common cause of low frequency sensorineural hearing loss.

Authors:  I N Bespalova; G Van Camp; S J Bom; D J Brown; K Cryns; A T DeWan; A E Erson; K Flothmann; H P Kunst; P Kurnool; T A Sivakumaran; C W Cremers; S M Leal; M Burmeister; M M Lesperance
Journal:  Hum Mol Genet       Date:  2001-10-15       Impact factor: 6.150

3.  Evaluation of methods for the prediction of membrane spanning regions.

Authors:  S Möller; M D Croning; R Apweiler
Journal:  Bioinformatics       Date:  2001-07       Impact factor: 6.937

4.  Mutation screening of the Wolfram syndrome gene in psychiatric patients.

Authors:  R Torres; E Leroy; X Hu; A Katrivanou; P Gourzis; A Papachatzopoulou; A Athanassiadou; S Beratis; D Collier; M H Polymeropoulos
Journal:  Mol Psychiatry       Date:  2001-01       Impact factor: 15.992

Review 5.  WFS1/wolframin mutations, Wolfram syndrome, and associated diseases.

Authors:  F Khanim; J Kirk; F Latif; T G Barrett
Journal:  Hum Mutat       Date:  2001-05       Impact factor: 4.878

6.  Identification of novel WFS1 mutations in Italian children with Wolfram syndrome.

Authors:  A Tessa; I Carbone; M C Matteoli; C Bruno; C Patrono; I P Patera; F De Luca; R Lorini; F M Santorelli
Journal:  Hum Mutat       Date:  2001-04       Impact factor: 4.878

7.  Molecular characterization of WFS1 in patients with Wolfram syndrome.

Authors:  Johannes M W van ven Ouweland; Kim Cryns; Ronald J E Pennings; Inge Walraven; George M C Janssen; J Antonie Maassen; Bernard F E Veldhuijzen; Alexander B Arntzenius; Dick Lindhout; Cor W R J Cremers; Guy Van Camp; Lambert D Dikkeschei
Journal:  J Mol Diagn       Date:  2003-05       Impact factor: 5.568

8.  Identification of p.A684V missense mutation in the WFS1 gene as a frequent cause of autosomal dominant optic atrophy and hearing impairment.

Authors:  Nanna D Rendtorff; Marianne Lodahl; Houda Boulahbel; Ida R Johansen; Arti Pandya; Katherine O Welch; Virginia W Norris; Kathleen S Arnos; Maria Bitner-Glindzicz; Sarah B Emery; Marilyn B Mets; Toril Fagerheim; Kristina Eriksson; Lars Hansen; Helene Bruhn; Claes Möller; Sture Lindholm; Stefan Ensgaard; Marci M Lesperance; Lisbeth Tranebjaerg
Journal:  Am J Med Genet A       Date:  2011-04-28       Impact factor: 2.802

9.  Is there a relationship between Wolfram syndrome carrier status and suicide?

Authors:  Joanna Crawford; Marta A Zielinski; Laura J Fisher; Grant R Sutherland; Robert D Goldney
Journal:  Am J Med Genet       Date:  2002-04-08

10.  Mutations in the WFS1 gene that cause low-frequency sensorineural hearing loss are small non-inactivating mutations.

Authors:  Kim Cryns; Markus Pfister; Ronald J E Pennings; Steven J H Bom; Kris Flothmann; Goele Caethoven; Hannie Kremer; Isabelle Schatteman; Karen A Köln; Tímea Tóth; Susan Kupka; Nikolaus Blin; Peter Nürnberg; Holger Thiele; Paul H van de Heyning; William Reardon; Dafydd Stephens; Cor W R J Cremers; Richard J H Smith; Guy Van Camp
Journal:  Hum Genet       Date:  2002-04-09       Impact factor: 4.132

View more
  7 in total

1.  Variants in WFS1 and Other Mendelian Deafness Genes Are Associated with Cisplatin-Associated Ototoxicity.

Authors:  Heather E Wheeler; Eric R Gamazon; Robert D Frisina; Carlos Perez-Cervantes; Omar El Charif; Brandon Mapes; Sophie D Fossa; Darren R Feldman; Robert J Hamilton; David J Vaughn; Clair J Beard; Chunkit Fung; Christian Kollmannsberger; Jeri Kim; Taisei Mushiroda; Michiaki Kubo; Shirin Ardeshir-Rouhani-Fard; Lawrence H Einhorn; Nancy J Cox; M Eileen Dolan; Lois B Travis
Journal:  Clin Cancer Res       Date:  2016-12-30       Impact factor: 12.531

2.  Mild Phenotype of Wolfram Syndrome Associated With a Common Pathogenic Variant Is Predicted by a Structural Model of Wolframin.

Authors:  Adi Wilf-Yarkoni; Oded Shor; Avi Fellner; Mark Andrew Hellmann; Elon Pras; Hagit Yonath; Shiri Shkedi-Rafid; Lina Basel-Salmon; Lili Bazak; Ruth Eliahou; Lior Greenbaum; Hadas Stiebel-Kalish; Felix Benninger; Yael Goldberg
Journal:  Neurol Genet       Date:  2021-03-19

3.  Whole-exome sequencing identified a missense mutation in WFS1 causing low-frequency hearing loss: a case report.

Authors:  Hye Ji Choi; Joon Suk Lee; Seyoung Yu; Do Hyeon Cha; Heon Yung Gee; Jae Young Choi; Jong Dae Lee; Jinsei Jung
Journal:  BMC Med Genet       Date:  2017-12-19       Impact factor: 2.103

4.  Multiomic analysis on human cell model of wolfram syndrome reveals changes in mitochondrial morphology and function.

Authors:  Wojciech Mlynarski; Wojciech Fendler; Agnieszka Zmyslowska; Miljan Kuljanin; Beata Malachowska; Marcin Stanczak; Dominika Michalek; Aneta Wlodarczyk; Dagmara Grot; Joanna Taha; Bartłomiej Pawlik; Magdalena Lebiedzińska-Arciszewska; Hanna Nieznanska; Mariusz R Wieckowski; Piotr Rieske; Joseph D Mancias; Maciej Borowiec
Journal:  Cell Commun Signal       Date:  2021-11-20       Impact factor: 5.712

5.  Identification of a novel WFS1 homozygous nonsense mutation in Jordanian children with Wolfram syndrome.

Authors:  Khaldon Bodoor; Osama Batiha; Ayman Abu-Awad; Khaldon Al-Sarihin; Haya Ziad; Yousef Jarun; Aya Abu-Sheikha; Sara Abu Jalboush; Khoulod S Alibrahim
Journal:  Meta Gene       Date:  2016-07-16

6.  A homozygous missense mutation of WFS1 gene causes Wolfram's syndrome without hearing loss in an Iranian family (a report of clinical heterogeneity).

Authors:  Shahram Torkamandi; Somaye Rezaei; Reza Mirfakhraie; Sahar Bayat; Samira Piltan; Milad Gholami
Journal:  J Clin Lab Anal       Date:  2020-05-17       Impact factor: 2.352

7.  Genetic Spectrum and Characteristics of Hereditary Optic Neuropathy in Taiwan.

Authors:  Chao-Wen Lin; Ching-Wen Huang; Allen Chilun Luo; Yuh-Tsyr Chou; Yu-Shu Huang; Pei-Lung Chen; Ta-Ching Chen
Journal:  Genes (Basel)       Date:  2021-08-31       Impact factor: 4.096

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.