Literature DB >> 28324312

Computational pipeline to identify and characterize functional mutations in ornithine transcarbamylase deficiency.

R Magesh1, C George Priya Doss2.   

Abstract

Ornithine transcarbamylase (OTC) (E.C. 2.1.3.3) is one of the enzymes in the urea cycle, which involves in a sequence of reactions in the liver cells. During protein assimilation in our body surplus nitrogen is made, this open nitrogen is altered into urea and expelled out of the body by kidneys, in this cycle OTC helps in the conversion of free toxic nitrogen into urea. Ornithine transcarbamylase deficiency (OTCD: OMIM#311250) is triggered by mutation in this OTC gene. To date more than 200 mutations have been noted. Mutation in OTC gene indicates alteration in enzyme production, which upsets the ability to carry out the chemical reaction. The computational analysis was initiated to identify the deleterious nsSNPs in OTC gene in causing OTCD using five different computational tools such as SIFT, PolyPhen 2, I-Mutant 3, SNPs&Go, and PhD-SNP. Studies on the molecular basis of OTC gene and OTCD have been done partially till date. Hence, in silico categorization of functional SNPs in OTC gene can provide valuable insight in near future in the diagnosis and treatment of OTCD.

Entities:  

Keywords:  I-Mutant 3; OTC; OTCD; PhD-SNP; PolyPhen 2; SIFT; SNPs&Go

Year:  2014        PMID: 28324312      PMCID: PMC4235886          DOI: 10.1007/s13205-014-0216-y

Source DB:  PubMed          Journal:  3 Biotech        ISSN: 2190-5738            Impact factor:   2.406


Introduction

Ornithine transcarbamylase (OTC) catalyzes the formation of citrulline from carbamoyl phosphate and l-ornithine in the urea cycle, deleterious mutations in the human OTC gene disrupts the formation and produces clinical hyperammonemia, which can also lead to encephalopathy with subsequent neurological symptoms or even death. Ornithine transcarbamylase deficiency (OTCD) is the most common inborn error of urea cycle showing X-linked inheritance, which occurs at a predictable frequency of 1 in 14,000 births. Affected individuals show elevated levels of ammonia in their plasma and amplified urinary flow of orotic acid (Lopes-Marques et al. 2012). Males with OTCD show neonatal ammonia intoxication with severe or fatal neurological damage. Those with limited enzymatic OTCD may perhaps have a normal life span, but are at the peak intended for stress-induced hyperammonemic emergencies and incremental neurological damage. Females are carriers who might be asymptomatic, but often show some amount of protein intolerance (Maddalena et al. 1988). The human OTC gene is found on the short arm of the X chromosome with its cytogenetic location being Xp21.1. The size of the gene is 73 kb with an open reading frame of 1,062 nucleotides and holds 10 exons interjected by 9 introns of highly variable size. The OTC gene is expressed entirely in the liver and small intestinal mucosa. It translates a precursor OTC protein containing 354 amino acids and the amino end contains a spearhead sequence of 32 amino acids, which is cleaved in two steps upon integration into the mitochondrial matrix (Ogino et al. 2007). A polymorphism is a germline variation in the nucleotide base of the DNA molecule. As a rule of thumb inheritable variation is termed, a polymorphism if it is present at an allele frequency greater than 1 % in the general population, otherwise, at lower frequencies, it is considered as germline mutation (Strachan and Read 1996). Genetic polymorphisms are present throughout the genome of human. The most common type of polymorphism is single nucleotide polymorphism (SNP) that can occur in the frequency of about 1 out of every 300 nucleotide base pairs, and there are probably more than 10 million SNPs in the human population (The international HapMap and Consortium 2006). Polymorphisms can occur in both coding and non-coding region of the genes and may sometimes, particularly those within exons, have an impact on the structure and function of the protein coded by a particular gene, especially in those cases when the polymorphism leads to an amino acid substitution in evolutionarily conserved functional region of the protein. A polymorphism that takes to an amino acid substitution and is present within an active site of an enzyme, at a substrate-binding site, a DNA-binding site or in other areas of the protein domains may affect the function of the encoded protein. This is particularly correct if the substituted amino acid has a different 3D structure or electrical charge than the wild-type amino acid, as this will alter the conformation or affinity of the enzyme, and make it non-functional, or more or less efficient than the wild-type protein (AliOsman et al. 1997; Hadi et al. 2000; Matullo et al. 2001; Pemble et al. 1994). The loss of stability of proteins is one of the foremost causes of disease. As the proteins are only marginally stable, even small effects on stability alter the thermodynamic equilibrium to make the folded state unstable. Mutational data show that mutations often, if not in the majority of cases, cause significant changes to protein stability which are often on the order of magnitude of the absolute stability of the protein (Guerois et al. 2002). Lowered stability leads to a reduction in a protein’s effective concentration, which in turn causes deficiencies in its ability to perform its biochemical function (Pakula et al. 1986). Mutations in this OTC gene are the main reason for OTCD. Deleterious non-synonymous single nucleotide polymorphism (nsSNP) analysis for the OTC gene has not been projected computationally until now, while they are the center for new investigators. Therefore, in this work, the computational methods namely SIFT, PolyPhen 2, I-Mutant 3, SNPs&Go, and PhD-SNP were used to identify the deleterious nsSNPs that are expected to be affecting the function and structure of the OTC protein.

Materials and methods

Dataset used for SNP annotation

Human OTC gene information data were collected from Online Mendelian Inheritance in Man (OMIM) (Amberger et al. 2009) and Entrez Gene on National Centre for Biological Information (NCBI). The SNP information of CBS was retrieved from the NCBI dbSNP (Sherry et al. 2001), and SWISS-Prot databases (Amos and Rolf 1996). Protein 3D structure was obtained from protein data bank (PDB) (Berman et al. 2000).

Sorting intolerant from tolerant (SIFT)

Sequence homology-based tool SIFT predicts the functional importance of amino acid substitution based on the alignment of highly similar orthologous and/or paralogous protein sequences. SIFT scores were designated as intolerant (0.00–0.05), potentially intolerant (0.051–0.10), borderline (0.101–0.20), or tolerant (0.201–1.00) (Kumar et al. 2009).

PolyPhen 2

PolyPhen2 (Polymorphism Phenotyping) predicts the functional effect of amino acid changes by considering evolutionary conservation, the physico-chemical differences, and the proximity of the substitution to predicted functional domains and/or structural features. A mutation is classified as “probably damaging” if the probabilistic score is above 0.85–1, mutation is classified as “possibly damaging” if the probabilistic score is above 0.15–0.84, and the remaining mutations are classified as benign (Adzhubei et al. 2010).

I-Mutant 3

SVM-based method I-Mutant 3 predicts the protein stability changes upon a single point mutation. It provides free energy change (DDG), which is calculated from the unfolding Gibbs free energy change of the mutated protein minus the unfolding free energy value of the native protein (Kcal/mol). It classifies the predictions in three classes: If DDG is <−0.5 = large decrease of stability, If DDG is between −0.5 and 0.5 = neutral stability and If DDG is >0.5 = large increase of stability (Capriotti et al. 2005).

SNPs&GO

It is a method based on SVMs that predict disease-associated mutations from protein sequence, evolutionary information and functions as encoded in the gene ontology terms. Moreover, it is a server for the predicting single point mutations, which cause disease in humans (Calabrese et al. 2009).

PhD-SNP

PhD-SNP uses SVM-Sequence method and SVM profile to classify the mutation into disease related and neutral polymorphisms. It predicts if the given nsSNP has pathological effect based on the local sequence environment of the mutation. It uses the most accurate mode that enables both sequence and evolutionary profiles (Capriotti et al. 2006).

Structural analysis

To evaluate the structural stability of native and mutant, protein structure analysis was performed. We used the web resource dbSNP to identify the protein coded by OTC. We also confirmed the mutation positions and the mutation residues from this server. These mutation residues and their corresponding positions were in complete agreement with the results obtained from the in silico prediction methods SIFT, PolyPhen 2, I-Mutant 3, SNPs&GO and PhD-SNP. The mutation was performed using SWISS-PDB viewer (Guex and Peitsch 1997), and energy minimization for 3D structures was performed by NOMAD-Ref server (Lindahl et al. 2006). This server uses Gromacs as default force field for energy minimization based on the methods of steepest descent, conjugate gradient and L-BFGS methods. Conjugate gradient method was used for optimizing the 3D structures. Deviation between the two structures was evaluated by their Root Mean Square Deviation (RMSD) values.

Results

A total of about 195 SNPs were collected and their deleterious natures were analyzed by various computational methods.

Analysis of deleterious SNPs using evolutionary-based prediction methods

SIFT algorithm calculates whether an amino acid replacement may have an impact on protein function by aligning similar proteins and calculating a score which tells the evolutionary conservation status of the amino acid of our interest. SIFT scores were obtained for 195 SNPs. SIFT scores were classified as intolerant (0.00–0.05), potentially intolerant (0.051–0.10), borderline (0.101–0.20), and tolerant (0.201–1.00). Approximately 115 (58.97 %) of the SNPs exhibit SIFT scores of 0.0. Another 45 (23 %) of the variants have scores between 0.01 and 0.05. Thus, 82 % of the SNPs are classified as ‘‘intolerant’’ by SIFT. The remaining SNPs were found to be “tolerant”. SIFT gave a prominent result with an 82 % of predictions to be deleterious.

Analysis of deleterious SNPs using structure-based prediction methods

The influences of nsSNPs in protein function were tested using structure-based predictors by applying it to three different methods. The structural levels of changes of 195 nsSNPs were determined by PolyPhen 2. To provide an outline of the distribution of PolyPhen 2 scores, the scores are distributed into three groups. PolyPhen 2 scores falling between 0.85 and 1 are expected to be ‘‘probably damaging’’ to protein structure and function. 157 (80 %) of the nsSNPs were found to have scores in the above-mentioned category. An additional 19 (9.7 %) of the variants exhibited PolyPhen 2 scores of 0.2–0.84, indicative of variants that are ‘‘possibly damaging’’ to protein function, and the remaining 17 (8.7 %) nsSNPs that scored less than 0.02 were designated as ‘‘benign’’. SNPs&GO makes use of sequence and evolutionary information to predict whether a mutation is disease related or not by developing the protein functional annotation. The protein sequences with corresponding UniProt accession numbers were submitted along with their corresponding mutational position, wild-type and mutant-type residue as input to the server. 98 % of the nsSNPs were designated as “disease”. These mutants are found to be disease causing. PhD-SNP predicts the given nsSNPs have pathological effects based on the local sequence environment of the mutation. It classifies the SNPs into disease or neutral based on the most accurate mode that uses both sequence and evolutionary profiles. It showed 64 % of nsSNPs were likely to cause disease on mutation.

Prediction of stability changes

Mutated proteins involved in diseases show a stability change. Predicting the protein stability upon mutation is necessary for understanding structure function relationship of protein. Generally, the stability of a protein is represented by the change in the Gibbs free energy upon folding (ΔG), where an increasingly negative number represents greater stability. Single amino acid substitution in a protein sequence can result in a significant change in the protein’s stability (ΔΔG), where a positive ΔΔG represents a destabilizing mutation and a negative value represents a stabilizing mutation. All the 195 nsSNPs submitted to pathogenic prediction tools were also subjected to protein stability analysis by I-Mutant 3.0. It gave an estimation of 107 nsSNPs (54 %) caused decreased stability, 48 SNPs (24 %) were neutral to the mutation, and 39 SNPs (20 %) increased the stability of protein after mutation. Out of 195 nsSNPs, 92 nsSNPs (47 %) were predicted to be positive by SIFT, PolyPhen 2, I-Mutant 3, SNPs&Go, and PhD-SNP (Table 1).
Table 1

List of nsSNPs in OTC found to be deleterious/neutral by computational methods

S. no.rs IDsVariantsSIFTPolyPhen 2I-Mutant 3SNPs&GOPhD-SNP
1rs72552295M1T0.010.895−0.28DiseaseNeutral
2rs72552296M1I0.030.4650.69DiseaseNeutral
3rs67752076M1V0.140.0640.61DiseaseNeutral
4rs137853257R10PNANA–1.12NANeutral
5rs148660170R23Q0.150.139–0.97DiseaseNeutral
6rs68031618R26Q0.590.002–0.84DiseaseNeutral
7rs199858968G28E0.060.999–0.82DiseaseNeutral
8rs72554306G39C0.011–2.15DiseaseNeutral
9rs72554307R40C011.16DiseaseDisease
10rs72554308R40H0.030.54–0.28DiseaseNeutral
11rs74518351D41G0.310.103–2.91DiseaseNeutral
12 rs72554309 L43F 0.01 1 –1.08 Disease Disease
13rs72554310T44I0.011–2.64DiseaseNeutral
14rs72554311L45V0.010.967–0.7DiseaseNeutral
15 rs72554312 L45P 0 1 –1.26 Disease Disease
16rs1800321K46R0.490.187–1.02DiseaseDisease
17rs67939655N47T0.010.07–0.66DiseaseDisease
18 rs72554315 F48S 0.01 1 –0.12 Disease Disease
19 rs72554316 T49P 0 0.922 –1.82 Disease Disease
20rs201802621G50A10.02–0.01DiseaseNeutral
21 rs72554317 E52G 0 1 –2.37 Disease Disease
22 rs72554318 E52D 0.01 1 –1.61 Disease Disease
23 rs66521141 E52K 0 1 –1.69 Disease Disease
24 rs66677059 I53T 0 1 –2.68 Disease Disease
25rs72554319Y55D0.230.919–0.42DiseaseNeutral
26rs72554320M56T00.197–0.34DiseaseNeutral
27 rs72554321 L57Q 0 1 –0.42 Disease Disease
28 rs72554323 S60L 0 1 –1.77 Disease Disease
29 rs72554324 L63P 0.01 1 –0.99 Disease Disease
30rs72554325I67R0.221–1.61DiseaseNeutral
31rs72554328L76S0.410.994–0.08DiseaseDisease
32rs72554329L77F00.9980.05DiseaseDisease
33 rs72554331 G79E 0 1 –1.51 Disease Disease
34 rs72554332 K80E 0.03 0.468 –1.6 Disease Disease
35rs72554333K80 N00.997–2.25DiseaseNeutral
36 rs72554336 G83R 0 0.923 –1.04 Disease Disease
37 rs72554337 G83D 0 0.824 –1.85 Disease Disease
38 rs72554338 E87K 0 0.96 –1.56 Disease Disease
39rs72554339K88N01–2.09DiseaseNeutral
40 rs72554340 S90G 0 1 –1.65 Disease Disease
41 rs72554341 S90N 0 1 –0.83 Disease Disease
42rs72554342S90R01–2.7DiseaseNeutral
43rs67418243R92G00.979–0.37DiseaseNeutral
44 rs66550389 R92Q 0 1 –0.55 Disease Disease
45 rs72554344 T93A 0 0.966 –1.39 Disease Disease
46rs72554345R94T01–0.6DiseaseNeutral
47rs72554346L95S00.283–2.74DiseaseDisease
48 rs184053962 S96R 0 1 –0.86 Disease Disease
49rs72554347E98K010.85DiseaseNeutral
50rs72554349G100D010.19DiseaseDisease
51rs1133135F101L0.30.2080.61DiseaseNeutral
52rs72554350A102E0.290.9891.09DiseaseNeutral
53rs72554351G105V010.14DiseaseDisease
54 rs72554352 G106R 0 1 –2.08 Disease Disease
55 rs67651903 G106E 0 1 –1.11 Disease Disease
56 rs1800324 L111P 0 1 –0.47 Disease Disease
57 rs66539573 H117R 0.02 0.993 –1.43 Disease Disease
58rs72554356T125M0.210.688–0.84DiseaseNeutral
59 rs72554358 D126G 0 1 –0.68 Disease Disease
60 rs140046498 R129C 0 0.9999 –1.87 Disease Disease
61 rs66656800 R129H 0.02 1 –2.46 Disease Disease
62 rs72556252 L131S 0 1 –2.09 Disease Disease
63rs72556253S132P010.95DiseaseDisease
64rs72556254S132F011.37DiseaseDisease
65rs72556256A135E0.090.1220.53DiseaseDisease
66rs72556257D136V00.9960.86DiseaseDisease
67rs72556258A137T0.040.7610.19DiseaseNeutral
68rs72556259L139S01–0.06DiseaseNeutral
69 rs72556260 A140P 0.04 1 –0.46 Disease Disease
70 rs68026851 R141Q 0 1 –0.27 Disease Disease
71 rs67960011 R141G 0 1 –2.51 Disease Disease
72 rs72556261 V142E 0 1 0.55 Disease Disease
73rs67016166L148S010.1DiseaseNeutral
74rs66741318L148F010.84DiseaseDisease
75rs72556265L151R01–0.21DiseaseNeutral
76rs72556266A152V0.020.7911.19DiseaseNeutral
77rs72556268A155E00.999–0.24DiseaseNeutral
78 rs67890094 A155P 0.01 0.999 –1.22 Disease Disease
79 rs72556269 I159T 0 0.99 –3.21 Disease Disease
80 rs67954347 I160N 0 0.996 –0.25 Disease Disease
81 rs72558497 N161K 0 1 –1.25 Disease Disease
82rs72556270N161D010.44DiseaseDisease
83rs72556271N161S0.030.9980.17DiseaseNeutral
84rs72556272G162E01–1.58DiseaseNeutral
85rs66626662G162R01–0.57DiseaseNeutral
86 rs72556273 S164P 0 1 –0.34 Disease Disease
87rs72556275D165Y01–0.73DiseaseNeutral
88 rs72556276 H168Q 0 1 –1.47 Disease Disease
89 rs72556277 P169A 0 1 –0.9 Disease Disease
90 rs72556278 P169L 0 1 –0.6 Disease Disease
91 rs72556279 I172F 0.01 1 –0.91 Disease Disease
92rs72556280I172M00.964–0.5DiseaseNeutral
93 rs72556281 A174P 0 1 –1.52 Disease Disease
94 rs68033093 D175G 0.01 1 –3.78 Disease Disease
95 rs72556282 Y176H 0 0.999 –1.39 Disease Disease
96 rs72556283 Y176C 0 1 –1.39 Disease Disease
97rs148961194L177F0.021–0.13DiseaseNeutral
98 rs72556284 T178M 0 1 –0.51 Disease Disease
99 rs72556286 L179P 0 1 –0.63 Disease Disease
100 rs72556290 E181G 0 0.793 –0.62 Disease Disease
101rs143746493H182Q0.141–1.93DiseaseNeutral
102rs72556291H182L0.271–1.06DiseaseNeutral
103 rs72556292 Y183D 0.04 1 –1.91 Disease Disease
104 rs72556294 G188R 0 1 –2.08 Disease Disease
105 rs72556295 G188V 0 1 –0.91 Disease Disease
106rs72556296L191F0.130.202–0.08DiseaseDisease
107 rs72556297 L191R 0 0.998 –0.03 Disease Disease
108 rs72556298 S192R 0 0.999 –0.57 Disease Disease
109rs67284661W193R01–0.91DiseaseNeutral
110 rs67294955 G195R 0 1 –1.97 Disease Disease
111rs72556300D196V011.37DiseaseDisease
112 rs66642398 D196N 0 1 –0.09 Disease Disease
113 rs72556301 G197R 0 1 –0.09 Disease Disease
114rs72556302G197E010.26DiseaseDisease
115rs72558403N198I010.26DiseaseDisease
116 rs72558404 N198K 0 1 –0.84 Disease Disease
117 rs72558405 N199D 0 1 –0.97 Disease Disease
118 rs72558406 N199S 0 1 –0.91 Disease Disease
119 rs72558407 L201P 0.01 1 –0.31 Disease Disease
120rs72558408H202Y0.031–2.74DiseaseNeutral
121 rs72558409 H202P 0 0.998 –2.84 Disease Disease
122 rs72558410 S203C 0 1 –3.27 Disease Disease
123rs72558411M205V0.030.904–0.64DiseaseNeutral
124rs72558412M206R0.030.998–2.5DiseaseNeutral
125rs72558413M206I0.130.069–0.95DiseaseNeutral
126rs72558414S207N0.060.805–0.31DiseaseDisease
127rs72558415S207R0.0110.42DiseaseNeutral
128rs72558416A208T0.081–0.05DiseaseNeutral
129rs72558417A209V0.0411.21DiseaseNeutral
130rs72558418K210Q0.030.8970.57DiseaseNeutral
131rs72558419M213L0.170.721–0.67DiseaseNeutral
132rs72558420H214Y0.020.980.4DiseaseNeutral
133rs72558421L215F0.140.994–0.41DiseaseDisease
134rs72558423Q216E0.970.663–1.03DiseaseDisease
135 rs72558424 A217E 0 0.994 –1.02 Disease Disease
136 rs72558425 P220A 0 1 –1.05 Disease Disease
137 rs72558426 P220L 0 1 –1.15 Disease Disease
138rs72558428P225T0.270.954–0.29DiseaseNeutral
139rs67120076P225R0.030.9750.12DiseaseDisease
140rs72558429A233V00.998–1.49DiseaseNeutral
141rs4385598Q235Q0.83NANANANA
142rs67283833E239G0.160.85–0.47DiseaseDisease
143rs72558435T242I0.051–0.23DiseaseDisease
144 rs72558436 L244Q 0 0.854 –0.11 Disease Disease
145 rs72558437 T247K 0.03 0.974 –0.55 Disease Disease
146 rs67330615 A253T 0 1 –2.18 Disease Disease
147rs72558440H255P0.020.017–0.9DiseaseDisease
148 rs142592280 N258S 0 0.688 –1.72 Disease Disease
149rs72558441L260S010.24DiseaseDisease
150 rs67333670 T262K 0 1 –0.62 Disease Disease
151rs72558442D263N010.33DiseaseNeutral
152 rs72558443 D263G 0 1 –1.23 Disease Disease
153 rs72558444 T264A 0.01 0.998 –2.53 Disease Disease
154 rs67156896 T264N 0.03 0.902 –1.76 Disease Disease
155 rs72558445 W265R 0 0.996 –0.87 Disease Disease
156 rs72558446 W265L 0 1 –0.26 Disease Disease
157 rs72558448 S267R 0 1 –0.51 Disease Disease
158 rs72558449 M268T 0 1 –1.98 Disease Disease
159 rs72558450 G269E 0 1 –1.43 Disease Disease
160 rs72558451 Q270E 0.01 0.923 –1.13 Disease Disease
161rs1800328Q270P00.284–2.14DiseaseDisease
162 VAR_004927 Q270R 0 0.977 –1.14 Disease Disease
163 rs72558454 R277W 0 1 –0.46 Disease Disease
164 rs66724222 R277Q 0 1 –0.35 Disease Disease
165rs72558461W298S0.050.987–0.96DiseaseNeutral
166rs72558462L301F0.010.9970.16DiseaseDisease
167rs72558463H302Y011.2DiseaseDisease
168rs67993095H302R01–0.1DiseaseNeutral
169rs67870244H302K010.21DiseaseNeutral
170rs72558464C303Y010.74DiseaseDisease
171rs67468335C303R010.28DiseaseDisease
172rs72558465L304F00.9991.14DiseaseDisease
173 rs67501347 P305H 0 1 –2.28 Disease Disease
174 rs72558467 E310G 0 1 –3.46 Disease Disease
175rs72558468V311M01–0.27DiseaseNeutral
176rs137899554E314A0.110.003–1.44DiseaseNeutral
177 rs72558470 V315F 0 0.999 –0.97 Disease Disease
178 rs67414444 V315D 0 1 –3.9 Disease Disease
179rs72558471F316S00.999–0.05DiseaseNeutral
180 rs72558472 S318F 0.04 0.555 –0.51 Disease Disease
181 rs72558474 R320L 0.01 0.948 –1.64 Disease Disease
182 rs72558476 E326K 0 1 –0.52 Disease Disease
183rs72558478R330G01–2.29DiseaseNeutral
184 rs72558480 W332R 0.03 1 –0.84 Disease Disease
185rs72558486A336S0.160.7910.41DiseaseDisease
186rs72558487V337L0.320.0071.53DiseaseNeutral
187rs199568993M338L0.560.0121.21DiseaseNeutral
188rs72558488V339L0.30.0011.1DiseaseNeutral
189rs72558489S340P0.090.9960.77DiseaseNeutral
190 rs72558490 L341P 0 1 –0.85 Disease Disease
191rs72558491T343K0.920.101–0.48DiseaseNeutral
192rs72558492Y345C0.010.999–1.92DiseaseNeutral
193rs66469337Y345H0.230.042–2.64DiseaseNeutral
194rs72558493P347T0.020.997–0.62DiseaseNeutral
195rs72558495F354C00.791–0.22DiseaseNeutral

rs IDs highlighted in bold were found to be deleterious by SIFT, PolyPhen 2, I-Mutant 3, SNPs&GO and PhD-SNP

List of nsSNPs in OTC found to be deleterious/neutral by computational methods rs IDs highlighted in bold were found to be deleterious by SIFT, PolyPhen 2, I-Mutant 3, SNPs&GO and PhD-SNP According to the computational prediction in OTC gene, structural analysis was performed for the five highly deleterious variants by modeling mutant structures using native X-ray crystallographic structure (PDB ID: 1OTH). An energy minimization study gives the information about the protein structure stability. We checked the total energy for native- and mutant-type structures. In OTC gene, mutation occurred for the native protein in ‘A’ chain of protein structure at position D126G, R141Q, A174P, T178M and G195R. It can be seen that the total energy value and RMSD of native-type and mutant-modeled structures (D126G, A174P, and G195R) were found to be higher (Table 2).
Table 2

Summary of deleterious nsSNPs in the coding region of OTC gene

RS IDSAmino acid positionSIFTPolyPhen 2I-Mutant 3SNPs&GOPHD-SNP
rs72554358D126G01–0.68DiseaseDisease
rs68026851R141Q01–0.27DiseaseDisease
rs72556281A174P01–1.52DiseaseDisease
rs72556284T178M01–0.51DiseaseDisease
rs67294955G195R01–1.97DiseaseDisease
Summary of deleterious nsSNPs in the coding region of OTC gene The mutations for 1OTH at their corresponding positions were performed by SWISS-PDB viewer independently to achieve modeled structures. Then, energy minimizations were performed by NOMAD-Ref server for the native-type protein 1OTH and the mutant-type structures. The RMSD values between the native type (1OTH) and the mutant D126G is 2.01 Å, between the native type and the mutant A174P is 2.82 Å, and between the native type and the mutant G195R is 2.82 Å, respectively. The deviation between the two structures is evaluated by their RMSD values, which could affect the stability and functional activity. The RMSD values of all the mutant structures were all alike. Higher the RMSD value more will be the deviation between native- and mutant-type structures and which in turn changes their functional activity. Superimposition of native with the mutant protein D126G, R141Q, A174P, T178M and G195R of OTC gene is shown in (Fig. 1a–e). The total energy for the native and mutant type structures were found to be −25480.939, −24899.660, −25068.101, −24881.020, −24969.936 and −24608.215 kcal/mol respectively (Table 3).
Fig. 1

Superimposition of native and mutant modeled structures (cartoon shape) of OTC protein. a Superimposed structure of native amino acid aspartic acid (green) with mutant amino acid glycine (red) at position 126. b Superimposed structure of native amino acid arginine (green) with mutant amino acid glutamine (red) at position 141. c Superimposed structure of native amino acid alanine (green) with mutant amino acid proline (red) at position 174. d Superimposed structure of native amino acid threonine (green) with mutant amino acid methionine (red) at position 178. e Superimposed structure of native amino acid glycine (green) with mutant amino acid arginine (red) at position 195

Table 3

RMSD and total energy of native and mutant model of OTC gene

Native and mutant structureRMSD (Å)Total energy (Kcal/mol)Stabilizing residue (Sride)
Native0.00–25,480.9395
D126G2.01–24,899.6603
R141Q1.84–25,068.1014
A174P2.82–24,881.0204
T178M1.94–24,969.9363
G195R2.82–24,608.2153
Superimposition of native and mutant modeled structures (cartoon shape) of OTC protein. a Superimposed structure of native amino acid aspartic acid (green) with mutant amino acid glycine (red) at position 126. b Superimposed structure of native amino acid arginine (green) with mutant amino acid glutamine (red) at position 141. c Superimposed structure of native amino acid alanine (green) with mutant amino acid proline (red) at position 174. d Superimposed structure of native amino acid threonine (green) with mutant amino acid methionine (red) at position 178. e Superimposed structure of native amino acid glycine (green) with mutant amino acid arginine (red) at position 195 RMSD and total energy of native and mutant model of OTC gene

Analysis of local environment changes

Within the range of 4 Å from the mutational point, surrounding amino acid changes were analyzed for native and mutant protein structures. It was observed through PyMOL (DeLano 2002). Figure 2 shows the substitution of hydrophilic residue aspartic acid to hydrophobic residue glycine at position 126, which leads to hydrophobic change at the core of the protein that could result in the destabilization of the gamma turns. The drift in hydrophilic to hydrophobic property can result in the gain of one amino acid LEU 131 in mutant structure.
Fig. 2

Surrounding amino acid changes in native OTC and mutant D126G structures

Surrounding amino acid changes in native OTC and mutant D126G structures Figure 3 illustrates the substitution of the hydrophilic residue arginine with another hydrophilic residue glutamine at position 141, which leads to structural modification at the core region of the protein due to the size of the substituted amino acid, and that could result in affecting the strand portion. The changes in the amino acid size results in loss of four amino acids ARG330, HIS268, LEU139, and THR93 in mutant R141Q structure. Substitution of hydrophobic residue alanine with another hydrophobic residue proline and changes in the surrounding amino acids are shown in Fig. 4. Since the size of the substituted amino acid has the same size of the native residue, these changes were not affected the surrounding amino acids in A174 P-mutant structure. Figure 5 shows the substitution of non-polar hydrophobic amino acid glycine with polar hydrophilic larger amino acid arginine at position 195 of OTC protein. Substitution of small amino acid glycine with large amino acid arginine leads to gain of seven SER267, THR264, ILE200, ASP263, TRP265, ASN198, and LEU252 amino acids in the surrounding region of mutant structure. This change may affect the gamma turn of the native protein. Substitution of polar hydrophobic amino acid threonine at position 178 with non-polar hydrophobic amino acid methionine is shown in Fig. 6. This substitution leads to gain of one amino acid in the mutant structure and this change may affect the helix region of the native OTC protein.
Fig. 3

Surrounding amino acid changes in native OTC and mutant R141Q structures

Fig. 4

Surrounding amino acid changes in native OTC and mutant A174P structures

Fig. 5

Surrounding amino acid changes in native OTC and mutant G195R structures

Fig. 6

Surrounding amino acid changes in native OTC and mutant T178M structures

Surrounding amino acid changes in native OTC and mutant R141Q structures Surrounding amino acid changes in native OTC and mutant A174P structures Surrounding amino acid changes in native OTC and mutant G195R structures Surrounding amino acid changes in native OTC and mutant T178M structures

Secondary structural changes analysis

The number of secondary structure elements such as Beta sheets, Beta–Alpha Beta, Strands, Helices, Helix–Helix Interacs, Beta Turns, and Gamma Turns was calculated for both the native and mutant models (Table 4). It has to note that the observed numbers of secondary structural elements are equal in both native and mutant models except the Helix–Helix Interacs and Beta Turns. There was a slight decrease in the number of beta turns in mutant models D126G, R141Q, A174P, T178M, and G195R as 15, 12, 15, 12, and 15, respectively. The number of beta turn was increased by one in three mutant models R141Q, A174P, and T178M. These secondary structural element changes lead to changes in the physiochemical properties of the mutant structure (Table 5) and it may affect the protein stability and conformation.
Table 4

Secondary structural elements in native and mutant structure of OTC gene

S. no.VariantSheetBeta-alpha-beta unitsStrandsHelicesHelix–Helix interacsBeta turnsGamma turns
1Native2591516233
2D126G25915 15 233
3R141Q25915 12 24 3
4A174P25915 15 24 3
5T178M25915 12 24 3
6G195R25915 15 233

Change in the secondary structure elements are highlighted in bold

Table 5

Changes in the physiochemical properties of native and mutant structure of OTC protein

VariantSizeChargePolarityHydrophobicityDisulfide bondGly or ProModification scoreAccessibilityFree energy changes
Wild typeMutantProtein stabilityReliability index
D126GDecreaseDecreaseDecreaseIncreaseUnchangedApparition2825.17 (intermediate)11.51 (intermediate)Increase4
R141QDecreaseDecreaseUnchangedIncreaseUnchangedUnchanged234.63 (buried)0.37 (buried)Decrease7
A174PIncreaseUnchangedUnchangedDecreaseUnchangedApparition150.00 (buried)0.00 (buried)Decrease2
T178MIncreaseUnchangedDecreaseIncreaseUnchangedUnchanged321.30 (buried)2.71 (buried)Decrease1
G195RIncreaseIncreaseIncreaseDecreaseUnchangedDisparition610.10 (buried)0.11 (buried)Decrease5
Secondary structural elements in native and mutant structure of OTC gene Change in the secondary structure elements are highlighted in bold Changes in the physiochemical properties of native and mutant structure of OTC protein

Discussion

Last decade has witnessed the accelerated expansion of information regarding the genomic variants especially SNPs in public databases as a result of improved second generation sequencing technologies. After polymorphism information has become abundant in public databases, many groups started to develop in silico tools that would computationally calculate the properties of these polymorphisms, particularly trying to extrapolate the effect of polymorphism that has on the phenotype. If dataset on the phenotypic impact is unknown (owing to the insufficiency of clinical data or experimental) or not specified, most of the tools set out to identify whether a polymorphism is detrimental or not. Anyhow, in order for the identification, to be accurate, information had to be accumulated on the features distinguishing neutral from deleterious polymorphisms; many tools and algorithms that support large-scale analyses of SNPs (In particular nsSNPs). Various computational methods have been developed for predicting the significant missense mutations based on sequence and structural methods. With respect to the information utilized by the prediction, existing methods can be roughly grouped into three categories: ‘sequence-based’, ‘structure-based’ and ‘sequence and structure-based’, respectively. Sequence-based methods can be subcategorized into sequence homology-based and single sequence-based methods. Sequence homology-based method methods in this category calculate the probability of the substitutions based on multiple sequence alignments (Ferrer-Costa et al. 2004; Shen and Vihinen 2004). Sequence homology-based tools are derived based on the premise that essential amino acids are conserved in the protein family. Hence, changes at well-conserved positions tend to be predicted as deleterious. This probabilistic method provides information about conserved sites in evolution that are often structurally or functionally important and distinguishes between missense mutations involved in disease and those that are functionally neutral. For sequence homology-based methods, the prediction accuracy depends heavily on the availability of enough homologs in protein databases. Saunders and Baker (2002) showed that the prediction accuracy decreased significantly when fewer than 5–10 homologous sequences are available. An ideal alignment should be composed of a diverse set of orthologous sequences rather than paralogs. Structure-based methods make predictions based on structural information, especially that of amino acid side-chain conformation, over packing and residue–residue contacts (Gonzalez Diaz et al. 2005). The substitution of a wild-type residue may lead to altered chemical and physical properties, thus causing structural arrangements. The third method category combines information on the sequence features, the structural parameters and contacts to characterize the substitution. The incorporation of structural data greatly improves the quality of the multiple sequence alignment and the accuracy of prediction. This is well illustrated by PolyPhen (Ramensky et al. 2002), a multiple sequence alignment server that aligns sequences using structural information. It may outperform the single sequence-based program SIFT (Ng and Henikoff 2003) in predicting the effect of amino acid mutations. In addition to PolyPhen, diverse Web-based programs are used to predict mutation effects based on homology and three-dimensional structural models, e.g., PROMALS3D (Pei et al. 2008), 3Dcoffee (O’Sullivan et al. 2004), Expresso (Armougom et al. 2006), CLUSTALW (Thompson et al. 1994), MUSCLE (Edgar 2004), PRALINE (Simossis and Heringa 2005), SPEM (Zhou and Zhou 2005). The user only needs to provide sequences, the server runs BLAST to identify close homologues of the sequences within the PDB database. Study of the molecular basis of diseases using experimental methods is often labor intensive, and time consuming, especially in cases where there are several missense mutations causing the disease. These studies are difficult to mount on a scale that may be required for characterizing the genetic variants and at times these results might not always reflect the in vivo genotype function in humans. In contrast, precise and useful information about the effects of mutations on protein structure and function can be readily obtained by in silico methods. Our study gains significance by predicting the possible deleterious SNPs in OTC gene, so that the number of SNPs screened for association with diseases can be reduced to those that are most likely to alter gene function. All the above methods defined here follow a similar technique in which each SNP is first labeled with the properties related to damage it may cause on protein structure and function. The resulting feature vector is then used to determine whether a single residue substitution has any effect on protein function or not. Considering SNPs based on the amino acid properties are generally reflected to be an important phenomenon in defining the protein folding, stability, and its function. The results from this paper signify the impact of mutations in OTC gene in causing OTCD. Further, studies possibly will help in uncluttered nature of OTCD. It is hoped that the results obtained from this study would pave the way by providing useful information to the researchers, and can play an important role in bridging the gap between biologists and bioinformaticians.
  34 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

3.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations.

Authors:  Raphael Guerois; Jens Erik Nielsen; Luis Serrano
Journal:  J Mol Biol       Date:  2002-07-05       Impact factor: 5.469

4.  MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Authors:  Robert C Edgar
Journal:  Nucleic Acids Res       Date:  2004-03-19       Impact factor: 16.971

5.  Sequence-based prediction of pathological mutations.

Authors:  C Ferrer-Costa; M Orozco; X de la Cruz
Journal:  Proteins       Date:  2004-12-01

6.  The SWISS-PROT protein sequence data bank and its new supplement TREMBL.

Authors:  A Bairoch; R Apweiler
Journal:  Nucleic Acids Res       Date:  1996-01-01       Impact factor: 16.971

7.  Human non-synonymous SNPs: server and survey.

Authors:  Vasily Ramensky; Peer Bork; Shamil Sunyaev
Journal:  Nucleic Acids Res       Date:  2002-09-01       Impact factor: 16.971

8.  Human glutathione S-transferase theta (GSTT1): cDNA cloning and the characterization of a genetic polymorphism.

Authors:  S Pemble; K R Schroeder; S R Spencer; D J Meyer; E Hallier; H M Bolt; B Ketterer; J B Taylor
Journal:  Biochem J       Date:  1994-05-15       Impact factor: 3.857

9.  A method and server for predicting damaging missense mutations.

Authors:  Ivan A Adzhubei; Steffen Schmidt; Leonid Peshkin; Vasily E Ramensky; Anna Gerasimova; Peer Bork; Alexey S Kondrashov; Shamil R Sunyaev
Journal:  Nat Methods       Date:  2010-04       Impact factor: 28.547

10.  NOMAD-Ref: visualization, deformation and refinement of macromolecular structures based on all-atom normal mode analysis.

Authors:  Erik Lindahl; Cyril Azuara; Patrice Koehl; Marc Delarue
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

View more
  2 in total

1.  Mutation Study of Malaysian Patients with Ornithine Transcarbamylase Deficiency: Clinical, Molecular, and Bioinformatics Analyses of Two Novel Missense Mutations of the OTC Gene.

Authors:  Ernie Zuraida Ali; Yuslina Zakaria; Mohd Amran Mohd Radzi; Lock Hock Ngu; Siti Azma Jusoh
Journal:  Biomed Res Int       Date:  2018-08-05       Impact factor: 3.411

2.  Common polymorphic OTC variants can act as genetic modifiers of enzymatic activity.

Authors:  Mónica Lopes-Marques; Ana Rita Pacheco; Maria João Peixoto; Ana Rita Cardoso; Catarina Serrano; António Amorim; Maria João Prata; David N Cooper; Luísa Azevedo
Journal:  Hum Mutat       Date:  2021-06-03       Impact factor: 4.878

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.