Literature DB >> 32867672

Predicting the most deleterious missense nsSNPs of the protein isoforms of the human HLA-G gene and in silico evaluation of their structural and functional consequences.

Elaheh Emadi1, Fatemeh Akhoundi2, Seyed Mehdi Kalantar1, Modjtaba Emadi-Baygi3,4.   

Abstract

BACKGROUND: The Human Leukocyte Antigen G (HLA-G) protein is an immune tolerogenic molecule with 7 isoforms. The change of expression level and some polymorphisms of the HLA-G gene are involved in various pathologies. Therefore, this study aimed to predict the most deleterious missense non-synonymous single nucleotide polymorphisms (nsSNPs) in HLA-G isoforms via in silico analyses and to examine structural and functional effects of the predicted nsSNPs on HLA-G isoforms.
RESULTS: Out of 301 reported SNPs in dbSNP, 35 missense SNPs in isoform 1, 35 missense SNPs in isoform 5, 8 missense SNPs in all membrane-bound HLA-G isoforms and 8 missense SNPs in all soluble HLA-G isoforms were predicted as deleterious by all eight servers (SIFT, PROVEAN, PolyPhen-2, I-Mutant 3.0, SNPs&GO, PhD-SNP, SNAP2, and MUpro). The Structural and functional effects of the predicted nsSNPs on HLA-G isoforms were determined by MutPred2 and HOPE servers, respectively. Consurf analyses showed that the majority of the predicted nsSNPs occur in conserved sites. I-TASSER and Chimera were used for modeling of the predicted nsSNPs. rs182801644 and rs771111444 were related to creating functional patterns in 5'UTR. 5 SNPs in 3'UTR of the HLA-G gene were predicted to affect the miRNA target sites. Kaplan-Meier analysis showed the HLA-G deregulation can serve as a prognostic marker for some cancers.
CONCLUSIONS: The implementation of in silico SNP prioritization methods provides a great framework for the recognition of functional SNPs. The results obtained from the current study would be called laboratory investigations.

Entities:  

Keywords:  Deleterious SNPs; HLA-G gene; In silico analysis; Missense mutation; Structural and functional impact

Mesh:

Substances:

Year:  2020        PMID: 32867672      PMCID: PMC7457528          DOI: 10.1186/s12863-020-00890-y

Source DB:  PubMed          Journal:  BMC Genet        ISSN: 1471-2156            Impact factor:   2.797


Background

Single-Nucleotide Polymorphisms (SNPs) are the most copious type of human genetic sequence alterations that exist throughout the genome [1-3]. A missense mutation is a type of nonsynonymous (nsSNPs) substitution in which the one amino acid is substituted with another and may produce a mutated protein with structural and functional changes that may lead to disease. One of the main challenges for scientists is to identify SNPs that are pathogenic or related to a particular effect in humans [4]. Nowadays, deleterious nsSNPs in the desired gene can be identified using in -silico approaches. These approaches are reliable, user-friendly, fast, and low cost [5]. The major histocompatibility complex (MHC) is a group of genes encoding essential proteins for the adaptive immune system to identify fragments derived from pathogens [6]. In humans, The MHC complex is also called the human leukocyte antigen (HLA) complex [7]. The HLA genes have been classified into 3 classes I, II, and III. MHC class I genes are divided into two groups: major or classical (HLA-A, HLA-B, and HLA-C) and minor or non-classical (HLA-E, HLA-F, and HLA-G) [8], as they differ from each other by their genetic diversity, expression, structure, and functions [9]. The Human Leukocyte Antigen G (HLA-G) is a protein-coding gene on chromosome 6p21.3 and has an important function in modulation of the immune responses and diseases such as chronic viral infections, autoimmune disorders, transplantation and cancers [10, 11]. HLA-G gene displays low polymorphism but several mature mRNAs can be produced as a result of differential splicing of the primary transcript. The mature mRNAs encode 7 different protein isoforms, 4 of them being membrane-bound (HLA-G1 to G4), and 3 soluble or secreted (HLA-G5 to G7) [11]. Also, Roux et al. reported an inventory of novel HLA-G isoforms that have an extended 5′-region and lack the transmembrane and alpha-1 domains [12]. Soluble HLA-G1 (sHLA-G1) protein can be produced through the proteolysis activity of metalloprotease which maintains the functions of the membrane counterpart completely [13]. The general structure of an HLA-G protein consists of a heavy chain of 3 globular domains (α1, α2, and α3) and a light chain (β-2-microglobulin (B2M)) and a peptide (Fig. 1) [9].
Fig. 1

HLA-G heavy chain gene comprises 7 introns (i1-i7) and 8 exons (each with a distinctive color) with an internal stop codon in Exon 6. As shown in figure each exon encodes a discrete part of the heavy chain, except exon 7 and 8. Alternative splicing events of HLA-G primary transcript (exclusion exon 3 or/and exon 4 and keeping of intron 4 or intron 2 from the final gene transcript) generate seven isoforms. Soluble isoforms lack the transmembrane and cytoplasmic regions due to the intron retention, which includes an immature stop codon. HLA-G5 and HLA-G6 have a tail (21 amino acids) that plays a role in their solubility. HLA-G1 is the complete molecule. HLAG1 is homologous to HLA-G5 and both of them associate with B2M. The signal peptide (exon 1) and α1 domain (exon 2) are existing in all isoforms. Figure modified from Bainbridge et al. [14]

HLA-G heavy chain gene comprises 7 introns (i1-i7) and 8 exons (each with a distinctive color) with an internal stop codon in Exon 6. As shown in figure each exon encodes a discrete part of the heavy chain, except exon 7 and 8. Alternative splicing events of HLA-G primary transcript (exclusion exon 3 or/and exon 4 and keeping of intron 4 or intron 2 from the final gene transcript) generate seven isoforms. Soluble isoforms lack the transmembrane and cytoplasmic regions due to the intron retention, which includes an immature stop codon. HLA-G5 and HLA-G6 have a tail (21 amino acids) that plays a role in their solubility. HLA-G1 is the complete molecule. HLAG1 is homologous to HLA-G5 and both of them associate with B2M. The signal peptide (exon 1) and α1 domain (exon 2) are existing in all isoforms. Figure modified from Bainbridge et al. [14] HLA-G is involved in the control of the immune responses to maintain a fetomaternal tolerance in pregnancies [15]. Interaction of immune effector cells with HLA-G often introduces the suppression of them. The effects of inhibition are performed via three ITIM-bearing receptors expressed on various immune cells: ILT2/CD85/LIR-1, ILT4/CD85d/LIR-2, and KIR2DL4/CD158d [9, 16]. ILT2 is expressed by myeloid and lymphoid cells, ILT4 by myeloid cells, and KIR2DL4 by NK and T CD8+ [16]. The binding site of receptors to HLA-G is different. Interaction of HLA-G with ILT2 requires the association of the α3 domain with β2M but not for binding to ILT4. The KIR2DL4 binds to the α1 domain [11, 15, 16]. sHLA-G and membrane-bound HLA-G isoforms have alike functions. The membrane-bound HLA-G inhibits peripheral natural killer (NK) cytotoxicity and CD4+ cells directly through interaction with ILT2. The decidual NK (dNK) cells up-take and internalize HLA-G from the cell membrane of extravillous trophoblast cells through trogocytosis. HLA-G internalization results in maintaining low cytotoxicity and immunosuppressive status of dNK cells to protect the fetus versus dNKs activity and further release of a set of angiogenic factors to promote vascular remodeling and fetal growth at the beginning of pregnancy. The interaction of HLA-G with ILT2s on CD4+ T cells decreases the alloproliferative effect of CD4+ T cells. Binding sHLA-G to ILT4+ DCs leads to the generation of IL-10 and IL-10-producing DCs can promote the expansion of Tregs (CD4+ CD25highFOXP3) and Tr1s differentiation. Besides, the rapid reproduction, differentiation, and antibody production of B cells are inhibited due to HLA-G interplay with the LILRB1s on B cells. Moreover, apoptosis of CD8+ T cells through activation of the FasR/FasL pathway and endothelial cells are induced by HLA-G5 via interactions with CD8 receptor on CD8+ T cells and CD160 on endothelial cells [15-17]. HLA-G has a restricted tissue-specific protein expression in normal situations examples being extravillous cytotrophoblasts in the placenta, some immune cells, thymic medulla, and cornea. The neo-expression of HLA-G occurs in different pathological situations [15, 18–20]. The expression of HLA-G gene is adjusted mostly by a unique promoter region in comparison with other HLA genes and also at the post-transcriptional control level [21]. A single nucleotide polymorphism of a gene in the coding region or the regulatory region can lead to disease as a result of the expression change or structural and/or function alteration [4]. Most experimental and pathological studies of the HLA-G gene have been focused on polymorphisms in the promoter and 3ˊ UTR regions. The rate of polymorphisms in the coding sequence of this gene is low that indicates a powerful evolutionary pressure acting on the coding sequence [10]. Polymorphisms in the coding region may change the conformation of protein which could lead to modification of protein function including modulating immune responses, production of isoforms, peptide binding, and ability polymerization. HLA-G expression may change by altering the binding affinity of targeted sequences to transcriptional or post-transcriptional factors considering variations in the HLA-G promoter and 3ˊ UTR regions [10]. Concerning the important function of HLA-G in health and diseases in human, the main objectives of this study are to predict the most deleterious missense SNPs in HLA-G1 and HLA-G5, the common most deleterious missense SNPs in membrane-bound HLA-G isoforms, the common most deleterious missense SNPs in soluble HLA-G isoforms and finally to evaluate the impacts of the SNPs on the structure and function of HLA-G protein. The current study presents useful information about the most deleterious missense SNPs and their effects on the structure and function of HLA-G protein. In this paper, we also investigated the correlation between the survival rates of patients in some cancer types with HLA-G expression. The various steps of our study are shown in a flow chart (Fig. 2).
Fig. 2

Flowchart of the different steps of the study

Flowchart of the different steps of the study

Results

Currently, one of the valuable fields of computational genetic research is the identification of SNPs involved in diseases. At present, the advancement of computational biology methods has enabled us to detect the damaging SNPs in the objective genes. Computational methods are used to study the effect of nsSNPs on protein structure and function at the molecular level [22]. In this study, several computational methods were applied to determine the most deleterious common missense SNPs between soluble HLA-G isoforms and the most deleterious common missense SNPs between membrane-bound HLA-G isoforms as well as the most deleterious missense SNPs in HLA-G1 (the longest isoform protein of the HLA-G gene among membrane-bound HLA-G isoforms) and HLA-G5 isoforms (the longest isoform protein of the HLA-G gene among soluble HLA-G isoforms).

SNP dataset of the HLA-G gene from NCBI dbSNP and protein sequences dataset

The desired SNPs of the HLA-G gene were retrieved from the NCBI dbSNP database because it is the most extensive SNP database [23]. SNPs retrieved from NCBI and their corresponding IMGT/HLA alleles are shown in the supplementary Table 1. Of the total reported SNPs in the human HLA-G gene sequence, 301 SNPs are missense (16.38%), 117 SNPs are in 3ˊUTR (6.36%) and 65 SNPs are in 5ˊUTR (3.53%). A pictorial description of the distribution of SNPs in the HLA-G gene represented in percentage terms is shown in Fig. 3. Most tools for analyzing protein require the amino acid sequence, for this reason, the protein sequences of seven HLA-G isoforms were retrieved from the UniProt database. The seven protein isoforms of HLA-G (HLA-G1–7) consist of 338, 246, 154, 246, 319, 227, and 116 amino acids respectively, and a 24-amino acid signal peptide.
Fig. 3

The 3-D pie-chart shows the percentage of missense SNPs, 5′ UTR, 3′ UTR and other types of SNPs in HLA-G gene (according to the dbSNP database on December 2018)

The 3-D pie-chart shows the percentage of missense SNPs, 5′ UTR, 3′ UTR and other types of SNPs in HLA-G gene (according to the dbSNP database on December 2018)

Identification of the most deleterious missense SNPs in HLA-G isoforms using several different servers

At present, there is an extensive range of computational tools used to predict the consequences of missense SNPs on protein structure and function. The in silico methods accuracy for prioritizing candidate deleterious SNPs can be enhanced by incorporating the results of diverse computational tools based on various parameters. Hence, we performed the concordance analysis with SIFT, PROVEAN, PolyPhen-2, I-Mutant 3.0, SNPs&GO, PhD-SNP, SNAP2, and MUpro techniques to predict the most deleterious nsSNPs from the SNP dataset. All the reported missense SNPs for HLA-G were submitted to eight mentioned in silico nsSNP prediction algorithms. We selected missense SNPs that are deleterious in all 8 algorithmic tools manually. Finally, out of total missense SNPs, 35 missense SNPs were predicted as deleterious in isoform 1 (HLA-G1) (Tables 1 and 2), 35 missense SNPs were predicted as deleterious in isoform 5 (HLA-G5) (Supplementary Tables 2 and 3), 8 missense SNPs were predicted as deleterious in all membrane-bound HLA-G isoforms (HLA-G1–4) (Supplementary Tables 4 and 5) and 8 missense SNPs were predicted as deleterious in all soluble HLA-G isoforms (Supplementary Tables 6 and 7) and all further investigations were held for only these missense SNPs.
Table 1

SNPs analyzed in isoform 1by SIFT, PROVEAN, Polyphen 2.0, I-mutant 3.0, SNPs&GO

Isoform 1
SNP rsIDCodonsSubstitutionSIFT predictionPROVEAN predictionPolyPhen-2 predictionI-Mutant DDGSNPs&GO prediction
PredictionScorePredictionScorePredictionScoreSVM3 Prediction EffectDDG ValuePredictionProbability
rs17851921CAC ⇒ CCCH117PDAMAGING0Deleterious9.043Probably damaging1.000Large Increase0.33Disease0.805
CAC ⇒ CTCH117LDeleterious9.930Probably damaging1.000Large Increase0.65Disease0.794
rs111233577CTG ⇒ CGGL290RDAMAGING0Deleterious3.802Probably damaging1.000Large Decrease-1.44Disease0.786
rs142596947CCT ⇒ ACTP234TDAMAGING0Deleterious5.194Probably damaging0.991Large Decrease-1.08Disease0.557
rs144577485CCC ⇒ GCCP209ADAMAGING0Deleterious5.495Possibly damaging0.774Large Decrease-0.99Disease0.575
rs145097667CAT ⇒ TATH287YDAMAGING0Deleterious3.810Probably damaging0.999Large Increase0.54Disease0.743
rs572025435AGG ⇒ AGTR30SDAMAGING0Deleterious3.653Probably damaging0.991Large Decrease-1.17Disease0.542
rs748013931ACC ⇒ CCCT158PDAMAGING0Deleterious5.173Probably damaging1.000Large Decrease-0.52Disease0.696
rs749006959TAT ⇒ TGTY142CDAMAGING0Deleterious8.189Probably damaging1.000Large Increase-1.13Disease0.749
rs750238738ATC ⇒ TTCI237FDAMAGING0Deleterious2.554Probably damaging0.993Large Decrease-1.46Disease0.777
rs756652306GTG ⇒ GCGV285ADAMAGING0Deleterious2.527Probably damaging0.998Large Decrease-1.04Disease0.617
rs760500349CAG ⇒ CTGQ266LDAMAGING0.04Deleterious4.451Probably damaging1.000Large Increase0.19Disease0.706
rs763201540GAC ⇒ AACD53NDAMAGING0Deleterious3.842Probably damaging1.000Large Decrease-1.07Disease0.728
GAC ⇒ TACD53YDeleterious6.941Probably damaging1.000Large Decrease-0.29Disease0.872
rs765275727TGG ⇒ CGGW298RDAMAGING0Deleterious8.828Probably damaging1.000Large Decrease-0.8Disease0.713
rs770027530TGC ⇒ TACC227YDAMAGING0Deleterious7.461Probably damaging1.000Large Decrease-0.12Disease0.854
TGC ⇒ TTCC227FDeleterious7.461Probably damaging1.000Large Decrease0.06Disease0.882
rs770412396CTG ⇒ CCGL102PDAMAGING0.01Deleterious6.187Probably damaging1.000Large Decrease-1.63Disease0.786
rs772834879TAC ⇒ CACY142HDAMAGING0Deleterious-4.54Probably damaging1.000Large Decrease-1.31Disease0.635
rs780697086TGC ⇒ AGCC188SDAMAGING0Deleterious-8.51Probably damaging0.975Large Decrease-0.8Disease0.768
rs781774818CCT ⇒ CATP259HDAMAGING0Deleterious5.764Probably damaging1.000Large Decrease-1.22Disease0.581
rs867319917TGG ⇒ CGGW157RDAMAGING0Deleterious12.658Probably damaging1.000Large Decrease-0.86Disease0.791
rs1200732770GCC ⇒ GACA229DDAMAGING0Deleterious4.021Probably damaging1.000Large Decrease-0.87Disease0.775
rs1260086927CAG ⇒ CCGQ96PDAMAGING0.01Deleterious4.891Probably damaging0.996Large Decrease-0.4Disease0.524
rs1265409678CTC ⇒ CGCL294RDAMAGING0.01Deleterious3.443Probably damaging0.999Large Decrease-1.31Disease0.664
rs1317292772GAT ⇒ AATD143NDAMAGING0Deleterious4.485Probably damaging0.996Large Decrease-0.55Disease0.662
GAT ⇒ CATD143HDeleterious6.283Possibly damaging0.725Large Decrease-0.24Disease0.643
rs1379742188CGC ⇒ AGCR205SDAMAGING0.02Deleterious4.275Possibly damaging0.939Large Decrease-1.2Disease0.640
rs1390270595TAC ⇒ TGCY51CDAMAGING0.03Deleterious6.756Probably damaging1.000Large Decrease-1.29Disease0.724
rs1397132797CTG ⇒ CCGL196PDAMAGING0.01Deleterious5.562Probably damaging1.000Large Decrease-1.66Disease0.760
rs1414848134GAC ⇒ GTCD54VDAMAGING0Deleterious6.752Possibly damaging0.893Large Decrease-1.04Disease0.874
rs143056505CCT ⇒ CTTP234LDAMAGING0Deleterious6.497Possibly damaging0.454Large Decrease-0.33Disease0.711
rs147253884CCC ⇒ CGCP209RDAMAGING0Deleterious6.191Probably damaging0.993Large Decrease-0.61Disease0.763
rs1475659109CCC ⇒ CTCP39LDAMAGING0Deleterious7.485Probably damaging1.000Large Increase-0.48Disease0.582
rs555347515ATG ⇒ AAGM29KDAMAGING0Deleterious3.427Probably damaging0.996Large Decrease-1.48Disease0.774
rs556645753GAC ⇒ GGCD153GDAMAGING0Deleterious6.309Probably damaging0.999Large Decrease-0.73Disease0.511
rs565858069GAC ⇒ CACD130HDAMAGING0.03Deleterious5.927Probably damaging1.000Large Decrease-0.49Disease0.648
rs754527717GAT ⇒ GGTD244GDAMAGING0.01Deleterious3.915Probably damaging0.995Large Decrease-0.09Disease0.561
rs1161818149CTG ⇒ CAGL105QDAMAGING0Deleterious4.572Probably damaging1.000Large Decrease-2.04Disease0.764
CTG ⇒ CCGL105PDeleterious5.349Probably damaging1.000Large Decrease-1.69Disease0.799
Table 2

SNPs analyzed in isoform 1by PhD-SNP, SNAP2, Mupro

Isoform 1
SNP rsIDCodonsSubstitutionPhD-SNPSNAP2MUpro
PredictionScorePredictionScoreExpected AccuracyPredictionDDG Value
rs17851921CAC ⇒ CCCH117PDisease5effect9495%DECREASE-0.94483558
CAC ⇒ CTCH117LDisease5effect8291%INCREASE0.17248775
rs111233577CTG ⇒ CGGL290RDisease6effect5275%DECREASE-1.4553025
rs142596947CCT ⇒ ACTP234TDisease8effect5875%DECREASE-0.8378542
rs144577485CCC ⇒ GCCP209ADisease4effect3066%DECREASE-1.3457662
rs145097667CAT ⇒ TATH287YDisease6effect7985%INCREASE0.00175126
rs572025435AGG ⇒ AGTR30SDisease4effect253%DECREASE-1.582361
rs748013931ACC ⇒ CCCT158PDisease3effect5675%DECREASE-1.0769639
rs749006959TAT ⇒ TGTY142CDisease7effect6980%DECREASE-1.7925753
rs750238738ATC ⇒ TTCI237FDisease5effect4571%DECREASE-0.97093071
rs756652306GTG ⇒ GCGV285ADisease1effect3666%DECREASE-0.89978472
rs760500349CAG ⇒ CTGQ266LDisease6effect7885%DECREASE-0.05227910
rs763201540GAC ⇒ AACD53NDisease5effect5275%DECREASE-0.79149414
GAC ⇒ TACD53YDisease9effect8491%DECREASE-0.45294328
rs765275727TGG ⇒ CGGW298RDisease5effect8791%DECREASE-1.1870648
rs770027530TGC ⇒ TACC227YDisease8effect6880%DECREASE-0.88892871
TGC ⇒ TTCC227FDisease9effect7585%DECREASE-0.71910267
rs770412396CTG ⇒ CCGL102PDisease7effect7985%DECREASE-2.238046
rs772834879TAC ⇒ CACY142HDisease5effect7285%DECREASE-2.1531554
rs780697086TGC ⇒ AGCC188SDisease4effect7585%DECREASE-1.0332928
rs781774818CCT ⇒ CATP259HDisease5effect453%DECREASE-1.1893017
rs867319917TGG ⇒ CGGW157RDisease3effect9395%DECREASE-0.91293557
rs1200732770GCC ⇒ GACA229DDisease8effect7585%DECREASE-0.6414612
rs1260086927CAG ⇒ CCGQ96PDisease6effect4371%DECREASE-0.97929591
rs1265409678CTC ⇒ CGCL294RDisease6effect5775%DECREASE-1.0427076
rs1317292772GAT ⇒ AATD143NDisease4effect2263%DECREASE-1.2006322
GAT ⇒ CATD143HDisease9effect2563%DECREASE-1.2840175
rs1379742188CGC ⇒ AGCR205SDisease5effect3466%DECREASE-0.52183883
rs1390270595TAC ⇒ TGCY51CDisease2effect3966%DECREASE-0.83530596
rs1397132797CTG ⇒ CCGL196PDisease8effect5875%DECREASE-2.0384073
rs1414848134GAC ⇒ GTCD54VDisease5effect7585%DECREASE-0.37383825
rs1430565057CCT ⇒ CTTP234LDisease7effect8191%INCREASE0.3342289
rs1472538844CCC ⇒ CGCP209RDisease3effect4671%DECREASE-1.0509858
rs1475659109CCC ⇒ CTCP39LDisease5effect2963%DECREASE-0.20370663
rs555347515ATG ⇒ AAGM29KDisease6effect2563%DECREASE-1.7776487
rs556645753GAC ⇒ GGCD153GDisease2effect1759%DECREASE-2.384922
rs565858069GAC ⇒ CACD130HDisease5effect253%DECREASE-1.0708696
rs754527717GAT ⇒ GGTD244GDisease2effect753%DECREASE-1.7605711
rs1161818149CTG ⇒ CAGL105QDisease3effect7085%DECREASE-1.8766335
CTG ⇒ CCGL105PDisease6effect7685%DECREASE-2.1510154
SNPs analyzed in isoform 1by SIFT, PROVEAN, Polyphen 2.0, I-mutant 3.0, SNPs&GO SNPs analyzed in isoform 1by PhD-SNP, SNAP2, Mupro

Conservation analysis of the most deleterious nsSNPs in HLA-G isoforms by ConSurf sever

Evolutionary information is essential to investigate further the possible impacts of deleterious nsSNPs [24]. The ConSurf web server characterizes the evolutionary conservation profile of amino acid residues in the protein and whether each amino acid is exposed (on protein surface) or buried (inside protein core) in the protein structure. For example, our ConSurf analysis showed that D53 is an exposed and conserved residue in all soluble HLA-G isoforms and is predicted to have a functional impact on soluble HLA-G isoforms whereas D53 is a buried and conserved residue in isoform 1 and is predicted a structural residue. The ConSurf server produces a colorimetric conservation score as a result. The residues with the utmost change are shown in blue and the conserved residues are shown in purple. The most highly conserved residues are significant for biological function and changing these residues has functional and structural impacts on the proteins [25]. The ConSurf results are compiled in Tables 3, supplementary Tables 8–10, Fig. 4, and supplementary Figs. 1–3. The results showed that the majority of the most deleterious nsSNPs (87.5% in isoform 1 and 86.66% in isoform 5) occur in conserved sites.
Table 3

Evolutionary conservation pattern of amino acids with solvent accessibility in HLA-G1 by ConSurf server

Isoform 1
conservation scoreexposed or buriedpredictionconservation scoreexposed or buriedpredictionconservation scoreexposed or buriedpredictionconservation scoreexposed or buriedprediction
M29L105C188D244
1 (variable)buried-1 (variable)buried-9 (conserved)buriedstructural7 (conserved)exposed-
R30H117L196P259
6 (average)buried-9 (conserved)buriedstructural9 (conserved)buriedstructural9 (conserved)exposedfunctional
P39D130R205Q266
9 (conserved)exposedfunctional8 (conserved)exposedfunctional7 (conserved)exposed-9 (conserved)exposedfunctional
Y51Y142P209V285
9 (conserved)exposedfunctional9 (conserved)buriedstructural9 (conserved)buriedstructural9 (conserved)buriedstructural
D53D143C227H287
9 (conserved)buriedstructural9 (conserved)exposedfunctional9 (conserved)buriedstructural9 (conserved)buriedstructural
D54D153A229L290
8 (conserved)exposedfunctional9 (conserved)exposedfunctional9 (conserved)buriedstructural9 (conserved)buriedstructural
Q96W157P234L294
7 (conserved)exposed-9 (conserved)buriedstructural9 (conserved)exposedfunctional6 (average)buried-
L102T158I237W298
9 (conserved)buriedstructural9 (conserved)exposedfunctional9 (conserved)buriedstructural8 (conserved)exposedfunctional

Conservation score has a range of 1.0 to 9.0. Score 9 represents the most conserved and 1 represents the very variable amino acid. An amino acid, if is preserved and exposed, is a functional residue and if is preserved and buried, is a structural residue

Fig. 4

Consurf analysis of HLA-G1. The degree of conservation of amino acids was shown in the colouring scheme. The color intensity increases based on amino acids conservation grades e.g. turquoise indicates variable sites; white indicates average sites; maroon indicates evolutionarily conserved sites. The most deleterious predicted SNPs are marked below the sequence as red arrows. e is the exposed residue. b is the buried residue. f is an estimated functional residue (highly conserved and exposed). s is an estimated structural residue (highly conserved and buried)

Evolutionary conservation pattern of amino acids with solvent accessibility in HLA-G1 by ConSurf server Conservation score has a range of 1.0 to 9.0. Score 9 represents the most conserved and 1 represents the very variable amino acid. An amino acid, if is preserved and exposed, is a functional residue and if is preserved and buried, is a structural residue Consurf analysis of HLA-G1. The degree of conservation of amino acids was shown in the colouring scheme. The color intensity increases based on amino acids conservation grades e.g. turquoise indicates variable sites; white indicates average sites; maroon indicates evolutionarily conserved sites. The most deleterious predicted SNPs are marked below the sequence as red arrows. e is the exposed residue. b is the buried residue. f is an estimated functional residue (highly conserved and exposed). s is an estimated structural residue (highly conserved and buried)

Prediction of structural and functional modifications due to the most deleterious SNPs on the HLA-G isoforms by MutPred server

The SNPs were predicted as most deleterious also investigated by the Mutpard server to predict the functional effects of SNPs. The most deleterious SNPs that were submitted to this server along with their predicted functional and structural effect on isoforms and the resultant probability scores were represented in Table 4 and supplementary Tables 11–13. For example, W157R in HLA-G1 was found to be highly deleterious with a g score of 0.936 and was predicted to cause the alteration in transmembrane protein with a p score of 0.000015, showing very confident hypothesis. W157R in HLA-G5 was found to be highly deleterious with a g score of 0.93 and was predicted to induce alteration in ordered interface with a p score of 0.0017, showing a very confident hypothesis. Gain of sulfation at D53 was predicted at D53Y in all membrane-bound HLA-G isoforms (g = 0.746 and p= 0.0044 in HLA-G1, g = 0.628 and p = 0.0088 in HLA-G2, g = 0.785 and p = 0.0051 in HLA-G3 and g = 0.75 and p = 0.0046 in HLA-G4). Loss of proteolytic cleavage at R30 was predicted at M29K in all soluble HLA-G isoforms (g = 0.688 and p = 0.0037 in HLA-G5, g = 0.754 and p = 0.0035 in HLA-G6 and g = 0.772 and p = 0.003 in HLA-G7).
Table 4

Prediction of functional effects of the most deleterious SNPs on the HLA-G1 by MutPred

Isoform 1 (ID: 79bb0a85-d537-4be0-bff8-d0fdb06b5211)
SubstitutionMutpred score (>0.50 is considered pathogenic)Molecular mechanism with p-value <= 0.05ProbabilityP-valueprediction
H117P0.823Altered Metal binding0.270.02confident hypotheses
Altered Transmembrane protein0.261.0e-03very confident hypotheses
Altered Ordered interface0.250.03confident hypotheses
H117L0.665Altered Transmembrane protein0.284.8e-04actionable hypotheses
Loss of Strand0.289.6e-03actionable hypotheses
Altered Metal binding0.270.02actionable hypotheses
Altered Ordered interface0.250.02actionable hypotheses
L290R0.644Altered Transmembrane protein0.130.02actionable hypotheses
P234T0.767Altered Ordered interface0.321.8e-03very confident hypotheses
Altered Transmembrane protein0.251.3e-03very confident hypotheses
R30S0.792Altered Ordered interface0.295.0e-03very confident hypotheses
Loss of Proteolytic cleavage at R300.211.2e-03very confident hypotheses
Altered Stability0.200.01confident hypotheses
Altered Metal binding0.163.5e-03very confident hypotheses
T158P0.717Altered Ordered interface0.323.6e-03actionable hypotheses
Gain of Relative solvent accessibility0.290.01actionable hypotheses
Altered Disordered interface0.280.03actionable hypotheses
Loss of Allosteric site at W1570.287.4e-03actionable hypotheses
Altered Metal binding0.274.4e-03actionable hypotheses
Altered Transmembrane protein0.276.6e-04actionable hypotheses
Loss of Helix0.270.05actionable hypotheses
Altered Coiled coil0.100.04actionable hypotheses
Y142C0.681Altered Metal binding0.741.1e-03actionable hypotheses
Altered Disordered interface0.611.7e-04actionable hypotheses
Altered Ordered interface0.501.7e-04actionable hypotheses
Altered Transmembrane protein0.292.1e-04actionable hypotheses
Gain of Relative solvent accessibility0.270.02actionable hypotheses
Loss of Acetylation at K1450.200.04actionable hypotheses
Loss of Methylation at K1450.170.01actionable hypotheses
Loss of Ubiquitylation at K1450.160.04actionable hypotheses
Gain of Disulfide linkage at Y1420.090.05actionable hypotheses
Loss of Sulfation at Y1470.093.5e-03actionable hypotheses
I237F0.658Altered Disordered interface0.330.01actionable hypotheses
Altered Ordered interface0.280.04actionable hypotheses
Altered Transmembrane protein0.261.1e-03actionable hypotheses
Loss of Strand0.260.04actionable hypotheses
Loss of Pyrrolidone carboxylic acid at Q2420.060.02actionable hypotheses
Q266L0.444Altered Ordered interface0.280.03-
Gain of Relative solvent accessibility0.260.03
Loss of Methylation at K2670.233.8e-03
Gain of Acetylation at K2670.220.02
Loss of Pyrrolidone carboxylic acid at Q2660.211.8e-03
Altered Metal binding0.200.03
Gain of Catalytic site at D2620.160.02
Altered Transmembrane protein0.150.01
Gain of Proteolytic cleavage at D2620.140.02
D53Y0.746Altered Ordered interface0.391.1e-03actionable hypotheses
Altered Disordered interface =)0.386.6e-03actionable hypotheses
Altered Metal binding0.371.7e-03actionable hypotheses actionable
Altered Transmembrane protein0.196.6e-03hypotheses
Loss of Proteolytic cleavage at D530.120.03actionable hypotheses
Gain of Pyrrolidone carboxylic acid at Q560.090.01actionable hypotheses
Gain of Sulfation at D530.074.4e-03actionable hypotheses
W298R0.822Altered Ordered interface0.362.1e-03very confident hypotheses
Altered Transmembrane protein0.120.03confident hypotheses
C227Y0.87Altered Disordered interface0.423.2e-03very confident hypotheses
Altered Ordered interface0.394.0e-04very confident hypotheses
Altered Metal binding0.349.6e-03very confident hypotheses
Altered Transmembrane protein0.311.2e-04very confident hypotheses
Loss of Helix0.280.03confident hypotheses
C227F0.888Altered Metal binding0.377.8e-03very confident hypotheses
Altered Disordered interface0.340.01very confident hypotheses
Altered Ordered interface0.284.3e-03very confident hypotheses
Loss of Helix0.270.04confident hypotheses
Altered Transmembrane protein0.269.6e-04very confident hypotheses
L102P0.735Altered Disordered interface0.367.6e-03actionable hypotheses
Gain of Intrinsic disorder0.340.02actionable hypotheses
Altered Transmembrane protein0.311.1e-04actionable hypotheses
Loss of Helix0.314.8e-03actionable hypotheses
Altered DNA binding0.258.9e-03actionable hypotheses
Altered Stability0.210.01actionable hypotheses
Loss of Proteolytic cleavage at R990.130.02actionable hypotheses
Y142H0.667Altered Metal binding0.788.2e-04actionable hypotheses
Altered Ordered interface0.445.4e-04actionable hypotheses
Altered Disordered interface0.358.9e-03actionable hypotheses
Altered Transmembrane protein0.311.1e-04actionable hypotheses
Gain of Relative solvent accessibility0.250.03actionable hypotheses
Gain of Acetylation at K1450.200.04actionable hypotheses
Loss of Methylation at K1450.170.01actionable hypotheses
Altered Stability0.170.02actionable hypotheses
Loss of Ubiquitylation at K145 (0.150.05actionable hypotheses
Altered Coiled coil0.100.04actionable hypotheses
Loss of Sulfation at Y1470.093.5e-03actionable hypotheses
C188S0.795Altered Disordered interface0.290.03confident hypotheses
Altered Ordered interface0.240.03confident hypotheses
Altered DNA binding0.170.03confident hypotheses
Altered Transmembrane protein0.090.05confident hypotheses
P259H0.607Altered Metal binding0.475.9e-03actionable hypotheses
Altered Ordered interface0.280.04actionable hypotheses
Loss of Loop0.270.02actionable hypotheses
Altered Transmembrane protein0.205.0e-03actionable hypotheses
Gain of Catalytic site at D2620.130.03actionable hypotheses
Loss of Proteolytic cleavage at D2620.130.03actionable hypotheses
W157R0.936Altered Ordered interface0.371.8e-03very confident hypotheses
Altered Transmembrane protein0.361.5e-05very confident hypotheses
Gain of Relative solvent accessibility0.333.3e-03very confident hypotheses
Altered Disordered interface0.300.02confident hypotheses
Loss of Allosteric site at W1570.303.7e-03very confident hypotheses
Altered Metal binding0.283.8e-03very confident hypotheses
Altered Coiled coil0.270.01confident hypotheses
A229D0.843Altered Transmembrane protein0.342.4e-05very confident hypotheses
Altered Metal binding0.290.02confident hypotheses
Gain of Relative solvent accessibility0.280.02confident hypotheses
Altered Ordered interface0.279.8e-03very confident hypotheses
Q96P0.683Loss of Helix0.309.0e-03actionable hypotheses
Altered Transmembrane protein0.293.0e-04actionable hypotheses
Altered DNA binding0.256.7e-03actionable hypotheses
Gain of Ubiquitylation at K920.150.04actionable hypotheses
Gain of Proteolytic cleavage at R990.130.02actionable hypotheses
Loss of Pyrrolidone carboxylic acid at Q960.100.01actionable hypotheses
L294R0.527Altered Ordered interface0.278.6e-03actionable hypotheses
D143N0.649Altered Metal binding0.381.5e-03actionable hypotheses
Altered Transmembrane protein0.283.8e-04actionable hypotheses
Altered Ordered interface0.280.04actionable hypotheses
Altered Disordered interface0.280.03actionable hypotheses
Gain of Relative solvent accessibility0.260.03actionable hypotheses
Loss of Acetylation at K1450.190.05actionable hypotheses
Loss of Methylation at K1450.160.01actionable hypotheses
Loss of Ubiquitylation at K1450.150.04actionable hypotheses
Loss of Sulfation at Y1470.093.4e-03actionable hypotheses
D143H0.738Altered Metal binding0.437.0e-03actionable hypotheses
Altered Transmembrane protein0.327.8e-05actionable hypotheses
Altered Disordered interface0.290.03actionable hypotheses
Altered Ordered interface0.277.0e-03actionable hypotheses
Loss of Relative solvent accessibility0.260.03actionable hypotheses
Loss of Acetylation at K1450.190.05actionable hypotheses
Loss of Methylation at K1450.170.01actionable hypotheses
Loss of Ubiquitylation at K1450.160.03actionable hypotheses
Loss of Sulfation at Y1470.093.3e-03actionable hypotheses
R205S0.617Gain of Intrinsic disorder0.340.02actionable hypotheses
Altered Ordered interface0.290.03actionable hypotheses
Loss of Helix0.270.05actionable hypotheses
Y51C0.715Altered Disordered interface0.601.9e-04actionable hypotheses
Altered Metal binding0.534.1e-03actionable hypotheses
Altered Ordered interface0.304.6e-03actionable hypotheses
Loss of Strand0.270.02actionable hypotheses
Altered Transmembrane protein0.205.0e-03actionable hypotheses
Loss of Proteolytic cleavage at D530.120.03actionable hypotheses
Altered Stability0.120.03actionable hypotheses
Gain of Pyrrolidone carboxylic acid at Q560.080.02actionable hypotheses
Loss of Sulfation at Y510.030.02actionable hypotheses
L196P0.884Altered Disordered interface0.340.01confident hypotheses
Altered Ordered interface0.279.0e-03very confident hypotheses
Altered DNA binding0.160.04confident hypotheses
Altered Stability0.140.02confident hypotheses
D54V0.718Altered Metal binding0.553.1e-04actionable hypotheses
Altered Disordered interface0.404.2e-03actionable hypotheses
Altered Ordered interface0.250.02actionable hypotheses
Altered Transmembrane protein0.140.02actionable hypotheses
Loss of Proteolytic cleavage at D530.120.03actionable hypotheses
Gain of Pyrrolidone carboxylic acid at Q560.090.01actionable hypotheses
Loss of Sulfation at Y510.030.02actionable hypotheses
P234L0.809Altered Ordered interface0.331.5e-03very confident hypotheses
Altered Disordered interface0.300.03confident hypotheses
Altered Transmembrane protein0.292.0e-04very confident hypotheses
Loss of Strand0.280.01confident hypotheses
P39L0.659Altered Disordered interface0.270.04actionable hypotheses
Loss of B-factor0.260.04actionable hypotheses
Altered Transmembrane protein0.241.6e-03actionable hypotheses
Altered DNA binding0.239.6e-03actionable hypotheses
Gain of Proteolytic cleavage at R380.140.02actionable hypotheses
M29K0.720Altered Ordered interface0.260.02actionable hypotheses
Loss of Proteolytic cleavage at R300.202.6e-03actionable hypotheses
Altered Stability0.130.03actionable hypotheses
Altered Transmembrane protein0.120.03actionable hypotheses
Altered Metal binding0.117.9e-03actionable hypotheses
D153G0.853Altered Metal binding0.451.6e-04very confident hypotheses
Altered Disordered interface0.359.1e-03very confident hypotheses
Loss of Relative solvent accessibility0.308.2e-03very confident hypotheses
Gain of Strand0.303.1e-03very confident hypotheses
Altered Transmembrane protein0.284.6e-04very confident hypotheses
Altered Ordered interface0.260.01confident hypotheses
Gain of Allosteric site at W1570.267.0e-03very confident hypotheses
Altered Stability0.190.01confident hypotheses
Altered Coiled coil0.100.04confident hypotheses
D130H0.724Altered Transmembrane protein0.301.9e-04actionable hypotheses
Altered Disordered interface0.280.03actionable hypotheses
Altered Metal binding0.250.03actionable hypotheses
Loss of Relative solvent accessibility0.250.04actionable hypotheses
Altered Ordered interface0.240.04actionable hypotheses
Gain of Disulfide linkage at C1250.100.05actionable hypotheses
Gain of Catalytic site at D1260.080.05actionable hypotheses
D244G0.705Altered Ordered interface0.300.02actionable hypotheses
Altered Disordered interface0.280.04actionable hypotheses
Gain of Strand0.270.02actionable hypotheses
Altered Transmembrane protein0.268.2e-04actionable hypotheses
Altered Metal binding0.180.03actionable hypotheses
Gain of Pyrrolidone carboxylic acid at Q2480.154.6e-03actionable hypotheses
Loss of Proteolytic cleavage at R2430.120.03actionable hypotheses
Loss of Catalytic site at R2430.090.04actionable hypotheses
L105Q0.507Altered Disordered interface0.320.01actionable hypotheses
Gain of Intrinsic disorder0.300.05actionable hypotheses
Altered Transmembrane protein0.292.1e-04actionable hypotheses
Altered Ordered interface0.260.01actionable hypotheses
Altered DNA binding0.160.04actionable hypotheses
Altered Stability0.150.02actionable hypotheses
Loss of N-linked glycosylation at N1100.050.02actionable hypotheses
L105P0.702Altered Disordered interface0.330.01actionable hypotheses
Loss of Helix0.290.01actionable hypotheses
Altered Transmembrane protein0.283.7e-04actionable hypotheses
Altered Ordered interface0.260.01actionable hypotheses
Altered Stability0.200.01actionable hypotheses
Altered DNA binding0.160.04actionable hypotheses
Gain of N-linked glycosylation at N1100.050.02actionable hypotheses

The predictions which are very confident hypotheses shown in bold font

Prediction of functional effects of the most deleterious SNPs on the HLA-G1 by MutPred The predictions which are very confident hypotheses shown in bold font

The structural analysis of the most deleterious selected SNPs on HLA-G isoforms by project Hope server

Project HOPE predicted the effects of amino acid substitutions on native structures of HLA-G isoforms, the hydrophobicity, charge, and size change between wild-type and mutant residue and model of the 3D structure. The HOPE reports indicated that there was no exact known structural information for HLA-G1, 3, and 5 isoforms, and HOPE built the models of them based on homologous structures while the 3D-structures of HLA-G2, 4, 6 and 7 isoforms were known. All results of the effects of the most deleterious predicted SNPs on structures of the HLA-G isoforms and the difference in physicochemical properties of amino acids of wild type and mutated residue are reported in detail in Additional file 2 and supplementary Tables 14–16. For instance, rs555347515 mutation caused amino acid substitution from methionine into a lysine at the 29th position (M29K). The inspection of this mutation on HLA-G1 showed the mutated residue is bigger than the wild-type residue and probably will not fit in the core of the protein and the mutant residue has a positive charge, while the wild-type residue is neutral, so the positive charge can lead to protein folding problems. Furthermore, the mutation will lead to the loss of hydrophobic interplays in the center of the protein. Additionally, the structural analysis of M29K on HLA-G1 showed this variation is located inside a cluster of residues annotated in UniProt as the Alpha-1 domain and can disturb the domain structure and function (Additional file 2). Moreover, A/G mutation (rs556645753) resulted in a change of the aspartic acid to glycine at the 153rd position (D153G). The inspection of this mutation on HLA-G5 showed the mutated residue is smaller than the wild-type residue and this might induce loss of interplays and a further hydrophobic residue that can lead to loss of hydrogen bonds and disturb correct confirmation. The negative charge of the wild-type residue will be lost upon this mutation and this can lead to loss of interactions with other molecules or residues. Moreover, the structural analysis of D153G on HLA-G5 showed this variation is located inside a cluster of residues annotated in UniProt as the Alpha-2 domain and can distract this domain and disturb its function. Glycines are very flexible and can abolish the needed rigidity of HLA-G5 in this area (supplementary Table 14).

Modeling of protein

I-TASSER tool created the 5 high-quality 3D structures for each HLA-G isoform from its amino acid sequence. We submitted the protein sequence of each isoform without signal peptide as an input to I-TASSER because there were no most deleterious SNPs in the peptide signal sequence and removing signal peptide from the protein sequence can improve the speed of I-TASSER simulation without loss of modeling accuracy. I-TASSER used the top 10 templates which are structurally closest to query protein sequence to model the protein (supplementary Table 17). Among the 5 predicted models for each HLA-G isoform, the first model was selected because it had the highest confidence score (C-score) and it was used for further investigation using Chimera (Additional file 3). A greater level of C-score indicates a model with great confidence and conversely.

Chimera software

Chimera viewer was utilized to visualize the structures of the HLA-G isoforms using the first model as predicted by I-TASSER (Additional file 4). Furthermore, the structural characteristics of amino acids in wild and mutant protein chains were visualized by Chimera (Additional file 5 and supplementary Tables 18–20). A physicochemical rationale may be presented for the impact on protein activity by visualizing the location of the mutant amino acids [26].

Functional SNPs in UTR predicted by UTRscan tool

The total of the UTR SNPs was investigated by applying UTRscan. Then analyzing the functional elements for every UTR SNP, the result showed that rs182801644 was related to the creation of functional pattern of uORF, and rs771111444 was related to the creation of a functional patterns of uORF and IRES in 5′UTR (Table 5). The internal ribosome entry site (IRES) is an alternative translation initiation mechanism in a cap-independent process in comparison with the ordinary 5′-cap dependent ribosome scanning mechanism [27]. Upstream open reading frames (uORF) is in the 5’UTR of mRNA that can regulate eukaryotic gene expression [28].
Table 5

Table of HLA-G UTR SNPs with functional importance that were predicted by UTRscan tool

SNP IDNucleotide changeUTR positionFunctional element change
rs182801644C/T5′ UTRno pattern → uORF
rs771111444C/G5′ UTR

no pattern → IRES

no pattern → uORF

Table of HLA-G UTR SNPs with functional importance that were predicted by UTRscan tool no pattern → IRES no pattern → uORF

The functional SNPs located in 3′UTRs region predicted by PolymiTRS

3′ untranslated regions (UTR) as the putative target site for miRNAs is a significant gene expression regulator. The SNP in the 3′ UTR region may disrupt and/or create miRNA target sites. PolymiRTS database predicted functional SNPs in 3′ UTR of the HLA-G gene. Among all the SNPs in the 3′UTR region of the HLAG gene, 5 functional SNPs were predicted to affect the miRNA target sites. The details of the effect of these SNPs on the miRNA sites are listed in Table 6. Two SNPs, rs17179101 and rs1063320 disrupt 9 miRNA conserved sites (ancestral allele with support ≥2), while all of them produce 15 novel miRNA target sites.
Table 6

Prediction results of PolymiRTS database

dbSNP IDmiR IDConservationmiRSiteFunctionClass
rs1707hsa-miR-57022CTGACTCctctttC
hsa-miR-5833ctgactCCTCTTTC
rs17179101hsa-miR-44172AGCCCACccctgtD
hsa-miR-46515agcCCACCCCtgtD
hsa-miR-541-3p2aGCCCACCcctgtD
hsa-miR-6082agcCCACCCCtgtD
hsa-miR-654-5p2aGCCCACCcctgtD
hsa-miR-6756-5p2agCCCACCCctgtD
hsa-miR-6766-5p2agCCCACCCctgtD
hsa-miR-6782-5p3agccCACCCCTgtD
hsa-miR-15872AGCCCAAccctgtC
hsa-miR-296-3p6agccCAACCCTgtC
hsa-miR-31472aGCCCAACcctgtC
hsa-miR-3620-5p2AGCCCAAccctgtC
hsa-miR-46742AGCCCAAccctgtC
hsa-miR-6823-5p3agcccAACCCTGtC
hsa-miR-92a-1-5p2agCCCAACCctgtC
rs180827037hsa-miR-875-3p5tttcctTTTCCAGC
rs138249160hsa-miR-25-5p2tCTCCGCCtctgtC
hsa-miR-60873tctCCGCCTCtgtC
rs1063320hsa-miR-3619-3p3tgTGGTCCActgaC
hsa-miR-4776-5p3tgTGGTCCActgaC
hsa-miR-4800-5p3tgtGGTCCACtgaC
hsa-miR-767-5p3tgTGGTGCActgaD

In miRsite, sequences of the miRNA sites were shown. The capital letters show bases complementary to the seed region and SNPs were shown in bold font.

Prediction results of PolymiRTS database In miRsite, sequences of the miRNA sites were shown. The capital letters show bases complementary to the seed region and SNPs were shown in bold font.

Protein-protein interactions analysis

The mutation may change the structure of a protein and thus the function of protein may change. Therefore, mutated protein may interact with other proteins and lead to phenotypic effects. To investigate the interaction of HLA-G with various proteins, the STRING server was used. The interaction analysis revealed that HLAG is related to Beta-2-microglobulin (B2M), Leukocyte immunoglobulin-like receptor subfamily B member 2 (LILRB2), Leukocyte immunoglobulin-like receptor subfamily B member 1 (LILRB1), Killer cell immunoglobulin-like receptor 2DL4 (KIR2DL4), HLA class I histocompatibility antigen, alpha chain F (HLA-F), HLA class I histocompatibility antigen, A-3 alpha chain (HLA-A), HLA class I histocompatibility antigen, Cw-7 alpha chain (HLA-C), HLA class I histocompatibility antigen, alpha chain E(HLA-E), HLA class I histocompatibility antigen, B-7 alpha chain (HLA-B), T-cell surface glycoprotein CD8 alpha chain (CD8A) (Fig. 5).
Fig. 5

Protein–protein interaction network of HLAG with 10 partners

Protein–protein interaction network of HLAG with 10 partners

The effect of high and low expression levels of HLA-G on overall survival (OS) in patients with various cancers

Kaplan-Meier plotter was exerted to analyze the prognostic value of the HLA-G gene expression for breast, ovarian, lung, and gastric cancers by combining gene expression and cancer patient survival. The subjects were divided into 2 categories (high or low expression levels) according to the median expression of HLA-G. Subsequently, the correlation of expression levels and cancer patient’s overall survival rate was evaluated using the Kaplan-Meier plotter. Hazard ratio (HR) with 95% confidence intervals (CI) and logrank p-value were calculated. HLA-G gene in breast cancer had a hazard ratio (HR) = 0.85 (95% CI, 0.69–1.06) and logrank p-value = 0.15; therefore the result was not statistically significant (HLA-G deregulation had not the prognostic value). HLA-G gene in ovarian cancer had an HR = 0.81 (95% CI, 0.71–0.93) and logrank p-value = 0.0023; therefore the result was statistically significant (the relation between the high expression of HLA-G gene and more survival rate). HLA-G gene in lung cancer had a HR = 1.21 (95% CI, 1.07–1.38) and logrank p-value = 0.0029 and in gastric cancer HR = 1.3 (95% CI, 1.09–1.54) and logrank p-value = 0.0027; therefore the results were statistically significant (the relation between the low expression of HLA-G gene and more survival rate) (Fig. 6). The results showed that HLA-G deregulation has distinct implications in different types of cancers. This study shows, the HLA-G deregulation can serve as a prognostic marker for patients with ovarian, lung, and gastric cancer but not for breast cancer.
Fig. 6

The correlation of deregulation of HLA-G gene and overall survival rate of the cancer patients was evaluated using Kaplan-Meier plotter

The correlation of deregulation of HLA-G gene and overall survival rate of the cancer patients was evaluated using Kaplan-Meier plotter

Discussion

A large number of SNPs have been distributed throughout the human genome. Increasing evidence has suggested that SNPs are important and valuable in the search for the etiologies of human diseases/traits, the drug design, and human drug response [29, 30]. But the large number of SNPs causes a challenge for scientists because studying all SNPs with molecular approaches to choose target SNPs is an expensive, time-consuming and laborious task [29, 31, 32]. A better sense of genetic variations in susceptibility to disease and their phenotypic effects and reducing the number of them that should be screened in molecular studies may be provided by applying in silico methods [26, 33]. Among SNPs, missense SNPs are correlated with single amino acid substitution in the coded protein as a result of single nucleotide change in a codon that may have an intense impact on the structure and functionality of the relevant protein [4]. There is considerable data about SNPs in the dbSNP/NCBI database [34]. There were 301 missense mutations in the coding region of human HLA-G gene and in this study we focused on them in order to identify the most deleterious missense mutations that could modify the structure and function of the HLA-G isoforms. Identification of functional missense mutations and their role(s) may allow an individualized method for therapeutic goals [10]. HLA-G acts as an immune tolerogenic molecule, playing a role in various pathologies [10]. HLA-G primary mRNA is spliced into seven alternative mRNAs that encode 7 different isoforms of HLA-G protein: four membrane-bound (HLA-G1 to G4) and three soluble (HLA-G5 to G7) protein isoforms [35]. Full-length HLA-G protein exhibits a heavy chain consisting of α1 (residues 25 to 114), α2 (residues 115 to 206) and α3 (residues 207 to 298) domains and a light chain (B2M) [15, 36]. The HLA-Gl isoform consists of α1, α2 and α3 domains, transmembrane and cytoplasmic regions. The HLA-G2 isoform lacks the α2 domain. The HLA-G3 isoform does not comprise both the α2 and α3 domains. The HLA-G4 isoform lacks the α3 domain. The HLA-G5 isoform comprises the α1, α2 and α3 domains and lacks transmembrane and cytoplasmic domains as a result of intron 4 retention and encoding a C-terminal peptide sequence of twenty-one amino acid residues. HLA-G6 comprises α1 and α3 domains plus a C-terminal peptide sequence of twenty-one amino acid residues encoded by intron 4 retention and lacks transmembrane and cytoplasmic domains. The HLA-G7 isoform has only the α1 domain and lacks transmembrane and cytoplasmic domains as a result of intron 2 retention and encoding a C-terminal peptide sequence of two amino acid residues. All of these isoforms comprise α1 domain [36]. HLA-G expression has been widely studied in various disorders; nevertheless, the HLA-G gene polymorphism has not been evaluated to the same extent [10]. On the other hand, nearly half of the known gene-related damages for human hereditary diseases are amino acid substitutions. Consequently, screening of polymorphisms using in silico analyses to identify missense SNPs that affect the function of the protein and that are associated with the disease is an important task [29]. Therefore, in the present study, an attempt was made to predict the functional missense SNPs in human HLA-G isoforms. 301 missense SNPs of the human HLA-G gene were retrieved from dbSNP and were submitted to in silico tools to predict the functionally important missense SNPs in HLA-G1 and HLA-G5 and the common most deleterious missense SNPs in membrane-bound isoforms and in the soluble isoforms. Existing in silico methods have diverse strengths and weaknesses in predicting the effect of nsSNP because every algorithm uses different parameters for prediction [37, 38]. Therefore, algorithms individually could not be considered as an accurate method for the prediction of functional SNPs [39]. In consequence, screening and prioritizing the candidate functional nsSNPs requires the implementation of different algorithms with different parameters and aspects (e.g. based on evolutionary information and protein structure and/or functional parameters) to combine the advantages of different methods, to enhance the accuracy and reliability of the predictions and to minimize the errors [5, 39–41]. As a general rule, in each study, at least four or five of these tools should be run to obtain a consensus on the effect of single nucleotide polymorphism on the structure and function of the desired protein [37]. In the current investigation, 8 different prediction algorithms were used as follows: SIFT, PROVEAN, PolyPhen-2, I-Mutant 3.0, SNPs&GO, PhD-SNP, SNAP2 and MUpro for the prediction of deleterious missense SNPs present in HLA-G isoforms. SIFT, PROVEAN, PhD SNP and SNP&GO tools predict damaging SNPs based only on the sequence of a protein. PolyPhen-2 and SNAP2 tools predict the functional effects of mutations based on the combination of protein 3D structure and multiple homolog sequence alignment [37]. I-Mutant 3.0 and MUPro tools investigate the effect of candidate SNPs on protein stability [41]. In our analyses, 35 missense substitutions of all the SNPs in HLA-G 1 isoform were predicted to be most deleterious SNPs by all the programs used. These 35 missense substitutions were classified according to the domain where they were located. Nine (25.71%) substitutions (rs555347515, rs572025435, rs1475659109, rs1390270595, rs763201540, rs1414848134, rs1260086927, rs770412396, rs1161818149) are located in the α1 domain, 11 (31.42%) substitutions (rs17851921, rs565858069, rs749006959, rs772834879, rs1317292772, rs556645753, rs867319917, rs748013931, rs780697086, rs1397132797, rs1379742188) are detected in the α2 domain and 15 (42.85%) substitutions (rs144577485, rs1472538844, rs770027530, rs1200732770, rs1430565057, rs142596947, rs750238738, rs754527717, rs781774818, rs760500349, rs756652306, rs145097667, rs111233577, rs1265409678, rs765275727) are located in the α3 domain. Thirty-five missense SNPs were found to be the most deleterious on the stability and function of HLA-G 5 isoform. Twelve (34.28%) substitutions (rs555347515, rs572025435, rs540632198, rs1475659109, rs1390270595, rs763201540, rs1414848134, rs138289952, rs1260086927, rs770412396, rs1161818149, rs776393668) are located in the α1 domain, 12 (34.28%) substitutions (rs17851921, rs565858069, rs749006959, rs772834879, rs1317292772, rs556645753, rs867319917, rs748013931, rs780697086, rs1397132797, rs1438362414, rs1379742188) are sited in the α2 domain and 11 (31.42%) substitutions (rs144577485, rs1472538844, rs770027530, rs1200732770, rs1430565057, rs142596947, rs750238738, rs781774818, rs760500349, rs145097667, rs765275727) are located in the α3 domain. Eight missense mutations in the α1 domain with positions M29K, R30S, Y51C, D53N/Y, D54V, Q96P, L102P and L105Q/P among all membrane-bound HLA-G isoforms and with positions M29K, F32C, Y51C, D53N/Y, D54V, Q96P, L102P, L105P between all soluble HLA-G isoforms were predicted as common deleterious missense mutations. Evidence indicates that all three domains of the heavy chain of HLA-G molecule are involved in inhibiting immune response through interactions with other molecules, for instance, the α1 domain is an important KIR2DL4 recognition site and the LILRB1, LILRB2 and CD8 molecules interact with the α 3 domain. Nucleotide variations in these domains may affect the function of the HLA-G molecule. For example, the mutations around domain α1 and α2 affect peptide loading, peptide diversity, and T-cell recognition [10, 15, 42]. In this study, the selected variations were further investigated by other servers. For the rational prioritization of the selected most deleterious SNPs for further studies, an analysis of the evolutionary conservation of selected missense mutations was performed by ConSurf. The amino acids at the conserved regions of protein across species are biologically and functionally very important and SNPs that alter these amino acids may lead to structural and functional changes in the protein [29, 31]. We have shown that the selected deleterious SNPs in HLA-G1, HLA-G5, the membrane isoforms and the soluble isoforms were mostly in conserved positions and were functional and structural residues, which indicate these SNPs can be deleterious. The MutPred2 web-server predicted the possible molecular mechanisms that result from selected deleterious missense SNPs. The majority of the selected deleterious SNPs were predicted as ‘pathogenic’ (a g score greater than 0.5) and they are depicted as actionable, confident, and very confident hypotheses based on the g score and p score. The most predicted effects of very confident hypotheses in HLA-G1 and HLA-G5 were altered transmembrane protein and altered ordered interface. There was not any common predicted effect as very confident hypothesis among all of the membrane-bound HLA-G isoforms. The common predicted effect as very confident hypothesis among all of the soluble HLA-G isoforms was altered ordered interface resulting from F32C substitution. HOPE investigated the structural effects of the selected deleterious missense SNPs in HLA-G isoforms. The results revealed that nsSNPs are located in each of the three domains (α1, α2 and α3) of HLA-G. Since the function of any protein depends straightly on its tertiary structure, the modification in the structure of the domain can disrupt its function. The native protein 3D structures are very necessary for better understanding of the functional and structural effect of mutations. In the present study, because the 3D structure of all HLA-G isoforms is not available yet in the PDB database [43]; 3D structural models of native HLA-G isoforms were constructed by I-Tasser server and were visualized using Chimera software. Further, Chimera software was used to visualize the structural consequences of amino acid changes. The HLA-G promoter region is special in the class of the HLA genes. The 5′ UTR and 3′ UTR regions of HLA-G gene display many polymorphic sites that may affect HLA-G expression and therefore tissue distribution in healthy and pathological conditions [10]. UTRscan analyzed the 5′ and 3′ UTR SNPs of the HLA-G gene. Two SNPs in the 5′ UTR were determined to create the functional patterns. The rs182801644 was related to creation of the functional pattern of uORF and rs771111444 was related to creation of the functional patterns of uORF and IRES in 5′UTR. The creation of uORF due to SNPs can deregulate the downstream original ORF expression and therefore be the cause of pathological conditions [44]. Furthermore, the presence of new IRES due to SNPs affects the regulation of mRNA translation [45]. To better understand the consequences of these UTR SNPs, investigation at the functional levels is needed. PolymiRTS predicted that 5 functional SNPs are present in the HLA-G mRNA 3′ UTR, two of which them disrupt 9 target sites of the miRNA and all five SNPs create 15 new miRNA target sites. MicroRNAs play an important role in translation regulation. Thus disrupting or creating the microRNA target sites influences the regulation of gene and may lead to pathological conditions [10]. STRING analysis is a global way to understand protein-protein interactions. Any change in protein structure and function can affect its ability to interact with other molecules. STRING map showed the interaction of HLA-G with 10 different proteins. Some experimental studies confirm the interaction of HLA-G with these predicted proteins [9, 10, 14, 15, 17, 18, 46–53]. Lastly, the outcomes obtained from Kaplan Meier bioinformatics analyses indicated that the HLA-G gene deregulation affected the overall survival rate of patients with ovarian, lung and gastric cancer and had the prognostic significance. However, there are some controversies in relation to published original studies as presented in Table 7.
Table 7

The correlation of deregulation of HLA-G gene and overall survival rate of the cancer patients as reported by previous studies in comparison with our results

Type of cancerResults found in our studyResults found in some previous studiesControversyReference
Breast CancerHLA-G gene in breast cancer had a hazard ratio (HR) = 0.85 (95%CI, 0.69 − 1.06) and log-rank p-value= 0.15; therefore the result was not statistically significant.In the whole cohort of patients, HLA-G showed no statistically significant difference in outcome between expression versus no expression for overall survival (P = 0.74).No[54]
Breast cancer patients with positive HLA-G expression had a lower survival rate in comparison with negative HLA-G expression patients (P = 0.028).Yes[55]
HLA-G upregulated expression was confirmed in breast cancer. HLA-G was associated with both improved relapse-free survival and overall survival.Yes[56]
The expression of HLA-G was significantly higher in invasive ductal breast cancer patients with shorter survival time (P = 0.03).Yes[57]
Breast cancer patients with HLA-G-positive tumor cells had shorter disease-free survival, though not significantly (P = 0.14).No[58]
Ovarian CancerThe relation between the high expression of HLA-G gene and more survival rate was statistically significant (less number of patients at risk) (HR = 0.81 (95%CI, 0.71 − 0.93) and log-rank p-value= 0.0023)Ovarian cancer patients with HLA-G expression >17% showed poor survival than those with HLA-G expression <17% group with a P value of 0.04.Yes[59]
The HLA-G5/-G6 was expressed in 79.7% (94/118) of ovarian cancer lesions. lesion HLA-G5/-G6 expression was unrelated to clinicoparameters including histological type, patient age, FIGO stages and patient survival.Yes[60]
Survival was prolonged when ovarian tumors expressed HLA-G (P = 0.008) and HLA-G was an independent predictor for better survival (P = 0.011). Furthermore, longer progression-free survival (P = 0.036) and response to chemotherapy (P = 0.014) was correlated with expression of HLA-G.No[61]
The Kaplan-Meier analysis demonstrated no significant association between survival and HLA-G expression status in ovarian carcinoma patients.Yes[62]
Lung cancerHLA-G gene in lung cancer had a HR = 1.21 (95%CI, 1.07 − 1.38) and log-rank p-value= 0.0029; therefore the result was statistically significant (the relation between the low expression of HLA-G gene and more survival rate)The Higher sHLA-G level above the median (≥50 U/ml) in patients is associated with statistically significant shorter survival time in comparison to the lower sHLA-G expression (P < 0.0001).No[63]
sHLA-G expression was observed in 34.0% (45/131) of the NSCLC lesions, which was unrelated to patient survival.Yes[64]
Plasma sHLA-G above the median level (≥median, 32.0 U/ml) in NSCLC patients is strongly associated with shorter survival time (P = 0.044).No[65]
Patients with sHLA-G <40 ng/ml (p = 0.073) showed prolonged overall survival.No[66]
Patients with HLA-G positive tumors had a significantly shorter survival time than those with tumors that were HLA-G negative (P = 0.001).No[67]
Survival analyses were shown that the HLA class I loss was correlated to recurrence-free survival time.No[68]
Gastric carcinoma patients with HLA-G positive tumors had a significantly shorter survival time than those patients with tumors that were HLA-G negative (P = .001).No[69]
Gastric cancerHLA-G gene in gastric cancer had a HR = 1.3 (95%CI, 1.09 − 1.54) and log-rank p-value= 0.0027; therefore the result was statistically significant (the relation between the low expression of HLA-G gene and more survival rate.Kaplan-Meier analyses indicated that patients with HLA-G-positive gastric cancer had a poorer prognosis than those with HLA-G negative gastric cancer (P = 0.008).No[70]
The overall median survival was worse in gastric adenocarcinoma patients with HLA-G-positive tumors compared to those with HLA-G-negative tumors (p < 0.0001).No[71]
Kaplan–Meier analysis showed that gastric cancer patients with HLA-G expression had a significantly poorer overall survival than those without HLA-G expression at 5 years after the operation.No[72]
The 5-year survival rate of gastric cancer patients in the HLA-G-positive group was significantly higher than the HLA-G-negative group.Yes[73]
The correlation of deregulation of HLA-G gene and overall survival rate of the cancer patients as reported by previous studies in comparison with our results Altogether, the findings of the analyses displayed probable alterations that may disrupt the structure and function of HLA-G protein. The deleterious missense mutations determined in this inspection may have functional effects in HLA-G deregulation and may lead to pathological conditions like cancer.

Conclusion

The implementation of in silico SNP prioritization methods suggests a remarkable framework for the recognition of functional SNPs by reducing the number of alterations that should be screened in molecular studies. Further validation of the results obtained from the current study is recommended using clinical and/or laboratory investigations.

Methods

Extracting SNPs and protein sequences of HLA-G isoforms from the databases

In December 2018, NCBI dbSNP database [74] (https://www.ncbi.nlm.nih.gov/snp/) was used to collect information of missense nsSNPs and SNPs in the UTRs of human HLA-G gene. The amino acid sequences of seven human HLA-G isoforms (UniProt ID: P17693–1, P17693–2, P17693–3, P17693–4, P17693–5, P17693–6 and P17693–7) were obtained from the UniProt database [75] (https://www.uniprot.org/uniprot/P17693) in FASTA format for the next stages in this study.

Predicting the most deleterious missense nsSNPs

We used eight online bioinformatics tools (SIFT, PROVEAN, PolyPhen-2, I-Mutant 3.0, SNPs&GO, PhD-SNP, SNAP2 and MUpro) to increase the precision of prediction of the most deleterious missense nsSNPs. Missense nsSNPs found to be most deleterious using these eight tools were further analyzed by several other programs in the next stages. Sorting intolerant from tolerant (SIFT) [76] (available at https://sift.bii.a-star.edu.sg/) tool expresses whether a missense mutation at special position effects on the structure and function of protein molecule based on sequence homology and the physiochemical characteristics of substituted amino acid. SIFT computes the normalized probability score (SIFT score) for each substitution. The SIFT score has a range of 0.0 to 1.0. The amino acid substitution with a score greater than or equal to 0.05 (≥0.05) is predicted as tolerated (polymorphism) whereas a score less than 0.05 (< 0.05) is predicted to be damaging (related to disease). Protein Variation Effect Analyzer (PROVEAN) (available at provean.jcvi.org/) is another sequence homology-based predictor. It is used to assess the possible functional influence of nonsynonymous (single or multiple nonsynonymous) and in-frame indel (insertions and deletions) variations on a protein. It predicts the variation as deleterious or natural, if the functional impact score is less than or equal to − 2.5 (≤ − 2.5) it is estimated deleterious; score above − 2.5 (> − 2.5) is estimated neutral [77]. Polymorphism Phenotyping version2 (PolyPhen-2) (available at genetics.bwh.harvard.edu/pph2/) is a combination of protein 3D structure and multiple homolog sequence alignment-based method. It predicts the potential consequences of single amino acid substitution on both protein function and structure. The prediction is provided as benign, possibly damaging and probably damaging according to the position-specific independent count (PSIC) scores difference between 2 variants (wild amino acid (aa1) and mutant amino acid (aa2)). PSIC score has a range of 0.0 to 1.0. The amino acid substitution with a score of 0.0 to 0.49 is predicted as benign, with a score of 0.5 to 0.89 is predicted as damaging and with a score of 0.9 to 1 is predicted as probably damaging [78, 79]. I-Mutant 3.0 (available at gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi) is a web server including Support Vector Machine (SVM) based predictors suite. It predicts the effect of a particular amino acid substitution on the stability of protein under default parameters (at room temperature and neutral pH) starting from the protein sequence, mutational position and the corresponding novel residue. The protein stability change can disturb both protein function and structure [80]. I-Mutant 3.0 predicts the protein stability change in the unit of change in Gibbs free energy (ΔΔG or DDG). The DDG value (kcal/mol) is computed from the unfolding Gibbs free energy value of the mutant protein minus the unfolding Gibbs free energy value of native protein. The prediction is classified into three categories: neutral stability of the mutated protein (− 0.5 ≤ DDG ≤ 0.5 kcal/mol), a large decrease of stability of the mutated protein (≤ − 0.5 kcal/mol) and large increase of stability of the mutated protein (> 0.5 kcal/mol) [81]. Single Nucleotide Polymorphism Database (SNPs) & Gene Ontology (GO) (SNPs&GO) (available at snps.biofold.org/snps-and-go/snps-and-go.html) is a GO-integrated and single SVM-based predictor. It predicts whether an amino acid substitution is disease-associated or not using functional GO terms, 3D protein structure and protein sequence evolutionary information. The amino acid substitution is associated with the disease if the probability score is greater than 0.5 (> 0.5) [82]. Predictor of human Deleterious Single Nucleotide Polymorphisms (PhD-SNP) (available at snps.biofold.org/phd-snp/phd-snp.html) is a support vector machine (SVM) based server. This server determines whether a certain amino acid substitution is related to disease or neutral by protein sequence information, protein structure, conservation and solvent accessibility. The output is a probability index with a score of 0.0 to 1.0, when the score is higher than 0.5, the substituted amino acid is pathogenic [77, 81]. Screening for Non-Acceptable Polymorphisms (SNAP2) (available at https://rostlab.org/services/snap/) is a neural network-based tool that classifies amino acid substitutions into effective and neutral on protein function by taking a diversity of sequences and different characteristics into consideration. SNAP2 provides a list of all possible substitutions within the protein sequence with a score, functional effect (neutral or effect) and expected accuracy for any replacement. The expected accuracy shows the level of confidence for each prediction. The results are also displayed in heat map representation [83-85]. MUpro (available at mupro.proteomics.ics.uci.edu/) uses the Support Vector Machine (SVM) to assess the variation in the stability of the protein consequent to amino acid substitutions. The output is a confidence score among − 1 and 1. A confidence score < 0 indicates the substituted amino acid decreases the stability and a score > 0 indicates the substituted amino acid increases the stability [86].

Selecting the most deleterious missense nsSNPs for further study

Missense nsSNPs that were predicted deleterious by all eight servers were selected for further study. The precision of prediction increases to a greater extent by incorporating the scores of all eight servers.

Predicting the evolutionary conservation of the most deleterious missense nsSNPs by ConSurf

ConSurf web-server (available at consurf.tau.ac.il/) estimates the evolutionary conservation of each residue in a protein utilizing a Bayesian algorithm which often provides the possibility of identifying key structural and functional residues. The extent of conservation of residue at a specific position in a protein was computed by phylogenetic information of close homologous sequences. The measure of residue conservation is shown by the conservation score along with the color scheme as follows: 1–4 variable, 5–6 average, and 7–9 conserved. The ConSurf web - server also determines the buried (b) or exposed (e) residues of protein according to the HHPred 3D model. A residue is predicted functional residue if it is very conserved and exposed and a structural residue is predicted if it is very conserved and buried [87, 88].

Studying the most deleterious missense nsSNPs by MutPred2 server

MutPred is a bioinformatics web server (available at mutpred.mutdb.org/). It predicts whether a particular missense mutation in a human protein is disease-associated or not, along with its structural and functional effects (effective molecular characteristics). The result of MutPred consists of two important scores (general (g) score and top 5 molecular properties score (p)), affected PROSITE and ELM motifs and changes of different structural and functional properties. The g score (MutPred score) expresses the probability that the missense mutation is disease-related. The g score is between 0.0 and 1.0. The g score > 0.5 means the substituted amino acid is probably pathogenic and if g score is > 0.75, the mutation is more assurance pathogenic. The top 5 molecular properties score (p) is a P-value that indicates whether predicted changes of functional and structural characteristics of the protein due to the particular missense mutation are statistically significant. The predicted change is confident if p-value is less than 0.05 (< 0.05) and is very confident if p-value is less than 0.01 (< 0.01). The given coalescences of high levels of g scores and low levels of p scores are called hypotheses. Any prediction according to the scores is put in one of these 3 groups: very confident hypotheses (g > 0.75 and p < 0.01), confident hypotheses (g > 0.75 and p < 0.05) and actionable hypotheses (g > 0.5 and p < 0.05 [89, 90].

Analyzing the effects of the most deleterious missense SNPs on the 3D structure of the HLA-G isoforms by HOPE project

Project Have yOur Protein Explained (HOPE) is a web server (available at www.cmbi.ru.nl/hope/) that was used for the investigation of the impacts of a missense mutation on the native protein structure. HOPE will roll up and incorporate available information from UniProtKB, protein’s 3D structure and DAS-servers. As regards the exact 3D-structures of some HLA-G protein isoforms are unknown; HOPE built the model of them based on homologous structures. HOPE processes the gathered data and produces a report, including schematic structures of the wild-type and the mutant amino acids, differences in the properties of wild-type and mutant amino acids and the impacts of a substituted amino acid on the protein structure along with figures and animations [91].

Simulating the three-dimensional (3D) structure of HLA-G isoforms by I-TASSER

To investigate the impact of missense mutations on the structure protein, simulating the protein structure is essential. Iterative Threading ASSEmbly Refinement (I-TASSER) (available at https://zhanglab.ccmb.med.umich.edu/I-TASSER/) is a united program to create the complete protein model and predict protein function based upon the sequence-to-structure-to-function paradigm. Therefore, we used I-TASSER to achieve the high-quality three-dimensional (3D) models of HLA-G protein isoforms by submitting their amino acid sequences in FASTA format. The models are created by excising continuous fragments from threading alignments and iterative structural assembly simulations and their functions are derived by matching the 3D models with other known proteins structurally. I-TASSER produces a report, including predicted secondary and tertiary structures, functional annotations and Gene Ontology terms. The accuracy of predicted models is reflected in the form of the confidence score (C-score). The C-score range is between − 5 and 2. The more values of C-score display higher confidence for the predicted model. Five three-dimensional (3D) models were created for each HLA-G protein isoform and the best model was selected according to C-score values [92, 93].

Analyzing changes in HLA-G isoforms 3D structure due to amino acid substitution by UCSF chimera

UCSF Chimera is a program for molecular visualization, molecular structures study and related data (available at https://www.cgl.ucsf.edu/chimera/). The structures of the HLA-G isoforms predicted with I-TASSER in PDB formatted structure files were visualized by Chimera. Chimera was also used to achieve the 3D mutated models of the wild models of HLA-G isoforms with the most deleterious missense SNPs predicted in this project. The outputs are graphical models [94].

Founding functional SNPs in UTR by the UTRscan (available at http://itbtools.ba.itb.cnr.it/utrscan)

This tool is for scrutinizing UTR functional elements throughout user-submitted sequence data for any of the patterns collected in the UTRsite and UTR databases. UTRsite is a pile of functional sequence patterns found in 5ˊ and 3ˊ UTR sequences. If two or three sequences of each particular UTR SNP are concluded to have various functional patterns, specific UTR SNP is determined to have functional significance [95].

PolymiRTS database 3.0 (polymorphism in microRNAs and their target sites) (available at compbio.uthsc.edu/miRSNP/)

PolymiRTS is a database to analyze the 3’UTR regions of mRNAs in Homo sapiens and mouse for SNPs and INDELs variations in microRNA target sites. The polymorphisms of microRNA target sites may alter miRNA-mRNA interactions and accordingly gene expressions. The variations are divided into four categories according to their effect: “D” (the derived allele disrupts a conserved miRNA site), “N” (the derived allele disrupts a nonconserved miRNA site), “C” (the derived allele creates a new miRNA site) and “O” (the ancestral allele cannot be determined). “D” and “C” groups are most likely to have functional effects because they may lead to loss of normal repression and abnormal gene repression control, respectively. We submitted the HLA-G gene symbol to the program and the analysis was performed automatically on the transcript variant 2 (transcript ID: NM_002127) and functional SNPs were determined [96].

Predicting protein-protein interactions by search tool for the retrieval of interacting proteins (STRING) (available at http://string-db.org/)

STRING is a database of protein-protein interactions. The database contains data from empirical evidences, computational prediction tools and collections of universal text. This provided availability to both experimental and theoretical interaction data of HLA-G [97, 98].

Kaplan-Meier plotter analysis (KM plotter) (available at https://kmplot.com/analysis/)

The Kaplan Meier plotter is a tool to evaluate the impact of 54,000 genes on survival in 21 types of cancer using the microarray gene expression data. A meta-analysis based detection and validation of biomarkers for cancer patients is the primary aim of Kaplan-Meier. The ʽ211528_x_at̕ probe was used for HLA-G gene. Here, the overall survival (OS) is the period of time from the start of a change in specific gene expression (decrease or increase expression) for a cancer, that patients diagnosed with it are still alive. The expression in patients for each cancer was graded and allocated high and low expression groups according to the median level. The overall survival analysis was performed on 1402 cases of breast cancer, 1656 cases of ovarian cancer, 1926 cases of lung cancer and 876 cases of gastric cancer. These two groups of patients for cancer listed above were compared and the survival was evaluated. The p-values less than 0.05 were regarded as statistically significant [99-102]. Additional file 1: Table 5. Analysis of structural effects of deleterious SNPs on HLA-G1 by Project HOPE Table 6. Five models predicted for each human HLA-G isoform by I-TASSER Table 7. Structural representations of native isoforms of HLA-G predicted with I-TASSE and visualized with UCSF Chimera Table 8. Graphical representations of amino acid changes due to the most deleterious SNPs in isoform 1
  90 in total

Review 1.  HLA-G remains a mystery.

Authors:  D Bainbridge; S Ellis; P Le Bouteiller; I Sargent
Journal:  Trends Immunol       Date:  2001-10       Impact factor: 16.687

2.  Implementing an online tool for genome-wide validation of survival-associated biomarkers in ovarian-cancer using microarray data from 1287 patients.

Authors:  Balázs Gyorffy; András Lánczky; Zoltán Szállási
Journal:  Endocr Relat Cancer       Date:  2012-04-10       Impact factor: 5.678

3.  Prediction of protein stability changes for single-site mutations using support vector machines.

Authors:  Jianlin Cheng; Arlo Randall; Pierre Baldi
Journal:  Proteins       Date:  2006-03-01

4.  The BioPlex Network: A Systematic Exploration of the Human Interactome.

Authors:  Edward L Huttlin; Lily Ting; Raphael J Bruckner; Fana Gebreab; Melanie P Gygi; John Szpyt; Stanley Tam; Gabriela Zarraga; Greg Colby; Kurt Baltier; Rui Dong; Virginia Guarani; Laura Pontano Vaites; Alban Ordureau; Ramin Rad; Brian K Erickson; Martin Wühr; Joel Chick; Bo Zhai; Deepak Kolippakkam; Julian Mintseris; Robert A Obar; Tim Harris; Spyros Artavanis-Tsakonas; Mathew E Sowa; Pietro De Camilli; Joao A Paulo; J Wade Harper; Steven P Gygi
Journal:  Cell       Date:  2015-07-16       Impact factor: 41.582

5.  In silico analysis of structural and functional consequences in p16INK4A by deleterious nsSNPs associated CDKN2A gene in malignant melanoma.

Authors:  R Rajasekaran; C George Priya Doss; C Sudandiradoss; K Ramanathan; Rao Sethumadhavan
Journal:  Biochimie       Date:  2008-06-05       Impact factor: 4.079

6.  Matrix metalloproteinase-2 (MMP-2) generates soluble HLA-G1 by cell surface proteolytic shedding.

Authors:  Roberta Rizzo; Alessandro Trentini; Daria Bortolotti; Maria C Manfrinato; Antonella Rotola; Massimiliano Castellazzi; Loredana Melchiorri; Dario Di Luca; Franco Dallocchio; Enrico Fainardi; Tiziana Bellini
Journal:  Mol Cell Biochem       Date:  2013-06-05       Impact factor: 3.396

Review 7.  Implications of the polymorphism of HLA-G on its function, regulation, evolution and disease association.

Authors:  Eduardo A Donadi; Erick C Castelli; Antonio Arnaiz-Villena; Michel Roger; Diego Rey; Philippe Moreau
Journal:  Cell Mol Life Sci       Date:  2010-11-24       Impact factor: 9.261

8.  HLA-G expression is an independent predictor for improved survival in high grade ovarian carcinomas.

Authors:  M J Rutten; F Dijk; C D Savci-Heijink; M R Buist; G G Kenter; M J van de Vijver; E S Jordanova
Journal:  J Immunol Res       Date:  2014-05-27       Impact factor: 4.818

9.  HeatMapViewer: interactive display of 2D data in biology.

Authors:  Guy Yachdav; Maximilian Hecht; Metsada Pasmanik-Chor; Adva Yeheskel; Burkhard Rost
Journal:  F1000Res       Date:  2014-02-13

10.  ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules.

Authors:  Haim Ashkenazy; Shiran Abadi; Eric Martz; Ofer Chay; Itay Mayrose; Tal Pupko; Nir Ben-Tal
Journal:  Nucleic Acids Res       Date:  2016-05-10       Impact factor: 16.971

View more
  2 in total

Review 1.  NELL-1 in Genome-Wide Association Studies across Human Diseases.

Authors:  Xu Cheng; Jiayu Shi; Zhonglin Jia; Pin Ha; Chia Soo; Kang Ting; Aaron W James; Bing Shi; Xinli Zhang
Journal:  Am J Pathol       Date:  2021-12-07       Impact factor: 5.770

2.  Computational Analysis of Deleterious SNPs in NRAS to Assess Their Potential Correlation With Carcinogenesis.

Authors:  Mohammed Y Behairy; Mohamed A Soltan; Mohamed S Adam; Ahmed M Refaat; Ehab M Ezz; Sarah Albogami; Eman Fayad; Fayez Althobaiti; Ahmed M Gouda; Ashraf E Sileem; Mahmoud A Elfaky; Khaled M Darwish; Muhammad Alaa Eldeen
Journal:  Front Genet       Date:  2022-08-16       Impact factor: 4.772

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.