Literature DB >> 27478803

A Bioinformatics Approach to Prioritize Single Nucleotide Polymorphisms in TLRs Signaling Pathway Genes.

Behnam Alipoor¹, Hamid Ghaedi², Mir Davood Omrani², Milad Bastami³, Reza Meshkani¹, Taghi Golmohammadi¹.

Abstract

It has been suggested that single nucleotide polymorphisms (SNPs) in genes involved in Toll-like receptors (TLRs) pathway may exhibit broad effects on function of this network and might contribute to a range of human diseases. However, the extent to which these variations affect TLR signaling is not well understood. In this study, we adopted a bioinformatics approach to predict the consequences of SNPs in TLRs network. The consequences of non-synonymous coding SNPs (nsSNPs) were predicted by SIFT, PolyPhen, PANTHER, SNPs&GO, I-Mutant, ConSurf and NetSurf tools. Structural visualization of wild type and mutant protein was performed using the project HOPE and Swiss PDB viewer. The influence of 5'-UTR and 3'- UTR SNPs were analyzed by appropriate computational approaches. Nineteen nsSNPs in TLRs pathway genes were found to have deleterious consequences as predicted by the combination of different algorithms. Moreover, our results suggested that SNPs located at UTRs of TLRs pathway genes may potentially influence binding of transcription factors or microRNAs. By applying a pathway-based bioinformatics analysis of genetic variations, we provided a prioritized list of potentially deleterious variants. These findings may facilitate the selection of proper variants for future functional and/or association studies.

Entities: Chemical Disease Gene Mutation Species

Keywords: Bioinformatics; in-silico analysis; single nucleotide polymorphisms; toll- like receptors

Year: 2016 PMID： 27478803 PMCID： PMC4947211

Source DB: PubMed Journal: Int J Mol Cell Med ISSN： 2251-9637

Toll-like receptors (TLRs) are a major class of the pattern- recognition receptors of the innate immune system involved in the identification of pathogen-associated molecular patterns (PAMPs) from infectious pathogens (1-2). These trans-membrane proteins engage with PAMPs and trigger activation of intracellular signaling cascades, leading to the induction of genes that regulate the expression of pro- inflammatory cytokines and chemokines (3-4). Due to the critical roles of TLRs signaling network in the initiation of innate immune responses, malfunction of genes involved in this pathway may predispose individuals to numerous human diseases ranging from infectious and chronic inflammatory to cancers and autoimmune diseases (5-6). Accumulating evidence now suggests that genetic variations in TLRs pathway genes may exhibit deleterious effects on gene function, leading to the dysregulation of this signaling pathways (7-8). Single nucleotide polymorphisms (SNPs) are the shortest and the most frequent variations in the human genome. Among these, the functional consequences of untranslated regions (UTRs) and non-synonymous (nsSNPs) SNPs are of special interest, as they can either modulate gene expression or influence protein structure and function (9-10). Although the contribution of SNPs in TLR signaling to human pathological states was addressed by several studies, a comprehensive and prioritized list of SNPs potentially affecting the function and regulation of this pathway is still lacking. Therefore, this study aimed to systematically identify the UTR-SNPs and nsSNPs in genes involved in TLRs signaling network by employing a bioinformatics approach and predicting their deleterious functional and structural consequences.

Materials and methods

Retrieving SNPs in TLRs pathway genes Data on the human TLRs pathway genes were collected from national center for biological information (http://www.ncbi.nlm.nih.gov/) (acce-ssed May 2015) (Table 1). Genes implicated in TLRs pathway and their functional connections were retrieved by querying Kyoto encyclopedia of genes and genomes (KEGG) (http:// www. genome.jp/ kegg/) (accessed May 2015) (Figure 1). SNPs located in TLRs network genes were retrieved from dbSNP (http:// www. ncbi. nlm. nih. gov/SNP/) (accessed June 2015). For each SNP, the following information was recorded: SNP ID, genomic coordinate, and variation type. Protein information of TLR network genes was retrieved from UniProt (http: // www. uniprot.org/) (accessed

Table 1

TLR signaling pathway genes list.

	Name	Gene ID	Location	MIM	Number of SNPs
1	TLR1	7096	Chr 4	601194	321
2	TLR2	7097	Chr 4	603028	537
3	TLR3	7098	Chr 4	603029	400
4	TLR4	21898	Chr 9	603030	606
5	TLR5	7100	Chr 1	603031	790
6	TLR6	10333	Chr 4	605403	854
7	TLR7	51284	Chr X	300365	544
8	TLR8	51311	Chr X	300366	270
9	TLR9	54106	Chr 3	605474	509
10	MYD88	4615	Chr 3	602170	123
11	TIRAP	114609	Chr 11	606252	267
12	IRAK1	3654	Chr X	300283	235
13	IRAK4	51135	Chr 12	606883	601
14	TRAF6	7189	Chr 11	602355	579
15	TRAF3	7187	Chr 14	601896	2570
16	TAB1	10454	Chr 22	602615	1989
17	TAB2	23118	Chr 6	605101	3967
18	MAP3K7	6885	Chr 6	602614	1267
19	IKBKG	8517	Chr X	300248	222
20	IKBKB	3551	Chr 8	603258	1376
21	CHUK	1147	Chr 10	600664	750
22	NFKBIA	4792	Chr 14	164008	143
23	NFKB1	4790	Chr 4	164011	2060
24	MAP2K1	5604	Chr 15	176872	2124
25	MAPK1	5594	Chr 22	176948	2335
26	MAP2K3	5606	Chr 17	602315	1329
27	MAP2K7	5609	Chr 19	603014	317
28	MAPK14	1432	Chr 6	600289	1778
29	MAPK8	5599	Chr 10	601158	2450
30	FOS	2353	Chr 14	164810	101
31	TICAM1	148022	Chr 19	607601	438
32	RIPK1	8737	Chr 6	603453	1322
33	IKBKE	9641	Chr 1	605048	696
34	TBK1	29110	Chr 12	604834	895
35	IRF3	3661	Chr 19	603734	199
36	IRF5	3663	Chr 7	607218	284
37	IRF7	3665	Chr 11	605047	173

Fig. 1

Schematic presentation of gene network implicated in TLR signaling pathway. Direction of signal transduction is exhibited by arrows.

June 2015). Predicting UTR-SNPs consequences To evaluate the conservation score, we used genomic evolutionary rate profiling (GERP) track implemented in UCSC (https://genome.ucsc.edu/) to calculate the GERP++conservation score for each SNPs. Genomic Evolutionary Rate Profiling (GERP) is a method for producing position-specific estimates of evolutionary constraint using maximum likelihood evolutionary rate estimation. Constraint intensity at each individual alignment position is quantified in terms of a "rejected substitutions" (RS) score, defined as the number of substitutions expected under neutrality minus the number of substitutions "observed" at the position. Positive scores represent a substitution deficit (i.e., fewer substitutions than the average neutral site) and thus indicate that a site may be under the evolutionary constraint. Negative scores indicate that a site is probably evolving neutrally; negative scores should not be interpreted as evidence of accelerated rates of evolution because of too many strong confounders, such as alignment uncertainty or rate variance. The effects of UTR-SNPs on local RNA secondary structure were predicted using mode 1 of RNAsnp program (v 1.1). The software requires RNA sequence and SNP as inputs and uses a window of 400 nucleotides, ±200 nucleotide on either side of the SNP position to obtain subsequences and generate the base-pairing probability matrix for the corresponding wild type and mutant alleles. Then, RNAsnp computes the Euclidian distance (d) and Pearson correlation coefficient (r) for all sequence intervals with a minimum length of 50 that have self-contained base pairs to assess structural difference between the wild type and mutant alleles and reports the interval with the maximum base pairing distance (dmax) or minimum correlation coefficient (rmin) along with the corresponding empirical p-value (11). Here, we used both measures independently and defined structure disruptive UTR-SNPs as those with significant dmax or rmin (significance threshold is p< 0.2 as defined by RNAsnp). RegulomeDB Version 1.1 (12) was used to annotate UTR-SNPs with known and predicted regulatory elements of the genome including the regions of DNase hypersensitivity, binding sites and motifs of transcription factors, chromatin state and the expression of quantitative trait loci. To have further annotations, we identified 3'-UTR SNPs residing in microRNAs target sites. A comprehensive dataset of experimentally supported miRNAs target sites, including CLIP-Seq supported interactions from starBase version 2 (http:// starbase.sysu.edu.cn/) (13) and CLASH verified interactions extracted from PolymiRTS database, were compiled (http://compbio.uthsc.edu/miRSNP/) (14). TLR signaling pathway genes list. Schematic presentation of gene network implicated in TLR signaling pathway. Direction of signal transduction is exhibited by arrows. Analyzing the functional and structural conse-quences of non- synonymous SNPs Phenotypic effects of amino acid substitution on protein function were predicted by Sorting intolerant from tolerant (SIFT) (http://sift.jcvi.org/). In this study, a list of nsSNPs (rsIDs) from NCBI's dbSNP database was submitted as a query sequence to SIFT to predict tolerated and deleterious substitutions for every position of sequence. nsSNPs with SIFT score0.05 were classified as deleterious and those>0.05 were classified as tolerated (15). Polymorphism Phenotyping-2 (PolyPhen-2) (http://genetics.bwh.harvard.edu/ pph2/) predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative conside-rations. Input options for this tool are comprised of protein sequence, database ID/ accession number and details of amino acids substitution. For a given substitution, prediction outcome can be one of possibly damaging, probably damaging, and benign (16). Protein analysis through evolutionary relati-onships (PANTHER) (http:// www.pantherdb. org/) estimates the likelihood of a particular nsSNPs to cause a functional impact on the protein. This tool calculates the substitution position-specific evolutionary conservation (subPSEC) score based on an alignment of evolutionarily related proteins. The subPSEC scores are continuous values from 0 (neutral) to about -10 (most likely to be deleterious). A cutoff of -3 corresponds to a 50% probability that a score is deleterious. From this, the probability that a given variant will cause a deleterious effect on protein function is estimated by Pdeleterious, such that a subPSEC score of -3 corresponds to a Pdeleterious of 0.5 (17). SNPs database and gene ontology (GO) (http://snps.biofold.org/snps-and-go/snps-and-go.html) have been optimized to predict if a given single point protein variation can be classified as disease associated or neutral. A probability > 0.5 indicates that the mutation at the protein is disease-related (18). ConSurf web-server (http://consurf.tau.ac.il/) is a bioinformatics tool for estimating the evolutionary conservation of amino acid positions in a protein molecule based on the phylogenetic relations between homologous sequences. The continuous conservation scores are divided into a discrete scale of nine grades for visualization, from the most variable positions (grade 1) colored turqu-oise, through intermediately conserved positions (grade 5) colored white, to the most conserved positions (grade 9) colored maroon. I-Mutant (http:// folding. uib.es/ i-mutant/ i-mutant 2.0.html) is a neural network based web server for the automatic prediction of protein stability changes upon amino acid substitution. This tool provides the scores for free energy alterations, DDG<0 and DDG> 0 indicate reduction and elevation of the stability, respectively (19). NetSurfp (http: //www. cbs.dtu. dk/services /NetSurfP/) predicts the relative and absolute surface accessibility and secondary structure of residues in amino acid sequences. The reliability of the surface accessibility prediction is stated in the form of a Z-score, which cannot predict secondary structures of proteins (20). Project Have your Protein Explained (ProjectHOPE) (http://www.cmbi.ru.nl/hope/home) has been used to study the insight structural features of native protein and the variant models (21). This web server provides three dimensional structural visualization of mutated proteins, and gives the results by using UniProt and DAS prediction servers.

Results

SNP analysis Mining the dbSNP-NCBI and UniProt databases revealed a total of 35802 SNPs in thirty-seven candidate genes in TLRs pathway (Table 2). Among these, 819 and 2502 were located in 5′-UTR and 3′-UTR respectively, and 2172 were identified as nsSNPs.

Table 2

Summary results of SNPs mining of candidate genes in TLRs signaling pathway

Categories			Number of SNPs
Intragenic	exon	Synonymous	1382
	exon	Non-synonymous	2172
	Intron		28654
	Unknown		273
Intergenic	3′-UTR		2502
Intergenic	5′-UTR		819
Total			35802

Summary results of SNPs mining of candidate genes in TLRs signaling pathway Density plot of GERP++ conservation score (RS score). The figure shows that 5'UTR SNPs have higher (more positive) score than 3'UTR SNPs. Structure disruptive UTR SNPs in TLR genes. SNPs positioned above dashed line are those with dmax p-value< 0.2, and hence, designated to be structure disruptive. Common 3’UTR SNPs resided in miRNA target sites Target miRNA SNP MAF dmax p-value Conservation score of UTR SNPs We computed GERP++scores for SNPs in UTRs, which represent an evolutionary conservation extent based on alignment of 35 mammals to hg19. Generally, 5′-UTR SNPs were found to be more conserved than 3′-UTR SNPs (Figure 2). With a cut off RS score of ≥ 2, a total of 480 constrained SNPs (including 85 5′-UTR-SNPs and 395 3′-UTR-SNPs) were identified. Moreover, 1200 SNPs (including 141 5′-UTR-SNPs and 1059 3′-UTR-SNPs) were classified as neutrally evolving, which represents a RS score of ≤0. The most conserved SNPs were found in 3′-UTR of TAB2 (rs138687718, RS score= 6.17), MAPK14 (rs377447706, RS score= 6.17) and FOS (rs45480193, RS score= 6.16).

Fig. 2

Density plot of GERP++ conservation score (RS score). The figure shows that 5'UTR SNPs have higher (more positive) score than 3'UTR SNPs.

Influence of UTR-SNPs on RNA secondary structures Our analysis showed that 313 UTR-SNPs were structure disruptive as defined by dmax p- value P<0.2 (Figure 3). Considering both dmax and rmin, there were 232 unique structure disruptive UTR-SNPs. The top five genes enriched for structure disruptive SNPs were MAPK14 (n= 23), TLR7 (n= 12), TLR4 (n= 10), MAPK1 (n= 10), and TRAF3 (n= 8).

Fig. 3

Structure disruptive UTR SNPs in TLR genes. SNPs positioned above dashed line are those with dmax p-value< 0.2, and hence, designated to be structure disruptive.

Annotation of SNPs with regulatory elements Disease associated variants are enriched in regulatory elements of the genome. Using RegulomeDB, we annotated UTR-SNPs within regulatory elements. 11 UTR-SNPs were associated with transcription factor binding sites (i.e eQTL). These SNPs were found within 3’UTR of TAB1 (rs1010169, rs1010170, rs5757650, rs5750822), RIPK1 (rs9503383, rs9405606), IRF5 (rs752637, rs3807306), IRAK4 (rs4251425) and TLR9 (rs187084) genes. Identification of SNPs residing in miRNA target sites Intersecting 3′-UTR-SNPs with the experimentally validated miRNAs target site datasets, we found 314 SNPs resided in microRNAs target sites. Since miRNA target sites are under selective pressure, we refined SNPs in miRNA target sites by minor allele frequency (MAF) threshold of 0.01 (Table 3).

Table 3

Common 3’UTR SNPs resided in miRNA target sites

NFKBIA	hsa-miR-208a-3p	rs696	0.46	0.07
MYD88	hsa-miR-520f-3p	rs7744	0.14	0.86
TAB2	hsa-miR-4500	rs7896	0.20	0.27
MAPK14	hsa-miR-4306	rs8510	0.18	0.45
MAPK1	hsa-miR-210-3p	rs9340	0.33	0.21
MAPK1	hsa-miR-186-5p	rs13058	0.04	0.01
MAP3K7	hsa-miR-212-3p	rs2131906	0.04	0.38
MAPK14	hsa-miR-381-3p	rs3804451	0.13	0.35
IRAK4	hsa-miR-340-5p	rs4251562	0.04	0.90
MAP3K7	hsa-miR-212-3p	rs9451441	0.01	0.43
TAB2	hsa-miR-33a-5p	rs35859918	0.01	0.47
MAPK1	hsa-miR-217	rs41282607	0.01	0.08
TAB2	hsa-miR-539-5p	rs41288431	0.01	0.82
MAPK1	hsa-miR-488-3p	rs61757976	0.01	0.76
TRAF3	hsa-miR-4500	rs72704737	0.29	0.12

Target miRNA SNP MAF dmax p-value

List of nsSNPs that predicted to be deleterious by both PolyPhen-2 and SIFT tools Abbreviations: P.D; probablydamaging Distribution of SIFT and PolyPhen score of SNPs in coding region. Horizontal and vertical dashed red line correspond to the thresholds for predicting deleterious variants by PolyPhen and SIFT, respectively. Prediction of tolerated and deleterious non-synonymous SNPs by SIFT SIFT analysis predicted that a total of 785 nsSNPs were damaging (score 0.05) and 1322 nsSNPs had tolerated effects on the candidate genes involved in TLR pathway network (score> 0.05) (Figure 4).

Fig. 4

Distribution of SIFT and PolyPhen score of SNPs in coding region. Horizontal and vertical dashed red line correspond to the thresholds for predicting deleterious variants by PolyPhen and SIFT, respectively.

Prediction of damaging non-synonymous SNPs by PolyPhen-2 According to our Polyphen-2 results, 610 nsSNPs were predicted “probably damaging”, 353 nsSNPs were predicted “possibly damaging” and 1068 were classified as benign (Figure 4). To increase the accuracy of predictions, results of SIFT and PolyPhen-2 were joined and SNPs with PolyPhen score> 0.95 and SIFT< 0.05 were selected. Accordingly, 29 nsSNPs passed both criteria and were classified as deleterious/damaging (Table 4).

Table 4

List of nsSNPs that predicted to be deleterious by both PolyPhen-2 and SIFT tools

	Gene Symbol	SNP	Allele	AA substitution	PolyPhen Score	PolyPhenPerediction	SIFT Score	SIFT prediction
1	CHUK	rs56948661	G>A	P623L	1	P.D	0.01	Damaging
2	CHUK	rs61732515	C>G	Q277H	0.999	P.D	0.00	Damaging
3	CHUK	rs112432667	T>C	E492G	0.954	P.D	0.00	Damaging
4	FOS	rs74685695	T>G	V77G	0.999	P.D	0.01	Damaging
5	IRF5	rs112815033	T>C	L450P	1	P.D	0.01	Damaging
6	IRAK4	rs55944915	G>A	R391H	0.999	P.D	0.01	Damaging
7	IRAK4	rs114820168	C>T	R391C	1	P.D	0.00	Damaging
8	MAP3K7	rs77759048	A>T	W55R	1	P.D	0.00	Damaging
9	TBK1	rs34774243	A>G	K291E	0.997	P.D	0.00	Damaging
10	TBK1	rs55824172	C>T	S151F	0.997	P.D	0.00	Damaging
11	TIRAP	rs74937157	T>C	C134R	1	P.D	0.00	Damaging
12	TLR1	rs5743621	G>A	P733L	0.995	P.D	0.00	Damaging
13	TLR1	rs41311402	A>G	L697S	1	P.D	0.00	Damaging
14	TLR1	rs56205407	A>G	I679T	0.999	P.D	0.00	Damaging
15	TLR1	rs117033348	A>G	L144P	1	P.D	0.04	Damaging
16	TLR2	rs5743706	T>A	Y715N	1	P.D	0.01	Damaging
17	TLR2	rs56303479	T>C	L81P	1	P.D	0.00	Damaging
18	TLR2	rs121917864	C>T	R677W	1	P.D	0.00	Damaging
19	TLR3	rs5743316	A>T	N284I	1	P.D	0.00	Damaging
20	TLR3	rs112666655	T>C	L545P	1	P.D	0.00	Damaging
21	TLR3	rs111488413	C>A	P880Q	1	P.D	0.00	Damaging
22	TLR4	rs77214890	G>T	D181Y	1	P.D	0.00	Damaging
23	TLR4	rs80197996	G>T	L470F	1	P.D	0.03	Damaging
24	TLR4	rs55905951	C>G	A676G	1	P.D	0.00	Damaging
25	TLR4	rs55786277	C>T	R804W	0.999	P.D	0.01	Damaging
26	TLR5	rs5744176	T>C	D694G	1	P.D	0.01	Damaging
27	TLR5	rs78098893	T>C	R752G	0.997	P.D	0.01	Damaging
28	TLR6	rs13102250	A>C	L105W	1	P.D	0.01	Damaging
29	TLR9	rs55881257	G>A	R962C	1	P.D	0.01	Damaging

Abbreviations: P.D; probablydamaging

Prediction of functional impact of non-synonymous SNPs on protein by PANTHER and SNPs & GO. According to the PANTHER results, all 29 SNPs possessed the subPSEC score more than −3 and were therefore classified as deleterious (Table 5). As shown in table 5, these SNPs were found to be as disease-associated with the probability >0.5 after analyzing by SNPs & GO.

Table 5

PANTHER and SNPs&GO results for prediction of SNPs as disease associated.

			PANTHER		SNPs&GO
	SNPs	Substitution	subPSEC	Pdeleterious	Prediction	RI	Probability
1	rs56948661	P623L	-4.92855	0.87309	Disease	5	0.742
2	rs61732515	Q277H	-4.61589	0.83423	Disease	3	0.527
3	rs112432667	E492G	-3.99182	0.72945	Disease	4	0.711
4	rs74685695	V77G	-4.06862	0.74433	Disease	1	0.545
5	rs112815033	L450P	-4.36601	0.79674	Disease	0	0.523
6	rs55944915	R391H	-3.64924	0.65684	Disease	0	0.525
7	rs114820168	R391C	-4.67097	0.84171	Disease	3	0.643
8	rs77759048	W55R	-3.3007	0.57461	Disease	4	0.717
9	rs34774243	K291E	-3.56533	0.63768	Disease	5	0.772
10	rs55824172	S151F	-4.7119	0.84708	Disease	6	0.804
11	rs74937157	C134R	-3.47178	0.6158	Disease	2	0.619
12	rs5743621	P733L	-4.51666	0.82005	Disease	2	0.623
13	rs41311402	L697S	-4.23845	0.77529	Disease	4	0.712
14	rs56205407	I679T	-5.35855	0.91361	Disease	7	0.870
15	rs117033348	L144P	-8.17834	0.99439	Disease	5	0.750
16	rs5743706	Y715N	-4.34331	0.79303	Disease	4	0.707
17	rs56303479	L81P	-6.4936	0.97051	Disease	7	0.855
18	rs121917864	R677W	-6.4688	0.96979	Disease	6	0.819
19	rs5743316	N284I	-3.91448	0.71392	Disease	5	0.748
20	rs112666655	L545P	-4.25641	0.77841	Disease	6	0.823
21	rs111488413	P880Q	-8.50881	0.99597	Disease	6	0.811
22	rs77214890	D181Y	-4.48068	0.81467	Disease	0	0.511
23	rs80197996	L470F	-3.94106	0.71931	Disease	4	0.639
24	rs55905951	A676G	-3.16208	0.54043	Disease	0	0.503
25	rs55786277	R804W	-5.10263	0.89116	Disease	5	0.748
26	rs5744176	D694G	-3.42967	0.6058	Disease	4	0.716
27	rs78098893	R752G	-3.16919	0.5422	Disease	2	0.614
28	rs13102250	L105W	-5.09383	0.8903	Disease	2	0.583
29	rs55881257	R962C	-4.48094	0.81471	Disease	1	0.547

Prediction of protein stability analysis by I-Mutant According to I- Mutant results, all mutations expect N284I (rs5743316 in TLR3), S151F (rs55824172 in TBK1) and L105W (rs13102250 in TLR6) were predicted to decrease protein stability, with a free energy change value <0.0 (Table 6).

Table 6

Summary results of nsSNPs analysis by I-mutant and ConSurf.

				I-mutant		ConSurf
	Gene Symbol	SNP	AA substitution	DDG ( Kcal/mol)	Stability	conservation scale	Functional or structural residue
1	CHUK	rs56948661	P623L	-0.97	Decrease	9	F
2	CHUK	rs61732515	Q277H	-1.58	Decrease	7	-
3	CHUK	rs112432667	E492G	-1.06	Decrease	4	-
4	FOS	rs74685695	V77G	-5.25	Decrease	9	S
5	IRF5	rs112815033	L450P	-1.74	Decrease	8	-
6	IRAK4	rs55944915	R391H	-1.32	Decrease	8	F
7	IRAK4	rs114820168	R391C	-0.86	Decrease	8	F
8	MAP3K7	rs77759048	W55R	-1.71	Decrease	8	-
9	TBK1	rs34774243	K291E	-0.82	Decrease	6	-
10	TBK1	rs55824172	S151F	0.01	Increase	9	F
11	TIRAP	rs74937157	C134R	-1.55	Decrease	8	-
12	TLR1	rs5743621	P733L	-1.33	Decrease	8	F
13	TLR1	rs41311402	L697S	-1.51	Decrease	9	S
14	TLR1	rs56205407	I679T	-1.91	Decrease	8	-
15	TLR1	rs117033348	L144P	-0.79	Decrease	9	S
16	TLR2	rs5743706	Y715N	-1.65	Decrease	9	S
17	TLR2	rs56303479	L81P	-1.24	Decrease	9	S
18	TLR2	rs121917864	R677W	-0.83	Decrease	9	F
19	TLR3	rs5743316	N284I	1.23	Increase	9	F
20	TLR3	rs112666655	L545P	-1.10	Decrease	7	-
21	TLR3	rs111488413	P880Q	-1.26	Decrease	9	F
22	TLR4	rs77214890	D181Y	-0.98	Decrease	8	F
23	TLR4	rs80197996	L470F	-0.86	Decrease	9	S
24	TLR4	rs55905951	A676G	-1.19	Decrease	9	S
25	TLR4	rs55786277	R804W	-0.54	Decrease	6	-
26	TLR5	rs5744176	D694G	-1.31	Decrease	9	F
27	TLR5	rs78098893	R752G	-1.49	Decrease	7	-
28	TLR6	rs13102250	L105W	0.91	Increase	9	S
29	TLR9	rs55881257	R962C	-2.62	Decrease	8	F

Abbreviations: DDG; free energy change value (DDG<0: Decrease Stability, DDG>0: Increase Stability). The pH and the temperature were set to7 and 25˚C for all submissions, respectively. F: functional residue; S: structural residue.

Prediction of evolutionary conservation of amino acid position by ConSurf Our ConSurf analysis revealed that all 29 expected SNPs including the Q277H (CHUK), E492G (CHUK), L450P (IRF5), W55R (MAP3K7), K291E (TBK1), C134R (TIRAP), I679T (TLR1), L545P (TLR3), R804W (TLR4) and R752G (TLR5) were located in highly conserved regions and predicted to have functional and structural impacts on TLRs pathway proteins (Table 6). solvent accessibility and three-dimensional analyzes of native and mutant protein structures By combining the results of SIFT, Poly-phen-2, PANTHER, SNPs & GO, I-Mutant 2.0, and ConSurf servers, 19 mutations were found to be more deleterious in candidate genes. Subsequently, these mutations were analyzed for solvent accessibility and stability, and the results were represented in the following paragraphs (see also Table 7). Visualization of structural features of wild type and mutant protein containing the mentioned deleterious variants was performed using the project HOPE and Swiss PDB viewer.

Supplementary Table 1

Surface accessibility of wild-type and mutant variants in TLRs network intermediate molecules.

	Gene Symbol	SNP	AA substitution	Class assignment	AA	AA position	RSA	ASA	Z-fit score
1	CHUK	rs56948661	P623L	ExposedExposed	PL	623623	0.5440.537	77.17998.306	1.1901.088
2	FOS	rs74685695	V77G	BuriedBuried	VG	7777	0.0820.159	12.5812.55	-0.799-0.920
3	IRAK4	rs55944915	R391H	ExposedExposed	RH	391391	0.5000.520	114.4094.58	-0.611-0.727
4	IRAK4	rs114820168	R391C	ExposedExposed	RC	391391	0.5000.477	114.4067.04	-0.611-0.891
5	TBK1	rs55824172	S151F	BuriedBuried	SF	151151	0.1320.116	15.5223.30	0.068-0.048
6	TLR1	rs5743621	P733L	ExposedExposed	PL	733733	0.5750.569	81.57104.23	0.6870.717
7	TLR1	rs41311402	L697S	BuriedBuried	LS	697697	0.0280.030	5.053.49	0.9510.649
8	TLR1	rs117033348	L144P	BuriedBuried	LP	144144	0.0380.035	6.925.023	0.5030.657
9	TLR2	rs5743706	Y715N	BuriedBuried	YN	715715	0.1520.153	32.4622.39	0.1930.253
10	TLR2	rs56303479	L81P	BuriedBuried	LP	8181	0.0380.029	6.934.10	0.3620.758
11	TLR2	rs121917864	R677W	BuriedBuried	RW	677677	0.2430.255	55.6061.32	-0.079-0.088
12	TLR3	rs5743316	N284I	BuriedBuried	NI	284284	0.0830.088	12.1616.33	-1.686-1.081
13	TLR3	rs111488413	P880Q	ExposedExposed	PQ	880880	0.4010.446	56.8879.65	0.1500.108
14	TLR4	rs77214890	D181Y	BuriedBuried	DY	181181	0.2400.258	34.5255.24	0.5280.277
15	TLR4	rs80197996	L470F	BuriedBuried	LF	470470	0.0900.089	16.4017.88	0.0800.247
16	TLR4	rs55905951	A676G	BuriedBuried	AG	676676	0.0330.034	3.622.71	-0.046-0.158
17	TLR5	rs5744176	D694G	BuriedBuried	DG	694694	0.1640.173	23.5713.64	-0.270-0.384
18	TLR6	rs13102250	L105W	BuriedBuried	LW	105105	0.0300.031	5.517.40	0.8430.799
19	TLR9	rs55881257	R962C	ExposedExposed	RC	962962	0.4190.464	95.9565.20	0.0660.045

Abbreviations: RSA: Relative Surface Accessibility; ASA: Absolute Surface Accessibility. Values for wild type and mutant variants are presented by red and green color respectively

PANTHER and SNPs&GO results for prediction of SNPs as disease associated. Summary results of nsSNPs analysis by I-mutant and ConSurf. Abbreviations: DDG; free energy change value (DDG<0: Decrease Stability, DDG>0: Increase Stability). The pH and the temperature were set to7 and 25˚C for all submissions, respectively. F: functional residue; S: structural residue. The rs56948661 in CHUK gene leads to P623L. The residue is located on the surface of the protein and mutation of this residue can disturb the interactions with other molecules or other parts of the protein. Moreover, the mutation can disturb the special backbone conformation induced by proline. Conversion of V77G (rs74685695 in FOS) causes some structural changes in protein. Glycine residue is smaller than valine and this may lead to loss of the interactions. Furthermore, the mutant residue is more hydrophobic and flexible and can disturb the required rigidity of the protein on this position. For rs114820168 in IRAK4, the wild-type (arginine) and mutant (cysteine) amino acids differ in size, hydrophobicity and charge. The difference in charge will disturb the ionic interactions of the wild type residue with D388, E389 and D398. R391H is annotated with rs55944915 in dbSNP database. According to the PISA-database, the mutated residue is involved in a multimer contact. The new residue might be too small to make multimer contacts. In S151F variant, rs55824172 of TBK1 gene, the mutant residue (phenylalanine) is bigger and more hydrophobic than the wild-type (serine). This conversion will cause the loss of hydrogen bonds in the core of the protein resulting in the disruption of correct folding. We found that three SNPs in TLR1, including P733L (rs5743621), L697S (rs41311402) and L144P (rs117033348), were located in highly conserved regions and predicted to have functional and structural impacts on proteins. For P733L, the mutant residue (leucine) is bigger than the wild-type (proline) and is located on surface of the protein, potentially disturbing its interactions. For L697S and L144P, the mutant residues are smaller than the wild-type residues and will cause an empty space in the core of the protein. In addition, all three mutations are predicted to have functional and structural influences on TLR2 protein (Figure 5).

Fig 5

Deep view of superimposed structure of wild and mutant TLR2. A: L81P; B: R677W and C: Y715N. The protein and the side chains of the wild-type and the mutant residue are shown and colored grey, green and red, respectively. Hydrogen bonding interactions and clashes of wild type and mutant TLR4 at position 181. A: the wild-type residue (D) forms hydrogen bonds (green discontinuous line) with L155, V157, A158, L182 and S183; B: substitution of this amino acid with tyrosine will cause loss of hydrogen bonds with A158, L182 and S183. Moreover, the mutation showed a network of clashes (pink discontinuous line) with A158 and S183 residues. Surface accessibility of wild-type and mutant variants in TLRs network intermediate molecules. Abbreviations: RSA: Relative Surface Accessibility; ASA: Absolute Surface Accessibility. Values for wild type and mutant variants are presented by red and green color respectively For L81P (rs56303479),because this residue is part of some interpro domains like leucine-rich repeat, typical subtype, the interaction between these domains could be disturbed by the mutation. The R677W (rs121917864) mutation leads to substitution of arginine by a bigger and more hydrophobic residue named tryptophan. The difference in charge will disturb the ionic interaction made by the arginine with E649 and 656. The third mutation of TLR2 occurs at position 715 (rs5743706). The hydrophobicity of the wild-type (tyrosine) and mutant residue (asparagine) differs and the mutation will cause the loss of hydrophobic interactions in the core of the protein. Finally, the size difference between residues makes that the new residue is not in the correct position to make the same hydrogen bond with S646, as the wild-type residue does. For N284I (rs5743316, in TLR3), due to the difference in hydrophobicity index of residues, the mutation will cause the loss of hydrogen bonds in the core of the protein and may lead to incorrect folding of protein. The second mutation of TLR3 (rs111488413) causes P880Q. This mutant residue is bigger than the wild-type residue and can disturb the protein interactions. Additionally, the hydrophobicity of the residue differs; hence, the mutation may cause the loss of hydrophobic interactions. Concerning D181Y mutation in TLR4 (rs77214890), the difference in charge will disturb the ionic interaction made by the original residue with R234. Moreover, the hydrophobicity of the native and mutant residue differs. Therefore, this mutation causes the loss of hydrogen bonds in the core of the protein leading to disruption of the correct folding (Figure 6). For rs80197996 (L470F) in TLR4, the mutant residue (phenylalanine) is bigger and probably will not fit to bury in the core of the protein. In A676G (rs55905951), the mutant residue is smaller than the wild-type residue. This will cause a possible loss of external interactions. Furthermore, the mutation may cause the loss of hydrophobic interactions with other molecules on the surface of the protein.

Fig 6

Hydrogen bonding interactions and clashes of wild type and mutant TLR4 at position 181. A: the wild-type residue (D) forms hydrogen bonds (green discontinuous line) with L155, V157, A158, L182 and S183; B: substitution of this amino acid with tyrosine will cause loss of hydrogen bonds with A158, L182 and S183. Moreover, the mutation showed a network of clashes (pink discontinuous line) with A158 and S183 residues.

Concerning rs5744176 (D694G) of TLR5, the wild-type residue forms a salt bridge with K692, R752 and K753. The difference in charge will disturb these ionic interactions. Moreover, the aspartic acid forms a hydrogen bond with N726, but due to difference in hydrophobicity, the mutation causes the loss of hydrogen bond. For the L105W (rs13102250) in TLR6, the wild-type (leucine) and mutant (tryptophan) amino acids differ in size. The wild-type residue was buried in the core of the protein, but the mutant residue is bigger and probably will not fit. For rs55881257 (R962C in TLR9) the charge of the wild-type residue will be lost; this can cause the loss of interactions with other molecules or residues. Furthermore, this mutation introduces a more hydrophobic residue at this position, probably resulting to loss of hydrogen bonds.

Discussion

TLRs signaling pathway plays a key role in the host innate immune response. Increasing evidence has suggested that functional SNPs of genes related to TLRs pathway may contribute to diseases ranging from chronic inflammatory to cancers. Since SNPs are the most common genetic variations in human genome, it is expected that genes involved in TLRs pathway contains numerous SNPs. Nevertheless, discriminating deleterious SNPs with potential effects on disease susceptibility from tolerated variants is a major challenge. Therefore, a comprehensive study that systematically analyzes the effects of such SNPs can cost-effectively prioritized SNPs for further analyzes. In-silico analysis of the deleterious effects of SNPs may help to improve our understanding on the biological pathways (22). In this study, we systematically analyzed the SNPs in different parts of genes (5′-UTR, 3′-UTR and coding) in TLRs pathway. A report has suggested that mutation effect prediction algorithms have their own strengths and weaknesses, and therefore, implementing a combination of these tools may help to enhance the accuracy of effect predictions (23). In the present study, we combined the results of the SIFT, PolyPhen, PANTHER, SNPs & GO, I-Mutant and ConSurf algorithms to prioritize the damaging nsSNPs and increase the analysis accuracy. Accordingly, we were able to identify several potentially deleterious nsSNPs in TLRs pathway genes. These SNPs, to the best of our knowledge, have not yet been investigated and therefore may be considered as candidates for association with diseases. These results may pave the ground for future functional and/or association studies and facilitate the process of choosing functional variant for further analyses. UTR-SNPs play important roles in gene regulation and accumulating evidence has indicated their contribution to different diseases. Sequence alteration in these regulatory elements has been shown to interfere with transcription factors or microRNA binding, leading to gene dysregulation (24-25). By applying a bioinformatics approach, we evaluated such effects of UTR-SNPs on TLRs pathway genes and identified numerous disease-associated variants that potentially confer the disease risk through affecting transcription factors or miRNAs binding. TLR9 rs187084, a UTR-SNP which probably interferes with transcription factors binding, has been shown to modify susceptibility to diseases specially renal transplant recipients and cancers (26-27). Several genes of TLRs pathway are regulated post-transcriptionally by miRNAs (28). Our analysis revealed that several SNPs of TLRs network resided in microRNA target sites (Table 3) that may potentially modify miRNA-mediated regulation of these genes. For instance, rs7744 in 3′-UTR of MYD88 and rs696 in 3′-UTR of NFKBIA genes could disrupt the binding of miR-520f-3p and miR-208a-3p, respectively. Matsunaga et al. showed that homozygous minor allele of rs7744 is associated with the severity of ulcerative colitis (29). Moreover, it has been shown that rs696 G>A is associated with the susceptibility to different diseases including coronary artery disease and Behçet's disease (30-31). In conclusion, the current study reports the first pathway-based bioinformatics analysis of SNPs in TLRs pathway genes and provides a prioritized list of functional SNPs potentially affecting regulation and function of the pathway. However, we noticed that the complexities of biological pathways merit the need for more experimentation to validate the true effect of these SNPs on TLRs network. Although the functional significance of the candidate SNPs was not experimentally assessed in this study, we believe that our results will help researchers interested in the roles of SNPs in TLRs pathways genes to focus on proper candidate variants.

31 in total

Review 1. The role of pattern-recognition receptors in innate immunity: update on Toll-like receptors.

Authors: Taro Kawai; Shizuo Akira
Journal: Nat Immunol Date: 2010-04-20 Impact factor: 25.606

2. The *1244 A>G polymorphism of MyD88 (rs7744) is closely associated with susceptibility to ulcerative colitis.

Authors: Kazuhiro Matsunaga; Tomomitsu Tahara; Hisakazu Shiroeda; Toshimi Otsuka; Masakatsu Nakamura; Takeo Shimasaki; Nobuyuki Toshikuni; Natsuko Kawada; Tomoyuki Shibata; Tomiyasu Arisawa
Journal: Mol Med Rep Date: 2013-11-01 Impact factor: 2.952

Review 3. Toll-like receptors and their crosstalk with other innate receptors in infection and immunity.

Authors: Taro Kawai; Shizuo Akira
Journal: Immunity Date: 2011-05-27 Impact factor: 31.745

Review 4. Toll-Like Receptor Pathways in Autoimmune Diseases.

Authors: Ji-Qing Chen; Peter Szodoray; Margit Zeher
Journal: Clin Rev Allergy Immunol Date: 2016-02 Impact factor: 8.667

Review 5. Genetic variation in Toll-like receptors and disease susceptibility.

Authors: Mihai G Netea; Cisca Wijmenga; Luke A J O'Neill
Journal: Nat Immunol Date: 2012-05-18 Impact factor: 25.606

Review 6. Prediction of deleterious nonsynonymous single-nucleotide polymorphism for human diseases.

Authors: Jiaxin Wu; Rui Jiang
Journal: ScientificWorldJournal Date: 2013-01-30

7. Protein structure analysis of mutations causing inheritable diseases. An e-Science approach with life scientist friendly interfaces.

Authors: Hanka Venselaar; Tim A H Te Beek; Remko K P Kuipers; Maarten L Hekkelman; Gert Vriend
Journal: BMC Bioinformatics Date: 2010-11-08 Impact factor: 3.169

8. Toll-like receptors and human disease: lessons from single nucleotide polymorphisms.

Authors: Yi-Tzu Lin; Amanda Verma; Conrad P Hodgkinson
Journal: Curr Genomics Date: 2012-12 Impact factor: 2.236

9. starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data.

Authors: Jun-Hao Li; Shun Liu; Hui Zhou; Liang-Hu Qu; Jian-Hua Yang
Journal: Nucleic Acids Res Date: 2013-12-01 Impact factor: 16.971

10. PolymiRTS Database 3.0: linking polymorphisms in microRNAs and their target sites with human diseases and biological pathways.

Authors: Anindya Bhattacharya; Jesse D Ziebarth; Yan Cui
Journal: Nucleic Acids Res Date: 2013-10-24 Impact factor: 16.971

2 in total

1. Associations of Polymorphisms Localized in the 3'UTR Regions of the KRAS, NRAS, MAPK1 Genes with Laryngeal Squamous Cell Carcinoma.

Authors: Ruta Insodaite; Alina Smalinskiene; Vykintas Liutkevicius; Virgilijus Ulozas; Roberta Poceviciute; Arunas Bielevicius; Laimutis Kucinskas
Journal: Genes (Basel) Date: 2021-10-23 Impact factor: 4.096

2. Comprehensive Computational Analysis of Protein Phenotype Changes Due to Plausible Deleterious Variants of Human SPTLC1 Gene.

Authors: Tayyaba Sadaf; Peter John; Attya Bhatti
Journal: Int J Mol Cell Med Date: 2019-04-23

2 in total