| Literature DB >> 34298860 |
Elena E Korbolina1, Leonid O Bryzgalov1,2, Diana Z Ustrokhanova3, Sergey N Postovalov4, Dmitry V Poverin4, Igor S Damarov1, Tatiana I Merkulova1,3.
Abstract
Currently, the detection of the allele asymmetry of gene expression from RNA-seq data or the transcription factor binding from ChIP-seq data is one of the approaches used to identify the functional genetic variants that can affect gene expression (regulatory SNPs or rSNPs). In this study, we searched for rSNPs using the data for human pulmonary arterial endothelial cells (PAECs) available from the Sequence Read Archive (SRA). Allele-asymmetric binding and expression events are analyzed in paired ChIP-seq data for H3K4me3 mark and RNA-seq data obtained for 19 individuals. Two statistical approaches, weighted z-scores and predicted probabilities, were used to improve the efficiency of finding rSNPs. In total, we identified 14,266 rSNPs associated with both allele-specific binding and expression. Among them, 645 rSNPs were associated with GWAS phenotypes; 4746 rSNPs were reported as eQTLs by GTEx, and 11,536 rSNPs were located in 374 candidate transcription factor binding motifs. Additionally, we searched for the rSNPs associated with gene expression using an SRA RNA-seq dataset for 281 clinically annotated human postmortem brain samples and detected eQTLs for 2505 rSNPs. Based on these results, we conducted Gene Ontology (GO), Disease Ontology (DO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses and constructed the protein-protein interaction networks to represent the top-ranked biological processes with a possible contribution to the phenotypic outcome.Entities:
Keywords: Genotype-Tissue expression; allele-specific events; eQTLs; enrichment analysis; molecular phenotype; protein-protein interaction networks; regulatory SNPs
Mesh:
Substances:
Year: 2021 PMID: 34298860 PMCID: PMC8303726 DOI: 10.3390/ijms22147240
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1The stages of rSNP search assessing the likely molecular phenotypes and phenotypic outcomes.Arrows show the utilization of sequencing data or publicly available resources; NGS:next generation sequencing; PAECs:pulmonary arterial endothelial cells; SNP:single nucleotide polymorphism; rSNP:regulatory SNP; ASB:allele-specific binding bias; ASE:allele-specific expression bias; PPI;protein-protein interaction.
Figure 2Enrichment of the resulting rSNP set in GWAS phenotype associations and/or eQTLs.The cutoff threshold was−log α, where α is the probability of the first type.
Logistic regression analysis prediction parameters. |log2FC1| is the proportion between the coverage of two alleles for the SNP with an ASB effect; |log2FC2|, the same for the position with an ASE effect; |log2FC1/FC2| reflects the difference in asymmetry for SNP in independent ChIP-seq and RNA-seq experiments; Sign:significance (* p-value < 0.05; *** p-value < 0.001), dispersion parameter for binomial family was taken to be 1.
| Parameter | Regression Coefficient | Std. Error | Sign | |
|---|---|---|---|---|
| |log2FC1| | −0.547335 | 0.009193 | <2 × 10−16 | *** |
| |log2FC2| | −0.022103 | 0.008592 | 0.0101 | * |
| |log2FC1/FC2| | −0.125600 | 0.010718 | <2 × 10−16 | *** |
Figure 3Q-Q plots showing a significant positive correlation of the number of (a) eQTLs and (b) GWAS SNPs in the resulting rSNP set with predicted probabilities. All rSNPs were divided into 100 equal parts or ranks (the abscissa); the ordinate shows the number of (a) GTExeQTLs or (b) GWAS associations mapped to all SNPs of a particular rank.
Gradual enrichment of SNP sets with GTExeQTLs and GWAS associations. ASB, allele-specific binding bias; ASE:allele-specific expression bias; eQTLs:GTEx expression quantitative trait loci; GWAS:all available GWAS-derived SNP-trait associations; n: the number of SNPs; %: percentage relative to the total number of SNPs; pp:predicted probabilities.
| Identified SNPs |
| Overlapping with All GTEX eQTLs, % | Overlapping withthe GTExeQTLs with | Positions Contained in GWAS Catalog, % |
|---|---|---|---|---|
| Heterozygous SNPs | ~4.3 | 13 | 8 | 2.1 |
| SNPs with ASB | 58,191 | 15 | 10 | 2.5 |
| SNPs with ASE | 230,553 | 15 | 10 | 2.7 |
| SNPs with both ASB and ASE | 20,321 | 23 | 18 | 3.0 |
| SNPs with both ASB and ASE | 14,898 | 23 | 18 | 3.1 |
| SNPs selected by predicted probabilities, pp > 0.1929408 | 14,543 | 26 | 20 | 3.5 |
| SNPs selected by log regression and | 10,318 | 26 | 21 | 3.7 |
Figure 4Distribution of 14,266 rSNPs across human genomic regions.
Figure 5Normalized tissue distribution of the rSNPs mapped as GTExeQTLs.The ordinate shows the number of eQTLs (padj < 0.1) identified for all rSNPs in the tissue normalized on the number of all GTExeQTLs mapped in the tissue with padj < 0.1; the abscissa shows the tissue source with grouping by type.
Figure 6rs6507 (C>T) findings. (a) PPI subnetwork for rs6507 with the ‘root’ CCNE1 protein using R STRINGdb. Nodes are colored according to logFC value (green nodes: logFC < −0.5; orange nodes: logFC > 0.5; yellow nodes: |logFC| < 0.5) and node sizes are proportional to |logFC|. (b) The enriched KEGG functional terms for 337 corresponding DEGs are ranked according to the adjusted p-value and displayed in a tabular format. (c) Disruption of CTCF and ZBTB7B binding motifs by rs6507 (C>T). The red bar shows the chromosome location of rs6507. (d) The genome region surrounding rs6507 with visualized ChIP-Seq signal tracks for CTCF as given by ENCODE annotation (ICGC Genome browser). The location of rs6507 is highlighted with the red dotted line.
Figure 7Graphical visualization of the KEGG pathway DEGs (Parkinson’s disease) was found to be associated with (a) rs16910241 and (b) rs56119169 variants in this study.KEGG pathway diagrams show the genes grouped in different functional units distributed along the pathway. Gene node colors show the direction of the change in expression.
Contingency table of the rs7289432 and rs738904genotypes for a sample of 2504 individuals from the 1000 Genomes dataset.
| rs7289432 | cSNP (rs738904) 22chr:19179872 | Total Number of Genotypes in 1000 Genomes | ||
|---|---|---|---|---|
| CC | AC | AA | ||
| AA | 1084 | 5 | 0 | 1089 |
| AG | 9 | 1042 | 5 | 1056 |
| GG | 0 | 7 | 352 | 359 |
| Total number of genotypes in 1000 Genomes | 1093 | 1054 | 357 | 2504 |