| Literature DB >> 35627314 |
Hengwei Lu1, Yi-Ching Tang1, Assaf Gottlieb1.
Abstract
Gene expression plays a key role in health and disease. Estimating the genetic components underlying gene expression can thus help understand disease etiology. Polygenic models termed "transcriptome imputation" are used to estimate the genetic component of gene expression, but these models typically consider only the cis regions of the gene. However, these cis-based models miss large variability in expression for multiple genes. Transcription factors (TFs) that regulate gene expression are natural candidates for looking for additional sources of the missing variability. We developed a hypothesis-driven approach to identify second-tier regulation by variability in TFs. Our approach tested two models representing possible mechanisms by which variations in TFs can affect gene expression: variability in the expression of the TF and genetic variants within the TF that may affect the binding affinity of the TF to the TF-binding site. We tested our TF models in whole blood and skeletal muscle tissues and identified TF variability that can partially explain missing gene expression for 1035 genes, 76% of which explains more than the cis-based models. While the discovered regulation patterns were tissue-specific, they were both enriched for immune system functionality, elucidating complex regulation patterns. Our hypothesis-driven approach is useful for identifying tissue-specific genetic regulation patterns involving variations in TF expression or binding.Entities:
Keywords: transcription factor polymorphism; transcriptional regulation; transcriptome imputation
Mesh:
Substances:
Year: 2022 PMID: 35627314 PMCID: PMC9140347 DOI: 10.3390/genes13050929
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.141
Summary statistics of input data per tissue: transcription factors and associated SNPs.
| Skeletal Muscle | Whole Blood | |
|---|---|---|
| Number of samples | 706 | 670 |
| Number of expressed genes | 21,031 | 20,315 |
| Number of expressed genes with an associated TF | 11,130 | 10,563 |
| Number TFs per gene | 10.8 ± 8.4 | 10.9 ± 8.7 |
| Number of nsSNP * per TF | 1.55 ± 1.77 | 1.57 ± 1.99 |
* nsSNP: non-synonymous SNP.
Figure 1Illustration of the filtering pipeline to identify TF hit genes. We computed the baseline cis model and compared them with one of the three TF models to identify candidate genes per TF model (A), computed a background model for each of the candidate genes to test their significance (B), and conducted a robustness test to validate the results (C).
Figure 2Illustration of the three tested hypotheses regarding the effect of TF genetic polymorphism on the expression of their transcribed genes. We test whether polymorphism in the TF (orange box) affects the transcription levels of the transcribed gene (black box) by testing the added effect of each TF model relative to the baseline cis model. TF-expression model includes association of the TF expression with imputed gene expression (A); TF-binding model includes non-synonymous SNPs within the associated TF boundary (B).
Figure 3Stacked bar graph of the R2 results of the cis baseline model (blue) and the skeletal muscle and whole-blood TF-expression and TF-binding (orange).