| Literature DB >> 32718348 |
Xiaolong Cao1, Yeting Zhang1,2, Lindsay M Payer3, Hannah Lords1, Jared P Steranka3, Kathleen H Burns3, Jinchuan Xing4,5.
Abstract
BACKGROUND: Mobile elements are a major source of structural variants in the human genome, and some mobile elements can regulate gene expression and transcript splicing. However, the impact of polymorphic mobile element insertions (pMEIs) on gene expression and splicing in diverse human tissues has not been thoroughly studied. The multi-tissue gene expression and whole genome sequencing data generated by the Genotype-Tissue Expression (GTEx) project provide a great opportunity to systematically evaluate the role of pMEIs in regulating gene expression in human tissues.Entities:
Keywords: Alternative splicing; Gene expression regulation; Polymorphic mobile element insertions; Quantitative trait loci; Transposable elements
Mesh:
Year: 2020 PMID: 32718348 PMCID: PMC7385971 DOI: 10.1186/s13059-020-02101-4
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Overview of pMEIs in the MELT call set, eQTL, and sQTL analyses
| ME type | MELT call set | eQTL | sQTL | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Raw | HQ | Common | All | Causal | Highest | All | Causal | Highest | |
| nrAlu | 62,864 | 13,870 | 2157 | 1451 | 562 | 147 | 1071 | 539 | 191 |
| nrL1 | 11,159 | 2130 | 246 | 177 | 81 | 23 | 126 | 71 | 18 |
| nrSVA | 1877 | 558 | 69 | 61 | 32 | 12 | 51 | 27 | 13 |
| rAlu | 3837 | 3687 | 968 | 671 | 253 | 84 | 444 | 202 | 106 |
| rL1 | 192 | 188 | 59 | 42 | 15 | 7 | 28 | 18 | 8 |
| rSVA | 128 | 112 | 21 | 20 | 13 | 9 | 14 | 9 | 6 |
MELT call set: raw—all pMEI loci identified by MELT; HQ—high-quality loci after quality control; common—pMEIs used for the eQTL and sQTL analysis
eQTL/sQTL analysis: all—unique pMEIs in eQTL/sQTL analysis (FDR < 10%); causal—unique pMEIs identified as the causal variant; highest—unique pMEIs identified as the causal variant with the highest causal probability
Summary of genes
| Gene | Total | Expressed | eQTLs | ME causal | ME highest causal |
|---|---|---|---|---|---|
| Protein-coding | 19,820 | 19,064 | 4243 | 1062 | 294 |
| Non-coding | 36,382 | 19,111 | 2099 | 526 | 139 |
Expressed—genes used in the eQTL analysis of at least one tissue
eQTLs—number of unique genes in the ME-only eQTL analysis with FDR < 10%; ME causal and ME highest causal—unique genes with pMEIs predicted as a causal variant or a causal variant with the highest probability, respectively
Fig. 1Overview of the ME-only eQTL analysis. a The number of detected eQTLs with Benjamini-Hochberg FDR < 10% in each tissue. Bars are colored by tissue clusters based on cis-eQTL as shown in b (tree). b Similarity (Spearman’s correlation coefficient ρ) between different tissues based on cis-eQTL FDR values (lower triangle) and gene expression TPM values (upper triangle). Gene-pMEI pairs with FDR < 10% in at least one tissue are selected for the analysis. The tree on the left of the plot was based on the hierarchical clustering of the cis-eQTL results, and the branches are colored to five groups. Tissue text colors in a and b were based on the hierarchical clustering tree of TPM results (data not shown). c The relationship between the eQTL count (FDR < 10%) and the individual count in different tissues. Tissue text is colored by tissue clusters based on cis-eQTL in b (tree). The axes are in log scale. d Gene-pMEI pair count and the number of tissues they were detected as significant for coding and non-coding genes. e Effect size (beta value) distribution for coding and non-coding eQTLs of different types of pMEIs. Tissue abbreviations: AdS, adipose subcutaneous; AdV, adipose visceral omentum; AG, adrenal gland; ArA, artery aorta; ArC, artery coronary; ArT, artery tibial; BAm, brain amygdala; BAn, brain anterior cingulate cortex BA24; BCa, brain caudate basal ganglia; BCH, brain cerebellar hemisphere; BC, brain cerebellum; BCo, brain cortex; BFC, brain frontal cortex BA9; BHi, brain hippocampus; BHy, brain hypothalamus; BNu, brain nucleus accumbens basal ganglia; BPu, brain putamen basal ganglia; BSp, brain spinal cord cervical c-1; BSu, brain substantia nigra; Br, breast mammary tissue; CE, cells EBV-transformed lymphocytes; CT, cells transformed fibroblasts; CoS, colon sigmoid; CoT, colon transverse; EG, esophagus gastroesophageal junction; EMc, esophagus mucosa; EMs, esophagus muscularis; HA, heart atrial appendage; HL, heart left ventricle; Li, liver; Lu, lung; MSG, minor salivary gland; MuS, muscle skeletal; NT, nerve tibial; O, ovary; Pa, pancreas; Pi, pituitary; Pr, prostate; SN, skin not sun-exposed suprapubic; SS, skin sun-exposed lower leg; SIT, small intestine terminal ileum; Sp, spleen; St, stomach; Te, testis; Th, thyroid; U, uterus; V, vagina; B, whole blood
Fig. 2Enrichment of pMEIs in different functional genomic regions of affected genes in eQTL analysis (a–e) and sQTL analysis (f–j). Functional genomic regions include enhancers from the Dragon Enhancers Database (DENdb) (a, f); 10 kb upstream from the transcription starting site (TSS) (b, g), 10 kb downstream (c, h), exons (d, i), and introns of the affected gene (e, j). pMEIs were divided into three categories: NS, pMEIs that were not reported to be significantly related with any gene or ASE in any tissue; related, pMEIs that were significantly associated with at least one gene or ASE but were not reported as causal; causal, pMEIs that were reported as causal for at least one gene or ASE (see the “Methods” section for details). The bar plot shows the proportion of pMEIs in each genomic feature in each category (NS, related, or causal). Values inside the bars are fold enrichment compared to NS, and values above the bars are p value from Fisher’s exact test for significance of enrichment compared to NS. For exons in the eQTL analysis in d, the fold enrichment values are not available because the proportion of pMEIs in exon is zero in NS
Summary of alternative splicing events
| ASEs | Total events (genes) | Events in sQTL (genes) | ME causal (genes) | ME highest causal (genes) |
|---|---|---|---|---|
| A3 | 14,918 (7419) | 537 (456) | 165 (154) | 50 (49) |
| A5 | 14,197 (7144) | 576 (484) | 185 (165) | 55 (53) |
| AF | 70,352 (9036) | 3063 (1332) | 994 (533) | 253 (172) |
| AL | 18,369 (5103) | 887 (513) | 314 (198) | 103 (72) |
| MX | 4803 (2681) | 210 (179) | 71 (61) | 21 (18) |
| RI | 5718 (3237) | 219 (178) | 78 (66) | 25 (23) |
| SE | 37,525 (12,232) | 1692 (1267) | 494 (418) | 154 (135) |
Events in sQTL—number of unique ASEs in the ME-only sQTL analysis with FDR < 10%
ME causal and ME highest causal—number of unique ASEs with pMEIs predicted as a causal variant or a causal variant with the highest probability, respectively
The numbers in the parentheses are the number of genes/gene clusters of the corresponding ASEs. Genes sharing the same exons were merged into gene clusters by SUPPA when calculating PSI scores. Because some genes have multiple ASEs, the overall gene count is not the sum of the gene count in different ASEs
ASEs alternative splicing events, A3/A5 alternative 3′/5′ splice site, AF/AL alternative first/last exon, MX mutually exclusive exon, RI retained exon, SE skipping exon
Fig. 3Overview of the sQTL analysis in ME-only analysis. a Number of detected sQTLs with Benjamini-Hochberg FDR < 10% in each tissue. Bars are colored by tissue clusters based on cis-eQTL as shown in b (tree). b Similarity (Spearman’s correlation coefficient ρ) between different tissues based on cis-sQTL (lower triangle) and PSI values (upper triangle). ASE-pMEI pairs with FDR < 10% in at least one tissue are selected for the analysis. The tree was based on the hierarchical clustering of the cis-sQTL results, and the branches are colored to four groups. Tissue text colors in a and b were based on the hierarchical clustering tree of PSI results (data not shown). c The relationship between the sQTL count (FDR < 10%) and the individual count in different tissues. The axes are in log scale. d ASE-pMEI pair count and the number of tissues they were detected as significant for events internal or at the edge of the gene. Tissue text is colored by tissue clusters based on cis-sQTL in b (tree). e Effect size (beta values) distribution for ASEs internal or at the edge of different pMEIs. Tissue abbreviations are the same as in Fig. 1
Fig. 4Correlation between eQTL and sQTL analyses. a Correlation of p values of eQTLs and sQTLs. Average -log10(p values) of sQTLs were plotted against -log10(p values) of eQTLs divided in five bins. b effect size (|beta|) of sQTL versus eQTL. Average |beta| of sQTLs were plotted against eQTLs with their |beta| values divided into five bins. a, b Error bars are 95% confidence intervals. Only sQTL and eQTL pairs that shared the same gene, tissue, and pMEI were included in the analysis. c The number of pMEIs detected in the eQTL or sQTL analysis. d Count of pMEIs identified in the eQTL or sQTL analysis in different allele frequency groups. The pMEIs were divided into 10 groups based on their allele frequencies so that each group has an equal number of pMEIs
Fig. 5Experimental validation of eQTLs (a) and sQTLs (b). Gene names were labeled in the x-axis, and those underlined showed the effects in the same direction as predicted in the computational analysis. For sQTL experiments, one constitutive exon was included with the alternative exon. Results are shown for the ME-containing construct and the construct without the ME. In b, the direction of the arrow represents the strand of the ME on the chromosome. *p < 0.05, **p < 0.01, ***p < 0.001