| Literature DB >> 28394350 |
Mauro Pala1,2,3, Zachary Zappala4, Mara Marongiu1, Xin Li2, Joe R Davis4, Roberto Cusano1, Francesca Crobu1, Kimberly R Kukurba4, Michael J Gloudemans5, Frederic Reinier6, Riccardo Berutti3,6, Maria G Piras1, Antonella Mulas1, Magdalena Zoledziewska1, Michele Marongiu1, Elena P Sorokin4, Gaelen T Hess4, Kevin S Smith2, Fabio Busonero1, Andrea Maschio1, Maristella Steri1, Carlo Sidore1, Serena Sanna1, Edoardo Fiorillo1, Michael C Bassik4, Stephen J Sawcer7, Alexis Battle8, John Novembre9, Chris Jones6, Andrea Angius1, Gonçalo R Abecasis10, David Schlessinger11, Francesco Cucca1,3, Stephen B Montgomery2,4.
Abstract
Genetic studies of complex traits have mainly identified associations with noncoding variants. To further determine the contribution of regulatory variation, we combined whole-genome and transcriptome data for 624 individuals from Sardinia to identify common and rare variants that influence gene expression and splicing. We identified 21,183 expression quantitative trait loci (eQTLs) and 6,768 splicing quantitative trait loci (sQTLs), including 619 new QTLs. We identified high-frequency QTLs and found evidence of selection near genes involved in malarial resistance and increased multiple sclerosis risk, reflecting the epidemiological history of Sardinia. Using family relationships, we identified 809 segregating expression outliers (median z score of 2.97), averaging 13.3 genes per individual. Outlier genes were enriched for proximal rare variants, providing a new approach to study large-effect regulatory variants and their relevance to traits. Our results provide insight into the effects of regulatory variants and their relationship to population history and individual genetic risk.Entities:
Mesh:
Year: 2017 PMID: 28394350 PMCID: PMC5411016 DOI: 10.1038/ng.3840
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Expression traits with at least one eQTL
We report the number of tests performed and the number of significant QTL associations for different expression traits at a false discovery rate of 5%. Associations that are significant by BH are significant after Bonferroni correction and Benjamini-Hochberg adjustment (see Methods)
| Measurement | Trait type | # of tested traits | # of traits with at least one QTL (FDR 5%) | |
|---|---|---|---|---|
|
| ||||
| BH | By permutation | |||
| Gene-level | Protein coding | 11,477 | 8,381 (73%) | 9,019 (79%) |
| lncRNA | 1,694 | 991 (69%) | 1,258 (74%) | |
| miRNA precursors | 172 | 48 (27%) | 55 (32%) | |
| Other | 1,900 | 935 (39%) | 835 (44%) | |
|
| ||||
| Total | 15,243 | 10,329 (68%) | 11,167 (73%) | |
|
| ||||
| Isoform-proportion | Protein coding | 11,116 | 3,865 (35%) | 4,515 (41%) |
| lncRNA | 826 | 335 (41%) | 373 (45%) | |
| Other | 661 | 213 (32%) | 323 (49%) | |
|
| ||||
| Total | 12,603 | 4,413 (35%) | 5,120 (41%) | |
Isoforms results are reported at gene-level (only one sQTL per gene is reported)
Independent QTLs segmented by gene type
We report the number of independent QTLs for gene-level and isoform-level analyses. Isoform results are grouped by their respective gene.
| Max # of independent QTLs | Gene-level (# of genes) | Isoform-proportion (# of genes) | |||
|---|---|---|---|---|---|
|
| |||||
| Protein coding | lncRNA | miRNA precursors | Protein coding | lncRNA | |
| 1 | 4,215 | 598 | 44 | 3,489 | 281 |
| 2 | 2,833 | 386 | 8 | 799 | 60 |
| 3 | 1,235 | 170 | 2 | 165 | 22 |
| 4 | 428 | 66 | 0 | 36 | 8 |
| 5 | 175 | 18 | 1 | 18 | 1 |
| 6 | 82 | 6 | 0 | 3 | 1 |
| 7 | 27 | 5 | 0 | 5 | 0 |
| 8 | 14 | 3 | 0 | 0 | 0 |
| ≥9 | 10 | 6 | 0 | 0 | 0 |
|
| |||||
| Total | 9,019 | 1,258 | 55 | 4,515 | 373 |
Figure 1QTLs show larger effect sizes in Sardinia compared to Europe
The distribution of Spearman correlation coefficients (absolute value) is shown for (a) top expression QTLs (eQTLs) and (b) top allele-specific expression QTLs (aseQTLs) in Sardinia, Geuvadis, and DGN. Top eQTLs and aseQTLs in Sardinia show increased correlations relative to Geuvadis and DGN. To make analyses comparable across studies, 188 unrelated individuals from each study were uniformly processed and analyses were performed on a subset of genes that were quantifiable in all three studies.
Figure 2Differentiated eQTLs in Sardinia
(a) Sardinian eQTLs are plotted based on their allele frequency in Europe (measured in the 1000 Genomes Project) and Sardinia. Blue points represent eQTLs in the top 1% of the |ΔAF| distribution. Sample sizes: |ΔAF| > 0.00 (n = 19,108 eQTLs), > 0.05 (n = 6,793), > 0.10 (n = 2,151), > 0.15 (n = 567), > 0.20 (n = 134), and Top 1% (n = 192). (b) eQTLs with larger allele frequency differences compared to Europe have longer tracts of LD decay as potential evidence for recent positive selection. These are compared to eQTLs that have comparable allele frequencies in Sardinia and Europe (allele frequencies within ±2.5%; blue lines) as well as randomly selected, distance to TSS-matched, non-eQTL variants with large allele frequency changes (black line). (c) eQTLs linked to multiple sclerosis variants and malaria-associated genes are both enriched in allele frequency difference changes between Sardinia and Europe. (d) The significance of the top ten trait enrichments for differentiated eQTLs (red = ΔAF, blue = FST) after Bonferroni correction for all possible tests. Traits with less than 10 eQTLs in LD were filtered out.
Figure 3Outlier gene expression in Sardinian trios
(a) An example of a significant gene expression outlier effect that segregates in a single Sardinia trio. The father and daughter both under express the RINL gene and share a rare splicing variant. (b) A scatterplot showing the sharing of extreme gene expression patterns between parents and children in 61 Sardinian trios, with significant outliers highlighted (orange = 5% FDR and yellow = 10% FDR). (c) Heterozygous sites in outlier genes show elevated levels of allelic imbalance (AI) in outlier individuals (red) versus the rest of the population (gray). Allelic imbalance (AI) measures the absolute deviance of the reference allele ratio from 0.5 at heterozygous sites. (d) Correlation matrices for gene expression and allele-specific expression within outlier trios suggest that the extreme regulatory effects are restricted to the affected individuals and not primarily a family-specific event due to a shared environment. (e) The relationship between outlier gene expression and allelic imbalance (AI) in outlier (red) and non-outlier (gray) individuals. The mean ± one s.d. is shown for each bin.
Figure 4Properties of rare, shared variants near outlier genes
(a) Relative enrichment in the number of rare variants transmitted between outlier parents and children versus non-outlier parents and children. Relative enrichments were calculated in overlapping windows of 5 kb for the 250 kb regions adjacent to the TSS and TES of outlier genes. Enrichment is measured as the relative risk of finding rare shared variants in outlier versus non-outlier lineages in each window. (b) Shared rare variants in outlier lineages are enriched for functional regions of chromatin in peripheral blood and splice donor/acceptor regions. Enrichments are shown as the log odds ratio derived from Fisher’s exact tests with 95% confidence intervals. (c) The position of shared rare variants is plotted relative to the TSS against the regression coefficient derived from the rare eQTL analysis. The color represents under-expression (blue) and over-expression (yellow) rare eQTLs, and the size indicates relative significance. (d) Metrics of conservation, evolutionary constraint, fitness, and deleteriousness can identify the most significant rare eQTLs. The mean ± one s.d. is shown for each bin.
Figure 5Gene expression patterns in carriers of rare splicing variants
We identified five splicing variants in under-expression outliers - for each variant, the expression level of the affected gene is shown in red for heterozygous carriers in the Sardinia cohort and gray for individuals homozygous for the reference allele. The rare splicing variants for each gene are given here: chr12:g.121570899G>T for P2RX7; chr5:g.68490523G>A for CENPH; chr1:g.152009388C>T for S100A11; chr19:g.39368871C>T for RINL; and chr6:g.33237597C>G for VPS52.