| Literature DB >> 24037378 |
Tuuli Lappalainen1, Michael Sammeth, Marc R Friedländer, Peter A C 't Hoen, Jean Monlong, Manuel A Rivas, Mar Gonzàlez-Porta, Natalja Kurbatova, Thasso Griebel, Pedro G Ferreira, Matthias Barann, Thomas Wieland, Liliana Greger, Maarten van Iterson, Jonas Almlöf, Paolo Ribeca, Irina Pulyakhina, Daniela Esser, Thomas Giger, Andrew Tikhonov, Marc Sultan, Gabrielle Bertier, Daniel G MacArthur, Monkol Lek, Esther Lizano, Henk P J Buermans, Ismael Padioleau, Thomas Schwarzmayr, Olof Karlberg, Halit Ongen, Helena Kilpinen, Sergi Beltran, Marta Gut, Katja Kahlem, Vyacheslav Amstislavskiy, Oliver Stegle, Matti Pirinen, Stephen B Montgomery, Peter Donnelly, Mark I McCarthy, Paul Flicek, Tim M Strom, Hans Lehrach, Stefan Schreiber, Ralf Sudbrak, Angel Carracedo, Stylianos E Antonarakis, Robert Häsler, Ann-Christine Syvänen, Gert-Jan van Ommen, Alvis Brazma, Thomas Meitinger, Philip Rosenstiel, Roderic Guigó, Ivo G Gut, Xavier Estivill, Emmanouil T Dermitzakis.
Abstract
Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project--the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24037378 PMCID: PMC3918453 DOI: 10.1038/nature12531
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 49.962
Figure 1Transcriptome variation
a) Spearman rank correlation of replicate samples, based on mRNA exon and miRNA quantifications of 5 individuals sequenced 8 and 7 times for mRNA and miRNA, respectively, and separated by the individual or the sequencing lab being the same or different. The quantifications have been normalized only for the total number of mapped reads (see Fig. S11 for correlations after normalization). b) The proportion of expression level variation (as opposed to splicing) of the total transcription variation between individuals in each population, measured per gene. c) Proportion of genes with differential expression levels and/or transcript usage between population pairs, out of the total listed on the right-hand side. d) Network of significant miRNA families (P<0.001; yellow) and their significantly associated mRNA targets (P<0.05; purple). The edges display negative (green) and positive (red) associations.
Numbers of transcriptome features with a QTL (FDR 5%)
| Total | EUR (n=373) | YRI (n=89) | Union | |
|---|---|---|---|---|
| exon eQTL | 12981 genes | 7390 | 2369 | 7825 |
| gene eQTL | 13703 genes | 3259 | 501 | 3773 |
| transcript ratio QTL | 7855 genes | 620 | 83 | 639 |
| mirQTL | 644 miRNAs | 57 | 15 | 60 |
| Transcribed repeat eQTL | 43875 repeats | 5763 | 1055 | 6069 |
Figure 2Transcriptome QTLs
a) Enrichment of EUR exon eQTLs in functional annotations for the 1st, 2nd, 5th and 10th best associating eQTL variant per gene, relative to a matched null set of variants denoted by the horizontal line. The numbers are −log10 p-values of a Fisher test between the best eQTL and the null. b) Classification of changes caused by transcript ratio QTLs. c) The rank of the best Omni2.5M SNP among the significant EUR eQTL variants per gene. d) DGKD gene locus where an intronic SNP rs838705 is associated to calcium levels (red), and the top eQTL variant 21 kb downstream (blue) is a very likely causal variant, close the TSS of two transcripts in the MEF2A,C binding region.
Figure 3Allele-specific effects on expression and transcript structure
a) Sharing of allele-specific expression (ASE) and transcript structure (ASTS) signals: the distribution of ASTS p-value of the sites with significant (p<0.005) ASE in the same individual, and vice versa. The ASE p-values are calculated from sites sampled to exactly 30 reads. The numbers denote the pi1 statistic measuring the enrichment of low p-values. b) Frequency of significant ASE event in the population (x-axis) and their effect size (|0.5 – REF/TOTAL|), calculated per ASE SNP. Only ASE SNPs with >=20 heterozygote individuals with >=30 reads were included, and the estimates were corrected for coverage bias and false positives by sampling and permutations. c) Enrichment of variants in regulatory annotations relative to a matched null distribution for the most significant eQTL variants, and for the subset of these that are also rSNPs. Categories with highest amount of data are shown (see Fig. S36 for all categories, see also Fig. 2a).
Figure 4Transcriptome effects of loss-of-function variants
A) Nonsense-mediated decay due to premature stop codon variants was measured using allele-specific expression. The distribution of non-reference allele ratios (on the y-axis) for premature stop variants sorted on the x-axis according to derived allele frequency, split to sites predicted to trigger and escape NMD. The dots denote the median across individuals, and the vertical lines show the range of ratios for variants carried by several individuals. The grey vertical lines denote derived allele frequencies of 0, 0.001 and 0.01. B) Exon inclusion scores for variable exons for individuals that carry 0, 1 or 2 copies of variants that destroy a splice motif, with p-value from Mann-Whitney test.