| Literature DB >> 29208628 |
Nicholas E Banovich1, Yang I Li2, Anil Raj2, Michelle C Ward1,3, Peyton Greenside4, Diego Calderon4, Po Yuan Tung1,3, Jonathan E Burnett1, Marsha Myrthil1, Samantha M Thomas1, Courtney K Burrows1, Irene Gallego Romero1, Bryan J Pavlovic1, Anshul Kundaje2, Jonathan K Pritchard2,5,6, Yoav Gilad1,3.
Abstract
Induced pluripotent stem cells (iPSCs) are an essential tool for studying cellular differentiation and cell types that are otherwise difficult to access. We investigated the use of iPSCs and iPSC-derived cells to study the impact of genetic variation on gene regulation across different cell types and as models for studies of complex disease. To do so, we established a panel of iPSCs from 58 well-studied Yoruba lymphoblastoid cell lines (LCLs); 14 of these lines were further differentiated into cardiomyocytes. We characterized regulatory variation across individuals and cell types by measuring gene expression levels, chromatin accessibility, and DNA methylation. Our analysis focused on a comparison of inter-individual regulatory variation across cell types. While most cell-type-specific regulatory quantitative trait loci (QTLs) lie in chromatin that is open only in the affected cell types, we found that 20% of cell-type-specific regulatory QTLs are in shared open chromatin. This observation motivated us to develop a deep neural network to predict open chromatin regions from DNA sequence alone. Using this approach, we were able to use the sequences of segregating haplotypes to predict the effects of common SNPs on cell-type-specific chromatin accessibility.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29208628 PMCID: PMC5749177 DOI: 10.1101/gr.224436.117
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.Systematic measurements of molecular phenotypes across reprogramming and differentiation. (A) Summary of data collection. (B) Correlation matrix of gene expression from our samples and samples from ENCODE (*) and GTEx. Our LCL samples cluster most closely with LCLs samples from ENCODE, while our iPSCs and iPSC-CM lines cluster most closely with H1-ESC (ENCODE) and heart (GTEx), respectively. Dark purple: GTEx bone marrow. (C) Violin plots representing per individual log2 of the average square distance from the mean (Supplemental Materials) for iPSC, LCL, and iPSC-CM gene expression levels. Plots for chromatin accessibility and DNA methylation levels are shown in Supplemental Figure S7.
Figure 2.Mechanisms of cell-type–specific regulatory variation. (A) QQ-plot of LCL and iPSC eQTL signal conditioned on LCL- and iPSC-specific caQTLs. Higher enrichment of LCL (iPSC) eQTLs among LCL (iPSC) caQTLs links cell-type–specific regulation of chromatin accessibility to cell-type–specific regulation of gene expression. (B) Chromatin accessibility signal around cell-specific caQTLs in corresponding cell types (black rectangles) and in other cell types. A lack of accessibility in other cell types suggests that cell-specific caQTLs often affect cell-specific accessible regions, e.g., C. (C,D) Examples of cell-type–specific regulatory effects of genetic variation. SNP is correlated with accessibility of an iPSC-specific open chromatin region in iPSCs only (C) or of a nonspecific open chromatin region in LCLs only (D). (E) Scatter plot of iPSC and LCL chromatin accessibility at iPSC-specific caQTLs. About 20% of iPSC-specific caQTLs are accessible in LCLs. Plot of LCL-specific caQTLs in Supplemental Figure S15. (F) Example of an iPSC-specific caQTL that is also an iPSC-specific eQTL. SNP rs9367277 is associated with both chromatin accessibility of a strong enhancer and with expression of the CD2AP gene in iPSCs. Interestingly, rs9367277 lies in a transposable element of the ERVL family, which is preferentially activated in embryonic stem cells (Kunarso et al. 2010).
Figure 3.Predicting chromatin activity from sequence using deep neural networks. (A) OrbWeaver is a four-layered neural network where the parameters of the first convolutional layer are fixed to known position weight matrices of human transcription factors. The activation function used in each of the convolutional and dense layers is the Rectified Linear Unit (ReLU). (B) The OrbWeaver model for one cell type poorly predicts open chromatin in other cell types (gray), highlighting that the model captures cell-type–specific regulatory elements. (C) Transcription factors important for each locus were identified using DeepLIFT scores; this panel illustrates the top key TFs for each of the seven categories of chromatin activity and the fraction of loci explained by them. (D) An example of a locus that is open in iPSCs and LCLs but was identified to be an iPSC-specific caQTL. The subpanels on the left show the raw ATAC-seq signal in each cell type stratified by genotype of the most significant SNP of the iPSC caQTL. The subpanels on the right show the marginal change in OrbWeaver predictions due to mutating the reference base at each position to an alternate base. The sequence shown corresponds to the shaded portion on the left subpanels, and the reported Δpred values correspond to the change between alleles of the most significant SNP. The TF important for this locus as identified by DeepLIFT is YB-1, a factor highly expressed in all three cell types. (E) Scatter plot comparing the observed allelic imbalance at iPSC caQTLs, estimated by WASP, and the predicted difference in median chromatin activity between haplotypes tagged by the two alleles of the causal SNP. Note that the OrbWeaver model was learned using the reference genome sequence alone and had no information regarding genetic variation in the population when learning the model parameters.
Figure 4.Modeling complex disease using iPSC-derived cells. (A) Heat map of enrichment P-values of GWAS signals near genes with cell-type–specific expression (Supplemental Materials). (B) Enrichments of SNPs associated with four different diseases in different partitions of the genome (computed using LDscore regression; point estimates ±95% confidence intervals). In both analyses, the autoimmune traits (multiple sclerosis [MS] or Crohn's disease [CD] and rheumatoid arthritis [RA]) show enrichment near genes and chromatin that are more active in LCLs, and the heart-related traits (coronary artery disease [CAD] and myocardial infarction [MI]) are enriched in iPSC-CM active regions.