| Literature DB >> 26398868 |
Sándor Spisák1,2, Kate Lawrenson3, Yanfang Fu4,5,6,7, István Csabai8, Rebecca T Cottman4,5,6,9, Ji-Heui Seo1,2, Christopher Haiman3,10, Ying Han3, Romina Lenci1,2, Qiyuan Li1,2,11, Viktória Tisza1,12, Zoltán Szállási12,13,14, Zachery T Herbert15, Matthew Chabot1, Mark Pomerantz1, Norbert Solymosi16, Simon A Gayther3,17, J Keith Joung4,5,6,7,9, Matthew L Freedman1,2,17.
Abstract
The vast majority of disease-associated single-nucleotide polymorphisms (SNPs) mapped by genome-wide association studies (GWASs) are located in the non-protein-coding genome, but establishing the functional and mechanistic roles of these sequence variants has proven challenging. Here we describe a general pipeline in which candidate functional SNPs are first evaluated by fine mapping, epigenomic profiling, and epigenome editing, and then interrogated for causal function by using genome editing to create isogenic cell lines followed by phenotypic characterization. To validate this approach, we analyzed the 6q22.1 prostate cancer risk locus and identified rs339331 as the top-scoring SNP. Epigenome editing confirmed that the rs339331 region possessed regulatory potential. By using transcription activator-like effector nuclease (TALEN)-mediated genome editing, we created a panel of isogenic 22Rv1 prostate cancer cell lines representing all three genotypes (TT, TC, CC) at rs339331. Introduction of the 'T' risk allele increased transcription of the regulatory factor 6 (RFX6) gene, increased homeobox B13 (HOXB13) binding at the rs339331 region, and increased deposition of the enhancer-associated H3K4me2 histone mark at the rs339331 region compared to lines homozygous for the 'C' protective allele. The cell lines also differed in cellular morphology and adhesion, and pathway analysis of differentially expressed genes suggested an influence of androgens. In summary, we have developed and validated a widely accessible approach that can be used to establish functional causality for noncoding sequence variants identified by GWASs.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26398868 PMCID: PMC4746056 DOI: 10.1038/nm.3975
Source DB: PubMed Journal: Nat Med ISSN: 1078-8956 Impact factor: 53.440
Figure 1Overview of the CAUSEL pipeline
(a) Fine Mapping – Initial GWAS identifies a trait-associated locus (green). Fine mapping reduces the numbers of SNP candidate causal variants (blue). (b) Epigenomic Profiling – Analysis of colocalization of SNPs with epigenetic features can further prioritize causal variants for epigenome and genome editing. and Epigenome Editing - the regulatory potential of the candidate SNPs can be interrogated using epigenome-editing reagents. (c) Genome Editing - genome editing of the candidate SNP can be altered using nuclease-induced HDR. Because the efficiency of the HDR can be low, single cell cloning and genotyping is necessary. (d) Phenotypic Characterization - The isogenic cell lines can undergo phenotypic assessment for a range of traits, including measurement of gene expression levels and cell-based functional assays. Abbreviations: GWAS = Genome Wide Association Study; DNaseI = DnaseI Hypersensitivity peak; HM = Histone Marks including, H3K3Me2 and H3K27Ac sites; TF = transcription factor binding sites; DBD = DNA Binding Domain (TALE or gRNA mediated dCAS9); LSD1 = Lysine-specific histone demethylase 1A; VP64 = VP64 artificial transcription factor activator; T= mutant allele; C = wild type allele; gDNA = genomic DNA; FokI = FokI nuclease; DSB = Double Stranded Break; HDR = Homology Directed Repair; ssODN = single stranded oligonucleotide HDR template, carrying the required alteration; eDNA = edited DNA; RT-qPCR = quantitative real-time PCR; CBA = cell based assay
Figure 2Genetic and epigenetic landscape of the 6q22.1 region
(a) Fine mapping of the 6q22 prostate cancer risk region (Data from [13]). Each dot represents a SNP and its association with prostate cancer risk (–log(P value) from a 1-degree freedom Wald test) in a multiethnic cohort (N=18,031 cases and N=18,030 controls) is plotted on the y-axis. Rs339331 is shown in purple. The colors represent the degree of linkage disequilibrium with rs339331. (b) Fine mapping revealed 27 correlated variants in this region (green), however only the rs339331 (red) SNP co-localizes with multiple epigenetic features. (c) Publicly available epigenomic data, including DNaseI peaks (light blue) and H3K4me2 (dark blue). Transcription factor ChIP-seq in LNCaP cells reveals binding of HOXB13 (royal blue), FoxA1 (yellow), and AR (green). The red track demonstrates the AR ChIP-seq enrichment from a human prostate tumor tissue sample. (d) Genomic locations of rs339331, HOXB13 binding site (red), amplification oligos for HOXB13 ChIP-qPCR (blue) and TALE-LSD1 or TALE activator DNA binding locations (DBD represents DNA Binding Domain; golden (DBD_1) and purple (DBD_2)). (e) HOXB13 ChIP-qPCR performed in primary prostate tumors. RFX6 expression in response to site-specific recruitment of TALE-LSD1 (f) and TALE-ACT (g) to rs339331. All ChIP-qPCR and gene expression calculations are based on the mean ± standard deviation of three independent experiments (n=9). P values were obtained using the unpaired two-tailed Student’s t-test; **P<0.01.
Figure 3High-throughput sequencing pipeline and barcoding strategy
(a) Identification of isogenic cell lines by single cell cloning. This process consists of colony transfer into tissue culture plates and making replica plates; one for DNA extraction and genotyping and one for continued growth of colonies. The rest of the figure focuses on the sequencing pipeline. In this example, there are five separate plates for genotyping. (b) Each plate (represented by the blue bars) is barcoded by a unique amplicon and each amplicon contains the area of interest (denoted by the red hash mark). Each amplicon is shifted by 2–3 basepairs relative to the previous one. PCR based target amplicon generation using cell lysate from the plates is performed. (c) Within an individual plate, conventional dual barcoding is performed. Thus, each well (e.g., well A1) from separate plates will have the same conventional dual barcodes, but will be distinguished by the amplicon, which is unique for each plate. (d) Amplicon sequencing and variant identification after high throughput sequencing. Each clone is identified by its plate number (amplicon barcode) and well position (conventional barcode). A full computational pipeline has been developed and is available upon request. See Methods for full details. (e) Each clone can have one of three possible outcomes: unchanged, indels created by the NHEJ pathway, and knock-ins created by the HDR pathway.
Figure 4Sequencing reveals allelic diversity created by genome editing
(a) Summary plot representing the 459 alleles identified by sequencing 1,832 clones (1,920 clones – 40 failed reactions – 48 controls); each row is an allele and black lines refer to deletions. All of the deletion variants were listed and sorted based on the starting position of the deletion. The four bases are color coded (A, C, G, T); an insertion (I) larger than 1bp is indicated by light blue cells, a deletion (D) is demonstrated by black cells. Sequence logos show a 21 base pair core region surrounding rs339331 (top) and the positions and deletion frequencies for each nucleotide in the core sequence (bottom). (b) Heatmap showing the frequency of clones with a certain deletion size (x-axis) and insertion or substitution size (y-axis). (c) Frequency distribution for different mutation classes across the 1,832 clones. Genotypes are indicated by C or T, “Mut” is defined as a substitution, insertion, or deletion.
Figure 5Genotypic status at rs339331 causally affects RFX6 gene expression, HOXB13 binding, and the H3K4me2 histone modification
(a) Sanger sequencing of the two TALEN HDR-modified (C/C and T/T) and parental (C/T) 22RV1 cell lines. The rs339331 position is in larger font (b) RFX6 mRNA abundance was evaluated by qPCR in two clones of each HDR modified cell line (CC_1/169 and CC_2/096 represent two independent CC clones and TT_1/160 and TT_5/138 represent two independent TT clones). (c) Genomic location and DNA chromatogram of two SNPs in the RFX6 gene in the 22Rv1 cell line: rs339331 (intron 4) and the rs12202378 heterozygous reporter SNP in intron 12. (d) Each row represents one of the rs339331 genotypes (green) and the two columns represent rs12202378 (blue) sequenced in genomic DNA (gDNA) and heteronuclear cDNA. Genomic (gDNA) was used as a control for allelic balance. HOXB13 enrichment (e) and H3K4me2 enrichment (f) were measured by ChIP qPCR at the rs339331 site. All calculations are based on the mean ± standard deviation of three independent experiments (n=9). P values were obtained using the unpaired two-tailed Student’s t-test; ***P<0.001.
Figure 6Genotype at rs339331 alters morphology, cellular adhesion, and transcripts that are predicted to be regulated by androgens
(a) The 22Rv1 cell lines of each genotype were cultured in serum-containing medium for 48 hours, and analyzed by phase microscopy, 100× magnification. (b) The TT clones are significantly more adherent to plastic and to collagen IV; Mean ± standard deviation of three independent experiments (n=12). P values were obtained using the unpaired two-tailed Student’s t-test; *** P<0.001. (c) Venn diagram displaying the number of differentially expressed genes for each pairwise comparison between the isogenic cell lines. (d) Androgenic compounds and the androgen receptor (AR) (grey) are among the most significant predicted upstream regulators of genes differentially expressed between TT and CC clones.