| Literature DB >> 31727847 |
Teng Fei1,2,3,4,5,6,7, Wei Li5,7,8,9, Jingyu Peng3,4,5,6,7, Tengfei Xiao3,4,5,6,7, Chen-Hao Chen5,7, Alexander Wu5,7,10, Jialiang Huang7, Chongzhi Zang11, X Shirley Liu12,7, Myles Brown13,4,5,6.
Abstract
Although millions of transcription factor binding sites, or cistromes, have been identified across the human genome, defining which of these sites is functional in a given condition remains challenging. Using CRISPR/Cas9 knockout screens and gene essentiality or fitness as the readout, we systematically investigated the essentiality of over 10,000 FOXA1 and CTCF binding sites in breast and prostate cancer cells. We found that essential FOXA1 binding sites act as enhancers to orchestrate the expression of nearby essential genes through the binding of lineage-specific transcription factors. In contrast, CRISPR screens of the CTCF cistrome revealed 2 classes of essential binding sites. The first class of essential CTCF binding sites act like FOXA1 sites as enhancers to regulate the expression of nearby essential genes, while a second class of essential CTCF binding sites was identified at topologically associated domain (TAD) boundaries and display distinct characteristics. Using regression methods trained on our screening data and public epigenetic profiles, we developed a model to predict essential cis-elements with high accuracy. The model for FOXA1 essentiality correctly predicts noncoding variants associated with cancer risk and progression. Taken together, CRISPR screens of cis-regulatory elements can define the essential cistrome of a given factor and can inform the development of predictive models of cistrome function.Entities:
Keywords: CRISPR screen; CTCF; FOXA1; cistrome; enhancer
Mesh:
Substances:
Year: 2019 PMID: 31727847 PMCID: PMC6911175 DOI: 10.1073/pnas.1908155116
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Genome-wide CRISPR screens for FOXA1 binding sites in T47D cells, an ER-positive breast cancer cell line. (A) Top essential genes in T47D cells, identified by genome-wide CRISPR gene screens. A smaller RRA score (identified by the MAGeCK algorithm) indicates a stronger negative selection of the corresponding gene. (B) The design of the FOXA1 screening library. FOXA1 binding sites are preselected as indicated, followed by an sgRNA scanning to identify all possible guides within the binding sites. sgRNAs that have low predicted specificity or cleavage efficacy are then filtered. For the remaining sgRNAs, up to 20 guides that are close to the binding site summit are then selected. (C) An overview of the functions of FOXA1 binding sites in T47D cells, including strong intergenic binding sites (blue dots) and essential binding sites near essential genes (red dots). For each binding site, the β-score, a measurement of gene selection in the screen, is calculated using the MAGeCK-VISPR algorithm that we previously developed. A positive (or negative) β-score indicates the gene/binding site is positively (or negatively) selected, respectively. (D) The functional analysis of genes near essential FOXA1 binding sites using the GREAT prediction tool (46). Terms related to breast cancer are highlighted in red. (E) The percentage of all (and essential) FOXA1 binding sites that are within 100 kb of essential genes in T47D cells, and the percentage of all (and essential) genes near essential FOXA1 binding sites. Essential genes are genes with the 10% lowest β-scores from genome-wide CRISPR gene screens in T47D cells. ***P < 0.001. (F) The epigenetic features of essential binding sites vs. nonessential binding sites.
A summary of the CTCF and FOXA1 cistrome-targeting libraries
| FOXA1 library | CTCF library | Total | |
| Binding sites | 6,110 | 5,564 | 11,674 |
| sgRNAs (12 to 20 sgRNAs per binding site) | 96,962 | 97,002 | 193,964 |
| Essential genes | 146 | 146 | |
| Gene-targeting sgRNAs (5 sgRNAs per gene) | 730 | 730 | |
| AAVS1-targeting sgRNAs | 267 | 267 |
Fig. 2.Features of FOXA1 binding sites in T47D and LNCaP cells. (A) The chromatin features of selected binding sites with statistical significance coming out of the cistrome screens. (B) Possible features that are tested for the association with binding site functions in the screens. (C) The rankings of all features associated with the functions of FOXA1 binding sites. For each feature, we compare its signal distribution between the top 5% of essential sites vs. other sites. The average of P values (calculated using the Mann–Whitney U test) across 2 cell lines is used to measure the relevance of each feature. (D) The β-scores of all sites in T47D and LNCaP cells. Sites are colored by their appearances in both cell lines: Sites that only appear in T47D or appear in both cells are colored blue and red, respectively. (E and F) The β-score distribution of the strongest FOXA1 or ESR1 sites vs. others in T47D and LNCaP cells. (G) The binding signals of the top essential FOXA1 binding sites vs. nonessential sites in LNCaP cells.
Fig. 3.Genome-wide CRISPR screens for CTCF binding sites. (A) CTCF binding site selection procedure in screening library design. (B) The β-scores of all CTCF binding sites in T47D and LNCaP cells. Binding sites are colored by their appearances in both cell lines: Binding sites that only appear in T47D (or LNCaP) are colored in blue (or green), while common binding sites are colored in red, respectively. (C) The cumulative distribution of β-scores of T47D cell-specific and LNCaP cell-specific CTCF binding sites in T47D (red) and LNCaP (blue) cells. The P values are calculated by the Kolmogorov–Smirnov test. (D) The chromatin features of selected binding sites with statistical significance coming out of the cistrome screens. (E) The rankings of all features associated with the functions of CTCF binding sites. For each feature, we compare its signal distribution between the top 5% essential binding sites vs. other binding sites. The average of P values (calculated using the Mann–Whitney U test) across 2 cell lines is used to measure the relevance of each feature. (F) The β-score distribution of the strongest CTCF binding sites vs. others, and the binding strength of the top essential CTCF binding sites vs. other binding sites in T47D cells. (G) The percentage of all (and essential) CTCF binding sites that are within 100 kb of essential genes in T47D cells, and the percentage of all (and essential) genes near essential CTCF binding sites. Essential genes are genes with the 10% lowest β-scores from genome-wide CRISPR gene screens in T47D cells. **P < 0.01, ***P < 0.001. (H) The epigenetic features of essential binding sites vs. nonessential binding sites.
Fig. 4.Essential CTCF binding sites display distinct types of CTCF binding. (A) The β-score distribution of genes near essential CTCF binding sites, compared with all genes in the genome. The P value is calculated using the Mann–Whitney U test. (B) The functional analysis of genes near essential CTCF binding sites using the GREAT prediction tool (46). Enriched terms related to DNA damage and stress response are highlighted in red. (C) The β-score distribution of binding sites in the boundaries of TADs. (D) The β-score distribution of binding sites in CTCF anchors, or regions that contact chromosome loops and with CTCF motifs that are head-to-head oriented. The anchor annotation is extracted from Hi-C experiments (31). (E) The H3K27ac signal strength distribution of essential binding sites in CTCF anchors, other essential binding sites, and all CTCF binding sites.
Fig. 5.Predicting and validating essential enhancers. (A) The receiver operator characteristic (ROC) curves of different approaches for predicting FOXA1 binding site essentialities using different combinations of features. The AUC values using the SVM and individual features are also shown. (B) The precision-recall characteristic (PR) curves of A. The area under the PR curve (AUPR) of different approaches is shown. (C) An overview of the design of the pgRNA library and the screening strategy. Up to 25 pgRNAs are designed to knock out each binding site. (D) The ROC curve of the primary screening results in pgRNA screens. (E) The validation procedure of the prediction model using 125 DNase I binding sites that are not included in the sgRNA screening library. (F) The ROC curve of the prediction model (and 2 single features) in predicting the functions of 125 DNase I binding sites. (G) The enrichment of breast cancer-related variants over essential enhancers in T47D cells. The adjusted P values (using the χ2 test) are shown. Circle sizes indicate the fold enrichment over essential enhancers (>2 or <2). (H) The predicted score distribution of all FOXA1-bounded enhancers, and enhancers that carry variants of “breast cancer (early onset).” The P value is calculated using the Wilcoxon rank-sum test. (I and J) The same analysis of G and H in LNCaP cells.
A summary of the secondary paired-guide RNA screening library
| Binding sites | Paired-guide RNAs | |
| Enhancers near negatively selected genes (ESR1, FOXA1, GATA3, MYC) | 92 | 2,291 (25 pairs per binding site) |
| Enhancers near positively selected genes (PTEN, TSC1, RB1, CSK) | 46 | 1,150 (25 pairs per binding site) |
| Selected hits in CTCF/FOXA1 screens | 58 | 1,450 (25 pairs per binding site) |
| Promoters of the selected genes | N.A. | 259 (∼25 pairs per promoter) |
| Positive control (pairs targeting AAVS1 loci and the exons of essential genes) | 146 genes | 730 (5 pairs per gene) |
| Negative control (pairs targeting AAVS1 loci) | N.A. | 400 |
N.A., not applicable.