| Literature DB >> 26063738 |
Han Xu1, Tengfei Xiao2, Chen-Hao Chen3, Wei Li1, Clifford A Meyer1, Qiu Wu4, Di Wu5, Le Cong6, Feng Zhang6, Jun S Liu5, Myles Brown7, X Shirley Liu1.
Abstract
The CRISPR/Cas9 system has revolutionized mammalian somatic cell genetics. Genome-wide functional screens using CRISPR/Cas9-mediated knockout or dCas9 fusion-mediated inhibition/activation (CRISPRi/a) are powerful techniques for discovering phenotype-associated gene function. We systematically assessed the DNA sequence features that contribute to single guide RNA (sgRNA) efficiency in CRISPR-based screens. Leveraging the information from multiple designs, we derived a new sequence model for predicting sgRNA efficiency in CRISPR/Cas9 knockout experiments. Our model confirmed known features and suggested new features including a preference for cytosine at the cleavage site. The model was experimentally validated for sgRNA-mediated mutation rate and protein knockout efficiency. Tested on independent data sets, the model achieved significant results in both positive and negative selection conditions and outperformed existing models. We also found that the sequence preference for CRISPRi/a is substantially different from that for CRISPR/Cas9 knockout and propose a new model for predicting sgRNA efficiency in CRISPRi/a experiments. These results facilitate the genome-wide design of improved sgRNA for both knockout and CRISPRi/a studies.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26063738 PMCID: PMC4509999 DOI: 10.1101/gr.191452.115
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
A collection of CRISPR screening data sets used in the study
Figure 1.A schematic view of procedures for sgRNA selection and categorization. (A,B) Venn diagrams showing the overlap of essential genes between human HL-60 and KBM-7 cells (A) and two biological replicates in mouse ESC JM8 cells (B). (C–E) Scatter plots showing the log2 fold-change of sgRNA abundance in negative selection upon cell growth. (C) sgRNAs targeting essential ribosomal genes in Wang data. (D) sgRNAs targeting essential nonribosomal genes in Wang data. (E) sgRNAs targeting essential genes in Koike-Yusa data. The dashed lines represent the threshold chosen to determine efficient and inefficient sgRNAs.
Figure 2.Preference of nucleotide sequences that impact sgRNA efficiency. (A–C) Logos showing the sequence preference of the three sgRNA sets defined in Figure 1. The height of the nucleotides represents the log odds ratio of nucleotide frequency between efficient and inefficient sgRNAs. (D) A logo showing the selected features that reproducibly impact sgRNA efficiency in the three sgRNA sets. The height of the nucleotides represents the coefficients computed from the Elastic-Net. (E,F) Scatter plots showing the correlation of sequence preference for sgRNAs targeting ribosomal versus nonribosomal genes in Wang data (E) and sgRNAs in Wang data versus Koike-Yusa data (F). Each dot represents a nucleotide in a 40-bp region centered by the spacer. The sequence preference is measured as the log2 odds ratio of nucleotide frequency between efficient and inefficient sgRNAs.
Figure 3.Experimental validation of the sequence model in predicting sgRNA efficiency. (A) A SURVEYOR gel picture (top) and a bar chart (bottom) showing the indel rates of the sgRNAs predicted to be inefficient (low sequence score) or efficient (high sequence score). The sgRNAs were selected to target the AAVS1 locus. The experiment was conducted in 293T cells. (B) A scatter plot showing the correlation of the predicted sequence scores and the protein knockout efficiency for sgRNAs targeting AR and FOXA1 in LNCaP-abl cells. The knockout efficiency is measured as the percentage of reduction in protein level upon sgRNA infection.
Figure 4.Predicting sgRNA efficiency from sequence context in CRISPR/Cas9 knockout screens. (A) ROC curves showing the predictive power of the proposed model. (Red) Threefold cross-validation on the sgRNAs targeting ribosomal genes in Wang data; (blue) trained on ribosomal genes, and tested on nonribosomal genes in Wang data; (green) trained on Wang data, and tested on Koike-Yusa data. The black error bars on the red curve represent standard deviations computed from 10 iterations of random sampling in cross-validation. (B) ROC curves comparing the performance of the proposed model and the Doench et al. (2014) model in predicting sgRNA efficiency in Shalem data. (C) Scatter plot showing the correlation between the predicted sequence score and the relative sgRNA abundance for ABL1 and BCR in KBM-7 cells. The P-values were computed based on the Pearson correlation test. (D) Box plot showing the distributions of correlations between sequence scores and relative sgRNA abundances for essential and nonessential genes in KBM-7. The distribution of random background was computed by permuting the sequence scores within each gene in the data set. (E) Distributions of relative sgRNA abundances in KBM-7 cells, where the sgRNAs were categorized based on the predicted efficiency and the essentiality of their targeted genes.
Figure 5.Assessment of the sequence models in predicting sgRNA efficiency in positive selection experiments. (A–E) Bar charts showing the capability of selection and the experimental reproducibility for predicted efficient and inefficient sgRNAs. The tested sgRNAs target the genes known to be involved in the resistance to different drug treatment or external stimulus. (F) ROC curves comparing the performance of the proposed model and the Doench et al. (2014) model in predicting sgRNA efficiency in positive selection experiments. In the evaluation, the positive test set consists of the sgRNAs selected in all replicates in B–E; and the negative test set consists of those not selected in B–E.
Figure 6.Preference of the length and sequence context of spacers in CRISPR/dCas9 inhibition (CRISPRi) and activation (CRISPRa) screens. (A) Distribution of phenotype scores (Gilbert et al. 2014) for sgRNAs targeting the top 500 essential genes and the control sgRNAs in CRISPRi experiments. The dashed line represents the threshold chosen to determine efficient and inefficient sgRNAs. (B) A bar chart showing the effect of spacer length on sgRNA efficiency. (C) Logos showing the sequence preference of spacers. The height of the nucleotides represents the log odds ratio of nucleotide frequency between efficient and inefficient sgRNAs. The nucleotide at the 5′ end of the spacers is fixed to be guanines in the library design and is excluded from the logos. (D) Bar charts comparing the performance of CRISPRi model and CRISPR/Cas9 KO model in predicting sgRNA efficiency in CRISPRi negative selection, CRISPRi positive selection upon CTx-DTA treatment, and CRISPRa negative selections in Gilbert data and Konermann data. The length of spacers is 20 nt. Cross-validation was used to assess the performance of the CRISPRi model in the CRISPRi negative selection experiment. The error bars represent the standard deviations in 10 iterations of threefold cross validation. The P-value was computed using a paired t-test.