Literature DB >> 28873439

CRISPR-FOCUS: A web server for designing focused CRISPR screening experiments.

Qingyi Cao1, Jian Ma2, Chen-Hao Chen3,4, Han Xu3,4, Zhi Chen1, Wei Li3,4, X Shirley Liu3,4.   

Abstract

The recently developed CRISPR screen technology, based on the CRISPR/Cas9 genome editing system, enables genome-wide interrogation of gene functions in an efficient and cost-effective manner. Although many computational algorithms and web servers have been developed to design single-guide RNAs (sgRNAs) with high specificity and efficiency, algorithms specifically designed for conducting CRISPR screens are still lacking. Here we present CRISPR-FOCUS, a web-based platform to search and prioritize sgRNAs for CRISPR screen experiments. With official gene symbols or RefSeq IDs as the only mandatory input, CRISPR-FOCUS filters and prioritizes sgRNAs based on multiple criteria, including efficiency, specificity, sequence conservation, isoform structure, as well as genomic variations including Single Nucleotide Polymorphisms and cancer somatic mutations. CRISPR-FOCUS also provides pre-defined positive and negative control sgRNAs, as well as other necessary sequences in the construct (e.g., U6 promoters to drive sgRNA transcription and RNA scaffolds of the CRISPR/Cas9). These features allow users to synthesize oligonucleotides directly based on the output of CRISPR-FOCUS. Overall, CRISPR-FOCUS provides a rational and high-throughput approach for sgRNA library design that enables users to efficiently conduct a focused screen experiment targeting up to thousands of genes. (CRISPR-FOCUS is freely available at http://cistrome.org/crispr-focus/).

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28873439      PMCID: PMC5584922          DOI: 10.1371/journal.pone.0184281

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)–CRISPR-associated system genes 9 (Cas9) system has been proving itself to be a prominent genome-editing technique [1-2]. Based on the CRISPR/Cas9 system, CRISPR screening is a high-throughput technology that enables researchers to examine the effect of perturbing tens of thousands of genes in parallel [3-5]. In a CRISPR-based screening experiment, single-guide RNA (sgRNA) pools designated to target different genomic loci are delivered into the cells by the lentivirus system, while the function of a gene can be inferred by comparing the abundance of cell populations bearing sgRNAs that target this particular gene across different conditions. CRISPR screening has been applied to interrogate gene functions in different contexts, including immune response [6-7], cancer progression [8-10] and metastasis [11], while recently this technique was being used to identify the functions of non-coding elements as well [12-18]. Many CRISPR screening experiments are conducted as unbiased, genome-scale approaches, where several genome-wide screening libraries are available [3,8-9,19]. On the other hand, focused screen is also conducted in many studies, where researchers use a small-scale library to target gene sets of specific interests (e.g., oncogenes/tumor suppressors for oncologists or cytokines for immunologists) [20], to validate hits of genome-wide screens [7], or to reduce the cost of screens (e.g., in in vivo settings [11]). To design libraries for CRISPR screens (especially focused screens), several computational tools can be applied [19,21-30]. However, most of these algorithms provide optimized sgRNAs for only one or several genes/sequences [22-23,29]. A few web-based tools with nominal batch design capacity require users to provide target sequence for each individual gene, have strict size limits on the sequence file uploaded, could only accept limited numbers (10–20 mostly) of gene IDs as input, or base their work on mining of public domain libraries [19,25-26,30]. Some other tools with substantial batch-design capacity are not web-based, and require users to download the whole database, compile the source code and fine tune up to dozens of parameters [21,24,27-28]. Therefore, a user-friendly automatic tool is needed to facilitate the design process of CRISPR screen experiments. Another issue of library design comes from the rational sgRNA evaluation and selection based on multiple criteria. Preferably, sgRNA should have fewer off-target effects (based on the alignment of spacer sequence across the whole genome [23,26-28]), and higher on-target knockout efficiencies (determined mainly by the sgRNA sequence context [19,31]), while it is proved necessary to consider both of them [9,32]. Other factors, like sequence conservation [20] and isoform structures of target genes [25,32], also have a marked impact on the results of the screen experiments. Once multiple scores are calculated for all candidate sgRNAs, a method will become necessary for sgRNAs prioritizing and filtering. Common practices include weight-averaging all scores by assigning a fixed (or empirical) weight for each criterion [19,24]; or applying the filters one by one, followed by ranking the candidates lexicographically [21]. These approaches might be too loose or too rigid in sgRNA selection, because the distribution of these scores might vary among different genes. To reach optimal sgRNA ranking results, an ideal method should consider all criteria, and summarize them appropriately in a context dependent way. In light of requirements from CRISPR screen experiments, we developed CRISPR-FOCUS, a web-based method for library design of CRISPR screens. With minimum user input, CRISPR-FOCUS selects different numbers of sgRNAs targeting up to one thousand genes in human or mouse genome. SgRNAs in the output are ranked by their summary score, which is a comprehensive evaluation of efficiency, specificity, as well as target sequence conservation and the target of multiple isoforms. To our knowledge, CRISPR-FOCUS is the only web-based tool that is specially optimized for CRISPR screening experiments.

Methods and implementation

Overview

The scheme of CRISPR-FOCUS is presented in Fig 1. All possible sgRNA candidates that have canonical Protospacer Adjacent Motif (PAM) in human and mouse genome are discovered and stored in the backend database. For each of the candidate sequence, all their attributes (described in details below) are pre-computed and stored. When user performs a query through the web interface, CRISPR-FOCUS will retrieve all possible candidates, prioritize them and return the top ones with highest scores.
Fig 1

The main scheme of CRISPR-FOCUS.

Criteria for sgRNA performance evaluation

To reach the best CRISPR-based knockout effect, the selection of sgRNAs should be optimized to (1) maximize their on-target cleavage effects (i.e., maximize efficiency), (2) minimize potential off-target effects (i.e., maximize specificity), (3) ensure the fidelity of their sequence with corresponding target loci (and to avoid regions with possible genomic variations), and (4) consider the importance of target region (evaluated by sequence conservation and isoform structure). CRISPR-FOCUS evaluates every sgRNA with the following indices.

Efficiency

The cleavage efficiency of a sgRNA is a major factor that determines the sensitivity of a screen experiment [4]. We used SSC [31], a computational algorithm that we previously developed to predict the cleavage efficiency of candidate sgRNAs. SSC takes spacer sequences as well as its flanking sequences as input, and uses Least Absolute Shrinkage and Selection Operator (LASSO) model to calculate an efficiency score for each sgRNA. CRISPR-FOCUS will filter sgRNAs with efficiency score below zero.

Specificity

For each candidate sgRNA, CRISPR-FOCUS first calculated its specificity score [33] to evaluate the overall similarity with putative off-target genomic loci. For sgRNAs that have perfect-match off-targets, we further divided them into three categories according to their off-target positions: (1) non-exon hits that do not overlap with exons of any coding or non-coding genes, (2) exon (but non-coding) hits that overlap with exons of non-coding genes, and (3) coding region hits that overlap with exons of coding genes. These sgRNAs may be considered in a rescue step (described later).

The effect of possible variations

SgRNAs are usually designed based on the reference genome sequence. The knockout efficiencies of these sgRNAs may be affected by the genomic sequences in cells that are different from the reference, especially mutation. CRISPR-FOCUS prefers sgRNAs that cover no or fewer mutations, including Single Nucleotide Polymorphisms (SNPs) and somatic mutations (especially in cancer). CRISPR-FOCUS retrieved SNP information from dbSNP [34], and annotated each sgRNA with all possible SNPs whose minor allele frequency (MAF) is higher than 0.05. sgRNAs that cover no or fewer variations will be preferentially chosen in the selection procedure. If screen experiments are conducted in cancer cells, users could also choose whether to avoid recurrent somatic mutations from different cancer types (using the COSMIC database [35]).

Sequence conservation

Regions in a gene with higher conservation rates across species are more likely to be important, as they usually encode conserved functional domains (like catalytic center for enzyme or DNA binding domain for transcriptional factor) whose knockout are more likely to disrupt gene function [20]. CRISPR-FOCUS annotated each sgRNA with an average phastCon conservation score [36] of the corresponding target position.

Isoform structure

Some genes have multiple isoforms (or transcripts) with different structures. To completely knockout a gene, a sgRNA should ideally target as many isoforms as possible. For each exon region, CRISPR-FOCUS calculates an “isoform commonality score”, which is defined as the percentage of isoforms that uses this exon. SgRNAs targeting exon regions with higher scores are preferred.

SgRNA selection and ranking

For each gene in the query, CRISPR-FOCUS first retrieves all genomic coordinates of all exons, and collects all sgRNA candidates that overlap with these regions. It will next perform a “filter and rescue” procedure (described in S1 File in detail) to rank all candidates and pick up the top ones. For the filtering step, CRISPR-FOCUS will filter sgRNAs that are empirically regarded as “bad” candidates, including sgRNAs that: (1) overlap with a SNP or mutation loci, (2) contains >40% guanine counts (‘G’s), which is observed to have higher off-target effects [37], or (3) are perfectly matched to putative off-target loci within the genome. The remaining ones will be ranked by a summary score, which is a weighted summary of efficiency, specificity, phastCon conservation and exon commonality score, while all the weights are dynamically defined by the Criteria Importance Through Intercriteria Correlation (CRITIC) method [38]. The purpose of this method is to determine the objective weight for each criterion in multiple criteria decision problems. Briefly in CRITIC, a value C is calculated to quantify the amount of information transmitted by criterion j, which is determined by both contrast intensity and conflict of the decision criteria. The contrast intensity is represented by the standard deviation of j, while the conflict is measured as the multiplicative aggregation of one minus correlation coefficients between j and the rest of criteria. Finally, object weight w is generated by normalizing C to the unity of all C values. If the number of remaining sgRNAs does not reach the desired number, CRISPR-FOCUS will execute a “rescue” step to retrieve more possible sgRNAs. At this stage, sgRNAs with potential off-target hits will be rescued in the following order: (1) sgRNAs with non-exon off-target hits only, (2) sgRNAs with off-target hits located on non-coding elements but not coding regions, (3) sgRNAs with off-target hits located on coding regions. sgRNAs within the same category will be prioritized based on their number of off-target hits, or by the summary score if two candidates have the same number of hits within the same category. A detailed flowchart of the whole procedure is depicted in Fig 2.
Fig 2

Workflow of the sgRNA selection/ranking process in CRISPR-FOCUS.

The sgRNA selection/ranking process in CRISPR-FOCUS is composed of (A) a filter step and (B) a rescue step.

Workflow of the sgRNA selection/ranking process in CRISPR-FOCUS.

The sgRNA selection/ranking process in CRISPR-FOCUS is composed of (A) a filter step and (B) a rescue step.

The web portal

The web portal of CRISPR-FOCUS (Fig 3) accepts a gene ID (either official gene symbol or RefSeq ID) list as input, and returns the designated number of sgRNA candidates per each gene. Users can input up to 1000 genes, and retrieve up to 30 sgRNAs per gene. Users can also select sgRNAs from either Homo sapiens or Mus musculus. The web portal applies Common Gateway Interface (CGI) to fetch input, while all backend scripts were written in Python programming language.
Fig 3

The main user interface of CRISPR-FOCUS.

A screenshot of the CRISPR-FOCUS website (http://cistrome.org/crispr-focus/)) is shown.

The main user interface of CRISPR-FOCUS.

A screenshot of the CRISPR-FOCUS website (http://cistrome.org/crispr-focus/)) is shown. CRISPR-FOCUS also provides other options to accommodate different requirements, including the selection of different sgRNA lengths (19 or 20nt) [5,39]. As commonly used constituents of CRISPR/Cas9 delivery system, human U6 promoter and spCas9 scaffold could be appended to the output, allowing users to synthesize the library directly from the output. Furthermore, CRISPR-FOCUS includes a set of negative control sgRNAs (targeting several known “safe-harbor” loci within human or mouse genome) [40-41] and positive control sgRNAs (targeting 58 essential ribosome genes identified in [31]). The input and output formats are described in Table A in S2 File. The execution of CRISPR-FOCUS is based on genome assembly hg38 (for human) and mm10 (for mouse), while full versions of public domain databases applied to annotate sgRNAs could be found in Table B in S2 File.

Results and discussion

CRISPR-FOCUS provides a high throughput platform for rational sgRNA library design of CRISPR screen experiment. It could accomplish a full scale design (up to 1000 target genes with 30 sgRNAs for each) within about twenty seconds. To our knowledge, CRISPR-FOCUS is now the only web-based sgRNA design tool that provides batch processing mode for custom CRISPR library design, as well as the most comprehensive tool in sgRNA performance evaluation. By shortening the distance from “silico to bench”, CRISPR-FOCUS facilitates the design of screening experiments and promotes high-throughput functional studies in various scopes.

The schema for sgRNA ranking and selection.

(DOCX) Click here for additional data file.

Additional supporting information.

(DOCX) Click here for additional data file.
  38 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells.

Authors:  Yuexin Zhou; Shiyou Zhu; Changzu Cai; Pengfei Yuan; Chunmei Li; Yanyi Huang; Wensheng Wei
Journal:  Nature       Date:  2014-04-09       Impact factor: 49.962

3.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes.

Authors:  Adam Siepel; Gill Bejerano; Jakob S Pedersen; Angie S Hinrichs; Minmei Hou; Kate Rosenbloom; Hiram Clawson; John Spieth; Ladeana W Hillier; Stephen Richards; George M Weinstock; Richard K Wilson; Richard A Gibbs; W James Kent; Webb Miller; David Haussler
Journal:  Genome Res       Date:  2005-07-15       Impact factor: 9.043

4.  CRISPRdirect: software for designing CRISPR/Cas guide RNA with reduced off-target sites.

Authors:  Yuki Naito; Kimihiro Hino; Hidemasa Bono; Kumiko Ui-Tei
Journal:  Bioinformatics       Date:  2014-11-20       Impact factor: 6.937

5.  Discovery of cancer drug targets by CRISPR-Cas9 screening of protein domains.

Authors:  Junwei Shi; Eric Wang; Joseph P Milazzo; Zihua Wang; Justin B Kinney; Christopher R Vakoc
Journal:  Nat Biotechnol       Date:  2015-05-11       Impact factor: 54.908

6.  Sequence determinants of improved CRISPR sgRNA design.

Authors:  Han Xu; Tengfei Xiao; Chen-Hao Chen; Wei Li; Clifford A Meyer; Qiu Wu; Di Wu; Le Cong; Feng Zhang; Jun S Liu; Myles Brown; X Shirley Liu
Journal:  Genome Res       Date:  2015-06-10       Impact factor: 9.043

7.  BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis.

Authors:  Matthew C Canver; Elenoe C Smith; Falak Sher; Luca Pinello; Neville E Sanjana; Ophir Shalem; Diane D Chen; Patrick G Schupp; Divya S Vinjamur; Sara P Garcia; Sidinh Luc; Ryo Kurita; Yukio Nakamura; Yuko Fujiwara; Takahiro Maeda; Guo-Cheng Yuan; Feng Zhang; Stuart H Orkin; Daniel E Bauer
Journal:  Nature       Date:  2015-09-16       Impact factor: 49.962

8.  sgRNAcas9: a software package for designing CRISPR sgRNA and evaluating potential off-target cleavage sites.

Authors:  Shengsong Xie; Bin Shen; Chaobao Zhang; Xingxu Huang; Yonglian Zhang
Journal:  PLoS One       Date:  2014-06-23       Impact factor: 3.240

9.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer.

Authors:  Simon A Forbes; David Beare; Prasad Gunasekaran; Kenric Leung; Nidhi Bindal; Harry Boutselakis; Minjie Ding; Sally Bamford; Charlotte Cole; Sari Ward; Chai Yin Kok; Mingming Jia; Tisham De; Jon W Teague; Michael R Stratton; Ultan McDermott; Peter J Campbell
Journal:  Nucleic Acids Res       Date:  2014-10-29       Impact factor: 16.971

10.  CRISPRseek: a bioconductor package to identify target-specific guide RNAs for CRISPR-Cas9 genome-editing systems.

Authors:  Lihua J Zhu; Benjamin R Holmes; Neil Aronin; Michael H Brodsky
Journal:  PLoS One       Date:  2014-09-23       Impact factor: 3.240

View more
  5 in total

1.  Evaluating the cleavage efficacy of CRISPR-Cas9 sgRNAs targeting ineffective regions of Arabidopsis thaliana genome.

Authors:  Afsheen Malik; Alvina Gul; Faiza Munir; Rabia Amir; Hadi Alipour; Mustafeez Mujtaba Babar; Syeda Marriam Bakhtiar; Rehan Zafar Paracha; Zoya Khalid; Muhammad Qasim Hayat
Journal:  PeerJ       Date:  2021-05-21       Impact factor: 2.984

2.  PRMT1 loss sensitizes cells to PRMT5 inhibition.

Authors:  Guozhen Gao; Liang Zhang; Oscar D Villarreal; Wei He; Dan Su; Ella Bedford; Phoebe Moh; Jianjun Shen; Xiaobing Shi; Mark T Bedford; Han Xu
Journal:  Nucleic Acids Res       Date:  2019-06-04       Impact factor: 16.971

3.  Pooled CRISPR screens with imaging on microraft arrays reveals stress granule-regulatory factors.

Authors:  Emily C Wheeler; Anthony Q Vu; Jaclyn M Einstein; Matthew DiSalvo; Noorsher Ahmed; Eric L Van Nostrand; Alexander A Shishkin; Wenhao Jin; Nancy L Allbritton; Gene W Yeo
Journal:  Nat Methods       Date:  2020-05-11       Impact factor: 28.547

Review 4.  Computational approaches for effective CRISPR guide RNA design and evaluation.

Authors:  Guanqing Liu; Yong Zhang; Tao Zhang
Journal:  Comput Struct Biotechnol J       Date:  2019-11-29       Impact factor: 7.271

5.  ERCC6L2 promotes DNA orientation-specific recombination in mammalian cells.

Authors:  Xiaojing Liu; Tingting Liu; Yafang Shang; Pengfei Dai; X Shirley Liu; Fei-Long Meng; Wubing Zhang; Brian J Lee; Min Huang; Dingpeng Yang; Qiu Wu; Liu Daisy Liu; Xiaoqi Zheng; Bo O Zhou; Junchao Dong; Leng-Siew Yeap; Jiazhi Hu; Tengfei Xiao; Shan Zha; Rafael Casellas
Journal:  Cell Res       Date:  2020-04-30       Impact factor: 25.617

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.