Literature DB >> 27153724

Cas-Database: web-based genome-wide guide RNA library design for gene knockout screens using CRISPR-Cas9.

Jeongbin Park¹, Jin-Soo Kim², Sangsu Bae³.

Abstract

MOTIVATION: CRISPR-derived RNA guided endonucleases (RGENs) have been widely used for both gene knockout and knock-in at the level of single or multiple genes. RGENs are now available for forward genetic screens at genome scale, but single guide RNA (sgRNA) selection at this scale is difficult.
RESULTS: We develop an online tool, Cas-Database, a genome-wide gRNA library design tool for Cas9 nucleases from Streptococcus pyogenes (SpCas9). With an easy-to-use web interface, Cas-Database allows users to select optimal target sequences simply by changing the filtering conditions. Furthermore, it provides a powerful way to select multiple optimal target sequences from thousands of genes at once for the creation of a genome-wide library. Cas-Database also provides a web application programming interface (web API) for advanced bioinformatics users.
AVAILABILITY AND IMPLEMENTATION: Free access at http://www.rgenome.net/cas-database/ CONTACT: sangsubae@hanyang.ac.kr or jskim01@snu.ac.kr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Species

Mesh：

Substances：
RNA, Guide

Year: 2016 PMID： 27153724 PMCID： PMC4920116 DOI： 10.1093/bioinformatics/btw103

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

RNA guided endonucleases (RGENs) derived from the Type II clustered regularly interspaced short palindromic repeats (CRISPR)-Cas (CRISPR associated) system, an adaptive immune response in bacteria and archaea, have been usefully harnessed in many genome engineering applications such as gene knockout and knock-in in various organisms (Doudna and Charpentier, 2014; Kim and Kim, 2014; Sander and Joung, 2014; Shalem ). Recently, a few groups have undertaken genome-wide Cas9-mediated genetic screens (Chen ; Gilbert ; Hart ; Koike-Yusa ; Konermann ; Shalem ; Wang ; Zhou ) in human and other mammalian cells. The selection of target sequences is an initial, rate-limiting step in RGEN applications. We and other groups have developed a number of web-based online tools or command-line programs for single-guide RNA (sgRNA) design or off-target site identification (Aach ; Bae ; Cradick ; Doench ; Heigwer ; Hsu ; Lei ; Montague ; Naito ; O’Brien and Bailey, 2014; Park ; Sander ; Upadhyay and Sharma, 2014; Xiao ; Xie ). Although they are very useful for choosing sgRNAs that target one gene, selection of sgRNAs at the genome scale is still challenging. Previously, two genome-wide databases for SpCas9 design (Hodgkins ; MacPherson and Scherf, 2015) have been built conceptually. However, although they are useful for selecting sgRNAs that target a small number of genes, neither offers an easy way to select optimal target sequences at once from thousands of genes for genome-wide library construction in a variety of organisms. Here we describe Cas-Database, a web-based genome-wide gRNA design tool for SpCas9 nucleases for genome-scale screening experiments. Cas-Database contains all available targets of SpCas9 nucleases that recognizes 5′-NGG-3′ protospacer-adjacent motif (PAM) sequences in all coding sequence (CDS) regions throughout the whole genome of a selected organism, based on the Ensembl database (Cunningham ). Each target site has the following information: GC content, relative cleavage position in the CDS, constitutive exon coverage, a microhomolgy-associated out-of-frame score (Bae ), and potential off-target sequences with up to 2-nt mismatches. In addition, JBrowse (Skinner ) is used to display all available target sites graphically in a zoomable interface with genomic location information for the user’s convenience. Cas-Database basically provides a fast and easy way to select optimal target sequences in genes of interest from a variety of organisms. Additionally, it offers a powerful way to select optimal target sequences in many genes simultaneously. Selecting sgRNA sequences that target each gene is as easy as online shopping because of the use of a ‘cart’ system, similar to what is used in online shopping malls, which was implemented using cutting-edge web development techniques such as AJAX (asynchronous JavaScript and XML). Currently, Cas-Database supports sgRNA design in 12 different organisms, including five vertebrates (human, rat, mouse, pig and zebrafish), one insect (Drosophila melanogaster), one nematode (Caenorhabditis elegans) and five plants (Arabidopsis thaliana, tomato, banana, grapes and soybean). All processes required to generate the genome-wide database have now been automated using a scripts pipeline so that we can easily update information about the existing organisms or add new organisms to the database. We are planning to continue to add and support more organisms in the future.

2 Methods

2.1 Target selection for SpCas9 nucleases

The latest whole genome sequence and associated annotation data from each organism in the Ensembl database were automatically retrieved and saved to our server using a homemade program written in Python language. To allow easy access to the annotation database, we used the Biomart protocol (Kasprzyk, 2011; Kasprzyk ) for communicating with the Ensembl server. After retrieving the genome sequence and annotation data associated with each organism, we first searched for all possible targets that contain 5′-NGG-3′ PAM sequences in CDS regions and then calculated several characteristics of each target, e.g. sgRNA GC content, relative cleavage position in the CDS, common exon coverage throughout the various transcripts of the gene, and a microhomology-associated out-of-frame score.

2.2 Searching for potential off-target sites

After identifying all possible targets for SpCas9 nucleases in CDS regions and calculating the characteristics of each target, we next used the Cas-OFFinder program (Bae ) to search for potential off-target sites that differ from each on-target site by up to 2nt and that contain 5′-NGG-3′ or 5′-NAG-3′ PAM sequences. Because this step is very time-consuming, we ran Cas-OFFinder on an OpenCL-enabled cluster computer such as Chundoong (http://chundoong.snu.ac.kr/) for all sites in parallel. During this process, we periodically validated that each searching node had finished correctly. After validation, all Cas-OFFinder output files were retrieved from the Chundoong server and moved to our local storage site. Then, for each selected target the total number of potential off-targets was counted; information about the genomic location of these potential off-targets, such as whether they reside in CDS, UTR, intron or intergenic regions, was also evaluated.

2.3 Inserting information into the database

We rearranged all of the resulting data and constructed a SQL database that contains all possible SpCas9 nuclease targets in CDS regions and related characteristics of each target. We chose PostgreSQL (http://www.postgresql.org/) as the database server, which showed the best performance in our case.

2.4 Web interface

We built a web interface for Cas-Database using the Python web framework Django (https://www.djangoproject.com/). Because of its modern and easily-implemented database application program interface (database API), the algorithm for storing and retrieving data from the PostgreSQL database is simplified and web site maintenance is easy. For creating the HTML part of the interface, we used the web development framework Bootstrap (http://getbootstrap.com/) and the JavaScript framework JQuery (https://jquery.com/). We also implemented asynchronous data upload and download to achieve a fast response time for each user using the asynchronous JavaScript and XML (AJAX) technique. Because the results are retrieved from the server asynchronously, user requests will be run in parallel rather than in sequence. In conclusion, searching and filtering functions will be operated and displayed immediately after a user changes the input data or filtering condition.

3 Results and discussion

3.1 Cas-Database overview

Cas-Database provides a simple and easy way to design optimal sgRNAs. All available genes from a selected organism are listed on the main page of the Cas-Database website, as shown in Figure 1A. Users can easily search for desired genes by querying with the gene symbol, Ensembl ID or gene description. The search results are instantly displayed on the screen after every keystroke through the use of the AJAX web technique (Fig. 1B).

Fig. 1.

Cas-Database overview. (A) All available genes from selected organisms are listed with their Ensembl ID, description and biotype information on the main page of the Cas-Database web site. Users can preview gene information and manually select optimal sgRNAs for each gene using the ‘Quick Info’ function, or select sgRNAs for many genes automatically using the ‘Add to Collection’ function. (B) Cas-Database’s top search panel provides a powerful searching function. Users can easily search for a desired gene by querying the gene symbol, Ensembl ID or gene description. The results are updated instantly on the screen via the AJAX web technique. (C) When working with hundreds or thousands of genes, users can upload text files that contain Ensembl IDs or gene symbols separated by line breaks or spaces For a desired gene, users can use the ‘Quick Info’ function or the ‘Add to collection’ function. Basically, the ‘Quick Info’ function is useful for selecting sgRNAs manually for each gene; one can easily preview all available targets in a specific gene by clicking on the ‘Quick info’ button. Moreover, users can add additional desired genes—just as items are added to a shopping cart online—and select sgRNAs for many genes simultaneously using the ‘Add to collection’ function. In addition, hundreds or thousands of genes can be imported from text files that contain gene symbols or Ensembl ID lists, with individual entries separated by line breaks or spaces (Fig. 1C).

3.2 Use of the ‘Quick Info’ function

Cas-Database provides an easy way to preview sgRNAs that target a specific gene. Clicking on the ‘Quick info’ button immediately provides detailed information about the gene, transcripts and targets in a new dialog box (Fig. 2A–C). Furthermore, the genomic locus of each target is displayed on the graphical genome browser, JBrowse (Skinner ) (Fig. 2B). If one clicks on a target in the browser, the corresponding target (Fig. 2C) will be shown on the Cas-Database web page for added convenience.

Fig. 2.

The ‘Quick Info’ function of Cas-Database. Clicking on the ‘Quick Info’ button as described for Figure 1A results in the rapid display of useful information about the selected gene: (A) gene description, transcript variant information that includes the CDS region, (B) location of sgRNAs in the gene visualized by JBrowse and (C) all filtered sgRNAs with useful descriptions. Additionally, all target sites are listed below, with transcript ID, GC content, genomic location, relative position in the CDS, exon coverage throughout all transcript variants of the gene, a microhomology-associated out-of-frame score and the number of mismatched nucleotides. Users can alter filter conditions at the top of the dialog Cas-Database also offers a powerful filtering function. Using the filtering feature at the top of the web page, as shown in Figure 2A, users can change the filtering conditions and rapidly preview the new results. Filtering criteria include GC content, relative position in the CDS, exon coverage throughout all transcript variants of the gene, a microhomology-associated out-of-frame score (Bae ), excluding four thymidine nucleotides in tandem (Braglia ) and the number of mismatched nucleotides.

3.3 Use of the ‘Add to Collection’ function

Cas-Database also provides a novel function that allows the selection of sgRNAs from hundreds or thousands of genes at once through the use of a ‘cart’ system. Users can either collect desired genes on the main web page by clicking on the ‘Add to Collection’ button (Fig. 3A) or by uploading a text file as discussed above and shown in Figure 1C. After all desired genes are collected, clicking on the ‘Select optimal sgRNAs’ button will open the results page, which will list all available sgRNAs filtered by the default conditions as shown in Figure 3B. Users can easily change the filtering conditions, including the total count of sgRNAs for each gene, in the filter section that appears at the top of the resulting page (Fig. 3B). Because the AJAX technique is used, the retrieving processes run independently of each other, resulting in fast loading speeds; e.g. the loading time for 1000 genes is about 2 min in the default conditions.

Fig. 3.

The ‘Quick Info’ function of Cas-Database. The ‘Add to Collection’ function of Cas-Database. (A) Cas-Database provides a unique and easy way to select many genes at once by the implementation of a ‘cart’ system. Users can collect desired genes by clicking on the ‘Add to Collection’ button or by uploading a text file as described in the Figure 1C legend. Note that one can also select genes from different organisms. By clicking on the ‘Select optimal gRNAs’ button, users will proceed to the next step. (B) The results page will list all available sgRNAs filtered by the default conditions on the top panel. A colored indicator represents the selection status for each gene, e.g. green (selected completely), yellow (selected partially) or red (not selected at all). A user can download only the genes for which sgRNAs were successfully designed, and repeat the process for the remaining genes by changing the filter criteria and clicking on the ‘Remove “green” genes from list’ button Whether enough sgRNAs have been selected for each gene after filtering is indicated by colored indicators—green (selected completely), yellow (selected partially) or red (not selected at all), as shown in Figure 3B. In this step, users can download either sequences for all selected sgRNAs targeting each gene or for some sgRNAs selected completely from each gene with a green indicator. In other words, users can download sequences only for genes for which sgRNAs were selected completely under the initial filtering conditions. After that, users can eliminate genes with green indicators from the list by clicking the ‘Remove green genes from list’ button, and then can alter the filtering conditions again. After resetting the filtering conditions, users can download the genes with green indicators again and repeat this process until sgRNAs have been selected completely in the most remained genes. Finally, if a few genes are still left, users can manually select targets for those genes using the ‘Quick info’ function on the main page. As a result, users can select optimal sgRNAs from hundreds or thousands of genes and download a list of sgRNAs for every gene with the target specific information.

3.4 Web API

For users familiar with bioinformatics, Cas-Database also provides a web application programming interface (web API). If users send queries through hypertext transfer protocol (HTTP) requests to our database server, the results will be returned in the JavaScript oriented notation (JSON) format. Thus, researchers can easily create their own simple homemade scripts for automated data retrieval. Details about the web interface are described in the Supplementary data.

3.5 Update of Cas-Database

The entire process of database creation, from retrieving an organism’s genome sequence and its associated annotation data from the ENSEMBL server to creating a new database, is totally automated by our homemade scripts. The time to build a new database depends on the organism’s genome size, e.g. the creation of a new database for zebrafish (1.32 GB) took about three days. Currently, Cas-Database supports 12 different model organisms, including five vertebrates (human, rat, mouse, pig and zebrafish), one insect (D.melanogaster), one nematode (C.elegans) and five plants (A.thaliana, tomato, banana, grapes and soybean), and we are planning to update Cas-Database continuously to obtain the most recent versions of genomic sequences for the existing organisms. We also plan to add data from other organisms in the ENSEMBL database. In addition, we will update the database to allow for alternative CRISPR/Cas nucleases such as Cprf1 (Zetsche ).

4 Conclusion

Cas-Database is an easy-to-use web-based tool for designing sgRNAs for SpCas9 nucleases on a genome scale. It can be applied to construct optimal sgRNA libraries that target thousands of coding sequences in 12 different organisms for genome-wide knockout screening experiments. Because Cas-Database contains all available targets in CDS regions as well as target-related information, including data about potential off-target sites, users can easily access the data through the interactive web interface or web API. The web interface was made using cutting-edge web development techniques such as AJAX, so the website is highly responsive to user input and the output results load quickly.

35 in total

1. CasOT: a genome-wide Cas9/gRNA off-target searching tool.

Authors: An Xiao; Zhenchao Cheng; Lei Kong; Zuoyan Zhu; Shuo Lin; Ge Gao; Bo Zhang
Journal: Bioinformatics Date: 2014-01-02 Impact factor: 6.937

2. Flexible guide-RNA design for CRISPR applications using Protospacer Workbench.

Authors: Cameron Ross MacPherson; Artur Scherf
Journal: Nat Biotechnol Date: 2015-06-29 Impact factor: 54.908

Review 3. Genome editing. The new frontier of genome engineering with CRISPR-Cas9.

Authors: Jennifer A Doudna; Emmanuelle Charpentier
Journal: Science Date: 2014-11-28 Impact factor: 47.728

4. Microhomology-based choice of Cas9 nuclease target sites.

Authors: Sangsu Bae; Jiyeon Kweon; Heon Seok Kim; Jin-Soo Kim
Journal: Nat Methods Date: 2014-07 Impact factor: 28.547

5. High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells.

Authors: Yuexin Zhou; Shiyou Zhu; Changzu Cai; Pengfei Yuan; Chunmei Li; Yanyi Huang; Wensheng Wei
Journal: Nature Date: 2014-04-09 Impact factor: 49.962

6. Genetic screens in human cells using the CRISPR-Cas9 system.

Authors: Tim Wang; Jenny J Wei; David M Sabatini; Eric S Lander
Journal: Science Date: 2013-12-12 Impact factor: 47.728

Review 7. High-throughput functional genomics using CRISPR-Cas9.

Authors: Ophir Shalem; Neville E Sanjana; Feng Zhang
Journal: Nat Rev Genet Date: 2015-04-09 Impact factor: 53.242

8. CRISPRdirect: software for designing CRISPR/Cas guide RNA with reduced off-target sites.

Authors: Yuki Naito; Kimihiro Hino; Hidemasa Bono; Kumiko Ui-Tei
Journal: Bioinformatics Date: 2014-11-20 Impact factor: 6.937

9. WGE: a CRISPR database for genome engineering.

Authors: Alex Hodgkins; Anna Farne; Sajith Perera; Tiago Grego; David J Parry-Smith; William C Skarnes; Vivek Iyer
Journal: Bioinformatics Date: 2015-05-14 Impact factor: 6.937

10. sgRNAcas9: a software package for designing CRISPR sgRNA and evaluating potential off-target cleavage sites.

Authors: Shengsong Xie; Bin Shen; Chaobao Zhang; Xingxu Huang; Yonglian Zhang
Journal: PLoS One Date: 2014-06-23 Impact factor: 3.240

12 in total

1. Integrated design, execution, and analysis of arrayed and pooled CRISPR genome-editing experiments.

Authors: Matthew C Canver; Maximilian Haeussler; Daniel E Bauer; Stuart H Orkin; Neville E Sanjana; Ophir Shalem; Guo-Cheng Yuan; Feng Zhang; Jean-Paul Concordet; Luca Pinello
Journal: Nat Protoc Date: 2018-04-12 Impact factor: 13.491

2. An automatic hypothesis generation for plausible linkage between xanthium and diabetes.

Authors: Arida Ferti Syafiandini; Gyuri Song; Yuri Ahn; Heeyoung Kim; Min Song
Journal: Sci Rep Date: 2022-10-20 Impact factor: 4.996

3. Evaluation and Design of Genome-Wide CRISPR/SpCas9 Knockout Screens.

Authors: Traver Hart; Amy Hin Yan Tong; Katie Chan; Jolanda Van Leeuwen; Ashwin Seetharaman; Michael Aregger; Megha Chandrashekhar; Nicole Hustedt; Sahil Seth; Avery Noonan; Andrea Habsid; Olga Sizova; Lyudmila Nedyalkova; Ryan Climie; Leanne Tworzyanski; Keith Lawson; Maria Augusta Sartori; Sabriyeh Alibeh; David Tieu; Sanna Masud; Patricia Mero; Alexander Weiss; Kevin R Brown; Matej Usaj; Maximilian Billmann; Mahfuzur Rahman; Michael Constanzo; Chad L Myers; Brenda J Andrews; Charles Boone; Daniel Durocher; Jason Moffat
Journal: G3 (Bethesda) Date: 2017-08-07 Impact factor: 3.154

Review 4. The Current State and Future of CRISPR-Cas9 gRNA Design Tools.

Authors: Laurence O W Wilson; Aidan R O'Brien; Denis C Bauer
Journal: Front Pharmacol Date: 2018-07-12 Impact factor: 5.810

5. A Universal, Genomewide GuideFinder for CRISPR/Cas9 Targeting in Microbial Genomes.

Authors: Michelle Spoto; Changhui Guan; Elizabeth Fleming; Julia Oh
Journal: mSphere Date: 2020-02-12 Impact factor: 4.389

6. Futuristic CRISPR-based biosensing in the cloud and internet of things era: an overview.

Authors: Abdullahi Umar Ibrahim; Fadi Al-Turjman; Zubaida Sa'id; Mehmet Ozsoz
Journal: Multimed Tools Appl Date: 2020-06-08 Impact factor: 2.577

7. Massively parallel in vivo CRISPR screening identifies RNF20/40 as epigenetic regulators of cardiomyocyte maturation.

Authors: Nathan J VanDusen; Julianna Y Lee; Weiliang Gu; Catalina E Butler; Isha Sethi; Yanjiang Zheng; Justin S King; Pingzhu Zhou; Shengbao Suo; Yuxuan Guo; Qing Ma; Guo-Cheng Yuan; William T Pu
Journal: Nat Commun Date: 2021-07-21 Impact factor: 17.694

8. Engineering Point Mutant and Epitope-Tagged Alleles in Mice Using Cas9 RNA-Guided Nuclease.

Authors: Marina Gertsenstein; Lauryl M J Nutter
Journal: Curr Protoc Mouse Biol Date: 2018-03

9. Comparison of CRISPR and Marker-Based Methods for the Engineering of Phage T7.

Authors: Aurelija M Grigonyte; Christian Harrison; Paul R MacDonald; Ariadna Montero-Blay; Matthew Tridgett; John Duncan; Antonia P Sagona; Chrystala Constantinidou; Alfonso Jaramillo; Andrew Millard
Journal: Viruses Date: 2020-02-10 Impact factor: 5.818

10. High-Throughput Profiling of Cas12a Orthologues and Engineered Variants for Enhanced Genome Editing Activity.

Authors: Dan Zhu; Junyi Wang; Di Yang; Jianzhong Xi; Juan Li
Journal: Int J Mol Sci Date: 2021-12-10 Impact factor: 5.923