Literature DB >> 30587106

Web-based design and analysis tools for CRISPR base editing.

Gue-Ho Hwang¹, Jeongbin Park^2,3, Kayeong Lim^4,5, Sunghyun Kim⁴, Jihyeon Yu^1,6, Eunchong Yu¹, Sang-Tae Kim⁷, Roland Eils^2,8, Jin-Soo Kim^4,5, Sangsu Bae^9,10.

Abstract

BACKGROUND: As a result of its simplicity and high efficiency, the CRISPR-Cas system has been widely used as a genome editing tool. Recently, CRISPR base editors, which consist of deactivated Cas9 (dCas9) or Cas9 nickase (nCas9) linked with a cytidine or a guanine deaminase, have been developed. Base editing tools will be very useful for gene correction because they can produce highly specific DNA substitutions without the introduction of any donor DNA, but dedicated web-based tools to facilitate the use of such tools have not yet been developed.
RESULTS: We present two web tools for base editors, named BE-Designer and BE-Analyzer. BE-Designer provides all possible base editor target sequences in a given input DNA sequence with useful information including potential off-target sites. BE-Analyzer, a tool for assessing base editing outcomes from next generation sequencing (NGS) data, provides information about mutations in a table and interactive graphs. Furthermore, because the tool runs client-side, large amounts of targeted deep sequencing data (< 1 GB) do not need to be uploaded to a server, substantially reducing running time and increasing data security. BE-Designer and BE-Analyzer can be freely accessed at http://www.rgenome.net/be-designer/ and http://www.rgenome.net/be-analyzer /, respectively.
CONCLUSION: We develop two useful web tools to design target sequence (BE-Designer) and to analyze NGS data from experimental results (BE-Analyzer) for CRISPR base editors.

Entities: Chemical Disease Gene Species

Keywords: Base editing; CRISPR; Genome editing; NGS analysis; Web-based tool

Mesh：

Year: 2018 PMID： 30587106 PMCID： PMC6307267 DOI： 10.1186/s12859-018-2585-4

Source DB: PubMed Journal: BMC Bioinformatics ISSN： 1471-2105 Impact factor: 3.169

Background

CRISPR-Cas (clustered regularly interspaced short palindromic repeats and CRISPR associated), an immune system in bacteria and archaea that targets nucleic acids of viruses and plasmids, is now widely used as a genome editing tool because of its convenience and high efficiency [1-5]. The most popular endonuclease, type II CRISPR-Cas9, makes DNA double-stranded breaks (DSBs) at a desired site with the help of its single-guide RNA (sgRNA) [6-8]. The DSBs provoke the cell’s own repair systems: error-prone non-homologous end joining (NHEJ) and error-free homology-directed repair (HDR), resulting in gene knock-out and knock-in (or gene correction), respectively. However, it is relatively difficult to induce gene corrections such as one nucleotide substitutions because HDR occurs rarely in mammalian cells compared to NHEJ [9]. Furthermore, Cas9 can frequently induce DSBs at undesired sites with sequences similar to that of the sgRNA [10, 11]. Recently, CRISPR-mediated base editing tools have been developed. These tools enable the direct conversion of one nucleotide to another without producing DSBs in the target sequence and without the introduction of donor DNA templates. The initial base editors (named BEs), composed of dCas9 [12] or nCas9 [13] linked to a cytidine deaminase such as APOBEC1 (apolipoprotein B editing complex 1) [14] or AID (activation-induced deaminase) [15], substitute C for T. Later, adenine base editors (ABEs) were constructed by using tRNA adenine deaminase (TadA), evolved to enable the direct conversion of A to G in DNA [16]. Because of their ability to make highly specific DNA substitutions, these base editing tools will be very useful for gene correction [17-22], but to the best of our knowledge, a user-friendly and freely-available web-based tool for their design and analysis has not yet been developed. Here, we present dedicated web toolkits, named BE-Designer and BE-Analyzer, to aid researchers in choosing sgRNAs to target desired DNA sequences and to assess base editing outcomes from next generation sequencing (NGS) data. BE-Designer provides researchers with a list of all possible sgRNAs for targeting given input DNA sequences, along with useful information including their potential off-target sites, for 319 registered organisms, presently. After introducing CRISPR base editors into a population of cells, researchers ultimately perform targeted deep sequencing to measure mutation efficiencies and analyze DNA mutation patterns. BE-Analyzer analyzes and summarizes NGS data in a user’s web browser; because of the advantages of JavaScript, there is no need to upload data to a server or install local tools. BE-Analyzer also optionally accepts control data from CRISPR-untreated cells and displays the output in an additional nucleotide mutation table so that users can easily compare the data from CRISPR-treated and untreated cells.

Implementation

BE-designer overview

BE-Designer is a sgRNA designing tool for CRISPR base editors. BE-Designer rapidly provides a list of all possible sgRNA sequences from a given input DNA sequence along with useful information: possible editable sequences in a target window, relative target positions, GC content, and potential off-target sites. Basically, the interface of BE-Designer was developed using Django as a backend program.

Input panels in BE-designer

BE-Designer presently provides analysis for CRISPR base editors based on SpCas9 from Streptococcus pyogenes, which recognizes 5’-NGG-3′ protospacer-adjacent motif (PAM) sequences, as well as SpCas9 variants: SpCas9-VQR (5’-NGAN-3′), SpCas9-EQR (5’-NGAG-3′), SpCas9-VRER (5’-NGCG-3′), xCas9 3.7 (TLIKDIV SpCas9; 5’-NGR-3′ and 5’-NG-3′) [23-25]. BE-Designer also provides analysis for CRISPR base editors based on StCas9 from Streptococcus thermophilus (5’-NNAGAAW-3′), CjCas9 from Campylobaccter jejuni (5’-NNNVRYAC-3′), SaCas9 from Staphylococcus aureus (5’-NNGRRT-'3) and its engineered form, SaCas9-KKH (5’-NNNRRT-'3) [26-28]. Currently, BE-Designer supports sgRNA design in 319 different organisms, including vertebrates, insects, plants, and bacteria. Users can input DNA sequences directly in the target sequence panel of the web site or upload a text file containing DNA sequences. The DNA sequence should be a raw string comprised of IUPAC nucleotide codes or FASTA formatted text. By using an analysis parameter, users can manually select the type of base editor, either BE or ABE, and the base editing window in the target DNA (Fig. 1a).

Fig. 1

Overview of BE-Designer. a BE-Designer allows analysis of potential target sequences for base editors based on WT and variant forms of CRISPR-Cas9/-Cpf1 endonucleases, which recognize a variety of PAM sequences. The application supports 319 reference genomes from a variety of organisms including vertebrates, insects, plants, and bacteria. Furthermore, users can select base editing windows for different CRISPR base editors. b After a user clicks on the submit button, BE-Designer rapidly displays the results page showing all possible target sequences and associated useful information: target nucleotides, colored red in the base editing window, and their relative position and GC content. Possible off-target sequences from throughout the selected genome, which differ by up to 2 nucleotides from the on-target sequences, are supplied. In addition, BE-Designer offers a link to the corresponding Ensembl genome browser for each off-target site

Selection of sgRNAs

Within a given DNA sequence, BE-Designer finds all possible target sites based on input parameters; in the base editing window, target nucleotides are highlighted in red, and their relative position and GC content are indicated. BE-Designer then invokes Cas-OFFinder [29] to search throughout the entire genome of interest for possible off-target sequences that differ by up to 2 nucleotides from the on-target sequences (Additional file 1: Figure S1).

Result visualization

BE-Designer produces a result table that contains the target sequences with useful information [30] as shown in Fig. 1b. BE-Designer uses AJAX (Asynchronous JavaScript and Extensible Markup Language) to show results instantly; thus, users can filter the results according to GC content and mismatch numbers without refreshing the whole web page. In addition, if the Ensembl annotation is available for the given reference genome, BE-Designer offers a link to the corresponding Ensembl genome browser web page that displays the sequence information near any off-target loci.

BE-analyzer overview

Due to its high sensitivity and precision, targeted deep sequencing is the best method for assessing the results of base editing. BE-Analyzer accepts targeted deep-sequencing data and analyzes them to calculate base conversion ratios. In addition to the interactive table and graphs showing the results, BE-Analyzer also provides a full list of all query sequences aligned to a given wild-type (WT) sequence, so that users can confirm mutation patterns manually. BE-Analyzer wholly runs on a client-side web browser so that there is no need to upload very large NGS datasets (< 1 GB) to a server, reducing a time-consuming step in genome editing analysis. The BE-Analyzer interface was also developed using Django as a backend program. The core algorithm of BE-Analyzer was written in C++ and then trans-compiled to WebAssembly with Emscripten (http://kripken.github.io/emscripten-site/).

Input panels in BE-analyzer

NGS data are typically composed of a pair of Fastq files from paired-end sequencing, or a single Fastq file from single-read sequencing. BE-Analyzer allows both types; if the input is a pair of Fastq files, BE-Analyzer first merges them by the JavaScript port of fastq-join, a part of ea-utils (https://expressionanalysis.github.io/ea-utils//). As an option, users can additionally upload data from a CRISPR-untreated control to compare it with data from the treated sample (Fig. 2a). In this case, BE-Analyzer analyzes the two datasets simultaneously and compares them to exclude background mutations found in the control sample.

Fig. 2

BE-Analyzer input panels. a BE-Analyzer allows various types of NGS data files: single-end reads, paired-end reads, or merged sequencing data. Moreover, BE-Analyzer optionally accepts data from CRISPR-untreated control samples. b BE-Analyzer requires basic information: a full WT sequence for reference, the type of base editor, the desired base editing window, and the target DNA sequence. Additionally, analysis parameters for flanking windows on each side of the target and a minimum frequency are required To analyze query sequences in NGS data, BE-Analyzer requires basic information: a full WT sequence for reference, the type of base editor, the desired base editing window, and the target DNA sequence (Fig. 2b). Previous studies have reported the optimal target window for each base editor. For example, BE3 usually induces base conversion in a region ranging from 13 to 17 nucleotide (nt) upstream of the PAM, and TARGET-AID is most efficient within a region 15 to 19 nt upstream of the PAM. Basically, BE-Analyzer provides the optimal default values with reference to previous studies, but users can freely revise the value manually. On the other hand, it has been reported that base editors can introduce substitutions outside of the DNA target sequences at a low frequency [15]. Therefore, BE-Analyzer is implemented to allow additional flanking windows on each side of the target for analysis by the use of a relevant parameter.

Analysis of NGS data

From uploaded NGS data, BE-Analyzer first defines 15-nt indicator sequences on both sides of the given reference sequence; only identified queries that have both indicator sequences, with ≤1 nt mismatches, are collected. Then, BE-Analyzer counts the recurrent frequency of each sequence and sorts queries in descending order. In this procedure, sequences with frequencies below the minimum are discarded. Each sequence is aligned to the reference sequence with EMBOSS needle (https://www.ebi.ac.uk/Tools/psa/emboss_needle/) (Additional file 1: Figure S1). As a result, the aligned sequences are classified into four different groups based on the presence of a hyphen (−). If hyphens are found in the reference sequence or query, the query is classified as an insertion or deletion by a comparison of the number of hyphens in the two sequences. If hyphens (inserted or deleted sequences) are not found in a given target window including the additional flanking regions, the query is referred as a WT sequence [31]. Otherwise, the queries that contain a few mismatched nucleotides in the given target window are classified as substitutions (Additional file 1: Figure S2). Among the query sequences defined as substitutions, if there are desired base conversions, i.e. C to D (A, G, or T) for BE and A to G for ABE, in the given target window, BE-Analyzer further analyzes them to calculate the ultimate base editing efficiency and to display the base editing patterns in interactive tables and graphs. A table showing statistics, base editing efficiencies, information about expected amino acids, and the categorized align result tab are displayed using Bootstrap library. Bar graphs and heat maps of substitution patterns are visualized using Plotly.js (https://plot.ly/javascript/). The results are summarized as a table with 9 columns (Fig. 3a): (i) ‘Total Sequence’ indicates the number of all reads present in the Fastq file, (ii) ‘With both indicator sequences’ indicates the number of reads having both indicator sequences, (iii) ‘More than minimum frequency’ indicates the number of reads that remain after the reads that appear with less than the minimum frequency are removed, (iv, v, vi) ‘Wild type’, ‘Insertions’, and ‘Deletions’ indicate the number of reads in each category, (vii) the 7th column indicates the number of reads having at least one base substitution, (viii) the 8th column indicates the number of reads that have nucleotide conversions induced by CRISPR base editors in target windows, and (ix) the 9th column indicates the intended substitution rate (such as ‘C to T Substitution Rate’), obtained by dividing the number of reads that have intended conversions in the base editing window with the number of reads above the minimum frequency (3rd column).

Fig. 3

Overview of the BE-Analyzer results page. a The results are summarized in a table that includes the number of sequence reads with WT or different mutation patterns. Ultimately, the ratio of intended substitutions induced by CRISPR base editors is calculated. b For query sequences classified as substitutions, the substitution table shows the percentages of each of the 4 nucleotides at each position in the target window. For users’ convenience, expected amino acid sequences are provided. c Graphic plots show the substitution efficiencies (left) and the C to D transition patterns in the targeting region, with the ratio of types of nucleotide changes shown as C to T (red), C to G (black), and C to A (green) at each position (right). d All filtered sequences from the input data are aligned to the reference sequence. Users can confirm the mutated sequences manually For base editing, it is crucial to know how the mutation of one or a few nucleotides changes the amino acid sequence. To address this issue, BE-Analyzer provides the expected amino acid sequences for three different reading frames, so that users can select among three possible start positions (Fig. 3b). For each nucleotide, BE-Analyzer displays the nucleotide mutation rate in detail, highlighted with a color gradient. Although cytidine deaminases mainly introduce C to T transitions in the base editing window, C to A or G transitions may also occur in flanking regions with low probability. Thus, BE-Analyzer shows the substitution rate at each site in the flanking windows and the C to D transition pattern in the target windows (Fig. 3c). In the C to D substitution graph, each transition pattern is presented with its percentile rate, and the type of transition indicated by color (red-black-green). Optionally, if users previously uploaded data from a CRISPR-untreated control, BE-Analyzer displays the substitution rate at each of those sites in the negative direction. Furthermore, for users’ convenience, BE-Analyzer shows substitution patterns within the flanking windows with a heat map, which enables visualization of the dominant substitution patterns as well as background patterns. At the bottom of the results page, a list of categorized sequence reads aligned to the reference sequence is presented (Fig. 3d). Users can confirm all filtered sequences from the input data in this table and can also save the results by clicking the ‘Download Data’ button.

Conclusions

BE-Designer is an easy-to-use web tool for optimal selection of sgRNAs in a given target sequence. It identifies all possible target sequences in a given sequence and displays information about each target sequence, including predicted mutation patterns, mutation positions, and potential off-target sites. Users can easily select the optimal sgRNA sequence for current base editors. On the other hand, Benchling, Inc., a company developing biotech platforms, also provides a CRISPR-mediated base editor designing tool (https://benchling.com/). We carefully compare our BE-Designer with the Benchling’s designer as summarized in Table 1.

Table 1

Comparison between BE-Designer and a Benchling’s designing tool

	BE-Designer	Benchling’s
Login process	None	Obligation
BE (C to T)	Possible	Possible
ABE (A to G)	Possible	Not possible
Base editing window	Not limited	Flexible (13 ~ 20)
Provided organism types	319	164
Provided CRISPR variants	12 types	8 types
Predicted amino acids information	No	Yes
Guide RNA length	Flexible (15 ~ 25)	Limited (20)
Off-target information	List	List + Score
Visualization	Table	Sequence Map

Comparison between BE-Designer and a Benchling’s designing tool BE-Analyzer is another web tool for instant assessment of deep sequencing data obtained after treatment with base editors. BE-Analyzer instantly analyzes deep sequencing data at a client-side web browser and displays the results using interactive tables and graphs for users’ convenience. Useful information, including the ratio of intended conversions, transition patterns, and sequence alignments, is provided so that users can easily infer how frequently and where intended or unwanted substitutive mutations are generated. Figure S1. The internal programs used in this study for implementation of BE-Designer and BE-Analyzer. Figure S2. The workflow for classifying query sequences in BE-Analyzer. (DOCX 338 kb)

33 in total

Review 1. Design and analysis of CRISPR-Cas experiments.

Authors: Ruth E Hanna; John G Doench
Journal: Nat Biotechnol Date: 2020-04-13 Impact factor: 54.908

Review 2. CRISPR/Cas mediated base editing: a practical approach for genome editing in oil palm.

Authors: Rajesh Yarra; Hongxing Cao; Longfei Jin; Yang Mengdi; Lixia Zhou
Journal: 3 Biotech Date: 2020-06-15 Impact factor: 2.406

3. In situ CRISPR-Cas9 base editing for the development of genetically engineered mouse models of breast cancer.

Authors: Stefano Annunziato; Catrin Lutz; Linda Henneman; Jinhyuk Bhin; Kim Wong; Bjørn Siteur; Bas van Gerwen; Renske de Korte-Grimmerink; Maria Paz Zafra; Emma M Schatoff; Anne Paulien Drenth; Eline van der Burg; Timo Eijkman; Siddhartha Mukherjee; Katharina Boroviak; Lodewyk Fa Wessels; Marieke van de Ven; Ivo J Huijbers; David J Adams; Lukas E Dow; Jos Jonkers
Journal: EMBO J Date: 2020-01-13 Impact factor: 11.598

Review 4. Technologies and Computational Analysis Strategies for CRISPR Applications.

Authors: Kendell Clement; Jonathan Y Hsu; Matthew C Canver; J Keith Joung; Luca Pinello
Journal: Mol Cell Date: 2020-07-02 Impact factor: 17.970

5. Increasing the efficiency and targeting range of cytidine base editors through fusion of a single-stranded DNA-binding protein domain.

Authors: Xiaohui Zhang; Liang Chen; Biyun Zhu; Liren Wang; Caiyu Chen; Mengjia Hong; Yifan Huang; Huiying Li; Honghui Han; Bailian Cai; Weishi Yu; Shuming Yin; Lei Yang; Zuozhen Yang; Meizhen Liu; Ying Zhang; Zhiyong Mao; Yuxuan Wu; Mingyao Liu; Dali Li
Journal: Nat Cell Biol Date: 2020-05-11 Impact factor: 28.824

6. Amelioration of an Inherited Metabolic Liver Disease through Creation of a De Novo Start Codon by Cytidine Base Editing.

Authors: Lei Yang; Liren Wang; Yanan Huo; Xi Chen; Shuming Yin; Yaqiang Hu; Xiaohui Zhang; Rui Zheng; Hongquan Geng; Honghui Han; Xueyun Ma; Meizhen Liu; Haibo Li; Weishi Yu; Mingyao Liu; Jun Wang; Dali Li
Journal: Mol Ther Date: 2020-05-07 Impact factor: 11.454