Literature DB >> 29989077

CRISPRMatch: An Automatic Calculation and Visualization Tool for High-throughput CRISPR Genome-editing Data Analysis.

Qi You^1,2, Zhaohui Zhong³, Qiurong Ren³, Fakhrul Hassan³, Yong Zhang³, Tao Zhang^1,2.

Abstract

Custom-designed nucleases, including CRISPR-Cas9 and CRISPR-Cpf1, are widely used to realize the precise genome editing. The high-coverage, low-cost and quantifiability make high-throughput sequencing (NGS) to be an effective method to assess the efficiency of custom-designed nucleases. However, contrast to standardized transcriptome protocol, the NGS data lacks a user-friendly pipeline connecting different tools that can automatically calculate mutation, evaluate editing efficiency and realize in a more comprehensive dataset that can be visualized. Here, we have developed an automatic stand-alone toolkit based on python script, namely CRISPRMatch, to process the high-throughput genome-editing data of CRISPR nuclease transformed protoplasts by integrating analysis steps like mapping reads and normalizing reads count, calculating mutation frequency (deletion and insertion), evaluating efficiency and accuracy of genome-editing, and visualizing the results (tables and figures). Both of CRISPR-Cas9 and CRISPR-Cpf1 nucleases are supported by CRISPRMatch toolkit and the integrated code has been released on GitHub (https://github.com/zhangtaolab/CRISPRMatch).

Entities: Chemical Disease Species

Keywords: CRISPR; NGS data; automatic pipeline; genome-editing efficiency; mutation calculation

Mesh：

Year: 2018 PMID： 29989077 PMCID： PMC6036748 DOI： 10.7150/ijbs.24581

Source DB: PubMed Journal: Int J Biol Sci ISSN： 1449-2288 Impact factor: 6.580

Introduction

Custom-designed nucleases can execute the targeted gene knock-out by creating mutations (insertion, deletion and replacement) at the double-strand DNA breaks (DSBs) site 1, 2. Due to the efficient modification of target DNA, the nuclease tools such as Zinc finger proteins (ZNFs), transcription activator-like effector nucleases (TALENs) and clustered regularly interspaced short palindromic repeats (CRISPR) reagents were the popular genome-editing approaches in various organisms 3-6. Compared with ZNFs and TALENs, CRISPR-associated protein (Cas) system (CRISPR-Cas) possesses better specificity in target locus by using RNA guided nucleases and owns advantages comprising target design optimization, super-efficiency and multiplexed gene editing at one time. As a revolutionary genome-editing tool 5, 8, CRISPR-Cas has been widely used in the genetic manipulation, such as disease treatment and crop breading. At present, CRISPR-Cas9 system with a canonical G-rich form 5'-NGG-3' of protospacer adjacent motif (PAM) and CRISPR-Cpf1 system with a T-rich PAM at the 5'-site of the protospacer are the major genome editing toolbox 9-11. Due to the cost-effective, high-coverage and precise-quantification advantages, high-throughput sequencing method has been used for screening genome-editing results caused by CRISPR nuclease 12, 13, which leads to a dramatic increase in accumulation of edited genomic DNA high-throughput screens 14. In response, several bioinformatics tools became available to the next generation sequencing (NGS) CRISPR screening or analysis. Notable among these are GenomeCRISPR 15, CRISPRcloud 16 and Cpf1-Database 17 which are web-platforms for visualization and analysis of the pooled screening data, and CRISPR-DAV 18, CRISPR-GA 19, CRISPResso 20, BATCH-GE 21 and Cas-analyzer 22 are as stand-alone softwares for CRISPR genome editing experiments analysis. However, limitations are existed among these tools. For example, web-platforms lack analysis of batch samples; NGS CRISPR-Cas9 and CRISPR-Cpf1 data cannot be processed and compared at one time; Efficiency of mutations and accuracy evaluation of genome-editing experiments are not summarized and well-visualized. In addition, NGS mutagenesis acquired by transformed protoplasts has become a fast method to evaluate genome-editing efficiency and accuracy. In previous works, Tang et al 13, 23 developed efficient CRISPR genome-editing systems in which constructs were transformed into plant protoplasts and mutagenesis were assessed by NGS technology conveniently. That strategy enables us to quickly evaluate the targeted mutation efficiency of DNA endonucleases. However, contrast to standardized transcriptome protocol 24, 25, the high-throughput edited genome sequencing data is difficult to calculate mutations and display results without an appropriate pipeline connecting different tools that can automatically calculate mutations and result in a more organized dataset which can be properly visualized. Here, we have developed an automatic stand-alone toolkit using python script, namely CRISPRMatch, a new platform integrating analysis steps like NGS reads mapping, reads count normalization, mutation frequency (deletion and insertion) calculation, genome-editing efficiency statistics at each position of target region, and results multiform expression. After installing a couple of independent packages, CRISPRMatch can analyze a series of CRISPR-Cas9 or CRISPR-Cpf1 NGS samples and compare the efficiency and accuracy of genome-editing endonucleases at one time. These packages including BWA (Burrows-Wheeler Aligner) 26, SAMtools 27 and Picard are applied for mapping reads and mutation calling). Pysam 27 is assisted in reads classification, indels detection and mutation calculation. Matplotlibs 28 is used for improved views of reads alignment mutation details at each position, and genome-editing efficiency statistics among target regions. We then validate CRISPRMatch's ability to perform both CRISPR-Cas9 and CRISPR-Cpf1 NGS samples by inputting some certain simple files. As expected, it automatically output a set of charts, figures and tables, on which we evaluate the benefits of using CRISPRMatch to result expression, detailed mutation visualization, genome-editing efficiency evaluation and checking.

Results

CRISPRMatch Implementation

CRISPRMatch was designed to realize the strategy used in our previous Cpf1 work 13 by integrating analysis steps like mapping reads, calculating mutation frequency (deletion and insertion), evaluating accuracy and efficiency of genome-editing systems, and outputting (tables and figures) visualization. The NGS mutagenesis were accessed from transformed rice protoplasts, which enabled obtaining genome-editing data and testing mutation generation efficiency of different CRISPR editing systems quickly. Both of CRISPR-Cas9 and CRISPR-Cpf1 nucleases are supported by CRISPRMatch toolkit. The integrated pipeline, including data processing, analyzing and outputting are executed automatically (Fig 1). First, paired-end sequencing reads were preferred and must to be joined by FLASH (http://ccb.jhu.edu/software/FLASH/) to become single long reads. The joined reads were mapped to target editing region by applying BWA software with default parameters. Alignment files were sorted and made index by Picard and SAMtools. Second, genome-editing system types and target regions for mutation calculation were confirmed. Based on characteristics of each sequence-specific CRISPR system, we manually defined two cleavage regions of the endonucleases. For Cas9, the region (5'-3') covered 10 base pair upstream, guide RNA (gRNA) and 'NGG' PAM, protospacer adjacent motif (PAM), and 10 base pair downstream. For Cpf1, the region (5'-3') covered 'TTTN' PAM, CRISPR RNA (crRNA) and 30 base pair downstream. Third, different types of mutation, including deletion and insertion were detected by Pysam package in each mapped read. Here, a rule was set to define the mutation type of a read: the read was classified into deletion group when deletion type existed; when only insertion existed or insertion and replacement existed together, the read was classified into insertion group when insertion type existed; the read belonged to insertion and deletion group when both mutation types existed. Thus, the reads, which coming from genome-editing region, were divided into three type mutations for frequency calculation, including reads with deletion only, reads with insertion only, reads with both deletion and insertion (Fig. 2A). Considering the depth of NGS, mapped read count of each sample was normalized to one million and the mutation frequency stand for the number of reads with mutation per million. Last, summaries of mutation frequency and details of genome-editing efficiency in each position were plotted by matplotlib package.

Figure 1

Pipeline of CRISPRMatch Several steps are involved, including building genome (target genes) index, deep sequencing reads mapping (Paired-end sequencing reads were joined to single long reads by FLASH), mutation detecting (three types: deletion, insertion, deletion and insertion), efficiency calculation and results export. Here, softwares like BWA, Picard and SAMtools were used for reads mapping. Two cleavage regions of the endonucleases were defined: For Cas9, the region (5'-3') covered 10 base pair upstream, guide RNA (gRNA) and 'NGG' PAM, protospacer adjacent motif (PAM), and 10 base pair downstream; For Cpf1, the region (5'-3') covered 'TTTN' PAM, CRISPR RNA (crRNA) and 30 base pair downstream.

Figure 2

Description of output files (A) The bar charts have summarized the all kinds of mutation between genome-editing data (Treatment) and control data (Control). The purple, orange, blue and green color bars stand for frequency of all mutations, reads with deletion only, reads with insertion only, reads with both deletion and insertion. (B) Summary of deletion among samples in a group. The bar charts display frequency of deletion between a genome-editing sample (LbCpf1-OsDEP1-crRNA01) and the control sample (LbCpf1-OsDEP1-crRNA01 Control). X-axis covers the genome-editing region and labels highlighted with red color are components of the PAM. (C) Deletion frequency of each sequencing sample (sample name” GeneY-crRNA-treatment”). X-axis covers the genome-editing region and labels highlighted with red color are components of the PAM. (D) Alignments result of partial reads (sample name” GeneY-crRNA-treatment”). The colorful matrix (top) distinguishes nuclear acid bases and deletions (marked by “-”) by different colors. Each line stands for the components of a read and the number on left is the count of all reads with the same arrangement. The fasta format (bottom) is another output style of alignment. The numbers stand for counts of the reads below.

Evaluation and checking of genome-editing efficiency

For analysis of results, reads alignment were presented by three formats for clear visualization. First one was to output all reads as fasta format, second one was to make the alignment result of each sample as a matrix and the last one was to plot the matrix with various colors which distinguished mutations directly (Fig. 2D-E). Users can watch a variety of genome-editing results in the target region by different approaches. For example, the fasta format file contained all kinds of genome-editing cases which showed nucleotide composition of each edited read (Fig. 2D), and the colorful matrix summarized mutations (Fig. 2E). Apart from showing alignment result, CRISPRMatch plot the deletion rate at each nucleotide base of each sample (Fig. 2C). Compared to the reference sequence, the positions of deleted bases were counted and frequencies were calculated by Pysam and stand-alone python script. As a result, the frequency plot of deletion (Fig. 2C) was consistent with reads alignment results (Fig. 2D-E). To evaluate the efficiency of genome-editing experiments, the genome-editing samples were compared with the control sample. Considering repetition of genome-editing samples, mean values of the treatments at each position were calculated. For clear visualization, deletion frequency comparison between samples with genome-editing experiment and control samples were displayed together (Fig. 2B). Significantly, differential deletion frequency between treatments and control was obvious and the effective editing region was consistent with previous results 10.

Discussion and conclusion

CRISPRMatch realized an automatic pipeline for analyzing a list of high-throughput CRISPR genome-editing data. Compared to those existing CRISPR analysis tools, CRISPRMactch realized to process batch samples of CRISPR-Cas9 and CRISPR-Cpf1 at one time, detect efficiency of mutations and evaluate accuracy of genome-editing experiments, compare efficiencies of different genome-editing systems, summarize the results by figures and tables. In addition, CRISPRMatch was mainly developed for genome-editing data of CRISPR nuclease transformed protoplasts, which could evaluate the targeted mutation efficiency of DNA endonucleases and regions of guide RNAs quickly. Installation and usage of CRISPRMatch are user-friendly. In our solution, target region sequence, sample information and sequencing data are the only required files in this pipeline. Here, sample information must include sample name, location of PAM and gRNA/crRNA on the target region, relationships between repetition samples and control samples. All output files were text formats or pdf, which were suitable for checking and watching details of genome editing efficiency (examples on GitHub). Based on CRISPRMatch, mutation types including deletion and insertion are mainly detected. But, due to the complexity, replacement frequency just be calculated and provided as a reference value for nuclease efficiency assessing. In conclusion, CRISPRMatch software realized the effective analysis of NGS genome-editing data from transformed protoplasts by CRISPR system and output mutation results automatically. NGS data of mutagenesis from organisms or individuals are supported as well. The integrated code was released on GitHub (https://github.com/zhangtaolab/CRISPRMatch). The simplicity and applicability of usage made this software suitable for genome engineering field. We hope CRISPRMatch will make a contribution to standard analysis pipeline construction of high-throughput genome-editing data.

27 in total

1. Genome editing assessment using CRISPR Genome Analyzer (CRISPR-GA).

Authors: Marc Güell; Luhan Yang; George M Church
Journal: Bioinformatics Date: 2014-07-01 Impact factor: 6.937

2. A Single Transcript CRISPR-Cas9 System for Efficient Genome Editing in Plants.

Authors: Xu Tang; Xuelian Zheng; Yiping Qi; Dengwei Zhang; Yan Cheng; Aiting Tang; Daniel F Voytas; Yong Zhang
Journal: Mol Plant Date: 2016-05-19 Impact factor: 13.164

Review 3. High-throughput functional genomics using CRISPR-Cas9.

Authors: Ophir Shalem; Neville E Sanjana; Feng Zhang
Journal: Nat Rev Genet Date: 2015-04-09 Impact factor: 53.242

4. Cpf1-Database: web-based genome-wide guide RNA library design for gene knockout screens using CRISPR-Cpf1.

Authors: Jeongbin Park; Sangsu Bae
Journal: Bioinformatics Date: 2018-03-15 Impact factor: 6.937

5. The Sequence Alignment/Map format and SAMtools.

Authors: Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

6. Transcription activator-like effector nucleases enable efficient plant genome engineering.

Authors: Yong Zhang; Feng Zhang; Xiaohong Li; Joshua A Baller; Yiping Qi; Colby G Starker; Adam J Bogdanove; Daniel F Voytas
Journal: Plant Physiol Date: 2012-11-02 Impact factor: 8.340

7. Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting.

Authors: Tomas Cermak; Erin L Doyle; Michelle Christian; Li Wang; Yong Zhang; Clarice Schmidt; Joshua A Baller; Nikunj V Somia; Adam J Bogdanove; Daniel F Voytas
Journal: Nucleic Acids Res Date: 2011-04-14 Impact factor: 16.971

8. RAMPred: identifying the N(1)-methyladenosine sites in eukaryotic transcriptomes.

Authors: Wei Chen; Pengmian Feng; Hua Tang; Hui Ding; Hao Lin
Journal: Sci Rep Date: 2016-08-11 Impact factor: 4.379

9. GenomeCRISPR - a database for high-throughput CRISPR/Cas9 screens.

Authors: Benedikt Rauscher; Florian Heigwer; Marco Breinig; Jan Winter; Michael Boutros
Journal: Nucleic Acids Res Date: 2016-10-26 Impact factor: 16.971

10. CRISPR-Cas9 Based Genome Editing Reveals New Insights into MicroRNA Function and Regulation in Rice.

Authors: Jianping Zhou; Kejun Deng; Yan Cheng; Zhaohui Zhong; Li Tian; Xu Tang; Aiting Tang; Xuelian Zheng; Tao Zhang; Yiping Qi; Yong Zhang
Journal: Front Plant Sci Date: 2017-09-13 Impact factor: 5.753

16 in total

1. PAM-less plant genome editing using a CRISPR-SpRY toolbox.

Authors: Qiurong Ren; Simon Sretenovic; Shishi Liu; Xu Tang; Lan Huang; Yao He; Li Liu; Yachong Guo; Zhaohui Zhong; Guanqing Liu; Yanhao Cheng; Xuelian Zheng; Changtian Pan; Desuo Yin; Yingxiao Zhang; Wanfeng Li; Liwang Qi; Chenghao Li; Yiping Qi; Yong Zhang
Journal: Nat Plants Date: 2021-01-04 Impact factor: 15.793

2. Highly efficient CRISPR systems for loss-of-function and gain-of-function research in pear calli.

Authors: Meiling Ming; Hongjun Long; Zhicheng Ye; Changtian Pan; Jiali Chen; Rong Tian; Congrui Sun; Yongsong Xue; Yingxiao Zhang; Jiaming Li; Yiping Qi; Jun Wu
Journal: Hortic Res Date: 2022-06-30 Impact factor: 7.291

3. Boosting plant genome editing with a versatile CRISPR-Combo system.

Authors: Changtian Pan; Gen Li; Aimee A Malzahn; Yanhao Cheng; Benjamin Leyson; Simon Sretenovic; Filiz Gurel; Gary D Coleman; Yiping Qi
Journal: Nat Plants Date: 2022-05-20 Impact factor: 17.352

4. SWPepNovo: An Efficient De Novo Peptide Sequencing Tool for Large-scale MS/MS Spectra Analysis.

Authors: Chuang Li; Kenli Li; Keqin Li; Xianghui Xie; Feng Lin
Journal: Int J Biol Sci Date: 2019-07-03 Impact factor: 6.580

5. Special issue on Computational Resources and Methods in Biological Sciences.

Authors: Hao Lin; Shaoliang Peng; Jian Huang
Journal: Int J Biol Sci Date: 2018-07-01 Impact factor: 6.580

6. Bidirectional Promoter-Based CRISPR-Cas9 Systems for Plant Genome Editing.

Authors: Qiurong Ren; Zhaohui Zhong; Yan Wang; Qi You; Qian Li; Mingzhu Yuan; Yao He; Caiyan Qi; Xu Tang; Xuelian Zheng; Tao Zhang; Yiping Qi; Yong Zhang
Journal: Front Plant Sci Date: 2019-09-20 Impact factor: 5.753

7. Intron-Based Single Transcript Unit CRISPR Systems for Plant Genome Editing.

Authors: Zhaohui Zhong; Shishi Liu; Xiaopei Liu; Binglin Liu; Xu Tang; Qiurong Ren; Jianping Zhou; Xuelian Zheng; Yiping Qi; Yong Zhang
Journal: Rice (N Y) Date: 2020-02-03 Impact factor: 4.783

8. Application of CRISPR-Cas12a temperature sensitivity for improved genome editing in rice, maize, and Arabidopsis.

Authors: Aimee A Malzahn; Xu Tang; Keunsub Lee; Qiurong Ren; Simon Sretenovic; Yingxiao Zhang; Hongqiao Chen; Minjeong Kang; Yu Bao; Xuelian Zheng; Kejun Deng; Tao Zhang; Valeria Salcedo; Kan Wang; Yong Zhang; Yiping Qi
Journal: BMC Biol Date: 2019-01-31 Impact factor: 7.431

9. A qPCR method for genome editing efficiency determination and single-cell clone screening in human cells.

Authors: Bo Li; Naixia Ren; Lele Yang; Junhao Liu; Qilai Huang
Journal: Sci Rep Date: 2019-12-11 Impact factor: 4.379

Review 10. Computational approaches for effective CRISPR guide RNA design and evaluation.

Authors: Guanqing Liu; Yong Zhang; Tao Zhang
Journal: Comput Struct Biotechnol J Date: 2019-11-29 Impact factor: 7.271