| Literature DB >> 29989077 |
Qi You1,2, Zhaohui Zhong3, Qiurong Ren3, Fakhrul Hassan3, Yong Zhang3, Tao Zhang1,2.
Abstract
Custom-designed nucleases, including CRISPR-Cas9 and CRISPR-Cpf1, are widely used to realize the precise genome editing. The high-coverage, low-cost and quantifiability make high-throughput sequencing (NGS) to be an effective method to assess the efficiency of custom-designed nucleases. However, contrast to standardized transcriptome protocol, the NGS data lacks a user-friendly pipeline connecting different tools that can automatically calculate mutation, evaluate editing efficiency and realize in a more comprehensive dataset that can be visualized. Here, we have developed an automatic stand-alone toolkit based on python script, namely CRISPRMatch, to process the high-throughput genome-editing data of CRISPR nuclease transformed protoplasts by integrating analysis steps like mapping reads and normalizing reads count, calculating mutation frequency (deletion and insertion), evaluating efficiency and accuracy of genome-editing, and visualizing the results (tables and figures). Both of CRISPR-Cas9 and CRISPR-Cpf1 nucleases are supported by CRISPRMatch toolkit and the integrated code has been released on GitHub (https://github.com/zhangtaolab/CRISPRMatch).Entities:
Keywords: CRISPR; NGS data; automatic pipeline; genome-editing efficiency; mutation calculation
Mesh:
Year: 2018 PMID: 29989077 PMCID: PMC6036748 DOI: 10.7150/ijbs.24581
Source DB: PubMed Journal: Int J Biol Sci ISSN: 1449-2288 Impact factor: 6.580
Figure 1Pipeline of CRISPRMatch Several steps are involved, including building genome (target genes) index, deep sequencing reads mapping (Paired-end sequencing reads were joined to single long reads by FLASH), mutation detecting (three types: deletion, insertion, deletion and insertion), efficiency calculation and results export. Here, softwares like BWA, Picard and SAMtools were used for reads mapping. Two cleavage regions of the endonucleases were defined: For Cas9, the region (5'-3') covered 10 base pair upstream, guide RNA (gRNA) and 'NGG' PAM, protospacer adjacent motif (PAM), and 10 base pair downstream; For Cpf1, the region (5'-3') covered 'TTTN' PAM, CRISPR RNA (crRNA) and 30 base pair downstream.
Figure 2Description of output files (A) The bar charts have summarized the all kinds of mutation between genome-editing data (Treatment) and control data (Control). The purple, orange, blue and green color bars stand for frequency of all mutations, reads with deletion only, reads with insertion only, reads with both deletion and insertion. (B) Summary of deletion among samples in a group. The bar charts display frequency of deletion between a genome-editing sample (LbCpf1-OsDEP1-crRNA01) and the control sample (LbCpf1-OsDEP1-crRNA01 Control). X-axis covers the genome-editing region and labels highlighted with red color are components of the PAM. (C) Deletion frequency of each sequencing sample (sample name” GeneY-crRNA-treatment”). X-axis covers the genome-editing region and labels highlighted with red color are components of the PAM. (D) Alignments result of partial reads (sample name” GeneY-crRNA-treatment”). The colorful matrix (top) distinguishes nuclear acid bases and deletions (marked by “-”) by different colors. Each line stands for the components of a read and the number on left is the count of all reads with the same arrangement. The fasta format (bottom) is another output style of alignment. The numbers stand for counts of the reads below.