| Literature DB >> 28506212 |
Lihua Julie Zhu1,2,3, Michael Lawrence4, Ankit Gupta5, Hervé Pagès6, Alper Kucukural7, Manuel Garber8,7, Scot A Wolfe5,9.
Abstract
BACKGROUND: Genome editing technologies developed around the CRISPR-Cas9 nuclease system have facilitated the investigation of a broad range of biological questions. These nucleases also hold tremendous promise for treating a variety of genetic disorders. In the context of their therapeutic application, it is important to identify the spectrum of genomic sequences that are cleaved by a candidate nuclease when programmed with a particular guide RNA, as well as the cleavage efficiency of these sites. Powerful new experimental approaches, such as GUIDE-seq, facilitate the sensitive, unbiased genome-wide detection of nuclease cleavage sites within the genome. Flexible bioinformatics analysis tools for processing GUIDE-seq data are needed.Entities:
Keywords: Bioconductor; CRISPR; GUIDE-seq; Genome editing; Off-targets analysis
Mesh:
Substances:
Year: 2017 PMID: 28506212 PMCID: PMC5433024 DOI: 10.1186/s12864-017-3746-y
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Overview of GUIDEseq Analysis Workflow. Schematic representation of the GUIDEseq analysis pipeline. Input files required for preprocessing and GUIDEseq package are represented by annotated color arrows. First, Preprocessing Utilities are supplied to demultiplex the Illumina FASTQ files based on the index information and map the sequence files to the reference genome. This generates the experimental input files (BAM and UMI files) needed for the GUIDEseq pipeline, which are supplemented with information on the guide RNA (gRNA) and PAM element by the end-user. Key steps carried out by the algorithms within the GUIDEseq pipeline are indicated under the different headers. Details about the R-based commands and variables used within GUIDEseq are presented in the Use Cases within the main text, and are described in full in the Installation and Usage Section [see Additional file 1] and in the manual pages associated with the program
Fig. 2Schematic of the GUIDE-seq library features used for unique read identification. Schematic overview of the two sequencing libraries that are generated using the GUIDE-seq method [19]. Each library (forward and reverse) has a different GUIDE-seq oligo tag fragment (red or blue) that is a part of the resulting read 2 sequences. Paired-end reads from different libraries are aggregated based on the p5 and p7 indices. Unique reads within each library are defined based on three identifiers: the unique molecular index (UMI) in the p5 index read, the p5 adaptor genomic ligation site, and the GUIDE-seq dsODN integration site. Redundant reads are discarded. For the purposes of peak calling, unique paired-end reads are condensed into single-base genomic ranges that define the position of the GUIDE-seq dsODN integration site and the genomic reference sequence strand associated with read 2
Fig. 3Unique read aggregation into peaks for the identification of potential nuclease cleavage sites. Strand-specific unique reads defined by the GUIDE-seq dsODN integration site and the read 2 genomic reference sequence strand are aggregated over a user-defined window size (20 base default) to define strand-specific peaks. Windows with a read number greater or equal to a user-defined threshold (default = 5) are called peaks. In addition, the signal to noise ratio (SNratio) and a p-value are computed based on the local background window size (defaults 5 kb and Poisson distribution), which can also be employed as filters if desired. For each integration site, the Crick peak should precede the corresponding Watson peak based on the library construction method [19]. Consequently, this order is required to combine counts from the Watson and Crick peaks over a user-defined window size (40 base default). This aggregate “score” is used to rank peaks. The genomic region surrounding each peak (adjustable variables, default 20 bases on each side) is used to search for sequences with homology to the nuclease sequence preference (based on the input guide sequence (gRNA.file) and the PAM sequence (PAM), and the allowed mismatches within each element defined by the parameters: max.mismatch, PAM.pattern and allowed.mismatch.PAM. The GUIDE-seq data shown were generated in house for SpCas9 programmed with a sgRNA to recognize VEGFA site 2 (TS2; protospacer underlined, PAM in red) [11], where the most common dsODN integration site falls at the expected cleavage site within this sequence (green line, hg19)
Fig. 4Venn Diagram generated using combineOfftargets to depict the overlaps of off-target sites between three different nuclease variants. Example of the output from the combineOfftargets function (Example 6) comparing the overlap in GUIDE-seq identified off-target sites for wild-type Cas9, Split-Cas9 (dual NLS) [51], and the highly specific SpCas9MT3-ZFP [25] programmed with a sgRNA recognizing VEGFA site 2 (TS2) [11]