Literature DB >> 27301063

Deciphering relationship between microhomology and in-frame mutation occurrence in human CRISPR-based gene knockout.

Guohui Chuai¹, Fayu Yang², Jifang Yan¹, Yanan Chen¹, Qin Ma³, Chi Zhou¹, Chenyu Zhu¹, Feng Gu², Qi Liu¹.

Abstract

Entities: Chemical

Year: 2016 PMID： 27301063 PMCID： PMC5022128 DOI： 10.1038/mtna.2016.35

Source DB: PubMed Journal: Mol Ther Nucleic Acids ISSN： 2162-2531 Impact factor: 10.183

× No keyword cloud information.

To the Editor: CRISPR-based gene editing is widely implemented in various cell types and has great potential for molecular therapy.[1] The CRISPR-Cas9 system creates sequence-specific double-strand DNA breaks that are repaired by a dominant error-prone nonhomologous end-joining (NHEJ) pathway, often resulting in gene inactivation by generating frameshift alleles.[1,2,3,4,5,6,7] CRISPR-based gene knockout (KO) often produces in-frame variants that retain functionality, however, which reduces KO efficiency. Recently, Sangsu Bae et al pioneered studies to use microhomology prediction to improve CRISPR-based KO efficiency in cell lines, by in-silico selection of target sites to reduce in-frame mutations.[2] They presented that the preference of the in-frame mutations at a given target site can be predicted by the microhomology profile, and an alternative NHEJ pathway, i.e., the microhomology-mediated end joining (MMEJ) occurs.[2,3] A score was defined to predict the microhomology-based out-of-frame mutation preferences.[2,4] Their work to achieve CRISPR-based KO by reducing in-frame mutations is creative, since in-frame mutations retain protein functionality therefore reducing KO efficiency. Nevertheless, further works are still needed to systematically investigate the relationship between sequence microhomology and in-frame mutations as well as other factors that may influence the occurrence of in-frame mutations in CRIPSR-based KO, taking advantages of the analysis of the posterior high-throughput next-generation sequencing data in CRIPSR-based KO experiment. To address this issue, a comprehensive analysis of 68-sgRNA Hela cell line deep sequencing data[2] by our pipeline (RISPR KO nalysis based on enomic diting data) (, Supplementary Materials) deeply investigated the relationship between microhomology profile and in-frame mutation occurrence, and presented new clues for the efficient CRISPR-based sgRNA design in terms of reducing in-frame mutations.

Microhomology Profile May not be Considered as a General sgRNA Design Feature

Although several previous studies reported the involvement of MMEJ in the mutations introduced by CRISPR-Cas or TALEN,[2,8] there still exist controversial statistics on the occurrence of MMEJ-mediated indels, which probably lie in the vague and unclear definitions between NHEJ and MMEJ as well as the existed cell type heterogeneity tested. In our study, we strictly followed the review article presented by Mitch McVey et al.,[3] which indicated that MMEJ and NHEJ can be distinguished by the length of microhomologous sequence. Microhomologous sequence between 5–25 bp suggests the triggering of MMEJ, while microhomology whose length is under 5 bp actually trigger NHEJ.[3] Based on this definition, our sequence-level analyzing pipeline shows that only one occurrence of MMEJ (microhomology over 5 bp) was identified among all the single deletion reads (1/134,008), indicating that the MMEJ pathway is rarely used compared to NHEJ-based DNA repair, at least in Hela cell. (Supplementary Table S1, Supplementary Table S3a, ). This is not surprising as MMEJ serves as a complementary pathway only when NHEJ is unavailable.[3] We then tested whether microhomology is a crucial factor for the frameshifting paradigm occurring in the Hela dataset, by performing a contingency table analysis (Supplementary Table S2) to compare the enrichment ratio of in-frame mutations between those occurring with and without microhomology (Supplementary Materials). We found no statistically significant correlation between the frameshifting paradigm and microhomology for all the 68 sgRNAs (,). We further investigated the microhomology profile in mouse mESC[9] and zebrafish cells[10] based on the posterior analysis of the next-generation sequencing data using CAGE. We found that in these two cell types, MMEJ also rarely occurred and the contingency table based statistical analysis indicated that for most sgRNAs, the correlations between the frameshifting paradigm and microhomology are not statistically significant (Supplementary Materials). Besides our study, the recently work by John Doench et al. reported that “Microhomology features, suggested to improve sgRNA activity, were predictive on their own but did not improve performance when added to our final model”. Their study, from the view of feature selection to tune a sgRNA activity prediction model, also indicated that microhomology feature may be redundant for the final prediction performance.[11,12]

Further Experimental Validation Using EGFP Reporter System Detecting no Microhomology-Related NHEJ

In order to testify our hypothesis in other cell type, CRISPR-based gene knockout experiment was performed upon our enhanced green fluorescent protein (EGFP) reporter system as previously described in HEK293 cell.[13] Because CRISPR/Cas9-mediated gene knockout is generally based on functional NHEJs, here, we analysis CRISPR/Cas9-mediated EGFP KO to obtain functional NHEJs (EGFP-negative cells), which is more straightforward for the study of functional microhomology-related NHEJ but not total NHEJ. Specifically, we designed three sgRNAs (Supplementary Table S3b) to specifically target EGFP DNA sequence (Supplementary Materials). Next, EGFP gene was inactivated by transfection of the corresponding CRISPR/Cas9 plasmids. The EGFP-negative cells were obtained by fluorescence-activated cell sorting. The whole coding sequence for GFP was amplified, cloned into the cloning vector and individual clones were sequenced by Sanger sequencing. Lastly, we checked the functional indel pattern of the sequencing results and identified that they exist no microhomology pattern in the related NHEJ-mediated indels (Supplementary Table S3b). It should be noted that the analysis of NHEJ pattern in EGFP-negative cells is focused on functional NHEJ but not the total NHEJ, which indicated that MMEJ is rare in this test.

Frequency of Out-of-Frame Deletions/Indels is not a Proper Indicator for SGRNA Efficiency Estimation

To estimate sgRNA efficiency, Sangsu Bae et al. defined the out-of-frame score, which correlated well with the frequency of out-of-frame indels in their study. Frequency was calculated by the ratio of out-of-frame reads among all the deletions/indels per sgRNA, but we consider it more appropriate to use the ratio of the out-of-frame shifting reads among all sequencing reads per sgRNA, to quantitatively represent the sgRNA efficiency. Notably, occurrence of the indel is the prerequisite for CRISPR-based KO efficiency with respect to the frameshifting paradigm. Careful analysis of the sequencing data indicated that although many sgRNAs have a high frequency of out-of-frame deletions among indels, they actually generate a very low number of indels at first, resulting in low KO efficiency. We calculated the Pearson coefficient of the out-of-frame scores with the frequencies of the out-of-frame shift among all the number of sequence reads for one TALEN and two REGN datasets,[2] and the correlations were significantly lower than those in the previous report ().

A Learning-Based Model to Predict the Out-of-Frame Mutation Occurrence Rate in sgRNA Design

We first collected a comprehensive set of genomic features for sgRNAs and modeled their effects on the frequency of out-of-frame shifting reads among all sequencing reads (defined as the “OTF ratio”, Supplementary Materials). These features were coded in a dummying coding way (Supplementary Table S4) and the genomic feature representation of the 68-sgRNA samples in HeLa cell line is presented in Supplementary Table S5. These features were incorporated into a LASSO model and crucial features were selected (Supplementary Table S6, Supplementary Materials). Our prediction model was fivefold cross-validated on the 68 sgRNAs, achieving a mean correlation of 0.87 (P value < 0.01) in the out-of-frame mutation occurrence rate prediction with the selected determining genomics features (). We then generated a group of epigenetic features (Supplementary Table S8) describing the 68 sgRNAs (Supplementary Table S7) and modeled their prediction ability, although these epigenetic features seemed to have less predictive power, probably due to the lack of samples to be tested. We further tested our epigenetic model on three relatively larger sgRNA efficiency datasets[14] with improved prediction abilities (Supplementary Table S9). Recent work also indicates that both sequence composition and locus accessibility are important in determining sgRNA KO efficiency.[15] Table S1. sgRNA-Indel table of the 68-sgRNA HeLa cell line dataset. Table S2. A contingency table analysis to investigate the correlation between frameshifting paradigm and microhomology. Table S3. Microhomology analysis for the 68-sgRNA HeLa cell line dataset and EGFP dataset. Table S4. Dummy coding scheme for genomics feature. Table S5. The genomic feature representation of the 68-sgRNA HeLa cell line dataset. Table S6. The selected genomic factors that may influence the OTF ratio by LASSO model of the 68-sgRNA HeLa cell line dataset. Table S7. Epigenetic feature representations of 4 datasets. Table S8. Epigenetic feature description of 3 cell lines for 4 datasets. Table S9. The prediction performance of three sgRNA efficiency datasets presented by X. Methods Materials

Author contributions

G.H.C., F.Y.Y., J.F.Y., Y.N.C., Z.C., C.Y.Z., Q.L., and F.G. performed the whole data analysis and pipeline construction. Q.M. analyzed the prediction model and helped polish the manuscript. F.G. compared NHEJ with MMEJ based on the sequencing data. Q.L. and G.H.C. conceived the study and wrote the manuscript.

14 in total

1. Knockout rats generated by embryo microinjection of TALENs.

Authors: Laurent Tesson; Claire Usal; Séverine Ménoret; Elo Leung; Brett J Niles; Séverine Remy; Yolanda Santiago; Anna I Vincent; Xiangdong Meng; Lei Zhang; Philip D Gregory; Ignacio Anegon; Gregory J Cost
Journal: Nat Biotechnol Date: 2011-08-05 Impact factor: 54.908

Review 2. Targeted genome editing in primate embryos.

Authors: Xiangyu Guo; Xiao-Jiang Li
Journal: Cell Res Date: 2015-06-02 Impact factor: 25.617

3. Rapid Reverse Genetic Screening Using CRISPR in Zebrafish.

Authors: Arish N Shah; Crystal F Davey; Alex C Whitebirch; Adam C Miller; Cecilia B Moens
Journal: Zebrafish Date: 2015-07-08 Impact factor: 1.985

4. Microhomology-based choice of Cas9 nuclease target sites.

Authors: Sangsu Bae; Jiyeon Kweon; Heon Seok Kim; Jin-Soo Kim
Journal: Nat Methods Date: 2014-07 Impact factor: 28.547

Review 5. A guide to genome engineering with programmable nucleases.

Authors: Hyongbum Kim; Jin-Soo Kim
Journal: Nat Rev Genet Date: 2014-04-02 Impact factor: 53.242

6. Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library.

Authors: Hiroko Koike-Yusa; Yilong Li; E-Pien Tan; Martin Del Castillo Velasco-Herrera; Kosuke Yusa
Journal: Nat Biotechnol Date: 2013-12-23 Impact factor: 54.908

7. Sequence determinants of improved CRISPR sgRNA design.

Authors: Han Xu; Tengfei Xiao; Chen-Hao Chen; Wei Li; Clifford A Meyer; Qiu Wu; Di Wu; Le Cong; Feng Zhang; Jun S Liu; Myles Brown; X Shirley Liu
Journal: Genome Res Date: 2015-06-10 Impact factor: 9.043

8. Comparison of non-canonical PAMs for CRISPR/Cas9-mediated DNA cleavage in human cells.

Authors: Yilan Zhang; Xianglian Ge; Fayu Yang; Liping Zhang; Jiayong Zheng; Xuefang Tan; Zi-Bing Jin; Jia Qu; Feng Gu
Journal: Sci Rep Date: 2014-06-23 Impact factor: 4.379

9. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9.

Authors: John G Doench; Nicolo Fusi; Meagan Sullender; Mudra Hegde; Emma W Vaimberg; Jennifer Listgarten; Katherine F Donovan; Ian Smith; Zuzana Tothova; Craig Wilen; Robert Orchard; Herbert W Virgin; David E Root
Journal: Nat Biotechnol Date: 2016-01-18 Impact factor: 54.908

10. Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach.

Authors: Raj Chari; Prashant Mali; Mark Moosburner; George M Church
Journal: Nat Methods Date: 2015-07-13 Impact factor: 28.547

4 in total

Review 1. Targeting Epithelial Mesenchymal Plasticity in Pancreatic Cancer: A Compendium of Preclinical Discovery in a Heterogeneous Disease.

Authors: James H Monkman; Erik W Thompson; Shivashankar H Nagaraj
Journal: Cancers (Basel) Date: 2019-11-07 Impact factor: 6.639

2. enAsCas12a Enables CRISPR-Directed Evolution to Screen for Functional Drug Resistance Mutations in Sequences Inaccessible to SpCas9.

Authors: Jasper Edgar Neggers; Maarten Jacquemyn; Tim Dierckx; Benjamin Peter Kleinstiver; Hendrik Jan Thibaut; Dirk Daelemans
Journal: Mol Ther Date: 2020-09-20 Impact factor: 11.454

3. CRISPR/Cas9 targeting events cause complex deletions and insertions at 17 sites in the mouse genome.

Authors: Ha Youn Shin; Chaochen Wang; Hye Kyung Lee; Kyung Hyun Yoo; Xianke Zeng; Tyler Kuhns; Chul Min Yang; Teresa Mohr; Chengyu Liu; Lothar Hennighausen
Journal: Nat Commun Date: 2017-05-31 Impact factor: 14.919

4. High doses of CRISPR/Cas9 ribonucleoprotein efficiently induce gene knockout with low mosaicism in the hydrozoan Clytia hemisphaerica through microhomology-mediated deletion.

Authors: Tsuyoshi Momose; Anne De Cian; Kogiku Shiba; Kazuo Inaba; Carine Giovannangeli; Jean-Paul Concordet
Journal: Sci Rep Date: 2018-08-06 Impact factor: 4.379

4 in total