Literature DB >> 30445568

PAVOOC: designing CRISPR sgRNAs using 3D protein structures and functional domain annotations.

Moritz Schaefer1, Djork-Arné Clevert2, Bertram Weiss2, Andreas Steffen2.   

Abstract

SUMMARY: Single-guide RNAs (sgRNAs) targeting the same gene can significantly vary in terms of efficacy and specificity. PAVOOC (Prediction And Visualization of On- and Off-targets for CRISPR) is a web-based CRISPR sgRNA design tool that employs state of the art machine learning models to prioritize most effective candidate sgRNAs. In contrast to other tools, it maps sgRNAs to functional domains and protein structures and visualizes cut sites on corresponding protein crystal structures. Furthermore, PAVOOC supports homology-directed repair template generation for genome editing experiments and the visualization of the mutated amino acids in 3D.
AVAILABILITY AND IMPLEMENTATION: PAVOOC is available under https://pavooc.me and accessible using modern browsers (Chrome/Chromium recommended). The source code is hosted at github.com/moritzschaefer/pavooc under the MIT License. The backend, including data processing steps, and the frontend are implemented in Python 3 and ReactJS, respectively. All components run in a simple Docker environment. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2018. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30445568      PMCID: PMC6596878          DOI: 10.1093/bioinformatics/bty935

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

The discovery of the CRISPR/Cas system (Cong ; Jinek ) was a breakthrough in the area of genome editing. An important application of CRISPR/Cas is to induce a targeted knockout (KO) of a gene of interest. Such KO experiments can help to study the essentiality of the targeted genes in given cellular contexts (e.g. a cancer cell line bearing certain genomic alterations) and ultimately support the validation of a new drug target (Moore, 2015). Shi showed, that the effect of a CRISPR based KO can be boosted by targeting functionally relevant regions of a protein, as in these regions in-frame mutations (indels) are more likely to induce a significant effect than in non-functional regions. Another application of CRISPR/Cas is to precisely introduce missense mutations into a genome and study the resulting effects of the perturbations. In both applications single-guide RNAs (sgRNAs) are used to direct the Cas9 enzyme towards the genomic region of interest, such that the Cas9 can cut the DNA at the targeted position. For the genome editing experiments in addition a template sequence needs to be provided that contains the desired nucleotide sequence. A number of tools have been published that facilitate and automate the design of sgRNAs for CRISPR KO experiments (Hough ; Listgarten ; Meier ; Stemmer ). In this application note, we present PAVOOC (Prediction And Visualization of On- and Off-targets for CRISPR)—a modern web application to support wet lab biologists in designing and selecting optimal sgRNAs and template sequences for KO and genome editing experiments using machine learning-based on- and off-target scoring, multi-attribute ranking, protein structure mapping of the cut sites and integration of cancer cell line data.

2 Materials and methods

PAVOOC is a web application that allows to design and visualize sgRNAs for gene KO and genome editing experiments. For KO experiments, a set of genes has to be provided (in form of symbols or Ensembl identifiers). PAVOOC then generates a table that contains a user-defined number of sgRNAs for each of these genes. These sgRNAs are prioritized based on the scoring function in Equation (1), which combines weighted on- and an off-target scores as well as whether the targeted regions lies within a protein domain. The on-target score is calculated using the Azimuth model, whereas the cutting frequency determination score is used to assess off-target effects (Doench ). It is possible to further analyze and modify the sgRNA selection for a gene in a detail view (see Supplementary Fig. S1). The detail view consists of three synchronized panels: The ‘LineUp’ ranking table on the upper right, the protein structure view on the upper left and the sequence view on the bottom panel of the page. The LineUp (Gratzl )-based sgRNA ranking table allows an individual adjustment of the weights for the on- and off-target scores in order to prioritize the sgRNAs accordingly. For each sgRNA, the LineUp table displays whether the targeted genomic region lies within a protein domain and whether the optionally selected cancer cell line contains a single nucleotide variation at that position. The sequence view on the bottom is based on Biodalliance (Down ) and shows the gene annotation, all targeted regions of the sgRNAs, protein domains and cancer cell line alteration data in order to support the tailored sgRNA design for a cell line under study. On the left side, available 3D protein structures from RCSB (Berman ) are shown and sgRNA-related cut sites are mapped and highlighted on the structure using the NGL viewer (Rose and Hildebrand, 2015). In this way, the user can assess the position of the Cas9 cut position on the protein structure and thus prioritize sgRNAs that are more likely to affect functionally relevant regions of a gene. Furthermore, when designing genome editing experiments, the structure view enables amino acid editing and displaying the designed alterations directly on the protein structure. We integrated genomic sequence data from UCSC in version hg19 (Consortium ). The genomic annotations, including genes, transcripts and exons were taken from the GENCODE project (Harrow ). Cancer cell line alteration data was taken from the Cancer Cell Line Encyclopedia (Barretina ) (based on hg19). In order to facilitate the mapping between genomic and protein coordinates we used the canonical transcript from APPRIS (Rodriguez ) only. Exons which are not contained in that transcript are not considered in our application. SIFTS (Velankar ) mappings are used to derive genomic coordinates of PDB structures. A structured overview of our pipeline is shown in Supplementary Figure S2. The data shown in the application is all pre-processed offline and stored in a non-relational database. Guide search and off-target scoring is performed using FlashFry (McKenna and Shendure, (2017).

3 Discussion

Our new tool PAVOOC provides a convenient means to design optimal sgRNAs for KO and genome editing experiments. A machine learning-based scoring system guides the user to select sgRNAs with possibly strong on- and low off-target effects. Through the integration of structural data, PAVOOC is able to display cut sites on corresponding protein crystal structures such that sgRNAs can be selected which cut in functionally relevant regions. Integration of cancer cell line data ensures that existing genomic alterations are considered during sgRNA selection. The tool was used internally to design a domain-targeting genome-wide sgRNA library. PAVOOC is hosted on GitHub and is an actively maintained project. As such, it provides an open platform to build and integrate use cases of CRISPR that are not part of the current state. The PEP8 compliant Python code and the react.js-based frontend simplify the entry for developers. The application runs in a Docker environment which makes it easy to host the application on premise. Click here for additional data file.
  19 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Initial sequencing and analysis of the human genome.

Authors:  E S Lander; L M Linton; B Birren; C Nusbaum; M C Zody; J Baldwin; K Devon; K Dewar; M Doyle; W FitzHugh; R Funke; D Gage; K Harris; A Heaford; J Howland; L Kann; J Lehoczky; R LeVine; P McEwan; K McKernan; J Meldrim; J P Mesirov; C Miranda; W Morris; J Naylor; C Raymond; M Rosetti; R Santos; A Sheridan; C Sougnez; Y Stange-Thomann; N Stojanovic; A Subramanian; D Wyman; J Rogers; J Sulston; R Ainscough; S Beck; D Bentley; J Burton; C Clee; N Carter; A Coulson; R Deadman; P Deloukas; A Dunham; I Dunham; R Durbin; L French; D Grafham; S Gregory; T Hubbard; S Humphray; A Hunt; M Jones; C Lloyd; A McMurray; L Matthews; S Mercer; S Milne; J C Mullikin; A Mungall; R Plumb; M Ross; R Shownkeen; S Sims; R H Waterston; R K Wilson; L W Hillier; J D McPherson; M A Marra; E R Mardis; L A Fulton; A T Chinwalla; K H Pepin; W R Gish; S L Chissoe; M C Wendl; K D Delehaunty; T L Miner; A Delehaunty; J B Kramer; L L Cook; R S Fulton; D L Johnson; P J Minx; S W Clifton; T Hawkins; E Branscomb; P Predki; P Richardson; S Wenning; T Slezak; N Doggett; J F Cheng; A Olsen; S Lucas; C Elkin; E Uberbacher; M Frazier; R A Gibbs; D M Muzny; S E Scherer; J B Bouck; E J Sodergren; K C Worley; C M Rives; J H Gorrell; M L Metzker; S L Naylor; R S Kucherlapati; D L Nelson; G M Weinstock; Y Sakaki; A Fujiyama; M Hattori; T Yada; A Toyoda; T Itoh; C Kawagoe; H Watanabe; Y Totoki; T Taylor; J Weissenbach; R Heilig; W Saurin; F Artiguenave; P Brottier; T Bruls; E Pelletier; C Robert; P Wincker; D R Smith; L Doucette-Stamm; M Rubenfield; K Weinstock; H M Lee; J Dubois; A Rosenthal; M Platzer; G Nyakatura; S Taudien; A Rump; H Yang; J Yu; J Wang; G Huang; J Gu; L Hood; L Rowen; A Madan; S Qin; R W Davis; N A Federspiel; A P Abola; M J Proctor; R M Myers; J Schmutz; M Dickson; J Grimwood; D R Cox; M V Olson; R Kaul; C Raymond; N Shimizu; K Kawasaki; S Minoshima; G A Evans; M Athanasiou; R Schultz; B A Roe; F Chen; H Pan; J Ramser; H Lehrach; R Reinhardt; W R McCombie; M de la Bastide; N Dedhia; H Blöcker; K Hornischer; G Nordsiek; R Agarwala; L Aravind; J A Bailey; A Bateman; S Batzoglou; E Birney; P Bork; D G Brown; C B Burge; L Cerutti; H C Chen; D Church; M Clamp; R R Copley; T Doerks; S R Eddy; E E Eichler; T S Furey; J Galagan; J G Gilbert; C Harmon; Y Hayashizaki; D Haussler; H Hermjakob; K Hokamp; W Jang; L S Johnson; T A Jones; S Kasif; A Kaspryzk; S Kennedy; W J Kent; P Kitts; E V Koonin; I Korf; D Kulp; D Lancet; T M Lowe; A McLysaght; T Mikkelsen; J V Moran; N Mulder; V J Pollara; C P Ponting; G Schuler; J Schultz; G Slater; A F Smit; E Stupka; J Szustakowki; D Thierry-Mieg; J Thierry-Mieg; L Wagner; J Wallis; R Wheeler; A Williams; Y I Wolf; K H Wolfe; S P Yang; R F Yeh; F Collins; M S Guyer; J Peterson; A Felsenfeld; K A Wetterstrand; A Patrinos; M J Morgan; P de Jong; J J Catanese; K Osoegawa; H Shizuya; S Choi; Y J Chen; J Szustakowki
Journal:  Nature       Date:  2001-02-15       Impact factor: 49.962

3.  A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.

Authors:  Martin Jinek; Krzysztof Chylinski; Ines Fonfara; Michael Hauer; Jennifer A Doudna; Emmanuelle Charpentier
Journal:  Science       Date:  2012-06-28       Impact factor: 47.728

4.  Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs.

Authors:  Jennifer Listgarten; Michael Weinstein; Benjamin P Kleinstiver; Alexander A Sousa; J Keith Joung; Jake Crawford; Kevin Gao; Luong Hoang; Melih Elibol; John G Doench; Nicolo Fusi
Journal:  Nat Biomed Eng       Date:  2018-01-10       Impact factor: 25.671

5.  Dalliance: interactive genome viewing on the web.

Authors:  Thomas A Down; Matias Piipari; Tim J P Hubbard
Journal:  Bioinformatics       Date:  2011-01-19       Impact factor: 6.937

6.  The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.

Authors:  Jordi Barretina; Giordano Caponigro; Nicolas Stransky; Kavitha Venkatesan; Adam A Margolin; Sungjoon Kim; Christopher J Wilson; Joseph Lehár; Gregory V Kryukov; Dmitriy Sonkin; Anupama Reddy; Manway Liu; Lauren Murray; Michael F Berger; John E Monahan; Paula Morais; Jodi Meltzer; Adam Korejwa; Judit Jané-Valbuena; Felipa A Mapa; Joseph Thibault; Eva Bric-Furlong; Pichai Raman; Aaron Shipway; Ingo H Engels; Jill Cheng; Guoying K Yu; Jianjun Yu; Peter Aspesi; Melanie de Silva; Kalpana Jagtap; Michael D Jones; Li Wang; Charles Hatton; Emanuele Palescandolo; Supriya Gupta; Scott Mahan; Carrie Sougnez; Robert C Onofrio; Ted Liefeld; Laura MacConaill; Wendy Winckler; Michael Reich; Nanxin Li; Jill P Mesirov; Stacey B Gabriel; Gad Getz; Kristin Ardlie; Vivien Chan; Vic E Myer; Barbara L Weber; Jeff Porter; Markus Warmuth; Peter Finan; Jennifer L Harris; Matthew Meyerson; Todd R Golub; Michael P Morrissey; William R Sellers; Robert Schlegel; Levi A Garraway
Journal:  Nature       Date:  2012-03-28       Impact factor: 49.962

7.  Discovery of cancer drug targets by CRISPR-Cas9 screening of protein domains.

Authors:  Junwei Shi; Eric Wang; Joseph P Milazzo; Zihua Wang; Justin B Kinney; Christopher R Vakoc
Journal:  Nat Biotechnol       Date:  2015-05-11       Impact factor: 54.908

8.  Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9.

Authors:  John G Doench; Nicolo Fusi; Meagan Sullender; Mudra Hegde; Emma W Vaimberg; Jennifer Listgarten; Katherine F Donovan; Ian Smith; Zuzana Tothova; Craig Wilen; Robert Orchard; Herbert W Virgin; David E Root
Journal:  Nat Biotechnol       Date:  2016-01-18       Impact factor: 54.908

9.  NGL Viewer: a web application for molecular visualization.

Authors:  Alexander S Rose; Peter W Hildebrand
Journal:  Nucleic Acids Res       Date:  2015-04-29       Impact factor: 16.971

10.  FlashFry: a fast and flexible tool for large-scale CRISPR target design.

Authors:  Aaron McKenna; Jay Shendure
Journal:  BMC Biol       Date:  2018-07-05       Impact factor: 7.431

View more
  2 in total

1.  Maximizing CRISPR/Cas9 phenotype penetrance applying predictive modeling of editing outcomes in Xenopus and zebrafish embryos.

Authors:  Thomas Naert; Dieter Tulkens; Nicole A Edwards; Marjolein Carron; Nikko-Ideen Shaidani; Marcin Wlizla; Annekatrien Boel; Suzan Demuynck; Marko E Horb; Paul Coucke; Andy Willaert; Aaron M Zorn; Kris Vleminckx
Journal:  Sci Rep       Date:  2020-09-04       Impact factor: 4.379

2.  Genome-scale CRISPR screens identify host factors that promote human coronavirus infection.

Authors:  Marco Grodzki; Andrew P Bluhm; Moritz Schaefer; Abderrahmane Tagmount; Max Russo; Amin Sobh; Roya Rafiee; Chris D Vulpe; Stephanie M Karst; Michael H Norris
Journal:  Genome Med       Date:  2022-01-27       Impact factor: 11.117

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.