SUMMARY: Single-guide RNAs (sgRNAs) targeting the same gene can significantly vary in terms of efficacy and specificity. PAVOOC (Prediction And Visualization of On- and Off-targets for CRISPR) is a web-based CRISPR sgRNA design tool that employs state of the art machine learning models to prioritize most effective candidate sgRNAs. In contrast to other tools, it maps sgRNAs to functional domains and protein structures and visualizes cut sites on corresponding protein crystal structures. Furthermore, PAVOOC supports homology-directed repair template generation for genome editing experiments and the visualization of the mutated amino acids in 3D. AVAILABILITY AND IMPLEMENTATION: PAVOOC is available under https://pavooc.me and accessible using modern browsers (Chrome/Chromium recommended). The source code is hosted at github.com/moritzschaefer/pavooc under the MIT License. The backend, including data processing steps, and the frontend are implemented in Python 3 and ReactJS, respectively. All components run in a simple Docker environment. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
SUMMARY: Single-guide RNAs (sgRNAs) targeting the same gene can significantly vary in terms of efficacy and specificity. PAVOOC (Prediction And Visualization of On- and Off-targets for CRISPR) is a web-based CRISPR sgRNA design tool that employs state of the art machine learning models to prioritize most effective candidate sgRNAs. In contrast to other tools, it maps sgRNAs to functional domains and protein structures and visualizes cut sites on corresponding protein crystal structures. Furthermore, PAVOOC supports homology-directed repair template generation for genome editing experiments and the visualization of the mutated amino acids in 3D. AVAILABILITY AND IMPLEMENTATION: PAVOOC is available under https://pavooc.me and accessible using modern browsers (Chrome/Chromium recommended). The source code is hosted at github.com/moritzschaefer/pavooc under the MIT License. The backend, including data processing steps, and the frontend are implemented in Python 3 and ReactJS, respectively. All components run in a simple Docker environment. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
The discovery of the CRISPR/Cas system (Cong ; Jinek ) was a breakthrough in the area of genome editing. An important application of CRISPR/Cas is to induce a targeted knockout (KO) of a gene of interest. Such KO experiments can help to study the essentiality of the targeted genes in given cellular contexts (e.g. a cancer cell line bearing certain genomic alterations) and ultimately support the validation of a new drug target (Moore, 2015). Shi showed, that the effect of a CRISPR based KO can be boosted by targeting functionally relevant regions of a protein, as in these regions in-frame mutations (indels) are more likely to induce a significant effect than in non-functional regions. Another application of CRISPR/Cas is to precisely introduce missense mutations into a genome and study the resulting effects of the perturbations. In both applications single-guide RNAs (sgRNAs) are used to direct the Cas9 enzyme towards the genomic region of interest, such that the Cas9 can cut the DNA at the targeted position. For the genome editing experiments in addition a template sequence needs to be provided that contains the desired nucleotide sequence.A number of tools have been published that facilitate and automate the design of sgRNAs for CRISPR KO experiments (Hough ; Listgarten ; Meier ; Stemmer ). In this application note, we present PAVOOC (Prediction And Visualization of On- and Off-targets for CRISPR)—a modern web application to support wet lab biologists in designing and selecting optimal sgRNAs and template sequences for KO and genome editing experiments using machine learning-based on- and off-target scoring, multi-attribute ranking, protein structure mapping of the cut sites and integration of cancer cell line data.
2 Materials and methods
PAVOOC is a web application that allows to design and visualize sgRNAs for gene KO and genome editing experiments. For KO experiments, a set of genes has to be provided (in form of symbols or Ensembl identifiers). PAVOOC then generates a table that contains a user-defined number of sgRNAs for each of these genes. These sgRNAs are prioritized based on the scoring function in Equation (1), which combines weighted on- and an off-target scores as well as whether the targeted regions lies within a protein domain. The on-target score is calculated using the Azimuth model, whereas the cutting frequency determination score is used to assess off-target effects (Doench ).It is possible to further analyze and modify the sgRNA selection for a gene in a detail view (see Supplementary Fig. S1). The detail view consists of three synchronized panels: The ‘LineUp’ ranking table on the upper right, the protein structure view on the upper left and the sequence view on the bottom panel of the page. The LineUp (Gratzl )-based sgRNA ranking table allows an individual adjustment of the weights for the on- and off-target scores in order to prioritize the sgRNAs accordingly. For each sgRNA, the LineUp table displays whether the targeted genomic region lies within a protein domain and whether the optionally selected cancer cell line contains a single nucleotide variation at that position. The sequence view on the bottom is based on Biodalliance (Down ) and shows the gene annotation, all targeted regions of the sgRNAs, protein domains and cancer cell line alteration data in order to support the tailored sgRNA design for a cell line under study. On the left side, available 3D protein structures from RCSB (Berman ) are shown and sgRNA-related cut sites are mapped and highlighted on the structure using the NGL viewer (Rose and Hildebrand, 2015). In this way, the user can assess the position of the Cas9 cut position on the protein structure and thus prioritize sgRNAs that are more likely to affect functionally relevant regions of a gene. Furthermore, when designing genome editing experiments, the structure view enables amino acid editing and displaying the designed alterations directly on the protein structure.We integrated genomic sequence data from UCSC in version hg19 (Consortium ). The genomic annotations, including genes, transcripts and exons were taken from the GENCODE project (Harrow ). Cancer cell line alteration data was taken from the Cancer Cell Line Encyclopedia (Barretina ) (based on hg19). In order to facilitate the mapping between genomic and protein coordinates we used the canonical transcript from APPRIS (Rodriguez ) only. Exons which are not contained in that transcript are not considered in our application. SIFTS (Velankar ) mappings are used to derive genomic coordinates of PDB structures. A structured overview of our pipeline is shown in Supplementary Figure S2.The data shown in the application is all pre-processed offline and stored in a non-relational database. Guide search and off-target scoring is performed using FlashFry (McKenna and Shendure, (2017).
3 Discussion
Our new tool PAVOOC provides a convenient means to design optimal sgRNAs for KO and genome editing experiments. A machine learning-based scoring system guides the user to select sgRNAs with possibly strong on- and low off-target effects. Through the integration of structural data, PAVOOC is able to display cut sites on corresponding protein crystal structures such that sgRNAs can be selected which cut in functionally relevant regions. Integration of cancer cell line data ensures that existing genomic alterations are considered during sgRNA selection. The tool was used internally to design a domain-targeting genome-wide sgRNA library.PAVOOC is hosted on GitHub and is an actively maintained project. As such, it provides an open platform to build and integrate use cases of CRISPR that are not part of the current state. The PEP8 compliant Python code and the react.js-based frontend simplify the entry for developers. The application runs in a Docker environment which makes it easy to host the application on premise.Click here for additional data file.
Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971
Authors: E S Lander; L M Linton; B Birren; C Nusbaum; M C Zody; J Baldwin; K Devon; K Dewar; M Doyle; W FitzHugh; R Funke; D Gage; K Harris; A Heaford; J Howland; L Kann; J Lehoczky; R LeVine; P McEwan; K McKernan; J Meldrim; J P Mesirov; C Miranda; W Morris; J Naylor; C Raymond; M Rosetti; R Santos; A Sheridan; C Sougnez; Y Stange-Thomann; N Stojanovic; A Subramanian; D Wyman; J Rogers; J Sulston; R Ainscough; S Beck; D Bentley; J Burton; C Clee; N Carter; A Coulson; R Deadman; P Deloukas; A Dunham; I Dunham; R Durbin; L French; D Grafham; S Gregory; T Hubbard; S Humphray; A Hunt; M Jones; C Lloyd; A McMurray; L Matthews; S Mercer; S Milne; J C Mullikin; A Mungall; R Plumb; M Ross; R Shownkeen; S Sims; R H Waterston; R K Wilson; L W Hillier; J D McPherson; M A Marra; E R Mardis; L A Fulton; A T Chinwalla; K H Pepin; W R Gish; S L Chissoe; M C Wendl; K D Delehaunty; T L Miner; A Delehaunty; J B Kramer; L L Cook; R S Fulton; D L Johnson; P J Minx; S W Clifton; T Hawkins; E Branscomb; P Predki; P Richardson; S Wenning; T Slezak; N Doggett; J F Cheng; A Olsen; S Lucas; C Elkin; E Uberbacher; M Frazier; R A Gibbs; D M Muzny; S E Scherer; J B Bouck; E J Sodergren; K C Worley; C M Rives; J H Gorrell; M L Metzker; S L Naylor; R S Kucherlapati; D L Nelson; G M Weinstock; Y Sakaki; A Fujiyama; M Hattori; T Yada; A Toyoda; T Itoh; C Kawagoe; H Watanabe; Y Totoki; T Taylor; J Weissenbach; R Heilig; W Saurin; F Artiguenave; P Brottier; T Bruls; E Pelletier; C Robert; P Wincker; D R Smith; L Doucette-Stamm; M Rubenfield; K Weinstock; H M Lee; J Dubois; A Rosenthal; M Platzer; G Nyakatura; S Taudien; A Rump; H Yang; J Yu; J Wang; G Huang; J Gu; L Hood; L Rowen; A Madan; S Qin; R W Davis; N A Federspiel; A P Abola; M J Proctor; R M Myers; J Schmutz; M Dickson; J Grimwood; D R Cox; M V Olson; R Kaul; C Raymond; N Shimizu; K Kawasaki; S Minoshima; G A Evans; M Athanasiou; R Schultz; B A Roe; F Chen; H Pan; J Ramser; H Lehrach; R Reinhardt; W R McCombie; M de la Bastide; N Dedhia; H Blöcker; K Hornischer; G Nordsiek; R Agarwala; L Aravind; J A Bailey; A Bateman; S Batzoglou; E Birney; P Bork; D G Brown; C B Burge; L Cerutti; H C Chen; D Church; M Clamp; R R Copley; T Doerks; S R Eddy; E E Eichler; T S Furey; J Galagan; J G Gilbert; C Harmon; Y Hayashizaki; D Haussler; H Hermjakob; K Hokamp; W Jang; L S Johnson; T A Jones; S Kasif; A Kaspryzk; S Kennedy; W J Kent; P Kitts; E V Koonin; I Korf; D Kulp; D Lancet; T M Lowe; A McLysaght; T Mikkelsen; J V Moran; N Mulder; V J Pollara; C P Ponting; G Schuler; J Schultz; G Slater; A F Smit; E Stupka; J Szustakowki; D Thierry-Mieg; J Thierry-Mieg; L Wagner; J Wallis; R Wheeler; A Williams; Y I Wolf; K H Wolfe; S P Yang; R F Yeh; F Collins; M S Guyer; J Peterson; A Felsenfeld; K A Wetterstrand; A Patrinos; M J Morgan; P de Jong; J J Catanese; K Osoegawa; H Shizuya; S Choi; Y J Chen; J Szustakowki Journal: Nature Date: 2001-02-15 Impact factor: 49.962
Authors: Martin Jinek; Krzysztof Chylinski; Ines Fonfara; Michael Hauer; Jennifer A Doudna; Emmanuelle Charpentier Journal: Science Date: 2012-06-28 Impact factor: 47.728
Authors: Jennifer Listgarten; Michael Weinstein; Benjamin P Kleinstiver; Alexander A Sousa; J Keith Joung; Jake Crawford; Kevin Gao; Luong Hoang; Melih Elibol; John G Doench; Nicolo Fusi Journal: Nat Biomed Eng Date: 2018-01-10 Impact factor: 25.671
Authors: Jordi Barretina; Giordano Caponigro; Nicolas Stransky; Kavitha Venkatesan; Adam A Margolin; Sungjoon Kim; Christopher J Wilson; Joseph Lehár; Gregory V Kryukov; Dmitriy Sonkin; Anupama Reddy; Manway Liu; Lauren Murray; Michael F Berger; John E Monahan; Paula Morais; Jodi Meltzer; Adam Korejwa; Judit Jané-Valbuena; Felipa A Mapa; Joseph Thibault; Eva Bric-Furlong; Pichai Raman; Aaron Shipway; Ingo H Engels; Jill Cheng; Guoying K Yu; Jianjun Yu; Peter Aspesi; Melanie de Silva; Kalpana Jagtap; Michael D Jones; Li Wang; Charles Hatton; Emanuele Palescandolo; Supriya Gupta; Scott Mahan; Carrie Sougnez; Robert C Onofrio; Ted Liefeld; Laura MacConaill; Wendy Winckler; Michael Reich; Nanxin Li; Jill P Mesirov; Stacey B Gabriel; Gad Getz; Kristin Ardlie; Vivien Chan; Vic E Myer; Barbara L Weber; Jeff Porter; Markus Warmuth; Peter Finan; Jennifer L Harris; Matthew Meyerson; Todd R Golub; Michael P Morrissey; William R Sellers; Robert Schlegel; Levi A Garraway Journal: Nature Date: 2012-03-28 Impact factor: 49.962
Authors: Junwei Shi; Eric Wang; Joseph P Milazzo; Zihua Wang; Justin B Kinney; Christopher R Vakoc Journal: Nat Biotechnol Date: 2015-05-11 Impact factor: 54.908
Authors: John G Doench; Nicolo Fusi; Meagan Sullender; Mudra Hegde; Emma W Vaimberg; Jennifer Listgarten; Katherine F Donovan; Ian Smith; Zuzana Tothova; Craig Wilen; Robert Orchard; Herbert W Virgin; David E Root Journal: Nat Biotechnol Date: 2016-01-18 Impact factor: 54.908
Authors: Thomas Naert; Dieter Tulkens; Nicole A Edwards; Marjolein Carron; Nikko-Ideen Shaidani; Marcin Wlizla; Annekatrien Boel; Suzan Demuynck; Marko E Horb; Paul Coucke; Andy Willaert; Aaron M Zorn; Kris Vleminckx Journal: Sci Rep Date: 2020-09-04 Impact factor: 4.379
Authors: Marco Grodzki; Andrew P Bluhm; Moritz Schaefer; Abderrahmane Tagmount; Max Russo; Amin Sobh; Roya Rafiee; Chris D Vulpe; Stephanie M Karst; Michael H Norris Journal: Genome Med Date: 2022-01-27 Impact factor: 11.117