| Literature DB >> 35521548 |
Oded Shor1,2,3, Roy Rabinowitz3,4, Daniel Offen3,4, Felix Benninger1,2,3.
Abstract
The CRISPR-Cas system has transformed the field of gene-editing and created opportunities for novel genome engineering therapeutics. The field has significantly progressed, and recently, CRISPR-Cas9 was utilized in clinical trials to target disease-causing mutations. Existing tools aim to predict the on-target efficacy and potential genome-wide off-targets by scoring a particular gRNA according to an array of gRNA design principles or machine learning algorithms based on empirical results of large numbers of gRNAs. However, such tools are unable to predict the editing outcome by variant Cas enzymes and can only assess potential off-targets related to reference genomes. Here, we employ normal mode analysis (NMA) to investigate the structure of the Cas9 protein complexed with its gRNA and target DNA and explore the function of the protein. Our results demonstrate the feasibility and validity of NMA to predict the activity and specificity of SpyCas9 in the presence of mismatches by comparison to empirical data. Furthermore, despite the absence of their exact structures, this method accurately predicts the enzymatic activity of known high-fidelity engineered Cas9 variants.Entities:
Keywords: CRISPR; CRISPR activity; CRISPR computational modelling; CRISPR specificity; In silico activity simulation; Normal mode analysis; Structure function
Year: 2022 PMID: 35521548 PMCID: PMC9062324 DOI: 10.1016/j.csbj.2022.04.026
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1General scheme – NMA predicts the activity and specificity in a sequence-dependent manner. NMA yields entropy scores that correlate with empirical SpyCas9 activity data. Modifications were made to all parts of the structure: protein (high-fidelity variants mutations), DNA (four different EMX1 sites) and sgRNA (mismatches assay) while retaining high correlations. PDB: 5F9R [12].
Fig. 2SpyCas9 empirical activity and structure-based entropy. a) Heatmap representations of previously reported empirical SpyCas9 activity (specificity measured as the ratio of mismatch/perfect match), the entropy of the DNA and the SpyCas9 protein () in the presence of single-base mismatches in four loci within the EMX1 gene. The color scale bar orientation is determined by the direction of the correlation (positive/negative). b) Correlations between the empirical activity (x) and the of the DNA (y). c) Correlations between the empirical activity (x) and the of the protein (y). d) Correlations between the of the DNA (x) and the of the protein (y). All correlation plots are shown with a 95% confidence interval and p-value <0.00005 (N = 57). The correlation values represent the Pearson correlation coefficient (r).
Fig. 3The correlation between the empirical activity in the presence of mismatches and the entropy of each amino acid in the structure of SpyCas9 for each mismatch. a) Absolute values of the Pearson correlation coefficient r, measured in all amino acids along with the structure of SpyCas9 in the presence of mismatches in four genomic loci. The measured entropy relates to the α-carbon of each amino acid. The dashed line represents a threshold of r = 0.55. Regions containing residues with r greater than the threshold in more than one site are marked in light blue. The 2D representation of the protein domains shows the regions in which the entropy of the amino acids best correlate with the empirical activity data. Scale range 0 < r < 0.8. b) The structure of SpyCas9 highlighting the residues with r > 0.55 (mesh). Colors indicate the number of sites (1–3) in which the r value for this residue crossed the threshold (left). The right panel is a 3D representation of the protein domains. The target strand DNA (TS-DNA), non-target strand (NTS-DNA) and the sgRNA are represented as simplified lines, while the protein is visualized as a cartoon. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 4NMA predicts and replicates specificity and activity of eight SpyCas9 variants with improved specificity. a) Entropy profile heatmaps of SpyCas9 variants in the presence of gRNA mismatches at the EMX1 – site3 locus ( measured at the DNA molecule (chain C – TS-DNA). b) Average activity and specificity scores as previously reported and determined by the TTISS method. c) Correlation between the activity score of each variant and its corresponding average entropy score (. The correlation plot is shown with a 95% confidence interval and p-value = 0.024123 (N = 9). d) The Pearson correlation coefficient (r) of each position within the gRNA, representing the feasibility of each position to predict the activity outcome (average per variant) using the entropy score (average per position per variant). #=0.05 < p-value < 0.06, *=p-value < 0.05, **=p-value < 0.005.