| Literature DB >> 32789179 |
Zhuokun Li1, Xiaojue Wang1,2, Dongyang Xu1, Dengwei Zhang1,2, Dan Wang1,2,3, Xuechen Dai1,2, Qi Wang1,2, Zhou Li1, Ying Gu1, Wenjie Ouyang1, Shuchang Zhao1,4, Baoqian Huang1,5, Jian Gong6, Jing Zhao1, Ao Chen1, Yue Shen7,8, Yuliang Dong1, Wenwei Zhang1, Xun Xu1,2,3, Chongjun Xu1,6,9, Yuan Jiang1,2,6.
Abstract
Here, we report a sensitive DocMF system that uses next-generation sequencing chips to profile protein-DNA interactions. Using DocMF, we successfully identified a variety of endonuclease recognition sites and the protospacer adjacent motif (PAM) sequences of different CRISPR systems. DocMF can simultaneously screen both 5' and 3' PAMs with high coverage. For SpCas9, we found noncanonical 5'-NAG-3' (~5%) and 5'-NGA-3' (~1.6%), in addition to its common PAMs, 5'-NGG-3' (~89.9%). More relaxed PAM sequences of two uncharacterized Cas endonucleases, VeCas9 and BvCas12a, were extensively characterized using DocMF. Moreover, we observed that dCas9, a DNA binding protein lacking endonuclease activity, preferably bound to the previously reported 5'-NGG-3' sequence. In summary, our studies demonstrate that DocMF is the first tool with the capacity to exhaustively assay both the binding and the cutting properties of different DNA binding proteins.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32789179 PMCID: PMC7399529 DOI: 10.1126/sciadv.abb3350
Source DB: PubMed Journal: Sci Adv ISSN: 2375-2548 Impact factor: 14.136
Fig. 1DocMF overview.
(A) Biochemistry and illustration and (B) bioinformatics workflow.
Fig. 2Restriction endonuclease cut site identification using DocMF.
(A) Box plots for the motifs with the top 50 log10(site rates). Outliers’ DNA sequences (orange) and the sum of outliers’ site rates (green) are shown. (B) Cumulative site rates for Bgl I. (C) A sequence logo representation of the 372 motifs for Bgl I.
Fig. 3PAM identification for SpCas9 using DocMF.
(A) PAM DNB library preparation illustration. The synthetic oligo region contains a known 25-nt SpCas9 protospacer sequence (orange) flanked by 5′ and 3′ PAM regions with 15 random nucleotides each (green). Hundreds of copies of each random PAM-flanked protospacer are incorporated per DNB, and only copy is demonstrated. (B) The relative read frequency at both the 5′ end and the 3′ end for SpCas9. The X axis is all combinations of 7-nt sequences sorted by the difference between two ends in descending order. (C) PAM sequence for SpCas9.
Fig. 4PAM identification in novel CRISPR-Cas systems using DocMF.
(A) The relative read frequency at both the 5′ end and the 3′ end for VeCas9. (B) The relative read frequency at both the 5′ end and the 3′ end for BvCas12. Consensus PAM sequence by frequency plot with all detected 7-nt sequences for VeCas9 (C) and BvCas12 (D). PAM sequence by sequence logo for VeCas9 generated by all detected 7-nt sequences (E) and by the top 1000 7-nt sequences from FET analysis (F). PAM sequence by sequence logo for BvCas12 generated by all detected 7-nt sequences (G) and by the top 1000 7-nt sequences from FET analysis (H). (I) In vitro validation of VeCas9 PAM sequences. Nine 7-nt sequences each above/below the cutoff were selected. The FET ranking numbers are shown in red. NC, negative control. (J) In vitro validation of BvCas12 PAM sequences. Five 7-nt sequences above the cutoff and two 7-nt sequences below the cutoff were selected.
Fig. 5PAM wheel results.
(A) PAM wheel for VeCas9. The upper yellow box gives an indication about each position of the PAM sequence, and the arrow illustrates the orientation of each base. The area of a sector of the ring for one base at one particular position represents its frequency at this position. (B) PAM wheel for BvCas12. (C) In vitro validation of VeCas9 PAM wheel results. NYARRMY is the consensus sequence based on positional frequency, while ACAAGCC is 58th FET ranked sequence included as a positive control. (D) In vitro validation of BvCas12 PAM wheel results. NNNTTTN or NNNTYTN is the consensus sequence based on positional frequency, while AATTTTG is 70th FET ranked sequence included as positive control. NC (negative control): positive PAMs incubated with corresponding protein but without any sgRNA.
Fig. 6DocMF used to identify protein binding site of dCas9.
(A) The relative binding strength at both the 5′ end and the 3′ end for dCas9. The X axis is all combinations of 7-nt sequences and is automatically sorted by letter using Excel. (B) Sequence logo for dCas9 was generated by all detected 7-nt sequences based on those with the highest relative binding strength.