| Literature DB >> 21044337 |
Jeffry D Sander1, Deepak Reyon, Morgan L Maeder, Jonathan E Foley, Stacey Thibodeau-Beganny, Xiaohong Li, Maureen R Regan, Elizabeth J Dahlborg, Mathew J Goodwin, Fengli Fu, Daniel F Voytas, J Keith Joung, Drena Dobbs.
Abstract
BACKGROUND: Precise and efficient methods for gene targeting are critical for detailed functional analysis of genomes and regulatory networks and for potentially improving the efficacy and safety of gene therapies. Oligomerized Pool ENgineering (OPEN) is a recently developed method for engineering C2H2 zinc finger proteins (ZFPs) designed to bind specific DNA sequences with high affinity and specificity in vivo. Because generation of ZFPs using OPEN requires considerable effort, a computational method for identifying the sites in any given gene that are most likely to be successfully targeted by this method is desirable.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21044337 PMCID: PMC3098093 DOI: 10.1186/1471-2105-11-543
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Base composition differs in active versus inactive ZFP target sites. A) Total base counts for active and inactive ZFP target sites (from ZFTS135, a dataset of 135 experimentally validated 9-bp target sites, see Additional File 1 - Table S1) reveal that variation in the average frequency of each base differentiates active versus inactive target sites. The total number of G and T residues relative to A and C is inflated because currently available OPEN pools are designed to target GNN and TNN triplets. B) Positional base counts, i.e., average base counts for each position within target site triplets (1st, 2nd, 3rd), suggest that thymine bases negatively impact ZFP binding at all three positions. C) An iceLogo [50] generated from ZFTS135 illustrates the difference in percentage composition of nucleotides at each position, from 1 - 9 (5' to 3'), between the positive class and the entire dataset. For example, 78% of all sites in ZFTS135 have a G in position 1, whereas 88% of all active sites have a G at position 1, resulting in a difference of 10%. Positive difference values indicate that, on average, the indicated bases are favored at those positions in active sites; negative difference values indicate that the indicated bases are disfavored. These position-specific differences in percentage composition also support the conclusion that thymine bases tend to occur in inactive targets (i.e., they have large negative propensities).
Performance of classifiers in predicting active OPEN target sites
| Classifier | Target site | ROC AUC | Correlation | Accuracy % | ||
|---|---|---|---|---|---|---|
| 0.89 | 0.61 | 87 | 90 | 94 | ||
| Base Counts | 0.79 | 0.57 | 87 | 89 | 94 | |
| Positional Base Counts | 0.84 | 0.59 | 87 | 88 | 97 | |
| Sequence Identity | 0.76 | 0.48 | 84 | 86 | 95 | |
| Base Counts | 0.78 | 0.54 | 85 | 89 | 92 | |
| Positional Base Counts | 0.84 | 0.63 | 88 | 90 | 95 |
Figure 2Receiver Operating Characteristic (ROC) curves for Naïve Bayes and SVM classifiers.
Performance of ZiFOpT on an independent test set (ZFTS140)
| Confidence Score | Accuracy % | ||
|---|---|---|---|
| ≥ 6 | 90 | 90 | 100 |
| < 6 | 67 | 73 | 85 |
Summary of zebrafish OPEN ZFN target sites, classified by ZiFOpT
| Confidence Score (Active Sites) | False Positive Rate1 (FPR) | # of zebrafish transcripts targeted2 | Average # of ZFN target sites2 in transcripts containing nuclease sites | # of potential target sites2 eliminated by using ZiFOpT | ||
|---|---|---|---|---|---|---|
| ** | ** | 25,174 | (86%) | 4.5 | 0 | (0%) |
| > 4 | 24% | 15,565 | (53%) | 2.3 | 78,934 | (69%) |
| > 6 | 14% | 12,622 | (43%) | 2.0 | 89,580 | (78%) |
| > 8 | 7% | 6,942 | (24%) | 1.5 | 103,877 | (90%) |
1estimated from training data 2in coding exons 1-3 **no classification