| Literature DB >> 28587596 |
Pei Fen Kuan1, Scott Powers2, Shuyao He3, Kaiqiao Li3, Xiaoyu Zhao2, Bo Huang4.
Abstract
BACKGROUND: CRISPR is a versatile gene editing tool which has revolutionized genetic research in the past few years. Optimizing sgRNA design to improve the efficiency of target/DNA cleavage is critical to ensure the success of CRISPR screens.Entities:
Keywords: CRISPR; Machine learning; Predictive modeling; Thermodynamics
Mesh:
Substances:
Year: 2017 PMID: 28587596 PMCID: PMC5461693 DOI: 10.1186/s12859-017-1697-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Pairwise correlation plot for each class of features. Left column is the pairwise correlation plot between ribosomal and non-ribosomal genes from [10]. Middle column is the pairwise correlation plots between ribosomal genes from [10] and mESC essential genes from [11]. Right column is the pairwise correlation plots between non-ribosomal genes from [10] and mESC essential genes from [11]. Each point is a feature
Fig. 2Pairwise correlation plot for each class of features. Left column is the pairwise correlation plot between ribosomal and non-ribosomal genes from [10]. Middle column is the pairwise correlation plots between ribosomal genes from [10] and mESC essential genes from [11]. Right column is the pairwise correlation plots between non-ribosomal genes from [10] and mESC essential genes from [11]. Each point is a feature
Fig. 3Top 10 most informative features ranked by AUC by dataset. The last panel is the ranking by average AUC aggregating the three datasets
AUC, Youden index (J), Sensitivity (Se) and Specificity (Sp) from the 3-way cross validation within dataset 1 (ribosomal genes)
| Feature class | AUC |
| Se | Sp |
|---|---|---|---|---|
| PD Mono | 0.826 | 0.535 | 0.855 | 0.680 |
| PD Dinuc | 0.848 | 0.575 | 0.788 | 0.787 |
| Freq | 0.778 | 0.441 | 0.677 | 0.764 |
| Align | 0.613 | 0.188 | 0.746 | 0.442 |
| Thermo | 0.525 | 0.086 | 0.812 | 0.273 |
| Packer | 0.601 | 0.186 | 0.634 | 0.551 |
| PhyChem | 0.722 | 0.380 | 0.711 | 0.669 |
| PseKNC | 0.731 | 0.376 | 0.683 | 0.693 |
| Comb Feature | 0.867 | 0.618 | 0.826 | 0.792 |
Comb Feature: PD Mono+PD Dinuc+Freq+Thermo+Packer+PhyChem+PseKNC. We reported the average performance from the 3-way cross validation over 10 iterations of random sampling
AUC, Youden index (J), Sensitivity (Se) and Specificity (Sp) from intra-platform comparison (training set: ribosomal genes, test set: non-ribosomal genes)
| Feature class | AUC |
| Se | Sp |
|---|---|---|---|---|
| PD Mono | 0.785 | 0.443 | 0.717 | 0.726 |
| PD Dinuc | 0.792 | 0.478 | 0.765 | 0.713 |
| Freq | 0.700 | 0.332 | 0.779 | 0.553 |
| Align | 0.594 | 0.159 | 0.881 | 0.278 |
| Thermo | 0.616 | 0.222 | 0.639 | 0.580 |
| Packer | 0.637 | 0.207 | 0.431 | 0.776 |
| PhyChem | 0.659 | 0.241 | 0.633 | 0.608 |
| PseKNC | 0.647 | 0.243 | 0.694 | 0.549 |
| Comb Feature | 0.806 | 0.492 | 0.851 | 0.641 |
Comb Feature: PD Mono+PD Dinuc +Thermo+Packer+PhyChem
AUC, Youden index (J), Sensitivity (Se) and Specificity (Sp) from inter-platform comparison (training set: ribosomal and non-ribosomal genes, test set: mESC essential genes)
| Feature class | AUC |
| Se | Sp |
|---|---|---|---|---|
| PD Mono | 0.797 | 0.486 | 0.751 | 0.735 |
| PD Dinuc | 0.832 | 0.544 | 0.792 | 0.752 |
| Freq | 0.751 | 0.382 | 0.716 | 0.667 |
| Align | 0.574 | 0.131 | 0.490 | 0.641 |
| Thermo | 0.641 | 0.261 | 0.817 | 0.444 |
| Packer | 0.667 | 0.241 | 0.514 | 0.726 |
| PhyChem | 0.726 | 0.351 | 0.718 | 0.632 |
| PseKNC | 0.733 | 0.370 | 0.660 | 0.709 |
| Comb Feature | 0.848 | 0.566 | 0.843 | 0.722 |
| azimuth | 0.795 | 0.463 | 0.857 | 0.607 |
| sgRNA Scorer | 0.669 | 0.288 | 0.548 | 0.739 |
| azimuth (retrained) | 0.833 | 0.543 | 0.787 | 0.756 |
| sgRNA Scorer (retrained) | 0.804 | 0.474 | 0.786 | 0.688 |
Comb Feature: PD Mono+PD Dinuc+Freq+Align+Thermo+Packer+PhyChem+
PseKNC. azimuth and sgRNA Scorer were the results based on the softwares by [7] and [27], respectively developed using different training datasets. azimuth (retrained) and sgRNA Scorer (retrained) were the results obtained by refitting the algorithms on the current training set (ribosomal and non-ribosomal genes)
Fig. 4AUC curves for our proposed predictive model using combination features (Comb Feature), azimuth and sgRNA scorer. azimuth and sgRNA Scorer were the results based on the softwares by [7] and [27], respectively developed using different training datasets. azimuth (retrained) and sgRNA Scorer (retrained) were the results obtained by refitting the algorithms on the current training set (ribosomal and non-ribosomal genes)