| Literature DB >> 32487025 |
Ali Haisam Muhammad Rafid1,2, Md Toufikuzzaman1, Mohammad Saifur Rahman1, M Sohel Rahman3.
Abstract
BACKGROUND: The latest works on CRISPR genome editing tools mainly employs deep learning techniques. However, deep learning models lack explainability and they are harder to reproduce. We were motivated to build an accurate genome editing tool using sequence-based features and traditional machine learning that can compete with deep learning models.Entities:
Keywords: CRISPR; Cas9; Deep learning; Machine learning; sgRNA
Year: 2020 PMID: 32487025 PMCID: PMC7268231 DOI: 10.1186/s12859-020-3531-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Training pipelines, the steps of building the final prediction model. aThe pipeline for experimental setup A. We only extracted position-independent and position-specific features. The steps of splitting the dataset and selecting features are described in “Results” section. We used the default parameters while training SVM. bThe pipeline for experimental setup B. The steps of extracting features and splitting dataset is same as experimental setup A. But, in feature selection step we used extremely randomized trees (the feature selection criteria are described in “Results” section). We performed hyperparameter tuning on SVM and retrained SVM with the best hyperparameters. cThe pipeline for experimental setup C. It is exactly same as the pipeline for experimental setup B except we considered the feature type n-Gapped Di-nucleotide in feature extraction step
Fig. 2Comparison of performance of various methods with first three experimental settings (A, B and C). Y-axis denotes the ROC-AUC and X-axis denotes the cell types. In all three settings, CRISPRpred(SEQ) has convincingly beaten DeepCRISPR in 3 out of 4 cells, i.e., in HCT116, HeLa and HL60. However, in HEK293, DeepCRISPR performs far better than CRISPRpred(SEQ) (please also see a relevant discussion in “Results on HEK293 cell” section). CRISPRpred(SEQ)-C performs slightly better than CRISPRpred(SEQ)-B which in turn outperforms CRISPRpred(SEQ)-A in all cell lines
The results of 3 fold cross-validation hyperparameter tuning of Experiment B
| 0.0001 | 0.001 | 0.01 | |
|---|---|---|---|
| 1 | 0.702 | 0.775 | 0.759 |
| 10 | 0.733 | 0.781 | 0.758 |
| 100 | 0.765 | 0.781 | 0.758 |
All the values in the table are ROC-AUC. The best result has been achieved for C=10 and γ=0.001
The result of 3 fold cross-validation hyperparameter tuning of Experiment C
| 0.0001 | 0.001 | 0.01 | |
|---|---|---|---|
| 1 | 0.705 | 0.783 | 0.763 |
| 10 | 0.732 | 0.788 | 0.762 |
| 100 | 0.763 | 0.788 | 0.762 |
All the values in the table are ROC-AUC. The best result is achieved for C=10 and γ=0.001
Fig. 3Comparison of performance of various methods with the three experimental settings D, E and F. Y-axis denotes the ROC-AUC and X-axis denotes the cell type that we are leaving out. In all three settings, CRISPRpred(SEQ) has beaten DeepCRISPR when leaving out cell HL60. CRISPRpred(SEQ)-E and CRISPRpred(SEQ)-F achieved scores slightly better than DeepCRISPR when leaving out cell HeLa. None of the settings were able to beat DeepCRISPR when leaving out cell HCT116 but managed to achieve a score close to that of DeepCRISPR