| Literature DB >> 25147789 |
Yifan Sun1, Yi Xiong1, Qian Xu1, Dongqing Wei1.
Abstract
Combination drugs that impact multiple targets simultaneously are promising candidates for combating complex diseases due to their improved efficacy and reduced side effects. However, exhaustive screening of all possible drug combinations is extremely time-consuming and impractical. Here, we present a novel Hadoop-based approach to predict drug combinations by taking advantage of the MapReduce programming model, which leads to an improvement of scalability of the prediction algorithm. By integrating the gene expression data of multiple drugs, we constructed data preprocessing and the support vector machines and naïve Bayesian classifiers on Hadoop for prediction of drug combinations. The experimental results suggest that our Hadoop-based model achieves much higher efficiency in the big data processing steps with satisfactory performance. We believed that our proposed approach can help accelerate the prediction of potential effective drugs with the increasing of the combination number at an exponential rate in future. The source code and datasets are available upon request.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25147789 PMCID: PMC4134802 DOI: 10.1155/2014/196858
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Algorithm 1The workflow of the scalable version of the Naïve Bayesian algorithm implemented by MapReduce.
Comparison of the accuracy of the prediction models based on SVM using various feature representation and kernel functions.
| Linear | Polynomial | Gaussian | Tanh | |
|---|---|---|---|---|
| Linear addition pattern | 47.7% | 47.7% | 47.7% | 53.0% |
| Zhao's frequent pattern [ | 50.0% | 55.1% | 57.4% | 56.2% |
| Our frequent pattern | 62.2% | 64.6% | 69.1% | 65.4% |
The performance of the independent test using our definition of frequent pattern and Gaussian kernel.
| Run | ACC | SN | SP |
|
|---|---|---|---|---|
| 1 | 67.7% | 70.6% | 64.3% | 0.706 |
| 2 | 65.0% | 54.5% | 77.8% | 0.632 |
| 3 | 60.9% | 44.4% | 71.4% | 0.471 |
| 4 | 64.0% | 66.7% | 60.0% | 0.690 |
| 5 | 68.2% | 61.5% | 77.8% | 0.696 |
| 6 | 65.5% | 41.7% | 82.4% | 0.500 |
| 7 | 77.8% | 64.3% | 92.3% | 0.750 |
| 8 | 72.2% | 76.9% | 60.0% | 0.800 |
| 9 | 72.0% | 66.7% | 80.0% | 0.741 |
| 10 | 70.4% | 66.7% | 75.0% | 0.714 |
|
| ||||
| Average | 68.4% | 61.4% | 74.1% | 0.670 |
The performance of the one-class SVM classifiers using different kernel functions.
| Linear | Polynomial | Gaussian | Tanh | |
|---|---|---|---|---|
| ACC | 46.1% | 81.2% | 88.2% | 80.3% |
Comparison of the average efficiency between the scalable and sequential version.
| Mining steps | Scalable version | Sequential version |
|---|---|---|
| Microarray processing | 2 h 3 min | 6 h 18 m |
| Feature construction | 8 min 34 s | 18 min 3 s |
| Naive Bayesian | 15 s | 3 s |
| SVM grid search | 27 min 6 s | 1 h 11 min |