| Literature DB >> 26933453 |
P B Jayaraj1, Mathias K Ajay1, M Nufail1, G Gopakumar1, U C A Jaleel2.
Abstract
BACKGROUND: In-silico methods are an integral part of modern drug discovery paradigm. Virtual screening, an in-silico method, is used to refine data models and reduce the chemical space on which wet lab experiments need to be performed. Virtual screening of a ligand data model requires large scale computations, making it a highly time consuming task. This process can be speeded up by implementing parallelized algorithms on a Graphical Processing Unit (GPU).Entities:
Keywords: CUDA; GPU computing; In-silico drug discovery; Ligand based drug discovery; Random forest classifier; Virtual screening
Year: 2016 PMID: 26933453 PMCID: PMC4772510 DOI: 10.1186/s13321-016-0124-8
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1CPU–GPU execution control flow in the proposed parallel algorithm
Datasets used
| PubChem bioassay datasets | Number of molecules | Date of access |
|---|---|---|
| AID 1332 | 1193 | 22/07/2014 |
| AID 492952 | 2294 | 22/07/2014 |
| AID 651616 | 5569 | 22/07/2014 |
| AID 2330 | 36,869 | 22/07/2014 |
| AID 893 | 68,532 | 24/09/2014 |
| AID 778 | 95,859 | 22/07/2014 |
| AID 434955 | 323,578 | 28/07/2014 |
| AID 2314 | 2,964,564 | 10/09/2015 |
Hardware configuration used
| Particulars | GPU1 | GPU2 |
|---|---|---|
| GPU | NVIDIA GeForce GTX 780 | TESLA K20 |
| CUDA cores | 2304 | 2496 |
| GPU clock speed | 941 MHz | 706 MHz |
| Graphic memory | 3072 MB | 4800 MB |
| Memory bandwidth | 288.4 GB/S | 208 GB/S |
| Peak performance | 4 TFlops | 3.52 TFlops |
| Compute capability | 3.5 | 3.5 |
| CPU | Intel Core i7 | Xeon 2650 |
Performance of random forest virtual screening on serial environment
| Dataset | Recall | Precision | F-score | ROC area | Accuracy |
|---|---|---|---|---|---|
| AID 1332 | 0.68 | 0.5 | 0.58 | 0.73 | 0.89 |
| AID 492952 | 0.68 | 0.87 | 0.76 | 0.65 | 0.68 |
| AID 651616 | 0.77 | 0.95 | 0.85 | 0.49 | 0.74 |
| AID 2330 | 0.58 | 0.37 | 0.45 | 0.67 | 0.93 |
| AID 893 | 0.73 | 0.49 | 0.59 | 0.74 | 0.94 |
| AID 778 | 0.49 | 0.34 | 0.40 | 0.62 | 0.78 |
Performance of random forest virtual screening on GPU
| Dataset | Recall | Precision | F-score | ROC area | Accuracy |
|---|---|---|---|---|---|
| AID 1332 | 0.74 | 0.52 | 0.61 | 0.75 | 0.92 |
| AID 492952 | 0.73 | 0.87 | 0.8 | 0.66 | 0.72 |
| AID 651616 | 0.78 | 0.93 | 0.85 | 0.65 | 0.74 |
| AID 2330 | 0.54 | 0.38 | 0.45 | 0.68 | 0.93 |
| AID 893 | 0.72 | 0.49 | 0.58 | 0.73 | 0.94 |
| AID 778 | 0.48 | 0.34 | 0.4 | 0.62 | 0.78 |
Depth-bredth threshold crossover analysis for AID2314 training set
| Crossover value | Running time (s) | Recall | Precision | F-score | Roc area | Accuracy |
|---|---|---|---|---|---|---|
| 1000 | 10.53 | 0.63 | 0.40 | 0.49 | 0.681 | 0.91 |
| 5000 | 10.75 | 0.62 | 0.39 | 0.48 | 0.679 | 0.91 |
| 10,000 | 11.04 | 0.62 | 0.39 | 0.48 | 0.677 | 0.90 |
| 15,000 | 10.58 | 0.62 | 0.40 | 0.48 | 0.681 | 0.90 |
| 20,000 | 10.42 | 0.63 | 0.40 | 0.49 | 0.687 | 0.91 |
| 25,000 | 10.07 | 0.62 | 0.39 | 0.48 | 0.681 | 0.90 |
| 30,000 | 10.37 | 0.62 | 0.39 | 0.48 | 0.681 | 0.91 |
| 40,000 | 10.73 | 0.62 | 0.40 | 0.48 | 0.685 | 0.91 |
| 50,000 | 12.43 | 0.62 | 0.39 | 0.48 | 0.682 | 0.91 |
Running time of serial and GPU versions of random forest virtual screening for training
| Dataset | No of molecules | Time for (s) serial | Time for (s) for GPU |
|---|---|---|---|
| AID 1332 | 1193 | 0.0689 | 1.114 |
| AID 492952 | 2294 | 0.2095 | 1.2054 |
| AID 651616 | 5569 | 0.6641 | 1.7984 |
| AID 2330 | 36,869 | 4.2428 | 2.5487 |
| AID 893 | 68,532 | 11.5306 | 3.2746 |
| AID 778 | 95,859 | 35.8603 | 9.9027 |
| AID 434955 | 323,578 | 81.323 | 10.44 |
| AID 2314 | 330,664 | 100.57 | 10.77 |
Running time of serial and GPU versions of random forest virtual screening for classification
| Dataset | No of molecules | Date of access | Time (s) for serial | Time (s) for GPU |
|---|---|---|---|---|
| Gdb 17–0.5 million | 0.5 million | 12/11/2014 | 1.0296 | 6.5776 |
| Gdb 17–1 million | 1 million | 12/11/2014 | 215.379 | 13.5085 |
| Gdb 17–2 million | 2 million | 12/11/2014 | 1516.4383 | 25.9176 |
| Gdb 17–2.5 million | 2.5 million | 12/11/2014 | * | 32.0973 |
| Gdb 17–5 million | 5 million | 12/11/2014 | * | 69.4101 |
| Gdb 17–7.5 million | 7.5 million | 12/11/2014 | * | 104.1336 |
| Gdb 17–10 million | 10 million | 12/11/2014 | * | 129.7067 |
* Serial exception error
Fig. 2Classification time comparison: serial versus GPU