| Literature DB >> 29329334 |
Lateefat Temitope Afolabi1, Faisal Saeed2,3, Haslinda Hashim3,4, Olutomilayo Olayemi Petinrin3.
Abstract
Pharmacologically active molecules can provide remedies for a range of different illnesses and infections. Therefore, the search for such bioactive molecules has been an enduring mission. As such, there is a need to employ a more suitable, reliable, and robust classification method for enhancing the prediction of the existence of new bioactive molecules. In this paper, we adopt a recently developed combination of different boosting methods (Adaboost) for the prediction of new bioactive molecules. We conducted the research experiments utilizing the widely used MDL Drug Data Report (MDDR) database. The proposed boosting method generated better results than other machine learning methods. This finding suggests that the method is suitable for inclusion among the in silico tools for use in cheminformatics, computational chemistry and molecular biology.Entities:
Mesh:
Year: 2018 PMID: 29329334 PMCID: PMC5766097 DOI: 10.1371/journal.pone.0189538
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Activity class for dataset DS1.
| Activity Index | Activity Class | Activity Molecules | Pairwise Similarity (Mean) |
|---|---|---|---|
| 31420 | Renin inhibitors | 1130 | 0.573 |
| 71523 | HIV protease inhibitors | 750 | 0.446 |
| 37110 | Thrombin inhibitors | 803 | 0.419 |
| 31432 | Angiotensin II AT1 antagonists | 943 | 0.403 |
| 42731 | Substance P antagonists | 1246 | 0.339 |
| 06233 | 5HT3 antagonists | 752 | 0.351 |
| 06245 | 5HT reuptake inhibitors | 359 | 0.345 |
| 07701 | D2 antagonists | 395 | 0.345 |
| 06235 | 5HT1A agonists | 827 | 0.343 |
| 78374 | Protein kinase C inhibitors | 453 | 0.323 |
| 78331 | Cyclooxygenase inhibitors | 636 | 0.268 |
Activity class for dataset DS3.
| Activity Index | Activity Class | Activity Molecules | Pairwise Similarity (Mean) |
|---|---|---|---|
| 09249 | Muscarinic (M1) agonists | 900 | 0.257 |
| 12455 | NMDA receptor antagonists | 1400 | 0.311 |
| 12464 | Nitric oxide synthase inhibitors | 505 | 0.237 |
| 31281 | Dopamine β-hydroxylase inhibitors | 106 | 0.324 |
| 43210 | Aldose reductase inhibitors | 957 | 0.37 |
| 71522 | Reverse transcriptase inhibitors | 700 | 0.311 |
| 75721 | Aromatase inhibitors | 636 | 0.318 |
| 78331 | Cyclooxygenase inhibitors | 636 | 0.382 |
| 78348 | Phospholipase A2 inhibitors | 617 | 0.291 |
| 78351 | Lipoxygenase inhibitors | 2111 | 0.365 |
Sensitivity measure for the prediction of new bioactive molecules with DS1 (normal sataset).
| Class | Activity Index | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT |
|---|---|---|---|---|---|---|---|---|
| 1 | 31420 | 0.978 | 0.983 | 0.979 | 0.980 | 0.978 | 0.977 | |
| 2 | 71523 | 0.933 | 0.945 | 0.941 | 0.951 | |||
| 3 | 37110 | 0.980 | 0.978 | 0.978 | 0.980 | 0.976 | 0.971 | |
| 4 | 31432 | 0.990 | 0.995 | 0.990 | 0.986 | 0.992 | 0.989 | |
| 5 | 42731 | 0.986 | 0.980 | 0.970 | 0.970 | 0.971 | 0.968 | |
| 6 | 6233 | 0.973 | 0.979 | 0.964 | 0.961 | 0.951 | 0.969 | |
| 7 | 6245 | 0.905 | 0.872 | 0.855 | 0.861 | 0.905 | 0.886 | |
| 8 | 7701 | 0.851 | 0.830 | 0.823 | 0.810 | 0.843 | 0.813 | |
| 9 | 6235 | 0.941 | 0.949 | 0.935 | 0.906 | 0.900 | 0.933 | |
| 10 | 78374 | 0.945 | 0.943 | 0.932 | 0.943 | 0.951 | 0.916 | |
| 11 | 78331 | 0.970 | 0.973 | 0.973 | 0.947 | 0.951 | 0.961 | |
| 0.950 | 0.946 | 0.935 | 0.934 | 0.956 | 0.940 | |||
Sensitivity measure for the prediction of new bioactive molecules with DS3 (heterogeneous).
| Class | Activity | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT |
|---|---|---|---|---|---|---|---|---|
| 1 | 09249 | 0.980 | 0.972 | 0.979 | 0.970 | 0.968 | 0.974 | |
| 2 | 12455 | 0.955 | 0.942 | 0.942 | 0.946 | 0.949 | ||
| 3 | 12464 | 0.909 | 0.899 | 0.907 | 0.909 | 0.893 | ||
| 4 | 31281 | 0.953 | 0.934 | 0.868 | 0.887 | 0.915 | 0.896 | |
| 5 | 43210 | 0.950 | 0.934 | 0.947 | 0.943 | 0.937 | ||
| 6 | 71522 | 0.914 | 0.916 | 0.913 | 0.897 | 0.909 | 0.880 | |
| 7 | 75721 | 0.976 | 0.961 | 0.945 | 0.951 | 0.970 | 0.956 | |
| 8 | 78331 | 0.838 | 0.796 | 0.808 | 0.832 | 0.841 | 0.838 | |
| 9 | 78348 | 0.898 | 0.878 | 0.901 | 0.890 | 0.891 | 0.867 | |
| 10 | 78351 | 0.943 | 0.962 | 0.958 | 0.942 | 0.945 | 0.949 | |
| 0.934 | 0.921 | 0.914 | 0.917 | 0.931 | 0.914 | |||
Sensitivity measure for the prediction of new bioactive molecules with DS2 (homogeneous).
| Class of DS2 | Activity Index | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT |
|---|---|---|---|---|---|---|---|---|
| 1 | 07707 | 0.966 | 0.961 | 0.966 | 0.956 | 0.956 | 0.966 | |
| 2 | 07708 | 0.968 | 0.968 | 0.962 | 0.974 | 0.949 | 0.949 | |
| 3 | 31420 | 0.995 | 0.993 | 0.995 | 0.992 | |||
| 4 | 42710 | 0.982 | 0.973 | 0.973 | 0.982 | 0.973 | 0.964 | |
| 5 | 64100 | 0.972 | 0.978 | 0.981 | 0.977 | 0.975 | 0.977 | |
| 6 | 64200 | 0.734 | 0.772 | 0.715 | 0.759 | 0.722 | 0.759 | |
| 7 | 64220 | 0.996 | 0.993 | 0.995 | 0.995 | 0.995 | 0.996 | |
| 8 | 64300 | 0.968 | 0.952 | 0.952 | 0.968 | 0.968 | ||
| 9 | 65000 | 0.995 | 0.995 | 0.995 | ||||
| 10 | 75755 | 0.993 | 0.993 | |||||
| 0.958 | 0.959 | 0.953 | 0.956 | 0.958 | 0.960 | |||
Specificity measure for the prediction of new bioactive molecules with DS1 (normal dataset).
| Class | Activity Index | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT |
|---|---|---|---|---|---|---|---|---|
| 1 | 31420 | 0.995 | 0.996 | 0.978 | 0.996 | |||
| 2 | 71523 | 0.997 | 0.997 | 0.997 | 0.997 | 0.941 | 0.996 | |
| 3 | 37110 | 0.996 | 0.980 | 0.997 | ||||
| 4 | 31432 | 0.998 | 0.992 | 0.998 | ||||
| 5 | 42731 | 0.995 | 0.994 | 0.992 | 0.971 | 0.995 | 0.993 | |
| 6 | 6233 | 0.997 | 0.997 | 0.994 | 0.951 | 0.997 | 0.994 | |
| 7 | 6245 | 0.996 | 0.996 | 0.861 | 0.995 | |||
| 8 | 7701 | 0.994 | 0.996 | 0.993 | 0.993 | 0.810 | 0.995 | |
| 9 | 6235 | 0.991 | 0.988 | 0.990 | 0.900 | 0.991 | 0.990 | |
| 10 | 78374 | 0.998 | 0.998 | 0.998 | 0.997 | 0.943 | 0.997 | |
| 11 | 78331 | 0.996 | 0.994 | 0.951 | 0.995 | |||
| 0.996 | 0.996 | 0.995 | 0.934 | 0.995 | ||||
Specificity measure for the prediction of new bioactive molecules with DS3 (heterogeneous).
| Class | Activity | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT |
|---|---|---|---|---|---|---|---|---|
| 1 | 09249 | 0.996 | 0.996 | 0.995 | 0.996 | 0.996 | 0.994 | |
| 2 | 12455 | 0.991 | 0.989 | 0.987 | 0.988 | 0.989 | 0.985 | |
| 3 | 12464 | 0.996 | 0.998 | 0.996 | 0.995 | 0.994 | 0.996 | |
| 4 | 31281 | 0.999 | 0.999 | 0.999 | 0.999 | |||
| 5 | 43210 | 0.995 | 0.996 | 0.995 | 0.994 | 0.996 | 0.994 | |
| 6 | 71522 | 0.993 | 0.997 | 0.998 | 0.994 | 0.994 | 0.995 | |
| 7 | 75721 | 0.997 | 0.997 | 0.996 | 0.996 | 0.997 | ||
| 8 | 78331 | 0.990 | 0.993 | 0.991 | 0.989 | 0.989 | 0.989 | |
| 9 | 78348 | 0.992 | 0.995 | 0.995 | 0.994 | 0.992 | 0.993 | |
| 10 | 78351 | 0.974 | 0.956 | 0.971 | 0.974 | 0.965 | 0.971 | |
| 0.993 | 0.992 | 0.992 | 0.992 | 0.993 | 0.991 | |||
Specificity measure for the prediction of new bioactive molecules with DS2 (homogeneous).
| Class | Activity | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT |
|---|---|---|---|---|---|---|---|---|
| 1 | 07707 | |||||||
| 2 | 07708 | 0.998 | 0.998 | 0.998 | 0.998 | |||
| 3 | 31420 | 0.997 | 0.997 | 0.997 | ||||
| 4 | 42710 | 0.999 | 0.999 | 0.999 | 0.999 | 0.999 | 0.999 | |
| 5 | 64100 | 0.989 | 0.990 | 0.987 | 0.990 | 0.989 | 0.990 | |
| 6 | 64200 | 0.993 | 0.995 | 0.995 | 0.995 | 0.994 | 0.995 | |
| 7 | 64220 | 0.998 | 0.998 | 0.998 | ||||
| 8 | 64300 | 0.999 | 0.999 | 0.999 | ||||
| 9 | 65000 | 0.999 | 0.999 | 0.999 | 0.999 | |||
| 10 | 75755 | 0.999 | ||||||
| 0.997 | 0.997 | |||||||
AUC measure for the prediction of new bioactive molecules with DS1 (normal dataset).
| Class | Activity Index | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT |
|---|---|---|---|---|---|---|---|---|
| 1 | 31420 | 0.987 | 0.990 | 0.988 | 0.988 | 0.987 | 0.987 | |
| 2 | 71523 | 0.965 | 0.971 | 0.970 | ||||
| 3 | 37110 | 0.989 | 0.988 | 0.987 | 0.988 | 0.987 | 0.984 | |
| 4 | 31432 | 0.995 | 0.995 | 0.992 | 0.995 | 0.994 | ||
| 5 | 42731 | 0.991 | 0.988 | 0.982 | 0.981 | 0.981 | 0.981 | |
| 6 | 6233 | 0.986 | 0.988 | 0.981 | 0.978 | 0.973 | 0.982 | |
| 7 | 6245 | 0.951 | 0.935 | 0.926 | 0.929 | 0.951 | 0.941 | |
| 8 | 7701 | 0.923 | 0.912 | 0.908 | 0.902 | 0.920 | 0.904 | |
| 9 | 6235 | 0.966 | 0.971 | 0.962 | 0.948 | 0.945 | 0.962 | |
| 10 | 78374 | 0.972 | 0.971 | 0.965 | 0.971 | 0.975 | 0.957 | |
| 11 | 78331 | 0.984 | 0.985 | 0.985 | 0.971 | 0.973 | 0.978 | |
| 0.973 | 0.971 | 0.965 | 0.965 | 0.976 | 0.967 | |||
AUC measure for the prediction of new bioactive molecules with DS3 (heterogeneous).
| Class | Activity | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT |
|---|---|---|---|---|---|---|---|---|
| 1 | 09249 | 0.984 | 0.988 | 0.983 | 0.982 | 0.984 | ||
| 2 | 12455 | 0.973 | 0.967 | 0.965 | 0.967 | 0.967 | ||
| 3 | 12464 | 0.953 | 0.949 | 0.951 | 0.953 | 0.945 | ||
| 4 | 31281 | 0.977 | 0.967 | 0.934 | 0.943 | 0.958 | 0.948 | |
| 5 | 43210 | 0.973 | 0.965 | 0.971 | 0.969 | 0.976 | 0.966 | |
| 6 | 71522 | 0.954 | 0.957 | 0.954 | 0.946 | 0.954 | 0.938 | |
| 7 | 75721 | 0.987 | 0.979 | 0.971 | 0.974 | 0.984 | 0.977 | |
| 8 | 78331 | 0.914 | 0.894 | 0.899 | 0.911 | 0.919 | 0.914 | |
| 9 | 78348 | 0.945 | 0.937 | 0.948 | 0.941 | 0.944 | 0.930 | |
| 10 | 78351 | 0.960 | 0.957 | 0.957 | 0.960 | 0.960 | ||
| 0.963 | 0.956 | 0.953 | 0.954 | 0.962 | 0.953 | |||
AUC measure for the prediction of new bioactive molecules with DS2 (homogeneous).
| Class | Activity | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT |
|---|---|---|---|---|---|---|---|---|
| 1 | 07707 | 0.983 | 0.980 | 0.983 | 0.978 | 0.978 | 0.983 | |
| 2 | 07708 | 0.984 | 0.983 | 0.981 | 0.987 | 0.974 | 0.974 | |
| 3 | 31420 | 0.996 | 0.996 | 0.995 | ||||
| 4 | 42710 | 0.991 | 0.986 | 0.986 | 0.991 | 0.986 | 0.982 | |
| 5 | 64100 | 0.981 | 0.984 | 0.984 | 0.985 | 0.983 | 0.984 | |
| 6 | 64200 | 0.864 | 0.884 | 0.855 | 0.877 | 0.859 | 0.877 | |
| 7 | 64220 | 0.996 | 0.997 | 0.997 | 0.997 | |||
| 8 | 64300 | 0.984 | 0.976 | 0.976 | 0.984 | 0.984 | ||
| 9 | 65000 | 0.998 | 0.998 | 0.997 | 0.997 | 0.998 | ||
| 10 | 75755 | 0.997 | 0.997 | |||||
| 0.977 | 0.978 | 0.975 | 0.977 | 0.978 | 0.979 | |||
Rankings of existing best performing classifier (LSVM) and AdaBoost ensemble classifiers, based on Kendall’s W test results using the MDDR dataset by sensitivity measure.
| Datasets | W | χ 2 | p | Ranks | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.506 | 33.387 | 0.000 | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT | ||
| 4.45 | 4.18 | 2.36 | 2.68 | 5.86 | 2.55 | ||||||
| 0.086 | 5.176 | 0.521 | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT | ||
| 4.4 | 4.1 | 3.1 | 4.1 | 3.25 | 4.4 | ||||||
| 0.397 | 23.827 | 0.001 | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT | ||
| 5.10 | 3.70 | 2.55 | 2.85 | 5.35 | 2.75 | ||||||
Rankings of existing best performing classifier (LSVM) and AdaBoost ensemble classifiers, based on Kendall’s W test results using the MDDR dataset by AUC measure.
| Datasets | W | χ 2 | p | Ranks | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.600 | 39.573 | 0.000 | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT | ||
| 4.50 | 5.91 | 4.41 | 2.18 | 2.27 | 2.73 | ||||||
| 0.122 | 7.293 | 0.295 | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT | ||
| 4.35 | 3.90 | 2.95 | 4.30 | 3.15 | 4.55 | ||||||
| 0.486 | 29.133 | 0.000 | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT | ||
| 5.25 | 3.50 | 2.55 | 2.75 | 5.50 | 2.60 | ||||||
Rankings of existing best performing classifier (LSVM) and AdaBoost ensemble classifiers, based on Kendall’s W test results using the MDDR dataset by specificity measure.
| Datasets | W | χ2 | p | Ranks | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.413 | 27.287 | 0.000 | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT | ||
| 4.64 | 4.45 | 2.27 | 2.73 | 5.36 | 3.09 | ||||||
| 0.043 | 2.562 | 0.862 | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT | ||
| 3.70 | 4.30 | 3.90 | 3.80 | 3.70 | 3.95 | ||||||
| 0.432 | 25.895 | 0.000 | LSVM | Ada_Bag | Ada_Jrip | Ada_J48 | Ada_PART | Ada_RF | Ada_RT | ||
| 4.05 | 5.65 | 4.50 | 2.55 | 2.55 | 3.00 | ||||||
Fig 1Accuracy rates for the prediction of new bioactive molecules with MDDR (DS1, DS2 and DS3).
Activity class for dataset DS2.
| Activity Index | Activity Class | Activity Molecules | Pairwise Similarity (Mean) |
|---|---|---|---|
| 07707 | Adenosine (A1) agonists | 207 | 0.424 |
| 07708 | Adenosine (A2) agonists | 156 | 0.484 |
| 31420 | Renin inhibitors | 1130 | 0.584 |
| 42710 | Monocyclic β-lactams | 111 | 0.596 |
| 64100 | Cephalosporins | 1301 | 0.512 |
| 64200 | Carbacephems | 158 | 0.503 |
| 64220 | Carbapenems | 1051 | 0.414 |
| 64300 | Penicillin | 126 | 0.444 |
| 65000 | Antibiotic, macrolide | 388 | 0.673 |
| 75755 | Vitamin D analogous | 455 | 0.569 |