| Literature DB >> 35956770 |
Cai Zhong1, Jiali Ai1, Yaxin Yang1, Fangyuan Ma1, Wei Sun1.
Abstract
Virtual screening can significantly save experimental time and costs for early drug discovery. Drug multi-classification can speed up virtual screening and quickly predict the most likely class for a drug. In this study, 1019 drug molecules with actual therapeutic effects are collected from multiple databases and documents, and molecular sets are grouped according to therapeutic effect and mechanism of action. Molecular descriptors and molecular fingerprints are obtained through SMILES to quantify molecular structures. After using the Kennard-Stone method to divide the data set, a better combination can be obtained by comparing the combined results of five classification algorithms and a fusion method. Furthermore, for a specific data set, the model with the best performance is used to predict the validation data set. The test set shows that prediction accuracy can reach 0.862 and kappa coefficient can reach 0.808. The highest classification accuracy of the validation set is 0.873. The more reliable molecular set has been found, which could be used to predict potential attributes of unknown drug compounds and even to discover new use for old drugs. We hope this research can provide a reference for virtual screening of multiple classes of drugs at the same time in the future.Entities:
Keywords: Dempster–Shafer theory; Kennard–Stone division; molecular descriptor; molecular fingerprint
Mesh:
Year: 2022 PMID: 35956770 PMCID: PMC9369618 DOI: 10.3390/molecules27154807
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.927
Performance based on combinatorial descriptor data. The results are displayed as the mean plus or minus the standard deviation. The same is true for Table A3, Table A4, Table A5 and Table A6.
| Indicators | Algorithms | S4 | S3 | S2 | S1 |
|---|---|---|---|---|---|
|
| RF | 0.824 ± 0.03 | 0.834 ± 0.023 | 0.828 ± 0.026 | 0.832 ± 0.027 |
| SVM | 0.849 ± 0.025 | 0.856 ± 0.022 | 0.852 ± 0.021 | 0.857 ± 0.024 | |
| LR | 0.808 ± 0.031 | 0.811 ± 0.025 | 0.808 ± 0.025 | 0.811 ± 0.027 | |
| LDA | 0.693 ± 0.029 | 0.736 ± 0.03 | 0.751 ± 0.027 | 0.758 ± 0.029 | |
| ABT | 0.803 ± 0.03 | 0.812 ± 0.03 | 0.807 ± 0.027 | 0.808 ± 0.027 | |
| DS12 | 0.854 ± 0.025 | 0.861 ± 0.021 | 0.858 ± 0.023 | 0.862 ± 0.025 | |
| DS13 | 0.833 ± 0.025 | 0.843 ± 0.023 | 0.837 ± 0.023 | 0.842 ± 0.023 | |
| DS23 | 0.843 ± 0.026 | 0.851 ± 0.021 | 0.845 ± 0.025 | 0.847 ± 0.024 | |
| DS123 | 0.851 ± 0.024 | 0.858 ± 0.021 | 0.853 ± 0.022 | 0.858 ± 0.024 | |
| DS1234 | 0.794 ± 0.028 | 0.808 ± 0.024 | 0.813 ± 0.024 | 0.823 ± 0.027 | |
| DS1235 | 0.851 ± 0.024 | 0.859 ± 0.021 | 0.853 ± 0.022 | 0.858 ± 0.024 | |
| DS12345 | 0.834 ± 0.024 | 0.84 ± 0.022 | 0.84 ± 0.022 | 0.847 ± 0.023 | |
|
| RF | 0.761 ± 0.04 | 0.772 ± 0.03 | 0.761 ± 0.035 | 0.763 ± 0.037 |
| SVM | 0.797 ± 0.032 | 0.804 ± 0.029 | 0.798 ± 0.029 | 0.802 ± 0.033 | |
| LR | 0.743 ± 0.04 | 0.744 ± 0.032 | 0.739 ± 0.033 | 0.74 ± 0.036 | |
| LDA | 0.599 ± 0.037 | 0.649 ± 0.038 | 0.667 ± 0.035 | 0.673 ± 0.039 | |
| ABT | 0.736 ± 0.039 | 0.746 ± 0.039 | 0.737 ± 0.035 | 0.736 ± 0.036 | |
| DS12 | 0.803 ± 0.033 | 0.81 ± 0.028 | 0.805 ± 0.031 | 0.808 ± 0.035 | |
| DS13 | 0.775 ± 0.032 | 0.786 ± 0.03 | 0.776 ± 0.032 | 0.78 ± 0.032 | |
| DS23 | 0.789 ± 0.033 | 0.797 ± 0.028 | 0.789 ± 0.033 | 0.789 ± 0.033 | |
| DS123 | 0.799 ± 0.032 | 0.807 ± 0.028 | 0.799 ± 0.03 | 0.802 ± 0.033 | |
| DS1234 | 0.725 ± 0.035 | 0.741 ± 0.031 | 0.746 ± 0.032 | 0.756 ± 0.036 | |
| DS1235 | 0.799 ± 0.031 | 0.808 ± 0.029 | 0.799 ± 0.03 | 0.803 ± 0.033 | |
| DS12345 | 0.777 ± 0.031 | 0.782 ± 0.029 | 0.781 ± 0.029 | 0.787 ± 0.032 |
Performance based on Mordred descriptor data.
| Indicators | Algorithms | S4 | S3 | S2 | S1 |
|---|---|---|---|---|---|
|
| RF | 0.807 ± 0.027 | 0.818 ± 0.025 | 0.816 ± 0.03 | 0.823 ± 0.025 |
| SVM | 0.841 ± 0.024 | 0.852 ± 0.023 | 0.847 ± 0.025 | 0.853 ± 0.021 | |
| LR | 0.811 ± 0.024 | 0.82 ± 0.024 | 0.811 ± 0.026 | 0.818 ± 0.024 | |
| LDA | 0.555 ± 0.045 | 0.609 ± 0.043 | 0.637 ± 0.033 | 0.656 ± 0.031 | |
| ABT | 0.775 ± 0.032 | 0.781 ± 0.03 | 0.777 ± 0.028 | 0.784 ± 0.029 | |
| DS12 | 0.843 ± 0.025 | 0.855 ± 0.024 | 0.85 ± 0.025 | 0.857 ± 0.023 | |
| DS13 | 0.831 ± 0.024 | 0.841 ± 0.025 | 0.834 ± 0.026 | 0.845 ± 0.024 | |
| DS23 | 0.834 ± 0.021 | 0.844 ± 0.023 | 0.839 ± 0.025 | 0.846 ± 0.022 | |
| DS123 | 0.843 ± 0.023 | 0.851 ± 0.024 | 0.845 ± 0.027 | 0.855 ± 0.022 | |
| DS1234 | 0.769 ± 0.031 | 0.779 ± 0.033 | 0.781 ± 0.031 | 0.791 ± 0.028 | |
| DS1235 | 0.844 ± 0.023 | 0.851 ± 0.023 | 0.846 ± 0.027 | 0.855 ± 0.022 | |
| DS12345 | 0.828 ± 0.025 | 0.833 ± 0.027 | 0.83 ± 0.027 | 0.841 ± 0.024 | |
|
| RF | 0.74 ± 0.035 | 0.752 ± 0.033 | 0.747 ± 0.039 | 0.754 ± 0.034 |
| SVM | 0.787 ± 0.031 | 0.8 ± 0.03 | 0.791 ± 0.033 | 0.796 ± 0.029 | |
| LR | 0.748 ± 0.031 | 0.757 ± 0.032 | 0.744 ± 0.033 | 0.751 ± 0.031 | |
| LDA | 0.435 ± 0.054 | 0.495 ± 0.052 | 0.527 ± 0.039 | 0.548 ± 0.039 | |
| ABT | 0.701 ± 0.041 | 0.706 ± 0.04 | 0.699 ± 0.036 | 0.705 ± 0.038 | |
| DS12 | 0.789 ± 0.033 | 0.804 ± 0.031 | 0.794 ± 0.033 | 0.801 ± 0.03 | |
| DS13 | 0.774 ± 0.031 | 0.784 ± 0.033 | 0.774 ± 0.034 | 0.786 ± 0.032 | |
| DS23 | 0.779 ± 0.027 | 0.789 ± 0.031 | 0.781 ± 0.033 | 0.789 ± 0.03 | |
| DS123 | 0.79 ± 0.03 | 0.798 ± 0.031 | 0.789 ± 0.035 | 0.799 ± 0.029 | |
| DS1234 | 0.694 ± 0.039 | 0.705 ± 0.043 | 0.705 ± 0.04 | 0.716 ± 0.036 | |
| DS1235 | 0.791 ± 0.03 | 0.798 ± 0.031 | 0.789 ± 0.036 | 0.8 ± 0.029 | |
| DS12345 | 0.77 ± 0.033 | 0.775 ± 0.037 | 0.77 ± 0.035 | 0.781 ± 0.033 |
Performance based on MACCS fingerprint data.
| Indicators | Algorithms | S4 | S3 | S2 | S1 |
|---|---|---|---|---|---|
|
| RF | 0.812 ± 0.026 | 0.816 ± 0.027 | 0.81 ± 0.023 | 0.819 ± 0.022 |
| SVM | 0.807 ± 0.027 | 0.808 ± 0.025 | 0.806 ± 0.026 | 0.815 ± 0.023 | |
| LR | 0.716 ± 0.031 | 0.722 ± 0.029 | 0.718 ± 0.026 | 0.727 ± 0.027 | |
| LDA | 0.711 ± 0.033 | 0.719 ± 0.029 | 0.715 ± 0.025 | 0.727 ± 0.027 | |
| ABT | 0.691 ± 0.034 | 0.689 ± 0.034 | 0.673 ± 0.027 | 0.686 ± 0.028 | |
| DS12 | 0.819 ± 0.029 | 0.819 ± 0.026 | 0.819 ± 0.023 | 0.822 ± 0.022 | |
| DS13 | 0.78 ± 0.029 | 0.787 ± 0.027 | 0.785 ± 0.026 | 0.793 ± 0.025 | |
| DS23 | 0.791 ± 0.032 | 0.791 ± 0.027 | 0.789 ± 0.027 | 0.8 ± 0.024 | |
| DS123 | 0.806 ± 0.03 | 0.807 ± 0.027 | 0.806 ± 0.027 | 0.812 ± 0.022 | |
| DS1234 | 0.774 ± 0.03 | 0.78 ± 0.025 | 0.78 ± 0.023 | 0.79 ± 0.025 | |
| DS1235 | 0.806 ± 0.029 | 0.806 ± 0.027 | 0.805 ± 0.027 | 0.811 ± 0.022 | |
| DS12345 | 0.774 ± 0.03 | 0.78 ± 0.025 | 0.78 ± 0.024 | 0.79 ± 0.025 | |
|
| RF | 0.75 ± 0.035 | 0.753 ± 0.036 | 0.742 ± 0.031 | 0.751 ± 0.029 |
| SVM | 0.745 ± 0.035 | 0.745 ± 0.032 | 0.739 ± 0.034 | 0.749 ± 0.03 | |
| LR | 0.627 ± 0.04 | 0.632 ± 0.037 | 0.622 ± 0.034 | 0.631 ± 0.035 | |
| LDA | 0.623 ± 0.041 | 0.63 ± 0.037 | 0.621 ± 0.033 | 0.633 ± 0.035 | |
| ABT | 0.596 ± 0.042 | 0.59 ± 0.042 | 0.566 ± 0.035 | 0.579 ± 0.037 | |
| DS12 | 0.761 ± 0.038 | 0.759 ± 0.033 | 0.755 ± 0.031 | 0.758 ± 0.028 | |
| DS13 | 0.709 ± 0.038 | 0.716 ± 0.035 | 0.709 ± 0.034 | 0.717 ± 0.032 | |
| DS23 | 0.725 ± 0.041 | 0.723 ± 0.035 | 0.717 ± 0.036 | 0.728 ± 0.031 | |
| DS123 | 0.744 ± 0.038 | 0.742 ± 0.035 | 0.737 ± 0.035 | 0.743 ± 0.029 | |
| DS1234 | 0.703 ± 0.039 | 0.708 ± 0.032 | 0.703 ± 0.031 | 0.714 ± 0.032 | |
| DS1235 | 0.744 ± 0.038 | 0.742 ± 0.035 | 0.737 ± 0.035 | 0.743 ± 0.029 | |
| DS12345 | 0.703 ± 0.039 | 0.708 ± 0.032 | 0.703 ± 0.031 | 0.714 ± 0.032 |
Performance based on topological fingerprint data.
| Indicators | Algorithms | S4 | S3 | S2 | S1 |
|---|---|---|---|---|---|
|
| RF | 0.813 ± 0.029 | 0.821 ± 0.029 | 0.819 ± 0.024 | 0.821 ± 0.029 |
| SVM | 0.837 ± 0.026 | 0.835 ± 0.029 | 0.827 ± 0.028 | 0.828 ± 0.028 | |
| LR | 0.796 ± 0.027 | 0.804 ± 0.028 | 0.795 ± 0.025 | 0.797 ± 0.024 | |
| LDA | 0.599 ± 0.064 | 0.588 ± 0.065 | 0.571 ± 0.059 | 0.577 ± 0.059 | |
| ABT | 0.759 ± 0.034 | 0.761 ± 0.031 | 0.753 ± 0.03 | 0.755 ± 0.032 | |
| DS12 | 0.832 ± 0.027 | 0.834 ± 0.026 | 0.833 ± 0.025 | 0.834 ± 0.024 | |
| DS13 | 0.816 ± 0.027 | 0.825 ± 0.028 | 0.825 ± 0.025 | 0.827 ± 0.025 | |
| DS23 | 0.828 ± 0.029 | 0.825 ± 0.027 | 0.818 ± 0.026 | 0.822 ± 0.025 | |
| DS123 | 0.832 ± 0.026 | 0.829 ± 0.027 | 0.828 ± 0.025 | 0.829 ± 0.025 | |
| DS1234 | 0.774 ± 0.031 | 0.771 ± 0.032 | 0.751 ± 0.029 | 0.752 ± 0.029 | |
| DS1235 | 0.832 ± 0.027 | 0.829 ± 0.028 | 0.828 ± 0.025 | 0.829 ± 0.025 | |
| DS12345 | 0.794 ± 0.029 | 0.796 ± 0.03 | 0.777 ± 0.026 | 0.779 ± 0.027 | |
|
| RF | 0.753 ± 0.037 | 0.761 ± 0.038 | 0.756 ± 0.033 | 0.759 ± 0.039 |
| SVM | 0.786 ± 0.034 | 0.781 ± 0.036 | 0.77 ± 0.037 | 0.771 ± 0.036 | |
| LR | 0.736 ± 0.034 | 0.745 ± 0.035 | 0.731 ± 0.033 | 0.735 ± 0.032 | |
| LDA | 0.5 ± 0.07 | 0.483 ± 0.072 | 0.459 ± 0.066 | 0.464 ± 0.067 | |
| ABT | 0.687 ± 0.043 | 0.686 ± 0.039 | 0.676 ± 0.039 | 0.678 ± 0.04 | |
| DS12 | 0.78 ± 0.034 | 0.781 ± 0.034 | 0.778 ± 0.034 | 0.779 ± 0.032 | |
| DS13 | 0.76 ± 0.034 | 0.769 ± 0.036 | 0.768 ± 0.033 | 0.77 ± 0.032 | |
| DS23 | 0.776 ± 0.036 | 0.77 ± 0.034 | 0.76 ± 0.035 | 0.765 ± 0.032 | |
| DS123 | 0.78 ± 0.034 | 0.775 ± 0.034 | 0.772 ± 0.033 | 0.774 ± 0.032 | |
| DS1234 | 0.707 ± 0.039 | 0.7 ± 0.04 | 0.673 ± 0.038 | 0.674 ± 0.038 | |
| DS1235 | 0.78 ± 0.034 | 0.775 ± 0.035 | 0.772 ± 0.033 | 0.774 ± 0.032 | |
| DS12345 | 0.732 ± 0.036 | 0.733 ± 0.037 | 0.706 ± 0.035 | 0.709 ± 0.036 |
Performance based on Morgan fingerprint data.
| Indicators | Algorithms | S4 | S3 | S2 | S1 |
|---|---|---|---|---|---|
|
| RF | 0.781 ± 0.028 | 0.767 ± 0.026 | 0.765 ± 0.028 | 0.772 ± 0.029 |
| SVM | 0.775 ± 0.034 | 0.766 ± 0.029 | 0.763 ± 0.029 | 0.772 ± 0.031 | |
| LR | 0.753 ± 0.03 | 0.73 ± 0.031 | 0.725 ± 0.026 | 0.732 ± 0.031 | |
| LDA | 0.586 ± 0.051 | 0.563 ± 0.038 | 0.505 ± 0.041 | 0.513 ± 0.042 | |
| ABT | 0.652 ± 0.035 | 0.645 ± 0.036 | 0.642 ± 0.034 | 0.646 ± 0.033 | |
| DS12 | 0.788 ± 0.029 | 0.774 ± 0.028 | 0.773 ± 0.028 | 0.78 ± 0.028 | |
| DS13 | 0.785 ± 0.026 | 0.767 ± 0.027 | 0.763 ± 0.026 | 0.773 ± 0.03 | |
| DS23 | 0.771 ± 0.033 | 0.76 ± 0.028 | 0.757 ± 0.029 | 0.764 ± 0.029 | |
| DS123 | 0.786 ± 0.03 | 0.771 ± 0.027 | 0.771 ± 0.027 | 0.778 ± 0.029 | |
| DS1234 | 0.696 ± 0.034 | 0.681 ± 0.032 | 0.66 ± 0.028 | 0.666 ± 0.035 | |
| DS1235 | 0.785 ± 0.03 | 0.77 ± 0.026 | 0.771 ± 0.028 | 0.777 ± 0.028 | |
| DS12345 | 0.725 ± 0.033 | 0.714 ± 0.032 | 0.699 ± 0.027 | 0.705 ± 0.03 | |
|
| RF | 0.705 ± 0.036 | 0.683 ± 0.034 | 0.672 ± 0.038 | 0.676 ± 0.04 |
| SVM | 0.698 ± 0.044 | 0.683 ± 0.037 | 0.672 ± 0.038 | 0.679 ± 0.041 | |
| LR | 0.671 ± 0.039 | 0.636 ± 0.04 | 0.622 ± 0.036 | 0.623 ± 0.044 | |
| LDA | 0.466 ± 0.061 | 0.433 ± 0.047 | 0.363 ± 0.046 | 0.371 ± 0.049 | |
| ABT | 0.545 ± 0.044 | 0.531 ± 0.043 | 0.522 ± 0.044 | 0.522 ± 0.044 | |
| DS12 | 0.716 ± 0.038 | 0.695 ± 0.036 | 0.687 ± 0.037 | 0.69 ± 0.038 | |
| DS13 | 0.711 ± 0.033 | 0.682 ± 0.036 | 0.67 ± 0.035 | 0.677 ± 0.041 | |
| DS23 | 0.697 ± 0.043 | 0.679 ± 0.036 | 0.668 ± 0.038 | 0.673 ± 0.04 | |
| DS123 | 0.713 ± 0.039 | 0.691 ± 0.035 | 0.683 ± 0.037 | 0.688 ± 0.038 | |
| DS1234 | 0.599 ± 0.043 | 0.576 ± 0.041 | 0.542 ± 0.034 | 0.547 ± 0.046 | |
| DS1235 | 0.713 ± 0.039 | 0.69 ± 0.034 | 0.684 ± 0.037 | 0.687 ± 0.038 | |
| DS12345 | 0.635 ± 0.041 | 0.618 ± 0.041 | 0.591 ± 0.033 | 0.596 ± 0.04 |
Figure 1Classification results obtained by RF. (a) Results from Mordred descriptor set; (b) Results from Morgan fingerprint set.
Figure 2Classification results based on molecular set S4. (a) Q values; (b) Kappa coefficient.
Figure 3Six types of descriptor information included in five data sets.
Figure 4Classification results based on molecular set S4 and combinatorial descriptors.
The number of correct predictions by fusion method in the external validation set.
| Descriptor Sets | S4 | S3 | S2 | S1 |
|---|---|---|---|---|
| Combinatorial descriptor | 73 | 76 | 73 | 70 |
| Mordred descriptor | 74 | 74 | 75 | 72 |
| MACCS fingerprint | 70 | 74 | 74 | 71 |
| Topological fingerprint | 74 | 75 | 74 | 75 |
| Morgan fingerprint | 73 | 69 | 75 | 70 |
Prediction results of “F1—DS12 on S3”model for external validation set.
| Drugs | True Categories | Predicted Categories |
|---|---|---|
| Oliceridine | analgesics | antineoplastic drugs |
| Cyproheptadine | analgesics | analgesics |
| Methylergometrine | analgesics | analgesics |
| Ubrogepant | analgesics | antineoplastic drugs |
| Lasmiditan | analgesics | antineoplastic drugs |
| Talaporfin | antineoplastic drugs | antineoplastic drugs |
| Avapritinib | antineoplastic drugs | antineoplastic drugs |
| Tazemetostat | antineoplastic drugs | antineoplastic drugs |
| Capmatinib | antineoplastic drugs | antineoplastic drugs |
| Lurbinectedin | antineoplastic drugs | antineoplastic drugs |
| Abiraterone acetate | antineoplastic drugs | antineoplastic drugs |
| Sotorasib | antineoplastic drugs | antineoplastic drugs |
| Tamoxifen | antineoplastic drugs | analgesics |
| Fulvestrant | antineoplastic drugs | antineoplastic drugs |
| Anastrozole | antineoplastic drugs | antiviral drugs |
| Letrozole | antineoplastic drugs | antifungals |
| Exemestane | antineoplastic drugs | antineoplastic drugs |
| Zanubrutinib | antineoplastic drugs | antineoplastic drugs |
| Apalutamide | antineoplastic drugs | antineoplastic drugs |
| Darolutamide | antineoplastic drugs | antineoplastic drugs |
| Glasdegib | antineoplastic drugs | antineoplastic drugs |
| Duvelisib | antineoplastic drugs | antineoplastic drugs |
| Tofacitinib | antineoplastic drugs | antineoplastic drugs |
| Enzalutamide | antineoplastic drugs | antineoplastic drugs |
| Berzosertib | antineoplastic drugs | antineoplastic drugs |
| Mobocertinib | antineoplastic drugs | antineoplastic drugs |
| Vebicorvir | antiviral drugs | antineoplastic drugs |
| Rifampicin | antineoplastic drugs, antibacterial drugs | antibacterial drugs |
| Cytarabine | antineoplastic drugs, antiviral drugs | antineoplastic drugs |
| Seliciclib | antineoplastic drugs, antiviral drugs | antineoplastic drugs |
| Celecoxib | analgesics, antineoplastic drugs | antidiabetic drugs |
| Pomalidomide | analgesics, antineoplastic drugs | analgesics |
| Acetylcysteine | analgesics, antineoplastic drugs, antiviral drugs | antineoplastic drugs |
| Salicylic acid | analgesics, antineoplastic drugs, antifungals | analgesics |
| Suxibuzone | analgesics, antineoplastic drugs | analgesics |
| Promethazine | analgesics, antiviral drugs | analgesics |
The correct prediction for each class of drugs by several models. In first column, F1–F5 are different descriptor data, and detailed information is shown in Table A1 of the Appendix A. The seven classes of drugs are represented by C1, C2, C3, C4, C5, C6, and C7 in order. The same is true for Table 4.
| Models | Single-Role | Multi-role/10 | Total/87 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| C1/7 | C2/21 | C3/26 | C4/10 | C5/5 | C6/6 | C7/2 | Total/77 | |||
|
| 4 | 18 | 26 | 8 | 5 | 6 | 0 | 67 | 9 | 76 |
| F1—DS12 on S4 | 4 | 16 | 25 | 8 | 4 | 5 | 2 | 64 | 7 | 71 |
|
| 4 | 16 | 26 | 6 | 5 | 6 | 2 | 65 | 10 | 75 |
| F2—DS12 on S3 | 4 | 17 | 26 | 7 | 4 | 3 | 1 | 62 | 10 | 72 |
| F2—DS12 on S4 | 4 | 14 | 25 | 8 | 5 | 5 | 2 | 63 | 10 | 73 |
| F3—SVM on S3 | 4 | 15 | 26 | 7 | 5 | 5 | 2 | 64 | 10 | 74 |
| F3—DS12 on S4 | 3 | 15 | 26 | 5 | 5 | 4 | 2 | 60 | 8 | 68 |
|
| 5 | 16 | 25 | 6 | 5 | 6 | 2 | 65 | 10 | 75 |
|
| 5 | 17 | 25 | 5 | 5 | 5 | 2 | 64 | 10 | 74 |
| F4—DS12 on S4 | 4 | 16 | 26 | 5 | 5 | 5 | 2 | 63 | 10 | 73 |
| F5—SVM on S2 | 6 | 13 | 26 | 4 | 3 | 4 | 0 | 56 | 9 | 65 |
| F5—DS12 on S4 | 5 | 15 | 26 | 7 | 3 | 6 | 2 | 64 | 9 | 73 |
The predicted probabilities for the four drugs.
| Drugs | Models | Classes | ||||||
|---|---|---|---|---|---|---|---|---|
| C1 | C2 | C3 | C4 | C5 | C6 | C7 | ||
| Rifampicin | F1—DS12 on S3 | 0 | 0.005 | 0.994 | 0 | 0 | 0 | 0 |
| F2—SVM on S2 | 0.028 | 0.079 | 0.833 | 0.012 | 0.023 | 0.008 | 0.017 | |
| F4—DS123 on S3 | 0.002 | 0.005 | 0.991 | 0.001 | 0 | 0 | 0 | |
| F4—DS12 on S3 | 0 | 0.001 | 0.998 | 0 | 0 | 0 | 0 | |
| Celecoxib | F1—DS12 on S3 | 0.395 | 0.1 | 0.003 | 0.046 | 0.008 | 0.447 | 0 |
| F2—SVM on S2 | 0.387 | 0.08 | 0.15 | 0.117 | 0.025 | 0.236 | 0.005 | |
| F4—DS123 on S3 | 0.535 | 0.321 | 0.053 | 0.038 | 0.009 | 0.041 | 0.002 | |
| F4—DS12 on S3 | 0.609 | 0.318 | 0.021 | 0.012 | 0.004 | 0.035 | 0.001 | |
Figure 5The acquisition of molecular sets. S2 is obtained by checking drug therapeutic mechanism. S3 is obtained by checking other potential therapeutic effects. S4 is obtained by checking applicable objects and experimental stage.
Drug molecules included in four molecular sets.
| Drug Classes | Molecular Set S1 | Molecular Set S2 | Molecular Set S3 | Molecular Set S4 |
|---|---|---|---|---|
| Analgesics | 228 | 209 | 183 | 164 |
| Antineoplastic | 211 | 209 | 189 | 165 |
| Antibacterial drugs | 296 | 294 | 285 | 261 |
| Antiviral drugs | 108 | 108 | 102 | 99 |
| Antifungals | 64 | 64 | 57 | 54 |
| Antidiabetic drugs | 70 | 70 | 66 | 63 |
| Antiarrhythmics | 42 | 42 | 39 | 38 |
| Total | 1019 | 996 | 921 | 844 |
Different classes of molecule sets, descriptor sets, and classification methods.
| Molecule Sets | Descriptor Sets | Classification Methods |
|---|---|---|
| S1 with 1019 molecules | Combinatorial descriptors (F1) | RF (1) |
| S2 with 996 molecules | Mordred descriptors (F2) | SVM (2) |
| S3 with 921 molecules | MACCS fingerprints (F3) | LR (3) |
| S4 with 844 molecules | Topological fingerprints (F4) | LDA (4) |
| Morgan fingerprints (F5) | ABT (5) | |
| Fusion of RF and SVM (DS12) | ||
| Fusion of RF and LR (DS13) | ||
| Fusion of SVM and LR (DS23) | ||
| Fusion of RF, SVM, and LR (DS123) | ||
| Fusion of RF, SVM, LR, and LDA (DS1234) | ||
| Fusion of RF, SVM, LR, and ABT (DS1235) | ||
| Fusion of five single classifiers (DS12345) |
Figure 6The whole study process, where (a) is the flow for comparing classification results based on different data sets and (b) is the flow for further validation.