| Literature DB >> 31979300 |
Benjamin Bajželj1,2, Viktor Drgan1.
Abstract
Drug-induced liver injury is a major concern in the drug development process. Expensive and time-consuming in vitro and in vivo studies do not reflect the complexity of the phenomenon. Complementary to wet lab methods are in silico approaches, which present a cost-efficient method for toxicity prediction. The aim of our study was to explore the capabilities of counter-propagation artificial neural networks (CPANNs) for the classification of an imbalanced dataset related to idiosyncratic drug-induced liver injury and to develop a model for prediction of the hepatotoxic potential of drugs. Genetic algorithm optimization of CPANN models was used to build models for the classification of drugs into hepatotoxic and non-hepatotoxic class using molecular descriptors. For the classification of an imbalanced dataset, we modified the classical CPANN training algorithm by integrating random subsampling into the training procedure of CPANN to improve the classification ability of CPANN. According to the number of models accepted by internal validation and according to the prediction statistics on the external set, we concluded that using an imbalanced set with balanced subsampling in each learning epoch is a better approach compared to using a fixed balanced set in the case of the counter-propagation artificial neural network learning methodology.Entities:
Keywords: QSAR; counter-propagation artificial neural networks; genetic algorithm; hepatotoxicity; imbalanced dataset
Year: 2020 PMID: 31979300 PMCID: PMC7037161 DOI: 10.3390/molecules25030481
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Venn diagram of selected compounds. Compounds in our dataset that we classified either as DILI-positive or DILI-negative were extracted from different literature sources. Only LiverTox contained compounds that were not present in any of the other sources.
Distribution of compounds into classes for the datasets used.
| TR | TE1 | TE2 | VA | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Ninit | Hepat. | Non-H. | Hepat. | Non-H. | Hepat. | Non-H. | Hepat. | Non-H. | |
| 182 | a | 108 | 296 | 20 | 20 | 20 | 20 | 20 | 20 |
| b | 108 | 108 | 20 | 83 | 20 | 83 | 20 | 82 | |
| 98 | a | 108 | 296 | 20 | 20 | 20 | 20 | 20 | 20 |
| b | 108 | 108 | 20 | 83 | 20 | 83 | 20 | 82 | |
| 50 | a | 108 | 296 | 20 | 20 | 20 | 20 | 20 | 20 |
| b | 108 | 108 | 20 | 83 | 20 | 83 | 20 | 82 | |
TR—training set; TE1—the first test set; TE2—the second test set; VA—external validation set. Ninit—Number of initial descriptors in the dataset. Hepat—Number of compounds in hepatotoxic class. Non-H—Number of compounds in non-hepatotoxic class. a— Imbalanced training set used during optimization. b—Manually balanced training set used during optimization.
Figure 2Selection of validation set compounds for the first approach using the Kohonen top-map.
Figure 3Selection of validation set compounds for the second approach using the Kohonen top-map.
Average values of sensitivity and specificity obtained for models optimized using optimization criterion 1. Average values and standard deviations are given for 100 models built using selected descriptors and different permutations of objects in the training set.
| OC1 | TR | TE1 | TE2 | VA | |||||
|---|---|---|---|---|---|---|---|---|---|
| Nin. | model | Sens. | Spec. | Sens. | Spec. | Sens. | Spec. | Sens. | Spec. |
| 182 | 1 a | 0.81 ± 0.02 | 0.87 ± 0.02 | 0.82 ± 0.07 | 0.84 ± 0.03 | 0.74 ± 0.07 | 0.70 ± 0.04 | 0.56 ± 0.06 | 0.68 ± 0.04 |
| 2 b | 0.92 ± 0.02 | 0.79 ± 0.02 | 0.90 ± 0.06 | 0.92 ± 0.05 | 0.76 ± 0.08 | 0.78 ± 0.09 | 0.64 ± 0.08 | 0.72 ± 0.08 | |
| 3 b | 0.91 ± 0.02 | 0.79 ± 0.02 | 0.86 ± 0.06 | 0.90 ± 0.06 | 0.72 ± 0.09 | 0.78 ± 0.09 | 0.63 ± 0.08 | 0.73 ± 0.09 | |
| 4 b | 0.92 ± 0.02 | 0.78 ± 0.02 | 0.88 ± 0.07 | 0.90 ± 0.07 | 0.71 ± 0.07 | 0.77 ± 0.08 | 0.65 ± 0.07 | 0.75 ± 0.08 | |
| 5 b | 0.86 ± 0.03 | 0.76 ± 0.03 | 0.85 ± 0.07 | 0.86 ± 0.07 | 0.72 ± 0.08 | 0.71 ± 0.08 | 0.57 ± 0.08 | 0.78 ± 0.08 | |
| 6 b | 0.92 ± 0.02 | 0.81 ± 0.02 | 0.91 ± 0.06 | 0.92 ± 0.06 | 0.72 ± 0.07 | 0.74 ± 0.09 | 0.70 ± 0.07 | 0.66 ± 0.07 | |
| 7 b | 0.88 ± 0.03 | 0.77 ± 0.03 | 0.77 ± 0.08 | 0.83 ± 0.08 | 0.71 ± 0.07 | 0.71 ± 0.07 | 0.73 ± 0.07 | 0.77 ± 0.09 | |
| 8 b | 0.88 ± 0.02 | 0.77 ± 0.03 | 0.90 ± 0.04 | 0.88 ± 0.06 | 0.77 ± 0.08 | 0.70 ± 0.08 | 0.64 ± 0.07 | 0.68 ± 0.09 | |
| 98 | 1 b | 0.89 ± 0.02 | 0.76 ± 0.02 | 0.91 ± 0.06 | 0.90 ± 0.05 | 0.72 ± 0.08 | 0.70 ± 0.09 | 0.74 ± 0.08 | 0.48 ± 0.10 |
| 2 b | 0.89 ± 0.02 | 0.77 ± 0.02 | 0.87 ± 0.05 | 0.92 ± 0.04 | 0.77 ± 0.05 | 0.70 ± 0.06 | 0.60 ± 0.06 | 0.57 ± 0.08 | |
| 3 b | 0.89 ± 0.02 | 0.77 ± 0.02 | 0.86 ± 0.05 | 0.91 ± 0.04 | 0.75 ± 0.06 | 0.71 ± 0.07 | 0.60 ± 0.06 | 0.58 ± 0.09 | |
| 4 b | 0.90 ± 0.02 | 0.77 ± 0.02 | 0.86 ± 0.05 | 0.91 ± 0.06 | 0.76 ± 0.07 | 0.71 ± 0.06 | 0.68 ± 0.07 | 0.63 ± 0.08 | |
| 5 b | 0.94 ± 0.02 | 0.82 ± 0.02 | 0.84 ± 0.07 | 0.92 ± 0.05 | 0.70 ± 0.06 | 0.71 ± 0.07 | 0.69 ± 0.06 | 0.65 ± 0.07 | |
| 6 b | 0.85 ± 0.02 | 0.71 ± 0.02 | 0.88 ± 0.07 | 0.83 ± 0.07 | 0.73 ± 0.07 | 0.73 ± 0.07 | 0.69 ± 0.06 | 0.49 ± 0.09 | |
| 7 b | 0.95 ± 0.02 | 0.80 ± 0.02 | 0.88 ± 0.05 | 0.89 ± 0.07 | 0.75 ± 0.07 | 0.77 ± 0.08 | 0.54 ± 0.08 | 0.68 ± 0.07 | |
| 8 b | 0.96 ± 0.02 | 0.79 ± 0.02 | 0.87 ± 0.06 | 0.88 ± 0.06 | 0.75 ± 0.08 | 0.74 ± 0.08 | 0.53 ± 0.07 | 0.67 ± 0.07 | |
| 9 b | 0.95 ± 0.02 | 0.81 ± 0.02 | 0.87 ± 0.05 | 0.90 ± 0.06 | 0.76 ± 0.07 | 0.79 ± 0.07 | 0.53 ± 0.08 | 0.70 ± 0.07 | |
| 10 b | 0.78 ± 0.03 | 0.73 ± 0.03 | 0.81 ± 0.08 | 0.80 ± 0.06 | 0.71 ± 0.07 | 0.75 ± 0.06 | 0.63 ± 0.06 | 0.62 ± 0.07 | |
| 11 b | 0.87 ± 0.03 | 0.71 ± 0.03 | 0.80 ± 0.08 | 0.84 ± 0.07 | 0.72 ± 0.09 | 0.76 ± 0.08 | 0.64 ± 0.08 | 0.58 ± 0.09 | |
| 12 b | 0.93 ± 0.02 | 0.80 ± 0.02 | 0.93 ± 0.07 | 0.93 ± 0.05 | 0.71 ± 0.06 | 0.77 ± 0.08 | 0.66 ± 0.07 | 0.66 ± 0.08 | |
| 13 b | 0.82 ± 0.03 | 0.75 ± 0.02 | 0.84 ± 0.06 | 0.86 ± 0.08 | 0.72 ± 0.08 | 0.74 ± 0.07 | 0.72 ± 0.08 | 0.64 ± 0.06 | |
| 14 b | 0.85 ± 0.02 | 0.76 ± 0.02 | 0.86 ± 0.07 | 0.90 ± 0.07 | 0.70 ± 0.08 | 0.72 ± 0.08 | 0.70 ± 0.07 | 0.64 ± 0.07 | |
| 15 b | 0.79 ± 0.03 | 0.73 ± 0.02 | 0.90 ± 0.05 | 0.92 ± 0.06 | 0.77 ± 0.05 | 0.72 ± 0.07 | 0.64 ± 0.05 | 0.77 ± 0.07 | |
| 16 b | 0.79 ± 0.03 | 0.72 ± 0.02 | 0.89 ± 0.07 | 0.93 ± 0.05 | 0.80 ± 0.06 | 0.71 ± 0.06 | 0.64 ± 0.06 | 0.78 ± 0.07 | |
| 50 | 1 a | 0.77 ± 0.02 | 0.71 ± 0.02 | 0.88 ± 0.04 | 0.83 ± 0.04 | 0.77 ± 0.04 | 0.73 ± 0.04 | 0.64 ± 0.03 | 0.68 ± 0.04 |
| 2 a | 0.77 ± 0.02 | 0.72 ± 0.02 | 0.86 ± 0.04 | 0.84 ± 0.02 | 0.76 ± 0.04 | 0.75 ± 0.03 | 0.63 ± 0.04 | 0.71 ± 0.04 | |
| 3 a | 0.77 ± 0.02 | 0.71 ± 0.02 | 0.89 ± 0.04 | 0.82 ± 0.04 | 0.77 ± 0.03 | 0.73 ± 0.05 | 0.64 ± 0.03 | 0.68 ± 0.04 | |
| 4 a | 0.81 ± 0.02 | 0.79 ± 0.02 | 0.85 ± 0.08 | 0.83 ± 0.03 | 0.72 ± 0.06 | 0.71 ± 0.03 | 0.59 ± 0.07 | 0.73 ± 0.04 | |
| 5 a | 0.86 ± 0.03 | 0.86 ± 0.02 | 0.84 ± 0.06 | 0.81 ± 0.03 | 0.73 ± 0.08 | 0.71 ± 0.04 | 0.58 ± 0.08 | 0.70 ± 0.04 | |
| 6 b | 0.84 ± 0.03 | 0.72 ± 0.03 | 0.80 ± 0.07 | 0.84 ± 0.06 | 0.70 ± 0.07 | 0.71 ± 0.07 | 0.61 ± 0.08 | 0.64 ± 0.09 | |
OC1—optimization criterion 1; TR—Training set; TE1—The first test set; TE2—The second test set; VA—External validation set. Nin.—Number of initial descriptors in the dataset; model—Model number; Sens.—Sensitivity; Spec.—Specificity. a Imbalanced training set used during optimization. b Manually balanced training set used during optimization.
The average values of sensitivity and specificity obtained for models optimized using optimization criterion 2. Average values and standard deviations are given for 100 models built using selected descriptors and different permutations of objects in training set.
| OC2 | TR | TE1 | TE2 | VA | |||||
|---|---|---|---|---|---|---|---|---|---|
| Nin. | model | Sens. | Spec. | Sens. | Spec. | Sens. | Spec. | Sens. | Spec. |
| 182 | 1 a | 0.92 ± 0.02 | 0.92 ± 0.02 | 0.93 ± 0.05 | 0.85 ± 0.03 | 0.76 ± 0.07 | 0.72 ± 0.04 | 0.39 ± 0.09 | 0.69 ± 0.04 |
| 2 a | 0.93 ± 0.02 | 0.92 ± 0.02 | 0.94 ± 0.05 | 0.85 ± 0.04 | 0.76 ± 0.07 | 0.72 ± 0.04 | 0.39 ± 0.08 | 0.67 ± 0.04 | |
| 3 a | 0.90 ± 0.03 | 0.90 ± 0.03 | 0.83 ± 0.07 | 0.86 ± 0.04 | 0.72 ± 0.08 | 0.71 ± 0.04 | 0.50 ± 0.08 | 0.76 ± 0.04 | |
| 4 a | 0.89 ± 0.03 | 0.89 ± 0.03 | 0.84 ± 0.07 | 0.85 ± 0.04 | 0.74 ± 0.07 | 0.73 ± 0.04 | 0.50 ± 0.10 | 0.76 ± 0.04 | |
| 5 a | 0.77 ± 0.02 | 0.79 ± 0.03 | 0.82 ± 0.06 | 0.86 ± 0.04 | 0.70 ± 0.09 | 0.74 ± 0.04 | 0.61 ± 0.08 | 0.77 ± 0.04 | |
| 6 a | 0.80 ± 0.02 | 0.84 ± 0.02 | 0.91 ± 0.05 | 0.90 ± 0.04 | 0.73 ± 0.08 | 0.72 ± 0.04 | 0.58 ± 0.06 | 0.73 ± 0.03 | |
| 7 a | 0.81 ± 0.03 | 0.84 ± 0.02 | 0.93 ± 0.04 | 0.88 ± 0.04 | 0.71 ± 0.06 | 0.74 ± 0.04 | 0.56 ± 0.07 | 0.72 ± 0.03 | |
| 8 a | 0.82 ± 0.03 | 0.86 ± 0.02 | 0.84 ± 0.08 | 0.79 ± 0.04 | 0.73 ± 0.08 | 0.72 ± 0.04 | 0.57 ± 0.08 | 0.73 ± 0.04 | |
| 9 a | 0.82 ± 0.03 | 0.87 ± 0.02 | 0.86 ± 0.08 | 0.78 ± 0.04 | 0.74 ± 0.09 | 0.71 ± 0.04 | 0.59 ± 0.09 | 0.70 ± 0.04 | |
| 10 b | 0.92 ± 0.02 | 0.79 ± 0.02 | 0.85 ± 0.06 | 0.95 ± 0.05 | 0.72 ± 0.08 | 0.72 ± 0.08 | 0.65 ± 0.07 | 0.76 ± 0.08 | |
| 11 b | 0.92 ± 0.02 | 0.79 ± 0.02 | 0.86 ± 0.06 | 0.94 ± 0.05 | 0.71 ± 0.07 | 0.76 ± 0.07 | 0.68 ± 0.06 | 0.69 ± 0.10 | |
| 12 b | 0.92 ± 0.02 | 0.79 ± 0.02 | 0.85 ± 0.06 | 0.95 ± 0.05 | 0.72 ± 0.07 | 0.76 ± 0.07 | 0.67 ± 0.07 | 0.68 ± 0.07 | |
| 13 b | 0.91 ± 0.02 | 0.78 ± 0.02 | 0.83 ± 0.08 | 0.91 ± 0.06 | 0.76 ± 0.07 | 0.75 ± 0.08 | 0.63 ± 0.07 | 0.69 ± 0.08 | |
| 14 b | 0.92 ± 0.02 | 0.77 ± 0.02 | 0.84 ± 0.08 | 0.89 ± 0.06 | 0.73 ± 0.07 | 0.76 ± 0.09 | 0.62 ± 0.07 | 0.71 ± 0.08 | |
| 15 b | 0.95 ± 0.02 | 0.85 ± 0.02 | 0.91 ± 0.05 | 0.95 ± 0.05 | 0.70 ± 0.09 | 0.74 ± 0.08 | 0.67 ± 0.06 | 0.81 ± 0.07 | |
| 16 b | 0.95 ± 0.02 | 0.84 ± 0.02 | 0.92 ± 0.04 | 0.95 ± 0.05 | 0.72 ± 0.07 | 0.73 ± 0.07 | 0.69 ± 0.05 | 0.81 ± 0.07 | |
| 17 b | 0.95 ± 0.02 | 0.84 ± 0.02 | 0.92 ± 0.04 | 0.95 ± 0.05 | 0.73 ± 0.09 | 0.75 ± 0.08 | 0.68 ± 0.06 | 0.80 ± 0.07 | |
| 18 b | 0.88 ± 0.02 | 0.75 ± 0.02 | 0.85 ± 0.07 | 0.92 ± 0.06 | 0.72 ± 0.07 | 0.72 ± 0.07 | 0.61 ± 0.10 | 0.66 ± 0.08 | |
| 19 b | 0.90 ± 0.02 | 0.78 ± 0.02 | 0.84 ± 0.07 | 0.82 ± 0.07 | 0.74 ± 0.08 | 0.71 ± 0.08 | 0.59 ± 0.08 | 0.63 ± 0.09 | |
| 20 b | 0.90 ± 0.02 | 0.77 ± 0.02 | 0.80 ± 0.09 | 0.81 ± 0.09 | 0.75 ± 0.06 | 0.72 ± 0.09 | 0.71 ± 0.09 | 0.64 ± 0.09 | |
| 21 b | 0.91 ± 0.02 | 0.79 ± 0.02 | 0.79 ± 0.08 | 0.81 ± 0.07 | 0.73 ± 0.05 | 0.72 ± 0.07 | 0.60 ± 0.06 | 0.60 ± 0.08 | |
| 98 | 1 b | 0.82 ± 0.03 | 0.75 ± 0.02 | 0.86 ± 0.08 | 0.82 ± 0.08 | 0.73 ± 0.07 | 0.77 ± 0.08 | 0.70 ± 0.07 | 0.64 ± 0.09 |
| 2 b | 0.91 ± 0.02 | 0.78 ± 0.02 | 0.78 ± 0.08 | 0.91 ± 0.06 | 0.71 ± 0.08 | 0.71 ± 0.08 | 0.61 ± 0.08 | 0.74 ± 0.08 | |
| 3 b | 0.89 ± 0.02 | 0.76 ± 0.02 | 0.86 ± 0.08 | 0.91 ± 0.06 | 0.70 ± 0.07 | 0.74 ± 0.06 | 0.64 ± 0.07 | 0.59 ± 0.08 | |
| 4 b | 0.89 ± 0.03 | 0.73 ± 0.02 | 0.89 ± 0.07 | 0.92 ± 0.05 | 0.73 ± 0.07 | 0.72 ± 0.08 | 0.67 ± 0.08 | 0.56 ± 0.09 | |
| 5 b | 0.85 ± 0.02 | 0.78 ± 0.02 | 0.84 ± 0.06 | 0.93 ± 0.06 | 0.77 ± 0.06 | 0.79 ± 0.10 | 0.63 ± 0.08 | 0.74 ± 0.06 | |
| 6 b | 0.88 ± 0.03 | 0.77 ± 0.03 | 0.78 ± 0.08 | 0.81 ± 0.09 | 0.75 ± 0.07 | 0.79 ± 0.07 | 0.70 ± 0.09 | 0.71 ± 0.06 | |
| 7 b | 0.87 ± 0.02 | 0.78 ± 0.02 | 0.78 ± 0.07 | 0.90 ± 0.08 | 0.76 ± 0.06 | 0.84 ± 0.07 | 0.62 ± 0.07 | 0.73 ± 0.07 | |
| 8 b | 0.93 ± 0.02 | 0.76 ± 0.02 | 0.86 ± 0.07 | 0.93 ± 0.05 | 0.81 ± 0.07 | 0.72 ± 0.08 | 0.68 ± 0.06 | 0.66 ± 0.06 | |
| 9 b | 0.91 ± 0.02 | 0.76 ± 0.02 | 0.83 ± 0.08 | 0.91 ± 0.06 | 0.79 ± 0.06 | 0.75 ± 0.07 | 0.60 ± 0.06 | 0.68 ± 0.08 | |
| 10 b | 0.91 ± 0.02 | 0.76 ± 0.02 | 0.80 ± 0.08 | 0.90 ± 0.06 | 0.78 ± 0.07 | 0.75 ± 0.07 | 0.60 ± 0.08 | 0.68 ± 0.07 | |
| 11 b | 0.91 ± 0.02 | 0.75 ± 0.02 | 0.81 ± 0.08 | 0.91 ± 0.06 | 0.77 ± 0.07 | 0.75 ± 0.08 | 0.60 ± 0.08 | 0.67 ± 0.09 | |
| 12 b | 0.82 ± 0.03 | 0.70 ± 0.03 | 0.84 ± 0.08 | 0.85 ± 0.07 | 0.75 ± 0.08 | 0.76 ± 0.08 | 0.68 ± 0.09 | 0.59 ± 0.07 | |
| 13 b | 0.86 ± 0.03 | 0.74 ± 0.03 | 0.83 ± 0.08 | 0.88 ± 0.06 | 0.70 ± 0.07 | 0.71 ± 0.07 | 0.55 ± 0.08 | 0.71 ± 0.08 | |
| 14 b | 0.91 ± 0.02 | 0.78 ± 0.02 | 0.80 ± 0.08 | 0.90 ± 0.05 | 0.70 ± 0.08 | 0.82 ± 0.06 | 0.67 ± 0.07 | 0.55 ± 0.08 | |
| 15 b | 0.91 ± 0.02 | 0.77 ± 0.02 | 0.81 ± 0.08 | 0.89 ± 0.06 | 0.71 ± 0.06 | 0.81 ± 0.06 | 0.68 ± 0.08 | 0.57 ± 0.08 | |
| 16 b | 0.86 ± 0.02 | 0.78 ± 0.02 | 0.88 ± 0.07 | 0.93 ± 0.06 | 0.73 ± 0.08 | 0.75 ± 0.06 | 0.64 ± 0.06 | 0.73 ± 0.07 | |
| 17 b | 0.85 ± 0.03 | 0.79 ± 0.02 | 0.85 ± 0.08 | 0.92 ± 0.06 | 0.72 ± 0.08 | 0.79 ± 0.05 | 0.64 ± 0.06 | 0.74 ± 0.07 | |
| 18 b | 0.81 ± 0.02 | 0.79 ± 0.02 | 0.87 ± 0.06 | 0.92 ± 0.05 | 0.76 ± 0.06 | 0.81 ± 0.06 | 0.65 ± 0.06 | 0.75 ± 0.07 | |
| 19 b | 0.91 ± 0.02 | 0.77 ± 0.02 | 0.89 ± 0.07 | 0.88 ± 0.07 | 0.73 ± 0.09 | 0.71 ± 0.09 | 0.61 ± 0.08 | 0.51 ± 0.09 | |
| 20 b | 0.87 ± 0.02 | 0.73 ± 0.02 | 0.78 ± 0.08 | 0.83 ± 0.06 | 0.73 ± 0.08 | 0.73 ± 0.07 | 0.64 ± 0.07 | 0.56 ± 0.08 | |
| 21 b | 0.90 ± 0.02 | 0.72 ± 0.02 | 0.81 ± 0.07 | 0.88 ± 0.06 | 0.71 ± 0.07 | 0.72 ± 0.08 | 0.53 ± 0.08 | 0.63 ± 0.09 | |
| 50 | 1 a | 0.76 ± 0.03 | 0.74 ± 0.03 | 0.85 ± 0.07 | 0.78 ± 0.04 | 0.71 ± 0.08 | 0.72 ± 0.05 | 0.61 ± 0.08 | 0.70 ± 0.04 |
| 2 a | 0.77 ± 0.03 | 0.74 ± 0.04 | 0.86 ± 0.07 | 0.78 ± 0.05 | 0.72 ± 0.07 | 0.71 ± 0.04 | 0.61 ± 0.07 | 0.69 ± 0.04 | |
| 3 a | 0.77 ± 0.04 | 0.74 ± 0.04 | 0.85 ± 0.08 | 0.78 ± 0.04 | 0.71 ± 0.08 | 0.71 ± 0.05 | 0.60 ± 0.07 | 0.69 ± 0.04 | |
| 4 b | 0.97 ± 0.01 | 0.83 ± 0.02 | 0.88 ± 0.06 | 0.85 ± 0.08 | 0.71 ± 0.08 | 0.72 ± 0.08 | 0.55 ± 0.07 | 0.79 ± 0.08 | |
| 5 b | 0.96 ± 0.02 | 0.83 ± 0.02 | 0.87 ± 0.07 | 0.85 ± 0.06 | 0.71 ± 0.08 | 0.70 ± 0.08 | 0.59 ± 0.07 | 0.75 ± 0.08 | |
OC2—optimization criterion 2; TR—Training set; TE1—The first test set; TE2—The second test set; VA—External validation set. Nin.—Number of initial descriptors in the dataset; model—Model number; Sens.—Sensitivity; Spec.—Specificity. a Imbalanced training set used during optimization. b Manually balanced training set used during optimization.
The average values of sensitivity and specificity obtained for models optimized using optimization criterion 3. Average values and standard deviations are given for 100 models built using selected descriptors and different permutations of objects in training set.
| OC3 | TR | TE1 | TE2 | VA | |||||
|---|---|---|---|---|---|---|---|---|---|
| Nin. | model | Sens. | Spec. | Sens. | Spec. | Sens. | Spec. | Sens. | Spec. |
| 182 | 1 a | 0.91 ± 0.03 | 0.91 ± 0.03 | 0.88 ± 0.06 | 0.81 ± 0.04 | 0.71 ± 0.08 | 0.70 ± 0.04 | 0.47 ± 0.08 | 0.74 ± 0.04 |
| 2 a | 0.90 ± 0.03 | 0.90 ± 0.03 | 0.83 ± 0.06 | 0.86 ± 0.04 | 0.70 ± 0.09 | 0.71 ± 0.04 | 0.52 ± 0.09 | 0.74 ± 0.04 | |
| 3 a | 0.86 ± 0.03 | 0.85 ± 0.03 | 0.86 ± 0.06 | 0.83 ± 0.05 | 0.77 ± 0.08 | 0.70 ± 0.04 | 0.50 ± 0.10 | 0.73 ± 0.04 | |
| 4 b | 0.94 ± 0.02 | 0.82 ± 0.02 | 0.93 ± 0.06 | 0.89 ± 0.06 | 0.74 ± 0.07 | 0.72 ± 0.08 | 0.63 ± 0.08 | 0.65 ± 0.07 | |
| 5 b | 0.94 ± 0.02 | 0.81 ± 0.02 | 0.94 ± 0.05 | 0.91 ± 0.06 | 0.73 ± 0.08 | 0.71 ± 0.08 | 0.65 ± 0.07 | 0.64 ± 0.07 | |
| 6 b | 0.93 ± 0.02 | 0.84 ± 0.02 | 0.90 ± 0.04 | 0.96 ± 0.05 | 0.72 ± 0.07 | 0.71 ± 0.09 | 0.69 ± 0.06 | 0.66 ± 0.09 | |
| 7 b | 0.93 ± 0.02 | 0.83 ± 0.02 | 0.91 ± 0.04 | 0.96 ± 0.05 | 0.73 ± 0.07 | 0.71 ± 0.08 | 0.66 ± 0.07 | 0.66 ± 0.08 | |
| 8 b | 0.93 ± 0.02 | 0.82 ± 0.02 | 0.90 ± 0.05 | 0.92 ± 0.06 | 0.73 ± 0.08 | 0.71 ± 0.07 | 0.61 ± 0.06 | 0.74 ± 0.07 | |
| 9 b | 0.93 ± 0.02 | 0.82 ± 0.02 | 0.89 ± 0.05 | 0.92 ± 0.06 | 0.76 ± 0.07 | 0.73 ± 0.07 | 0.63 ± 0.07 | 0.75 ± 0.07 | |
| 10 b | 0.89 ± 0.03 | 0.77 ± 0.03 | 0.84 ± 0.06 | 0.90 ± 0.07 | 0.75 ± 0.08 | 0.72 ± 0.09 | 0.73 ± 0.07 | 0.64 ± 0.07 | |
| 11 b | 0.88 ± 0.02 | 0.78 ± 0.02 | 0.89 ± 0.07 | 0.91 ± 0.06 | 0.70 ± 0.08 | 0.72 ± 0.09 | 0.64 ± 0.08 | 0.68 ± 0.08 | |
| 12 b | 0.86 ± 0.02 | 0.81 ± 0.03 | 0.88 ± 0.08 | 0.96 ± 0.06 | 0.75 ± 0.07 | 0.73 ± 0.08 | 0.65 ± 0.07 | 0.72 ± 0.07 | |
| 13 b | 0.87 ± 0.02 | 0.81 ± 0.02 | 0.86 ± 0.07 | 0.95 ± 0.05 | 0.73 ± 0.07 | 0.72 ± 0.09 | 0.68 ± 0.06 | 0.72 ± 0.08 | |
| 14 b | 0.87 ± 0.02 | 0.80 ± 0.02 | 0.87 ± 0.07 | 0.95 ± 0.05 | 0.75 ± 0.08 | 0.72 ± 0.08 | 0.67 ± 0.07 | 0.73 ± 0.07 | |
| 15 b | 0.92 ± 0.02 | 0.83 ± 0.02 | 0.84 ± 0.06 | 0.92 ± 0.06 | 0.72 ± 0.07 | 0.71 ± 0.08 | 0.63 ± 0.06 | 0.73 ± 0.06 | |
| 16 b | 0.93 ± 0.02 | 0.83 ± 0.02 | 0.85 ± 0.07 | 0.91 ± 0.06 | 0.72 ± 0.07 | 0.71 ± 0.09 | 0.64 ± 0.06 | 0.78 ± 0.08 | |
| 17 b | 0.94 ± 0.02 | 0.81 ± 0.02 | 0.93 ± 0.06 | 0.91 ± 0.06 | 0.75 ± 0.07 | 0.77 ± 0.08 | 0.66 ± 0.06 | 0.72 ± 0.08 | |
| 18 b | 0.94 ± 0.02 | 0.81 ± 0.02 | 0.92 ± 0.06 | 0.91 ± 0.07 | 0.74 ± 0.08 | 0.72 ± 0.08 | 0.65 ± 0.05 | 0.72 ± 0.08 | |
| 19 b | 0.96 ± 0.01 | 0.86 ± 0.02 | 0.92 ± 0.06 | 0.95 ± 0.05 | 0.70 ± 0.07 | 0.70 ± 0.06 | 0.58 ± 0.06 | 0.72 ± 0.06 | |
| 98 | 1 a | 0.86 ± 0.03 | 0.89 ± 0.02 | 0.91 ± 0.06 | 0.76 ± 0.04 | 0.71 ± 0.08 | 0.71 ± 0.05 | 0.52 ± 0.10 | 0.65 ± 0.04 |
| 2 b | 0.95 ± 0.02 | 0.82 ± 0.02 | 0.79 ± 0.07 | 0.93 ± 0.06 | 0.70 ± 0.06 | 0.77 ± 0.06 | 0.57 ± 0.07 | 0.68 ± 0.06 | |
| 3 b | 0.91 ± 0.02 | 0.78 ± 0.02 | 0.82 ± 0.07 | 0.88 ± 0.06 | 0.70 ± 0.08 | 0.71 ± 0.07 | 0.68 ± 0.08 | 0.61 ± 0.08 | |
| 4 b | 0.91 ± 0.02 | 0.79 ± 0.02 | 0.80 ± 0.07 | 0.90 ± 0.05 | 0.73 ± 0.07 | 0.71 ± 0.06 | 0.69 ± 0.07 | 0.64 ± 0.07 | |
| 5 b | 0.92 ± 0.02 | 0.80 ± 0.02 | 0.85 ± 0.07 | 0.95 ± 0.05 | 0.71 ± 0.09 | 0.74 ± 0.07 | 0.63 ± 0.07 | 0.65 ± 0.07 | |
| 6 b | 0.87 ± 0.02 | 0.77 ± 0.02 | 0.78 ± 0.08 | 0.87 ± 0.07 | 0.73 ± 0.06 | 0.72 ± 0.07 | 0.63 ± 0.07 | 0.63 ± 0.07 | |
| 7 b | 0.86 ± 0.02 | 0.75 ± 0.02 | 0.79 ± 0.07 | 0.87 ± 0.07 | 0.73 ± 0.07 | 0.71 ± 0.07 | 0.62 ± 0.07 | 0.63 ± 0.08 | |
| 8 b | 0.87 ± 0.02 | 0.76 ± 0.02 | 0.79 ± 0.08 | 0.85 ± 0.08 | 0.73 ± 0.06 | 0.73 ± 0.08 | 0.60 ± 0.08 | 0.61 ± 0.09 | |
| 9 b | 0.86 ± 0.03 | 0.75 ± 0.03 | 0.73 ± 0.09 | 0.92 ± 0.06 | 0.72 ± 0.07 | 0.72 ± 0.08 | 0.67 ± 0.09 | 0.64 ± 0.08 | |
| 10 b | 0.87 ± 0.02 | 0.77 ± 0.02 | 0.74 ± 0.09 | 0.93 ± 0.06 | 0.75 ± 0.06 | 0.75 ± 0.06 | 0.70 ± 0.08 | 0.58 ± 0.09 | |
| 11 b | 0.92 ± 0.02 | 0.80 ± 0.02 | 0.84 ± 0.07 | 0.91 ± 0.06 | 0.76 ± 0.08 | 0.80 ± 0.06 | 0.71 ± 0.07 | 0.68 ± 0.07 | |
| 12 b | 0.92 ± 0.02 | 0.80 ± 0.02 | 0.84 ± 0.06 | 0.90 ± 0.06 | 0.71 ± 0.07 | 0.80 ± 0.08 | 0.66 ± 0.08 | 0.68 ± 0.08 | |
| 13 b | 0.92 ± 0.02 | 0.79 ± 0.02 | 0.86 ± 0.07 | 0.93 ± 0.05 | 0.73 ± 0.08 | 0.80 ± 0.08 | 0.68 ± 0.07 | 0.68 ± 0.07 | |
| 14 b | 0.86 ± 0.02 | 0.79 ± 0.03 | 0.74 ± 0.06 | 0.92 ± 0.06 | 0.71 ± 0.08 | 0.78 ± 0.08 | 0.58 ± 0.07 | 0.73 ± 0.09 | |
| 15 b | 0.86 ± 0.03 | 0.79 ± 0.03 | 0.72 ± 0.07 | 0.91 ± 0.05 | 0.72 ± 0.08 | 0.78 ± 0.08 | 0.53 ± 0.07 | 0.68 ± 0.09 | |
| 16 b | 0.92 ± 0.02 | 0.79 ± 0.02 | 0.88 ± 0.06 | 0.90 ± 0.05 | 0.75 ± 0.05 | 0.72 ± 0.07 | 0.64 ± 0.06 | 0.64 ± 0.08 | |
| 17 b | 0.93 ± 0.02 | 0.80 ± 0.02 | 0.73 ± 0.09 | 0.96 ± 0.04 | 0.70 ± 0.07 | 0.71 ± 0.08 | 0.64 ± 0.07 | 0.70 ± 0.08 | |
| 18 b | 0.93 ± 0.02 | 0.80 ± 0.02 | 0.79 ± 0.08 | 0.93 ± 0.05 | 0.72 ± 0.07 | 0.77 ± 0.08 | 0.63 ± 0.07 | 0.68 ± 0.08 | |
| 19 b | 0.92 ± 0.02 | 0.81 ± 0.02 | 0.74 ± 0.08 | 0.87 ± 0.06 | 0.73 ± 0.08 | 0.79 ± 0.06 | 0.67 ± 0.09 | 0.65 ± 0.08 | |
| 20 b | 0.91 ± 0.02 | 0.81 ± 0.02 | 0.83 ± 0.07 | 0.91 ± 0.05 | 0.74 ± 0.07 | 0.76 ± 0.07 | 0.56 ± 0.08 | 0.68 ± 0.08 | |
| 21 b | 0.91 ± 0.02 | 0.81 ± 0.02 | 0.83 ± 0.08 | 0.92 ± 0.06 | 0.74 ± 0.07 | 0.75 ± 0.08 | 0.57 ± 0.09 | 0.66 ± 0.08 | |
| 22 b | 0.92 ± 0.02 | 0.81 ± 0.02 | 0.85 ± 0.07 | 0.95 ± 0.05 | 0.76 ± 0.07 | 0.81 ± 0.07 | 0.60 ± 0.08 | 0.68 ± 0.09 | |
| 23 b | 0.91 ± 0.02 | 0.80 ± 0.02 | 0.77 ± 0.08 | 0.87 ± 0.07 | 0.72 ± 0.08 | 0.70 ± 0.07 | 0.63 ± 0.07 | 0.65 ± 0.08 | |
| 24 b | 0.91 ± 0.02 | 0.80 ± 0.03 | 0.78 ± 0.09 | 0.89 ± 0.05 | 0.72 ± 0.08 | 0.73 ± 0.07 | 0.59 ± 0.07 | 0.64 ± 0.07 | |
| 25 b | 0.91 ± 0.02 | 0.81 ± 0.02 | 0.84 ± 0.08 | 0.96 ± 0.04 | 0.71 ± 0.07 | 0.74 ± 0.07 | 0.58 ± 0.07 | 0.69 ± 0.09 | |
| 26 b | 0.94 ± 0.02 | 0.84 ± 0.02 | 0.78 ± 0.08 | 0.92 ± 0.06 | 0.83 ± 0.06 | 0.72 ± 0.08 | 0.62 ± 0.08 | 0.66 ± 0.07 | |
| 27 b | 0.94 ± 0.02 | 0.84 ± 0.02 | 0.81 ± 0.09 | 0.93 ± 0.06 | 0.83 ± 0.06 | 0.74 ± 0.07 | 0.59 ± 0.08 | 0.66 ± 0.06 | |
| 28 b | 0.94 ± 0.02 | 0.84 ± 0.02 | 0.81 ± 0.07 | 0.92 ± 0.06 | 0.83 ± 0.05 | 0.72 ± 0.09 | 0.61 ± 0.08 | 0.63 ± 0.06 | |
| 29 b | 0.96 ± 0.02 | 0.85 ± 0.02 | 0.83 ± 0.08 | 0.95 ± 0.05 | 0.70 ± 0.09 | 0.74 ± 0.08 | 0.63 ± 0.10 | 0.63 ± 0.09 | |
| 30 b | 0.96 ± 0.01 | 0.85 ± 0.02 | 0.83 ± 0.07 | 0.92 ± 0.06 | 0.72 ± 0.07 | 0.70 ± 0.07 | 0.54 ± 0.07 | 0.66 ± 0.08 | |
| 31 b | 0.97 ± 0.02 | 0.86 ± 0.02 | 0.82 ± 0.07 | 0.94 ± 0.04 | 0.72 ± 0.07 | 0.72 ± 0.07 | 0.54 ± 0.07 | 0.67 ± 0.07 | |
| 32 b | 0.96 ± 0.02 | 0.85 ± 0.02 | 0.85 ± 0.07 | 0.95 ± 0.04 | 0.71 ± 0.08 | 0.71 ± 0.06 | 0.55 ± 0.07 | 0.62 ± 0.07 | |
| 33 b | 0.88 ± 0.03 | 0.77 ± 0.03 | 0.82 ± 0.08 | 0.91 ± 0.07 | 0.78 ± 0.07 | 0.73 ± 0.08 | 0.67 ± 0.08 | 0.68 ± 0.08 | |
| 34 b | 0.88 ± 0.03 | 0.77 ± 0.03 | 0.80 ± 0.08 | 0.91 ± 0.06 | 0.77 ± 0.07 | 0.73 ± 0.07 | 0.66 ± 0.08 | 0.70 ± 0.08 | |
| 35 b | 0.88 ± 0.03 | 0.77 ± 0.03 | 0.80 ± 0.07 | 0.91 ± 0.06 | 0.75 ± 0.08 | 0.72 ± 0.06 | 0.66 ± 0.09 | 0.68 ± 0.08 | |
| 36 b | 0.96 ± 0.02 | 0.83 ± 0.02 | 0.74 ± 0.07 | 0.93 ± 0.06 | 0.79 ± 0.06 | 0.71 ± 0.06 | 0.65 ± 0.07 | 0.66 ± 0.07 | |
| 37 b | 0.95 ± 0.02 | 0.82 ± 0.02 | 0.73 ± 0.07 | 0.94 ± 0.05 | 0.79 ± 0.06 | 0.71 ± 0.06 | 0.65 ± 0.06 | 0.65 ± 0.07 | |
| 38 b | 0.96 ± 0.02 | 0.82 ± 0.02 | 0.76 ± 0.07 | 0.90 ± 0.06 | 0.78 ± 0.06 | 0.70 ± 0.07 | 0.65 ± 0.08 | 0.65 ± 0.06 | |
| 39 b | 0.86 ± 0.03 | 0.71 ± 0.03 | 0.85 ± 0.08 | 0.83 ± 0.06 | 0.70 ± 0.09 | 0.72 ± 0.08 | 0.68 ± 0.07 | 0.63 ± 0.09 | |
| 40 b | 0.87 ± 0.02 | 0.71 ± 0.03 | 0.82 ± 0.08 | 0.82 ± 0.06 | 0.74 ± 0.08 | 0.71 ± 0.08 | 0.66 ± 0.06 | 0.61 ± 0.08 | |
| 41 b | 0.83 ± 0.02 | 0.77 ± 0.02 | 0.80 ± 0.08 | 0.90 ± 0.06 | 0.71 ± 0.07 | 0.78 ± 0.06 | 0.54 ± 0.07 | 0.59 ± 0.08 | |
| 42 b | 0.82 ± 0.02 | 0.76 ± 0.03 | 0.80 ± 0.07 | 0.90 ± 0.06 | 0.76 ± 0.08 | 0.73 ± 0.07 | 0.53 ± 0.07 | 0.54 ± 0.06 | |
| 43 b | 0.83 ± 0.03 | 0.74 ± 0.02 | 0.81 ± 0.08 | 0.91 ± 0.05 | 0.76 ± 0.08 | 0.73 ± 0.06 | 0.56 ± 0.07 | 0.54 ± 0.07 | |
| 44 b | 0.95 ± 0.02 | 0.79 ± 0.02 | 0.71 ± 0.09 | 0.90 ± 0.04 | 0.74 ± 0.07 | 0.71 ± 0.08 | 0.64 ± 0.07 | 0.69 ± 0.07 | |
| 45 b | 0.94 ± 0.02 | 0.81 ± 0.02 | 0.84 ± 0.06 | 0.90 ± 0.06 | 0.71 ± 0.07 | 0.79 ± 0.06 | 0.76 ± 0.07 | 0.65 ± 0.07 | |
| 50 | 1 a | 0.92 ± 0.02 | 0.92 ± 0.02 | 0.82 ± 0.05 | 0.79 ± 0.04 | 0.72 ± 0.07 | 0.70 ± 0.03 | 0.53 ± 0.06 | 0.69 ± 0.04 |
| 2 a | 0.91 ± 0.02 | 0.93 ± 0.02 | 0.83 ± 0.07 | 0.83 ± 0.04 | 0.72 ± 0.07 | 0.70 ± 0.04 | 0.60 ± 0.07 | 0.66 ± 0.04 | |
| 3 b | 0.92 ± 0.02 | 0.78 ± 0.02 | 0.82 ± 0.07 | 0.83 ± 0.07 | 0.71 ± 0.09 | 0.71 ± 0.06 | 0.61 ± 0.07 | 0.77 ± 0.08 | |
| 4 b | 0.87 ± 0.03 | 0.76 ± 0.02 | 0.84 ± 0.09 | 0.85 ± 0.07 | 0.73 ± 0.08 | 0.71 ± 0.07 | 0.66 ± 0.07 | 0.66 ± 0.08 | |
| 5 b | 0.87 ± 0.03 | 0.75 ± 0.02 | 0.85 ± 0.07 | 0.83 ± 0.09 | 0.72 ± 0.08 | 0.72 ± 0.08 | 0.66 ± 0.07 | 0.66 ± 0.07 | |
| 6 b | 0.95 ± 0.02 | 0.83 ± 0.02 | 0.80 ± 0.08 | 0.93 ± 0.06 | 0.72 ± 0.07 | 0.73 ± 0.07 | 0.55 ± 0.08 | 0.62 ± 0.07 | |
| 7 b | 0.96 ± 0.02 | 0.83 ± 0.02 | 0.81 ± 0.07 | 0.93 ± 0.06 | 0.71 ± 0.08 | 0.74 ± 0.07 | 0.58 ± 0.07 | 0.63 ± 0.07 | |
OC3—optimization criterion 3; TR—Training set; TE1—The first test set; TE2—The second test set; VA—External validation set. Nin.—Number of initial descriptors in the dataset; model—Model number; Sens.—Sensitivity; Spec.—Specificity. a Imbalanced training set used during optimization. b Manually balanced training set used during optimization.
Most important descriptors for models using the first modeling approach.
| Descriptor | Description | Id |
|---|---|---|
| J_D/Dt | Balaban-like index from distance/detour matrix | 8.945987 |
| GATS5v | Geary autocorrelation of lag 5 weighted by van der Waals volume | 8.042931 |
| H% | Percentage of H atoms | 7.579833 |
| SpMin1_Bh(s) | Smallest eigenvalue n. 1 of Burden matrix weighted by I-state | 6.506972 |
| CATS2D_02_AA | CATS2D Acceptor-Acceptor at lag 02 | 5.406386 |
| IC2 | Information content index (neighborhood symmetry of 2-order) | 5.3672 |
| GATS1v | Geary autocorrelation of lag 1 weighted by van der Waals volume | 4.810587 |
| GATS2v | Geary autocorrelation of lag 2 weighted by van der Waals volume | 4.727913 |
| BAC | Balaban centric index | 4.58365 |
| SpPosA_X | Normalized spectral positive sum from chi matrix | 4.303807 |
| P_VSA_LogP_6 | P_VSA-like on LogP, bin 6 | 3.877732 |
| C-006 | CH2RX | 3.640303 |
| P_VSA_e_3 | P_VSA-like on Sanderson electronegativity, bin 3 | 3.475949 |
| P_VSA_MR_2 | P_VSA-like on Molar Refractivity, bin 2 | 3.236547 |
| MATS8m | Moran autocorrelation of lag 8 weighted by mass | 3.138591 |
| nCsp3 | Number of sp3 hybridized carbon atoms | 2.675997 |
| PDI | Packing density index | 2.585321 |
| P_VSA_m_4 | P_VSA-like on mass, bin 4 | 2.511289 |
| SpAD_EA(dm) | Spectral absolute deviation from edge adjacency mat. weighted by dipole moment | 2.35969 |
| CATS2D_04_AA | CATS2D Acceptor-Acceptor at lag 04 | 2.332174 |
| X5Av | Average valence connectivity index of order 5 | 2.196837 |
| X5A | Average connectivity index of order 5 | 2.100552 |
Most important descriptors for models using the second modeling approach.
| Descriptor | Description | Id |
|---|---|---|
| JGI6 | Mean topological charge index of order 6 | 3.502671 |
| JGI4 | Mean topological charge index of order 4 | 3.398279 |
| SdssC | Sum of dssC E-states | 3.372717 |
| H% | Percentage of H atoms | 3.287295 |
| Uc | Unsaturation count | 3.071672 |
| P_VSA_LogP_6 | P_VSA-like on LogP, bin 6 | 2.985918 |
| H-052 | H attached to C0(sp3) with 1X attached to next C | 2.805648 |
| MAXDN | Maximal electrotopological negative variation | 2.620215 |
| Chi1_EA(dm) | Connectivity-like index of order 1 from edge adjacency mat. Weighted by dipole moment | 2.576782 |
| SpMax_B(m) | Leading eigenvalue from Burden matrix weighted by mass | 2.540987 |
| GATS5m | Geary autocorrelation of lag 5 weighted by mass | 2.480224 |
| SpAD_EA(dm) | Spectral absolute deviation from edge adjacency mat. Weighted by Dipole moment | 2.416726 |
| GATS1i | Geary autocorrelation of lag 1 weighted by ionization potential | 2.365205 |
| SsssN | Sum of sssN E-states | 2.352358 |
| SpMAD_EA(bo) | Spectral mean absolute deviation from edge adjacency mat. Weighted by bond order | 2.344266 |
| ChiA_B(s) | Average Randic-like index from Burden matrix weighted by I-State | 2.338868 |
| NssO | Number of atoms of type ssO | 2.223777 |
| VE2sign_A | Average coefficient of the last eigenvector from adjacency matrix | 2.169277 |
| MATS2p | Moran autocorrelation of lag 2 weighted by polarizability | 2.169277 |
| MATS1p | Moran autocorrelation of lag 1 weighted by polarizability | 2.103962 |
| SpMin1_Bh(v) | Smallest eigenvalue n. 1 of Burden matrix weighted by van der Waals volume | 2.089443 |
| ChiA_B(v) | Average Randic-like index from Burden matrix weighted by van der Waals volume | 2.043667 |
| Rbrid | Ring bridge count | 2.040469 |
| nCsp3 | Number of sp3 hybridized Carbon atoms | 2.038696 |
| C-040 | R-C(=X)-X / R-C#X / X=C=X | 2.022 |
Figure 4Modeling workflow. A total of 1259 Dragon descriptors were calculated for our dataset. The number of descriptors was reduced based on pairwise correlation and the frequency of the most common value. For the first modeling approach, where selection of the training set resulted in balanced classes, 102 validation objects were removed. For the second approach, where selection of the training set resulted in imbalanced classes, 40 objects were removed. For both approaches, the number of descriptors was further reduced: 50, 98, and 181 descriptors were selected. Based on the selected descriptors, objects for the training set (TR), test set 1 (TE1), and test set 2 (TE2) were selected. We applied the genetic algorithm for optimization of the models. Models passing the criteria were selected for internal validation with TE2. Lastly, models passing internal validation were externally validated with the corresponding validation set.