| Literature DB >> 27934950 |
Alaa Tharwat1,2, Yasmine S Moemen2,3, Aboul Ella Hassanien2,4.
Abstract
Measuring toxicity is one of the main steps in drug development. Hence, there is a high demand for computational models to predict the toxicity effects of the potential drugs. In this study, we used a dataset, which consists of four toxicity effects:mutagenic, tumorigenic, irritant and reproductive effects. The proposed model consists of three phases. In the first phase, rough set-based methods are used to select the most discriminative features for reducing the classification time and improving the classification performance. Due to the imbalanced class distribution, in the second phase, different sampling methods such as Random Under-Sampling, Random Over-Sampling and Synthetic Minority Oversampling Technique are used to solve the problem of imbalanced datasets. ITerative Sampling (ITS) method is proposed to avoid the limitations of those methods. ITS method has two steps. The first step (sampling step) iteratively modifies the prior distribution of the minority and majority classes. In the second step, a data cleaning method is used to remove the overlapping that is produced from the first step. In the third phase, Bagging classifier is used to classify an unknown drug into toxic or non-toxic. The experimental results proved that the proposed model performed well in classifying the unknown samples according to all toxic effects in the imbalanced datasets.Entities:
Mesh:
Year: 2016 PMID: 27934950 PMCID: PMC5146749 DOI: 10.1038/srep38660
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Dataset description.
| Feature No. | Name | Feature No. | Name |
|---|---|---|---|
| 1 | Total Molecular Weight | 17 | Electron Negative Atoms |
| 2 | Molecular Weight | 18 | Stereo Centers |
| 3 | Absolute Weight | 19 | Rotatable Bonds |
| 4 | cLogP (Octanol/Water, partition coefficient) | 20 | Rings |
| 5 | cLogS (Aqueous solubility) | 21 | Aromatic Rings |
| 6 | H-Acceptors (Hydrogen bond Acceptor) | 22 | Aromatic Atoms |
| 7 | H-Donors (Hydrogen bond donor) | 23 | sp3-Atoms |
| 8 | Total Surface Area | 24 | Symmetric atoms |
| 9 | Polar Surface Area | 25 | Amides (acid amide) |
| 10 | Druglikeness | 26 | Amines |
| 11 | Molecular Shape Index | 27 | AlkylAmines |
| 12 | Molecular Flexibility | 28 | Aromatic Amines |
| 13 | Molecular Complexity | 29 | Aromatic Nitrogen |
| 14 | Non Hydrogen Atoms | 30 | Basic Nitrogen |
| 15 | Non-Carbon/Hydrogen Atoms | 31 | Acidic Oxygen |
| 16 | Metal Atoms |
Distribution of the two classes of each toxic effect.
| Toxic effect | #Samples in Positive Class | #Samples in Negative Class | Imbalance ratio |
|---|---|---|---|
| Mutagenic Effect | 90 = 16.28% | 463 = 83.73% | 5.14 |
| Tumorigenic Effect | 90 = 16.28% | 463 = 83.73% | 5.14 |
| Reproductive Effect | 187 = 33.82% | 366 = 66.18 | 1.96 |
| Irritant Effect | 67 = 12.16% | 486 = 87.88% | 7.25 |
Figure 1Block diagram of the proposed model.
The selected features using QRFS, DMFS and EBFS rough set methods.
| Rough Set reduction method | Mutagenic Effect | Tumorigenic Effect | Irritant Effect | Reproductive Effect | ||||
|---|---|---|---|---|---|---|---|---|
| Selected features | No. of features (Red. Rate %) | Feature Subset | No. of features (Red. Rate %) | Selected features | No. of features (Red. Rate %) | Selected features | No. of features (Red. Rate %) | |
| QRFS | {1, 4, | 13 (≈58.1%) | {2, | 11 (≈64.5%) | {4, | 14 (≈54.8%) | { | 14 (≈54.8%) |
| DMFS | {4, 5, | 9 (≈71%) | { | 11 (≈64.5%) | { | 11 (≈64.5%) | { | 14 (≈54.8%) |
| EBFS | {5, 6, | 11 (≈64.5%) | { | 12 (≈61.3%) | {4, | 13 (≈58.1%) | All Features | 31 (0%) |
Figure 2A comparison between QRFS, EBFS, and DMFS methods in terms of CPU time using mutagenic, tumorigenic, irritant, and reproductive effects.
Accuracy, sensitivity, specificity and geometric mean (GM) of the proposed model using all features and the selected features using QRFS, DMFS and EBFS rough set methods.
| Assessment Method | Mutagenic Effect | Tumorigenic Effect | Irritant Effect | Reproductive Effect | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| All | QRFS | DMFS | EBFS | All | QRFS | DMFS | EBFS | All | QRFS | DMFS | EBFS | All | QRFS | DMFS | EBFS | |
| Accuracy | 82.6 | 82.5 | 81.4 | 82.5 | 82.3 | 83 | 85 | 85.8 | 80.0 | 68.3 | 67.2 | |||||
| Sensitivity | 49.8 | 46.7 | 47.4 | 35.2 | 34.7 | 36.5 | 27.2 | 29.9 | 20.4 | 52.3 | 52.6 | |||||
| Specificity | 87.4 | 86.9 | 87.9 | 84.9 | 84.3 | 85.9 | 89 | 88.3 | 82.6 | 76.5 | 75.3 | |||||
| GM | 55.2 | 38.3 | 62.6 | 55.4 | 55.7 | 58.2 | 50.5 | 52.4 | 45.5 | 64.8 | 62.7 | |||||
Figure 3ROC curves of the proposed model using all and selected features: (a) Mutagenic effect, (b) Tumorigenic effect, (c) Reproductive effect and (d) Irritant effect.
Figure 4Classification time of the proposed model using the all and selected features.
Figure 5Results of classification tumorigenic and reproductive effects with and without pre-processing using the selected features from EBFS, QRFS and DMFS methods: (a) Tumorigenic effect, (b) Reproductive effect.
Figure 6Results of classification irritant and mutagenic effects with and without pre-processing using the selected features from EBFS, QRFS and DMFS methods: (a) Irritant effect, (b) Mutagenic effect.
Figure 7ROC curves of the proposed model using the original and selected features: (a) Mutagenic effect, (b) Tumorigenic effect, (c) Reproductive effect and (d) Irritant effect.
Figure 8The number of samples in minority and majority classes in the two steps of ITS method using mutagenic effect dataset and the selected features by EBFS algorithm.
Figure 9The between-class and within-class variance of the minority and majority classes in the two steps of ITS method using mutagenic effect dataset and the selected features by EBFS algorithm.