| Literature DB >> 26006240 |
Swapnil Chavan1, Ran Friedman2, Ian A Nicholls3,4.
Abstract
A k-nearest neighbor (k-NN) classification model was constructed for 118 RDT NEDO (Repeated Dose Toxicity New Energy and industrial technology Development Organization; currently known as the Hazard Evaluation Support System (HESS)) database chemicals, employing two acute toxicity (LD50)-based classes as a response and using a series of eight PaDEL software-derived fingerprints as predictor variables. A model developed using Estate type fingerprints correctly predicted the LD50 classes for 70 of 94 training set chemicals and 19 of 24 test set chemicals. An individual category was formed for each of the chemicals by extracting its corresponding k-analogs that were identified by k-NN classification. These categories were used to perform the read-across study for prediction of the chronic toxicity, i.e., Lowest Observed Effect Levels (LOEL). We have successfully predicted the LOELs of 54 of 70 training set chemicals (77%) and 14 of 19 test set chemicals (74%) to within an order of magnitude from their experimental LOEL values. Given the success thus far, we conclude that if the k-NN model predicts LD50 classes correctly for a certain chemical, then the k-analogs of such a chemical can be successfully used for data gap filling for the LOEL. This model should support the in silico prediction of repeated dose toxicity.Entities:
Keywords: Estate fingerprint; LD50; LOEL; category formation; classification model; k-nearest neighbor; read-across
Mesh:
Year: 2015 PMID: 26006240 PMCID: PMC4463722 DOI: 10.3390/ijms160511659
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Parameters of eight k-NN classification models.
| Entry | Fingerprint | NER | Sensitivity | Specificity | ||||
|---|---|---|---|---|---|---|---|---|
| Class 1 | Class 2 | Class 1 | Class 2 | |||||
| 1 | CDK | Fitting | 0.69 | 9 | 0.77 | 0.61 | 0.61 | 0.77 |
| CV | 0.76 | 9 | 0.84 | 0.68 | 0.68 | 0.84 | ||
| External | 0.74 | 9 | 0.79 | 0.70 | 0.70 | 0.79 | ||
| 2 | Estate | Fitting | 0.75 | 3 | 0.73 | 0.76 | 0.76 | 0.73 |
| CV | 0.74 | 3 | 0.77 | 0.71 | 0.71 | 0.77 | ||
| External | 0.81 | 3 | 0.71 | 0.90 | 0.90 | 0.71 | ||
| 3 | Extended CDK | Fitting | 0.70 | 9 | 0.80 | 0.61 | 0.61 | 0.80 |
| CV | 0.74 | 9 | 0.80 | 0.68 | 0.68 | 0.80 | ||
| External | 0.79 | 9 | 0.79 | 0.80 | 0.80 | 0.79 | ||
| 4 | Graph | Fitting | 0.68 | 3 | 0.75 | 0.61 | 0.61 | 0.75 |
| CV | 0.70 | 3 | 0.79 | 0.61 | 0.61 | 0.79 | ||
| External | 0.76 | 3 | 0.71 | 0.80 | 0.80 | 0.71 | ||
| 5 | Klekoth-Roth | Fitting | 0.68 | 10 | 0.70 | 0.66 | 0.66 | 0.70 |
| CV | 0.77 | 10 | 0.77 | 0.76 | 0.76 | 0.77 | ||
| External | 0.72 | 10 | 0.64 | 0.80 | 0.80 | 0.64 | ||
| 6 | MACCS | Fitting | 0.76 | 7 | 0.79 | 0.74 | 0.74 | 0.79 |
| CV | 0.73 | 7 | 0.77 | 0.68 | 0.68 | 0.77 | ||
| External | 0.72 | 7 | 0.64 | 0.80 | 0.80 | 0.64 | ||
| 7 | Pubchem | Fitting | 0.77 | 1 | 0.80 | 0.74 | 0.74 | 0.80 |
| CV | 0.73 | 1 | 0.77 | 0.68 | 0.68 | 0.77 | ||
| External | 0.77 | 1 | 0.64 | 0.90 | 0.90 | 0.64 | ||
| 8 | Substructure | Fitting | 0.68 | 5 | 0.73 | 0.63 | 0.63 | 0.73 |
| CV | 0.66 | 5 | 0.80 | 0.53 | 0.53 | 0.80 | ||
| External | 0.71 | 5 | 0.71 | 0.70 | 0.70 | 0.71 | ||
Summary of predicted lowest observed effect levels (LOELs) of all training set queries obtained by arithmetic means of LOELs of corresponding k-nearest analogs (3 analogs) from Estate fingerprint based k-NN model.
| Entry | LOEL | Fold_diff | ||||
|---|---|---|---|---|---|---|
| Query | Analog 1 | Analog 2 | Analog 3 | Predicted | ||
| 1 | 30 | 50 | 625 | 1.2 | 225.40 | 7.51 |
| 2 | 50 | 30 | 625 | 1.2 | 218.73 | 4.37 |
| 3 | 200 | 100 | 750 | 150 | 333.33 | 1.67 |
| 4 | 10 | 10 | 250 | 30 | 96.67 | 9.67 |
| 5 | 30 | 20 | 30 | 30 | 26.67 | 1.12 |
| 6 | 70 | 300 | 20 | 20 | 113.33 | 1.62 |
| 7 | 5 | 150 | 200 | 6 | 118.67 | 23.73 |
| 8 | 100 | 200 | 750 | 200 | 383.33 | 3.83 |
| 9 | 150 | 200 | 100 | 30 | 110.00 | 1.36 |
| 10 | 1000 | 1000 | 11 | 100 | 370.33 | 2.70 |
| 11 | 30 | 1000 | 100 | 150 | 416.67 | 13.89 |
| 12 | 0.75 | 5 | 6 | 50 | 20.33 | 27.11 |
| 13 | 30 | 20 | 200 | 10 | 76.67 | 2.56 |
| 14 | 3130 | 1000 | 100 | 600 | 566.67 | 1.84 |
| 15 | 100 | 300 | 750 | 40 | 363.33 | 3.63 |
| 16 | 10 | 10 | 250 | 30 | 96.67 | 9.67 |
| 17 | 30 | 60 | 60 | 50 | 56.67 | 1.89 |
| 18 | 1000 | 30 | 1000 | 1000 | 676.67 | 1.47 |
| 19 | 60 | 30 | 60 | 50 | 46.67 | 1.29 |
| 20 | 600 | 3130 | 160 | 62.5 | 1117.50 | 1.86 |
| 21 | 20 | 30 | 200 | 10 | 80.00 | 4.00 |
| 22 | 750 | 1000 | 100 | 200 | 433.33 | 1.73 |
| 23 | 25 | 30 | 250 | 10 | 96.67 | 3.87 |
| 24 | 200 | 300 | 100 | 750 | 383.33 | 1.92 |
| 25 | 1000 | 3130 | 100 | 100 | 1110.00 | 1.11 |
| 26 | 250 | 200 | 3130 | 70 | 1133.33 | 4.53 |
| 27 | 200 | 30 | 20 | 200 | 83.33 | 2.40 |
| 28 | 300 | 200 | 200 | 100 | 166.67 | 1.80 |
| 29 | 160 | 600 | 1000 | 1000 | 866.67 | 5.42 |
| 30 | 350 | 1000 | 625 | 625 | 750.00 | 2.14 |
| 31 | 60 | 200 | 750 | 40 | 330.00 | 5.50 |
| 32 | 100 | 240 | 250 | 10 | 166.67 | 1.67 |
| 33 | 100 | 200 | 100 | 1000 | 433.33 | 4.33 |
| 34 | 30 | 30 | 30 | 100 | 53.33 | 1.78 |
| 35 | 3130 | 2500 | 10 | 350 | 953.33 | 1.09 |
| 36 | 1000 | 1000 | 11 | 30 | 347.00 | 2.88 |
| 37 | 40 | 30 | 60 | 60 | 50.00 | 1.25 |
| 38 | 100 | 40 | 20 | 30 | 30.00 | 1.11 |
| 39 | 1000 | 750 | 100 | 100 | 316.67 | 1.05 |
| 40 | 300 | 70 | 20 | 250 | 113.33 | 2.65 |
| 41 | 200 | 150 | 100 | 30 | 93.33 | 2.14 |
| 42 | 30 | 100 | 300 | 200 | 200.00 | 6.67 |
| 43 | 30 | 30 | 20 | 30 | 26.67 | 1.12 |
| 44 | 5 | 10 | 3130 | 2500 | 1880.00 | 376.00 |
| 45 | 40 | 100 | 300 | 1000 | 466.67 | 11.67 |
| 46 | 2 | 20 | 30 | 30 | 26.67 | 13.33 |
| 47 | 1.2 | 30 | 625 | 500 | 385.00 | 320.83 |
| 48 | 240 | 100 | 250 | 10 | 120.00 | 2 |
| 49 | 6 | 11 | 100 | 150 | 87.00 | 14.50 |
| 50 | 250 | 100 | 240 | 40 | 126.67 | 1.97 |
| 51 | 11 | 1000 | 6 | 1000 | 668.67 | 60.79 |
| 52 | 2 | 30 | 1000 | 30 | 353.33 | 176.67 |
| 53 | 62.5 | 6 | 600 | 3130 | 1245.33 | 19.93 |
| 54 | 100 | 1000 | 1000 | 300 | 766.67 | 7.67 |
| 55 | 40 | 200 | 100 | 150 | 150.00 | 3.75 |
| 56 | 10 | 200 | 100 | 200 | 166.67 | 16.67 |
| 57 | 20 | 70 | 300 | 1000 | 456.67 | 22.83 |
| 58 | 200 | 300 | 100 | 100 | 166.67 | 1.20 |
| 59 | 300 | 100 | 1000 | 40 | 380.00 | 1.27 |
| 60 | 100 | 1000 | 30 | 1000 | 676.67 | 6.77 |
| 61 | 30 | 30 | 1000 | 0.78 | 343.59 | 11.45 |
| 62 | 20 | 70 | 20 | 300 | 130.00 | 6.50 |
| 63 | 20 | 30 | 30 | 100 | 53.33 | 2.67 |
| 64 | 30 | 100 | 150 | 30 | 93.33 | 3.11 |
| 65 | 500 | 250 | 1.2 | 10 | 87.07 | 1.91 |
| 66 | 200 | 781 | 40 | 60 | 293.67 | 1.47 |
| 67 | 1000 | 1000 | 100 | 240 | 446.67 | 2.23 |
| 68 | 625 | 1.2 | 30 | 60 | 30.40 | 6.85 |
| 69 | 10 | 2500 | 3130 | 1000 | 2210.00 | 221.00 |
| 70 | 2500 | 10 | 3130 | 1000 | 1380.00 | 1.81 |
| 71 | 100 | 100 | 150 | 30 | 93.33 | 1.07 |
| 72 | 60 | 30 | 60 | 50 | 46.67 | 1.29 |
| 73 | 50 | 30 | 60 | 60 | 50.00 | 1.00 |
| 74 | 1000 | 350 | 1000 | 30 | 460.00 | 2.17 |
| 75 | 625 | 625 | 625 | 350 | 533.33 | 1.17 |
| 76 | 0.78 | 350 | 625 | 625 | 533.33 | 683.76 |
| 77 | 40 | 100 | 240 | 250 | 196.67 | 4.92 |
| 78 | 5 | 5 | 10 | 3130 | 1048.33 | 209.67 |
| 79 | 30 | 1.2 | 625 | 500 | 375.40 | 12.51 |
| 80 | 250 | 20 | 100 | 500 | 206.67 | 1.21 |
| 81 | 2 | 30 | 60 | 60 | 50.00 | 25.00 |
| 82 | 250 | 10 | 100 | 240 | 116.67 | 2.14 |
| 83 | 30 | 10 | 10 | 250 | 90.00 | 3.00 |
| 84 | 20 | 100 | 100 | 750 | 316.67 | 15.83 |
| 85 | 6 | 62.5 | 3130 | 70 | 1087.50 | 181.25 |
| 86 | 60 | 781 | 200 | 350 | 443.67 | 7.39 |
| 87 | 6 | 625 | 625 | 15 | 421.67 | 70.28 |
| 88 | 30 | 30 | 60 | 60 | 50.00 | 1.67 |
| 89 | 100 | 625 | 625 | 15 | 421.67 | 4.22 |
| 90 | 100 | 150 | 30 | 200 | 126.67 | 1.27 |
| 91 | 781 | 60 | 200 | 30 | 96.67 | 2.69 |
| 92 | 625 | 625 | 625 | 350 | 533.33 | 1.17 |
| 93 | 15 | 100 | 625 | 6 | 243.67 | 16.24 |
| 94 | 625 | 625 | 625 | 350 | 533.33 | 1.17 |
Summary of predicted LOELs of all test set queries from the Estate fingerprint-based k-NN model.
| Sr. | Query | Analog-1 | LOEL Analog-2 | Analog-3 | Predicted | Fold_diff |
|---|---|---|---|---|---|---|
| 1 | 30 | 30 | 30 | 20 | 26.67 | 1.13 |
| 2 | 30 | 30 | 1000 | 6 | 345.33 | 11.51 |
| 3 | 1.5 | 200 | 5 | 0.75 | 68.58 | 45.72 |
| 4 | 1250 | 750 | 1000 | 100 | 616.67 | 2.03 |
| 5 | 50 | 781 | 60 | 250 | 363.67 | 7.27 |
| 6 | 0.1 | 30 | 10 | 10 | 16.67 | 166.67 |
| 7 | 1000 | 1000 | 1000 | 11 | 670.33 | 1.49 |
| 8 | 20 | 150 | 200 | 6 | 118.67 | 5.93 |
| 9 | 20 | 5 | 10 | 5 | 6.67 | 3.00 |
| 10 | 100 | 250 | 100 | 500 | 283.33 | 2.83 |
| 11 | 110 | 1000 | 1000 | 11 | 670.33 | 6.09 |
| 12 | 1000 | 1000 | 100 | 750 | 616.67 | 1.62 |
| 13 | 33 | 30 | 10 | 10 | 16.67 | 1.98 |
| 14 | 30 | 3130 | 2500 | 200 | 1943.33 | 64.78 |
| 15 | 10 | 781 | 60 | 30 | 290.33 | 29.03 |
| 16 | 300 | 100 | 300 | 40 | 146.67 | 2.05 |
| 17 | 2 | 30 | 30 | 10 | 23.33 | 11.67 |
| 18 | 200 | 350 | 1000 | 1000 | 783.33 | 3.92 |
| 19 | 125 | 10 | 250 | 40 | 100.00 | 1.25 |
| 20 | 50 | 1000 | 100 | 30 | 376.67 | 7.53 |
| 21 | 100 | 350 | 625 | 6 | 327.00 | 3.27 |
| 22 | 10 | 6 | 15 | 10 | 10.33 | 1.03 |
| 23 | 150 | 1000 | 750 | 200 | 650.00 | 4.33 |
| 24 | 4 | 6 | 625 | 625 | 418.67 | 104.67 |
Training set queries sorted (in qualified and non-qualified categories) based on its k-NN model-based predicted class, and further divided based upon order of magnitude difference.
| Fold_diff# | Number of Queries | Total | |
|---|---|---|---|
| Qualified Category | Non-Qualified Category | ||
| <10 | 54 | 17 | 71 |
| 10–100 | 12 | 4 | 16 |
| >100 | 4 | 3 | 7 |
|
| 70 | 24 | 94 |
# over of magnitude, fold differences (Fold_diff) < 10, 10–100 and >100.
Test set query categorization (qualified and non-qualified) based on k-NN model-based predicted class, and further divided based upon order of magnitude difference.
| Fold_diff# | Number of Queries | Total | |
|---|---|---|---|
| Qualified Category | Non-Qualified Category | ||
| <10 | 14 | 3 | 17 |
| 10–100 | 4 | 1 | 5 |
| >100 | 1 | 1 | 2 |
|
| 19 | 5 | 24 |
# over of magnitude, fold differences (Fold_diff) <10, 10–100 and >100.
Figure 1Summary of LOEL prediction for the training set queries from the qualified category.
Test set query categories with their 3 respective analogs.
| Entry | Data | Query | Analog 1 | Analog 2 | Analog 3 | LOEL Predicted | Fold_diff |
|---|---|---|---|---|---|---|---|
| 3 | Structure | ||||||
| LD50 | 64 | 1525 | 1000 | 26 | |||
| LOEL | 1.5 | 200 | 5 | 0.75 | 68.58 | 45.72 | |
| 22 | Structure | ||||||
| LD50 | 400 | 640 | 535 | 256 | |||
| LOEL | 10 | 6 | 15 | 10 | 10.33 | 1.03 | |
| 24 | Structure | ||||||
| LD50 | 953 | 640 | 891 | 1072 | |||
| LOEL | 4 | 6 | 625 | 625 | 418.67 | 104.67 |
Literature survey of QSAR models for prediction of repeated dose toxicity endpoint.
| Method | Training Set Chemicals | Test Set Chemicals | Training Set Prediction | Test Set Prediction | Comment | Reference |
|---|---|---|---|---|---|---|
| Multivariate analysis | 234 | none | 95% within factor of 5 | none | No external prediction | [ |
| MLR | 234 | none | none | [ | ||
| MLR | 86 | 16 | [ | |||
| PLS | 445 | none | none | No external prediction | [ | |
| Read-across | 500 | none | none | none | 33 chemical categories formed | [ |
| 254 | 179 | [ |
Classifications of the 118 chemicals in the training and test sets prior to k-NN model construction.
| Description | LD50 (mg/kg/day) | Number of Entries | Training Set Entries | Test Set Entries | |
|---|---|---|---|---|---|
| Class 1 | Highly toxic, toxic and harmful | ≤2000 | 70 | 56 | 14 |
| Class 2 | Non-harmful | >2000 | 48 | 38 | 10 |