| Literature DB >> 35935266 |
Arthur C Silva1, Joyce V V B Borba1,2, Vinicius M Alves2, Steven U S Hall1, Nicholas Furnham3, Nicole Kleinstreuer4, Eugene Muratov2,5, Alexander Tropsha2, Carolina Horta Andrade1.
Abstract
Eye irritation and corrosion are fundamental considerations in developing chemicals to be used in or near the eye, from cleaning products to ophthalmic solutions. Unfortunately, animal testing is currently the standard method to identify compounds that cause eye irritation or corrosion. Yet, there is growing pressure on the part of regulatory agencies both in the USA and abroad to develop New Approach Methodologies (NAMs) that help reduce the need for animal testing and address unmet need to modernize safety evaluation of chemical hazards. In furthering the development and applications of computational NAMs in chemical safety assessment, in this study we have collected the largest expertly curated dataset of compounds tested for eye irritation and corrosion, and employed this data to build and validate binary and multi-classification Quantitative Structure-Activity Relationships (QSAR) models that can reliably assess eye irritation/corrosion potential of novel untested compounds. QSAR models were generated with Random Forest (RF) and Multi-Descriptor Read Across (MuDRA) machine learning (ML) methods, and validated using a 5-fold external cross-validation protocol. These models demonstrated high balanced accuracy (CCR of 0.68-0.88), sensitivity (SE of 0.61-0.84), positive predictive value (PPV of 0.65-0.90), specificity (SP of 0.56-0.91), and negative predictive value (NPV of 0.68-0.85). Overall, MuDRA models outperformed RF models and were applied to predict compounds' irritation/corrosion potential from the Inactive Ingredient Database, which contains components present in FDA-approved drug products, and from the Cosmetic Ingredient Database, the European Commission source of information on cosmetic substances. All models built and validated in this study are publicly available at the STopTox web portal (https://stoptox.mml.unc.edu/). These models can be employed as reliable tools for identifying potential eye irritant/corrosive compounds.Entities:
Year: 2021 PMID: 35935266 PMCID: PMC9355119 DOI: 10.1016/j.ailsci.2021.100028
Source DB: PubMed Journal: Artif Intell Life Sci ISSN: 2667-3185
Previously published QSAR models of eye irritation.
| Author | Curation | Cross- | Y-rand or | AD | Number of | Metrics | AI/Discriminant | Descriptor | Year | Model |
|---|---|---|---|---|---|---|---|---|---|---|
| Basant et al. [ | Yes | Yes | Yes | Yes | 107 | Training: 77–94% | CT, RT | Padel | 2016 | Unavailable |
| Verma et al. [ | No | No | External set only | Yes | 816 training | CCR = 72.3% | DT | Molecular weight, logP, melting point, aqueous solubility, lipid solubility | 2015 | Unavailable |
| Liew et al. [ | Yes | Yes | External set only | Yes | 2108 split in multiple categories | Training: CCR = 65–100% | SVM, kNN | Padel | 2013 | Publicly available [ |
| Wang et al. [ | Yes | Yes | External set only | Yes | 6015 training 1504 test | CCR = 0.92–95% | ANN, kNN, NB, SVM | Atom pair, estate fingerprint, CDK fingerprints, Klekota–Roth fingerprint, MACCS fingerprint, Pubchem fingerprint and substructure fingerprint | 2017 | Unavailable |
| Jing Lu [ | No | No | External set only | No | 1845 training 496 test | CCR = 68% | Read Across | Codessa | 2017 | Unavailable |
| Geerts et al. [ | No | No | No | No | 80 | CCR = 60–80% | Third-part software | None | 2018 | Unavailable |
| Bhhatarai et al. [ | No | No | No | No | 1644 | CCR = 74–80% | Third-part software | None | 2016 | Unavailable |
| Luechtefeld et al. [ | No | No | External set only | No | 929 | DT, kNN CCR = 73%–100% | Pubchem2d fingerprint | 2016 | Unavailable | |
| Luechtefeld et al. [ | No | Yes | External set only | No | 15,760 | CCR = 98% | Read Across | Pubchem2d fingerprint | 2018 | Unavailable |
| Verma et al. [ | No | No | External set only | No | 2928 | Training: CCR = 85% | ANN | ADMET predictor | 2015 | Unavailable |
| Worth and Cronin [ | No | Yes | No | No | 119 | CCR = 60–73% | LDA, CT, LR | Molecular weight | 2003 | Unavailable |
| Cruz-Monteagudo et al. [ | No | LOO | No | No | 46 | Acc = 80.43% | LDA | LogP | 2006 | Unavailable |
| Solimeo et al. [ | Yes | Yes | No | No | 75 | CCR = 82–92% | RF, kNN | Dragon, MOE | 2012 | Available by request |
| Patlewicz et al. [ | No | No | No | No | 29 | R [ | ANN | Logcmc, logP, molvol, mas $n-mas | 2000 | Unavailable |
| Sugai et al. [ | No | LOO | No | No | 138 | Acc = 86.3%, Validation = 74% | ALS | Physico-chemical descriptors | 1990 | Unavailable |
| Cronin et al. [ | No | No | No | No | 53 | R [ | LDA, LR | ClogP, kappa indices, molecular connectivity indices | 1994 | Unavailable |
| Barratt et al. [ | No | No | No | No | 46 | N/A | PCA | ClogP, mol. vol., Dipole moment, | 1995 | Unavailable |
| Abraham et al. [ | No | No | No | No | 91 | R2 = 0.94 | LR | Liquid vapor pressure, mr, | 1998 | Unavailable |
CT = Classification Trees; RT = Regression Trees; SVM = support vector machines; kNN = k -Nearest Neighbor; ANN = Artificial Neural Networks; NB = Naïve Bayes; LDA = Linear Discrimination Analysis; LR = Linear Regression; RF = Random Forest; PCA = Principal Component Analysis; ALS = Adaptative Least Squares LOO = Leave One Out; CCR = Correct Classification Rate; Acc = Accuracy; N/A = not applicable; R2 = Correlation coefficient.
Compounds must be sent to the authors to be predicted.
Fig. 1.Data compilation and curation workflow.
Fig. 2.Graphical representation of a self-organized map for the chemical space covered by modeling set chemicals. Red circles represent corrosives, yellow circles represent irritants, and green circles represent NC class. Blue-green regions show compounds that share structural similarities compared to their neighbors, and yellow-orange-red regions represent an abrupt change in the chemical structure of the compounds compared to their neighbors. The dataset is notably complex; there are similar compounds belonging to different classes, which makes the construction of multiclassification models a challenge.
Fig. 3.Venn diagram showing the overlap between correct predictions done by all models for the eye irritation dataset.
Fig. 4.Example of compounds correctly predicted only by MuDRA.
Statistical characteristics of multiclass QSAR models for eye irritation and eye corrosion.
| Multiclass models generated with RF modeling method | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Model | Descriptor | CCR | Se | PPV | Sp | NPV | F1 | MCC | Coverage |
| 11 | RDKit | 0.62 | 0.61 | 0.62 | 0.63 | 0.62 | 0.51 | 0.21 | 1 |
| RDKit-AD | 0.60 | 0.60 | 0.60 | 0.60 | 0.61 | 0.52 | 0.22 | 1 | |
| 12 | Dragon | 0.63 | 0.66 | 0.63 | 0.61 | 0.64 | 0.52 | 0.31 | 1 |
| Dragon-AD | 0.60 | 0.61 | 0.61 | 0.58 | 0.58 | 0.52 | 0.31 | 1 | |
| 13 | MACCS | 0.65 | 0.63 | 0.65 | 0.66 | 0.64 | 0.55 | 0.38 | 1 |
| MACCS-AD | 0.64 | 0.67 | 0.64 | 0.62 | 0.64 | 0.56 | 0.39 | 1 | |
| 14 | Morgan | 0.63 | 0.67 | 0.63 | 0.60 | 0.64 | 0.50 | 0.39 | 1 |
| Morgan-AD | 0.60 | 0.71 | 0.62 | 0.49 | 0.60 | 0.51 | 0.39 | 1 | |
| Multiclass model generated with MuDRA modeling method | |||||||||
| Model | Descriptor | CCR | Se | PPV | Sp | NPV | F1 | MCC | Coverage |
| 15 | Multi | 0.74 | 0.60 | 0.84 | 0.87 | 0.89 | 0.62 | 0.57 | 1 |
Statistical characteristics of binary QSAR models for eye irritation and eye corrosion assessed by 5-fold cross-validation.
| Binary models for eye irritation generated with RF algorithm | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Model | Descriptor | CCR | Se | PPV | Sp | NPV | F1 | MCC | Coverage |
| 1 | RDKit | 0.76 | 0.77 | 0.76 | 0.76 | 0.77 | 0.77 | 0.53 | 1 |
| RDKit-AD | 0.77 | 0.78 | 0.77 | 0.76 | 0.77 | 0.77 | 0.53 | 0.96 | |
| 2 | Morgan | 0.77 | 0.84 | 0.73 | 0.69 | 0.81 | 0.78 | 0.53 | 1 |
| Morgan-AD | 0.72 | 0.88 | 0.70 | 0.56 | 0.81 | 0.78 | 0.47 | 0.91 | |
| 3 | Dragon | 0.77 | 0.78 | 0.77 | 0.77 | 0.78 | 0.78 | 0.55 | 1 |
| Dragon-AD | 0.77 | 0.79 | 0.76 | 0.75 | 0.78 | 0.77 | 0.54 | 0.97 | |
| 4 | MACCS | 0.77 | 0.80 | 0.75 | 0.73 | 0.79 | 0.77 | 0.53 | 1 |
| MACCS-AD | 0.76 | 0.81 | 0.74 | 0.71 | 0.79 | 0.77 | 0.52 | 0,99 | |
| Binary model for eye irritation generated with MuDRA algorithm | |||||||||
| Model | Descriptor | CCR | Se | PPV | Sp | NPV | F1 | MCC | Coverage |
| 5 | Multi | 0.88 | 0.89 | 0.90 | 0.86 | 0.85 | 0.90 | 0.76 | 1 |
| Binary models for eye corrosion generated with RF algorithm | |||||||||
| Model | Descriptor | CCR | Se | PPV | Sp | NPV | F1 | MCC | Coverage |
| 6 | RDKit | 0.70 | 0.70 | 0.71 | 0.71 | 0.70 | 0.70 | 0.41 | 1 |
| RDKit-AD | 0.75 | 0.61 | 0.86 | 0.89 | 0.68 | 0.72 | 0.52 | 0.88 | |
| 7 | Morgan | 0.68 | 0.76 | 0.65 | 0.59 | 0.71 | 0.70 | 0.36 | 1 |
| Morgan-AD | 0.75 | 0.76 | 0.81 | 0.75 | 0.69 | 0.78 | 0.5 | 0.85 | |
| 8 | Dragon | 0.72 | 0.73 | 0.72 | 0.71 | 0.73 | 0.72 | 0.44 | 1 |
| Dragon-AD | 0.76 | 0.67 | 0.84 | 0.86 | 0.69 | 0.75 | 0.54 | 0.92 | |
| 9 | MACCS | 0.76 | 0.73 | 0.78 | 0.79 | 0.74 | 0.75 | 0.52 | 1 |
| MACCS-AD | 0.77 | 0.64 | 0.88 | 0.91 | 0.71 | 0.74 | 0.57 | 0.98 | |
| Binary model for eye corrosion generated with MuDRA algorithm | |||||||||
| Model | Descriptor | CCR | Se | PPV | Sp | NPV | F1 | MCC | Coverage |
| 10 | Multi | 0.85 | 0.84 | 0.86 | 0.86 | 0.83 | 0.85 | 0.69 | 1 |