| Literature DB >> 30357358 |
Masamitsu Honma1, Airi Kitazawa1, Alex Cayley2, Richard V Williams2, Chris Barber2, Thierry Hanser2, Roustem Saiakhov3, Suman Chakravarti3, Glenn J Myatt4, Kevin P Cross4, Emilio Benfenati5, Giuseppa Raitano5, Ovanes Mekenyan6, Petko Petkov6, Cecilia Bossa7, Romualdo Benigni7,8, Chiara Laura Battistelli7, Alessandro Giuliani7, Olga Tcheremenskaia7, Christine DeMeo9, Ulf Norinder10,11, Hiromi Koga12, Ciloy Jose12, Nina Jeliazkova13, Nikolay Kochev13,14, Vesselina Paskaleva14, Chihae Yang15, Pankaj R Daga16, Robert D Clark16, James Rathman15,17.
Abstract
The International Conference on Harmonization (ICH) M7 guideline allows the use of in silico approaches for predicting Ames mutagenicity for the initial assessment of impurities in pharmaceuticals. This is the first international guideline that addresses the use of quantitative structure-activity relationship (QSAR) models in lieu of actual toxicological studies for human health assessment. Therefore, QSAR models for Ames mutagenicity now require higher predictive power for identifying mutagenic chemicals. To increase the predictive power of QSAR models, larger experimental datasets from reliable sources are required. The Division of Genetics and Mutagenesis, National Institute of Health Sciences (DGM/NIHS) of Japan recently established a unique proprietary Ames mutagenicity database containing 12140 new chemicals that have not been previously used for developing QSAR models. The DGM/NIHS provided this Ames database to QSAR vendors to validate and improve their QSAR tools. The Ames/QSAR International Challenge Project was initiated in 2014 with 12 QSAR vendors testing 17 QSAR tools against these compounds in three phases. We now present the final results. All tools were considerably improved by participation in this project. Most tools achieved >50% sensitivity (positive prediction among all Ames positives) and predictive power (accuracy) was as high as 80%, almost equivalent to the inter-laboratory reproducibility of Ames tests. To further increase the predictive power of QSAR tools, accumulation of additional Ames test data is required as well as re-evaluation of some previous Ames test results. Indeed, some Ames-positive or Ames-negative chemicals may have previously been incorrectly classified because of methodological weakness, resulting in false-positive or false-negative predictions by QSAR tools. These incorrect data hamper prediction and are a source of noise in the development of QSAR models. It is thus essential to establish a large benchmark database consisting only of well-validated Ames test results to build more accurate QSAR models.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30357358 PMCID: PMC6402315 DOI: 10.1093/mutage/gey031
Source DB: PubMed Journal: Mutagenesis ISSN: 0267-8357 Impact factor: 3.000
Participants in Ames/QSAR International Challenge Project
| QSAR vendor | QSAR tool | Methodology |
|---|---|---|
| 1. Lhasa Limited (UK) | a. Derek Nexus | Rule |
| b. Sarah Nexus | Statistical | |
| 2. MultiCASE Inc (USA) | c. CASE Ultra statistical-based | Statistical |
| d. CASE Ultra rule-based | Rule | |
| 3. Leadscope Inc (USA) | e. Leadscope statistical-based | Statistical |
| f. Leadscope rule-based | Rule | |
| 4. Istituto di Ricerche Farmacologiche Mario Negri IRCCS (Italy) | g. CAESAR | Statistical |
| h. SARPY | Rule | |
| i. KNN | Statistical | |
| 5. LMC - Bourgas University (Bulgaria) | j. TIMES_AMES | Rule |
| 6. Istituto Superiore di Sanita (Italy) | k. Toxtree | Rule |
| 7. Prous Institute (Spain) | l. Symmetry | Statistical |
| 8. Swedish Toxicology Science Research Center (Sweden) | m. AZAMES | Statistical |
| 9. Fujitsu Kyushu Systems Limited (Japan) | n. ADMEWORKS | Statistical |
| 10. IdeaConsult Ltd. (Bulgaria) | o. AMBIT | Statistical |
| 11. Molecular Networks GmbH and Altamira LLC (USA) | p. ChemTune•ToxGPS | Statistical |
| 12. Simulations Plus, Inc (USA) | q. MUT_Risk | Statistical |
Number of Chemicals in Ames/QSAR International Challenge Project
| Class | Phase I (2014–2015) | Phase II (2015–2016) | Phase III (2016–2017) | Total (2014–2017) |
|---|---|---|---|---|
| Class A | 183 (4.7%) | 253 (6.6%) | 236 (5.4%) | 672 (5.5%) |
| Class B | 383 (9.8%) | 309 (8.1%) | 393(8.9%) | 1085 (8.9%) |
| Class C | 3336 (85.5%) | 3267 (85.3%) | 3780 (85.7%) | 10383 (85.6%) |
| Total | 3902 | 3829 | 4409 | 12140 |
| Only chemicals with >800 mw | ||||
| Class A | 3 (1.8) | 1 (2.4) | 2 (0.8) | 6 (1.3) |
| Class B | 16 (9.7) | 1 (2.4) | 9 (3.8) | 26 (5.8) |
| Class C | 146 (88.5) | 39 (95.1) | 229 (95.4) | 414 (92.8) |
| Subtotal | 165 | 41 | 240 | 446 |
2 × 2 (2 × 3) contingency matrix for Ames mutagenicity classification
| Experimental Ames mutagenicity class | ||||
|---|---|---|---|---|
| Class A (Strong positive) | Class B (Positive) | Class C (Negative) | ||
| QSAR Prediction Class | Positive | True Positive (A) (TPA) | True Positive (B) (TPB) | False Positive (FP) |
| Negative | False Negative (FN) | True Negative (TN) | ||
| True Positive (TP) = TPA + TPB | ||||
Performance metrics used to evaluate classifiers
| Performance metric | Calculation and description |
|---|---|
| A-Sensitivity (A-SENS) |
|
| Sensitivity (SENS) |
|
| Specificity (SPEC) |
|
| Accuracy (ACC) |
|
| Balanced Accuracy (BA) |
|
| Positive Prediction Value (PPV) |
|
| Negative Prediction Value (NPV) |
|
| Mathews Correlation Coefficient (MCC) |
|
| Coverage (COV) |
|
Summary of the performance metrics of QSAR tools in Phase I challenge with 3902 chemicals
| Vendors | QSAR tools (module) | A-SENS (%) | SENS (%) | SPEC (%) | ACC (%) | BA (%) | PPV (%) | NPV (%) | MCC | COV (%) |
|---|---|---|---|---|---|---|---|---|---|---|
| Lhasa Limited | *Derek_Nexus v.4.0.5 | 72.1 | 58.8 | 86.2 | 82.2 | 72.5 | 41.9 | 92.5 | 0.39 | 100.0 |
| *Sarah Nexus v. 1.2 (model v.1.1.2) | 64.7 | 51.2 | 82.0 | 77.6 | 66.6 | 32.3 | 91.0 | 0.28 | 80.0 | |
| MultiCASE Inc | *BM_PHARMA v1.5.2.0 (Statistical approach; SALM/ ECOLI consensus) | 65.5 | 52.9 | 84.7 | 80.1 | 68.8 | 37.1 | 91.3 | 0.33 | 90.5 |
| *GT_EXPERT v1.5.2.0 (Rule based) | 81.3 | 69.8 | 75.0 | 74.2 | 72.4 | 32.8 | 93.4 | 0.34 | 91.0 | |
| Leadscope Inc | *Statistical-based (QSAR) | 76.2 | 58.7 | 83.2 | 79.8 | 71.0 | 35.8 | 92.6 | 0.34 | 86.0 |
| *Rule-based (alerts) | 74.1 | 60.2 | 79.0 | 76.3 | 69.6 | 32.0 | 92.3 | 0.31 | 94.3 | |
| Istituto di Ricerche Farmacologiche Mario Negri IRCCS | *CAESAR (training set based on Katius et al.) | 76.9 | 69.5 | 64.8 | 65.5 | 67.2 | 25.1 | 92.6 | 0.25 | 99.6 |
| *SARPY (training set based on Katius et al.) | 68.1 | 61.9 | 66.3 | 65.7 | 64.1 | 23.8 | 91.1 | 0.20 | 99.6 | |
| LMC-Bourgas University | TIMES-AMES (in domain TIMES model) | 79.4 | 49.5 | 88.8 | 81.8 | 68.9 | 46.1 | 89.7 | 0.37 | 14.5 |
| *TIMES-AMES (including out of domain) | 59.3 | 50.4 | 76.9 | 73.0 | 63.7 | 26.9 | 90.1 | 0.22 | 99.9 | |
| Istituto Superiore di Sanita | *ToxTree 2.6.6 | 74.9 | 65.3 | 68.0 | 67.6 | 66.7 | 25.7 | 92.0 | 0.24 | 99.9 |
| Prous Institute | *Symmetry | 51.4 | 43.8 | 80.3 | 75.0 | 62.1 | 27.4 | 89.4 | 0.20 | 99.9 |
| Swedish Toxicology Science Research Center | *Swetox AZAMES | 56.2 | 38.6 | 91.5 | 83.9 | 65.1 | 43.1 | 89.9 | 0.32 | 97.1 |
| Fujitsu Kyushu Systems Limited | *ADMEWORKS/Predictor Ames-V71 | 58.5 | 46.5 | 80.1 | 76.0 | 63.3 | 24.2 | 91.6 | 0.20 | 57.7 |
| IdeaConsult Ltd. | *Ambit consensus model | 59.1 | 43.6 | 86.1 | 80.0 | 64.9 | 34.4 | 90.1 | 0.27 | 93.6 |
| Molecular Networks GmbH and Altamira LLC | *ChemTune•ToxGPS Ames | 77.7 | 65.7 | 76.1 | 74.5 | 70.9 | 32.9 | 92.6 | 0.33 | 90.3 |
| Simulations Plus, Inc | *MUT_Risk-0 | 82.8 | 70.0 | 62.5 | 63.6 | 66.3 | 24.8 | 92.2 | 0.23 | 83.6 |
| MUT_Risk-1 | 62.0 | 48.0 | 84.3 | 78.9 | 66.2 | 35.1 | 90.2 | 0.29 | 83.6 |
*The QSAR tool with the module was statistically evaluated in Table 8 and Figure 1.
Averages and ranges of the performance metrics of QSAR tools in the Ames/QSAR challenge project
| Phase I | Phase II | Phase III | |
|---|---|---|---|
| A-Sensitivity (%) | 68.7 (51.4–82.8) | 73.2 (55.3–89.5) | 70.2 (42.7–78.6) |
| Sensitivity (%) | 56.7 (38.6–70.0) | 58.0 (41.6–72.1) | 57.1 (31.7–67.6) |
| Specificity (%) | 77.7 (62.5–91.5) | 84.2 (64.9–92.8) | 79.9 (60.7–93.0) |
| Accuracy (%) | 74.7 (63.6–83.9) | 80.3 (65.8–87.7) | 76.7 (61.1–87.3) |
| Balanced accuracy (%) | 67.2 (62.1–72.5) | 71.1 (64.0–78.9) | 68.5 (62.0–74.4) |
| MCC | 0.28 (0.20–0.39) | 0.37 (0.24–0.50) | 0.31 (0.17–0.44) |
| Coverage (%) | 91.4 (57.7–100) | 89.1 (22.7–100) | 92.3 (74.5–100) |
Figure 1.Receiver operating characteristic (ROC) graph of Ames mutagenicity prediction for the QSAR tools evaluated in this study. Sensitivity to Class A or Class A + B chemical and specificity to class C chemicals are presented. Each dot represents a QSAR tool used.
Summary of the performance metrics of QSAR tools in Phase II challenge with 3829 chemicals
| Vendors | QSAR tools (module) | A-SENS (%) | SENS (%) | SPEC (%) | ACC (%) | BA (%) | PPV (%) | NPV (%) | MCC | COV (%) |
|---|---|---|---|---|---|---|---|---|---|---|
| Lhasa Limited | *Derek_Nexus v.4.2.0 | 73.5 | 54.3 | 90.1 | 84.8 | 72.2 | 48.4 | 92.0 | 0.42 | 100.0 |
| Sarah Nexus v. 1.2 (model v.1.1.2) | 61.0 | 46.0 | 88.2 | 82.2 | 67.1 | 39.0 | 90.9 | 0.32 | 82.9 | |
| Sarah Nexus v. 2.0.1 (model v.1.1.19) | 63.7 | 48.3 | 89.0 | 83.4 | 68.7 | 41.4 | 91.5 | 0.35 | 84.1 | |
| *Sarah Nexus v. 2.0.1 (model v.1.1.19)+NIHS1 | 66.5 | 52.4 | 88.6 | 83.5 | 70.5 | 42.7 | 92.0 | 0.38 | 83.3 | |
| MultiCASE Inc | *BM_PHARMA v1.5.2.0 (Statistical approach; SALM/ECOLI consensus) | 89.5 | 72.1 | 85.6 | 83.5 | 78.9 | 48.4 | 94.2 | 0.50 | 65.3 |
| *GT_EXPERT v1.5.2.0 (Rule based) | 84.1 | 67.9 | 83.1 | 80.8 | 75.5 | 42.1 | 93.5 | 0.43 | 89.4 | |
| Leadscope Inc | *Statistical-based QSAR (rebuild I) | 79.0 | 63.9 | 88.0 | 84.5 | 76.0 | 47.2 | 93.6 | 0.46 | 90.7 |
| *Rule-based (Alerts) | 76.6 | 57.8 | 90.6 | 85.8 | 74.2 | 51.6 | 92.6 | 0.46 | 93.8 | |
| Istituto di Ricerche Farmacologiche Mario Negri IRCCS | *CAESAR (training set based on Katius et al.) | 80.6 | 67.8 | 74.2 | 73.3 | 71.0 | 31.2 | 93.1 | 0.32 | 99.9 |
| *SARPY (training set based on Katius et al. + Phase I) | 61.6 | 45.9 | 90.1 | 83.7 | 68.0 | 44.0 | 90.7 | 0.35 | 97.7 | |
| *KNN (training set based on Hansen et al. + Phase I) | 55.3 | 41.6 | 89.5 | 82.4 | 65.5 | 40.7 | 89.8 | 0.31 | 98.8 | |
| LMC-Bourgas University | TIMES-AMES (In domain TIMES model) | 80.0 | 51.0 | 93.5 | 87.1 | 72.3 | 58.2 | 91.5 | 0.47 | 18.0 |
| *TIMES-AMES (Including out of domain) | 60.5 | 47.5 | 83.8 | 78.4 | 65.7 | 33.4 | 90.3 | 0.27 | 98.0 | |
| Istituto Superiore di Sanita | *ToxTree 2.6.6 | 73.9 | 59.3 | 78.1 | 75.3 | 68.7 | 31.7 | 91.8 | 0.30 | 100.0 |
| Prous Institute | *Symmetry | 81.7 | 61.9 | 85.9 | 82.4 | 73.9 | 43.0 | 92.9 | 0.41 | 99.9 |
| Swedish Toxicology Science Research Center | *Swetox AZAMES | 76.8 | 56.5 | 92.8 | 87.7 | 74.7 | 56.3 | 92.9 | 0.49 | 93.1 |
| Fujitsu Kyushu Systems Limited | *ADMEWORKS/Predictor Ames-V71 | 68.3 | 55.4 | 79.3 | 74.7 | 67.4 | 39.1 | 88.1 | 0.31 | 22.7 |
| IdeaConsult Ltd. | *Ambit consensus model | 58.5 | 44.8 | 83.1 | 77.5 | 64.0 | 31.4 | 89.8 | 0.24 | 100.0 |
| Molecular Networks GmbH and Altamira LLC | ChemTunes•ToxGPS Ames (original) | 70.5 | 56.9 | 91.6 | 86.5 | 74.3 | 54.4 | 92.4 | 0.48 | 96.2 |
| *ChemTunes•ToxGPS Ames (enhanced) | 77.1 | 66.5 | 83.9 | 81.3 | 75.2 | 42.3 | 93.4 | 0.42 | 92.7 | |
| Simulations Plus, Inc | *MUT_Risk-0 | 81.7 | 71.0 | 64.9 | 65.8 | 68.0 | 27.4 | 92.3 | 0.27 | 89.9 |
*The QSAR tool with the module was statistically evaluated in Table 8 and Figure 1.
Summary of the performance metrics of QSAR tools in Phase III challenge with 4409 chemicals
| Vendors | QSAR tools (module) | A-SENS (%) | SENS (%) | SPEC (%) | ACC (%) | BA (%) | PPV (%) | NPV (%) | MCC | COV (%) |
|---|---|---|---|---|---|---|---|---|---|---|
| Lhasa Limited | *Derek Nexus v. 5.0.1 | 70.8 | 54.7 | 83.3 | 79.2 | 69.0 | 35.2 | 91.7 | 0.32 | 100.0 |
| Sarah Nexus v. 2.0.1 (model v. 1.1.19) | 59.1 | 44.0 | 82.3 | 77.3 | 63.2 | 27.7 | 90.6 | 0.22 | 80.2 | |
| Sarah research prottype prediction | 83.1 | 70.4 | 74.4 | 73.8 | 72.4 | 32.7 | 93.4 | 0.34 | 67.7 | |
| *Sarah Nexus v. 2.0.1 (model v. 1.1.19)+ NIHS1 & NIHS2 | 72.2 | 60.5 | 78.1 | 75.7 | 69.3 | 30.5 | 92.6 | 0.30 | 79.5 | |
| MultiCASE Inc | *Statistical approach; SALM/ECOLI consensus | 69.8 | 50.6 | 92.8 | 87.3 | 71.7 | 51.0 | 92.7 | 0.44 | 85.9 |
| *RULE BASED (GT_EXPERT) | 77.3 | 65.5 | 74.5 | 73.1 | 70.0 | 31.5 | 92.3 | 0.31 | 86.4 | |
| Leadscope Inc | *Statistical-based QSAR (rebuild II) | 69.8 | 59.6 | 83.6 | 80.3 | 71.6 | 36.4 | 92.9 | 0.36 | 87.3 |
| *Rule-based (alerts; Bacterial mutagenicity v2) | 69.4 | 55.5 | 87.1 | 82.8 | 71.3 | 40.5 | 92.5 | 0.38 | 88.4 | |
| Istituto di Ricerche Farmacologiche Mario Negri IRCCS | *CAESAR (training set based on Katius et al.) | 78.0 | 67.6 | 67.0 | 67.1 | 67.3 | 25.4 | 92.5 | 0.25 | 100.0 |
| *SARPY (training set based on Katius et al.) | 72.5 | 63.3 | 60.7 | 61.1 | 62.0 | 21.1 | 90.9 | 0.17 | 100.0 | |
| *KNN batch (training set based on Hansen et al.+ Phase I & II) | 42.7 | 31.7 | 93.0 | 84.3 | 62.4 | 43.3 | 89.1 | 0.28 | 92.9 | |
| LMC-Bourgas University | TIMES AMES mutagenicity v.14.14. (In domain TIMES model) | 85.7 | 47.0 | 87.3 | 81.0 | 67.2 | 40.3 | 90.0 | 0.32 | 9.7 |
| *TIMES AMES mutagenicity v.14.14. (Including out of domain) | 64.0 | 50.0 | 78.2 | 74.2 | 64.1 | 27.6 | 90.4 | 0.23 | 99.9 | |
| Istituto Superiore di Sanita | *ToxTree 2.6.6 | 73.3 | 60.9 | 69.2 | 68.0 | 65.1 | 24.7 | 91.4 | 0.22 | 100.0 |
| Swedish Toxicology Science Research Center | *SwetoxAZAMES v2 | 77.1 | 61.0 | 87.7 | 83.9 | 74.4 | 44.5 | 93.3 | 0.43 | 91.2 |
| Fujitsu Kyushu Systems Limited | *ADMEWORKS AMES | 60.6 | 46.3 | 87.8 | 82.2 | 67.1 | 37.1 | 91.3 | 0.31 | 74.5 |
| IdeaConsult Ltd. | *Ambit consensus model | 70.3 | 60.1 | 80.0 | 77.1 | 70.1 | 33.6 | 92.3 | 0.32 | 99.1 |
| Molecular Networks GmbH and Altamira LLC | *ChemTunes•ToxGPS Ames (enhanced) | 78.6 | 65.2 | 82.7 | 80.3 | 74.0 | 38.2 | 93.6 | 0.39 | 95.4 |
| Simulations Plus, Inc | *MUT_Risk-8.5 | 76.0 | 61.5 | 72.0 | 70.5 | 66.8 | 27.1 | 91.7 | 0.25 | 96.4 |
*The QSAR tool with the module was statistically evaluated in Table 8 and Figure 1.
Figure 2.Two aromatic amines predicted as Ames-positive by almost all QSAR tools, but negative in the actual Ames test (class C). Two Ames test results for chemical (a) using strain TA100 in the presence of S9, and three Ames test results for chemical (b) using strain TA98 in the presence of S9 are shown.
Figure 3.Ames test results for 4′-(chloroacetyl) acetanilide, which was examined by four laboratories as part of an NTP validation program using the TA1537 strain with or without rat S9.