| Literature DB >> 33058939 |
Catrin Hasselgren1, Joel Bercu2, Alex Cayley3, Kevin Cross4, Susanne Glowienke5, Naomi Kruhlak6, Wolfgang Muster7, John Nicolette8, M Vijayaraj Reddy9, Roustem Saiakhov10, Krista Dobo11.
Abstract
Pharmaceutical applicants conduct (Q)SAR assessments on identified and theoretical impurities to predict their mutagenic potential. Two complementary models-one rule-based and one statistical-based-are used, followed by expert review. (Q)SAR models are continuously updated to improve predictions, with new versions typically released on a yearly basis. Numerous releases of (Q)SAR models will occur during the typical 6-7 years of drug development until new drug registration. Therefore, it is important to understand the impact of model updates on impurity mutagenicity predictions over time. Compounds representative of pharmaceutical impurities were analyzed with three rule- and three statistical-based models covering a 4-8 year period, with the individual time frame being dependent on when the individual models were initially made available. The largest changes in the combined outcome of two complementary models were from positive or equivocal to negative and from negative to equivocal. Importantly, the cumulative change of negative to positive predictions was small in all models (<5%) and was further reduced when complementary models were combined in a consensus fashion. We conclude that model updates of the type evaluated in this manuscript would not necessarily require re-running a (Q)SAR prediction unless there is a specific need. However, original (Q)SAR predictions should be evaluated when finalizing the commercial route of synthesis for marketing authorization.Entities:
Keywords: Computational models; ICH M7; Impurities; Mutagenic; Pharmaceuticals; Q)SAR; Version update
Mesh:
Substances:
Year: 2020 PMID: 33058939 PMCID: PMC7734868 DOI: 10.1016/j.yrtph.2020.104807
Source DB: PubMed Journal: Regul Toxicol Pharmacol ISSN: 0273-2300 Impact factor: 3.271
Fig. 1.An illustration of the workflow used to generate a dataset of 3367 compounds from PubChem representing an area of chemical space and compound distribution relevant to the ICH M7 workflow.
Fig. 2.A similarity graph displayed with a force directed layout representing the chemical space represented by the Vitic Intermediates dataset (data points in green) when compared to the PubChem ICH M7 dataset (data points in blue).
Fig. 3.Similarity of some selected examples from the Vitic Intermediates dataset with the structures extracted from the PubChem database.
Fig. 4.Pie charts illustrating the distribution of mutagenicity alerts activated in Derek Nexus version 6.0.1 by the Vitic Intermediates and PubChem datasets.
Leadscope models.
| Year | Leadscope version | Statistical-based QSAR | Rule-based SAR | Type of update |
|---|---|---|---|---|
| 2010 | LSE 2.7 | Salmonella v2 (3579 compounds) | N/a | statistical model features updated |
| 2013 | LSE 3.1 | Salmonella v3 (3979 compounds) | N/a | statistical model training set and features updated |
| 2014 | LSE 3.2 | Salmonella v3 (3979 compounds) | Genetox Alerts v1 (162 rules) | first alerts release |
| 2015 | LSE 3.3 | Salmonella v3 (3979 compounds) | Genetox Alerts v2 (215 rules) | alerts updated |
| 2016 | LSE 3.4 | Salmonella v3 (3979 compounds) | Genetox Alerts v3 (213 rules) | alerts updated |
| 2017 | LSE 3.5 | Salmonella v3 (3979 compounds) | Genetox Alerts v4 (224 rules) | alerts updated |
| 2018 | LSE 3.6 | Bacterial Mutation v1 (9189 compounds) | Genetox Alerts v5 (226 rules) | alerts updated, statistical model training set and features updated |
Lhasa models.
| Year | Nexus version | Statistical-based QSAR | Rule-based SAR | Type of update |
|---|---|---|---|---|
| 2012 | 1.5 | n/a | 2.0.2 | Updates to Derek Nexus knowledge base. |
| 2013 | 1.5 | n/a | 3.0.1 | Updates to Derek Nexus knowledge base, including introduction of alert for aryl boronic acids. |
| 2014 | 1.7.6 | 1.1.2 | 4.0.6 | Sarah Nexus new software. Updates to Derek Nexus knowledge base. Introduction of negative predictions in Derek Nexus. |
| 2014 | 2.0 | 1.2 | 4.1 | Updates to Derek Nexus knowledge base. Changes to Sarah Nexus software interface. |
| 2016 | 2.1.1 | 2.0.1 | 5.0.2 | Update to Sarah Nexus fragmentation algorithm. Updates to Sarah Nexus training set. Updates to Derek Nexus knowledge base. |
| 2018 | 2.2.1 | 3.0.0 | 6.0.1 | Updates to Sarah Nexus training set. Updates to Derek Nexus knowledge base. |
Number of compounds predicted in each category of the Leadscope models.
| General Metric Descriptor | Leadscope Statistical | Leadscope Rule Based | Leadscope Consensus | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| V2 (2010) | V3 (2013) | V4 (2018) | V1 (2014) | V2 (2015) | V3 (2016) | V4 (2017) | V5 (2018) | V1 (2014) | V2 (2015) | V3 (2016) | V4 (2017) | V5 (2018) | |
| # Chemicals Positive | 502 | 505 | 666 | 1031 | 1000 | 927 | 718 | 715 | 1811 | 1164 | 1095 | 962 | 923 |
| # Chemicals Negative | 2312 | 2413 | 2225 | 2216 | 2222 | 2264 | 2246 | 2314 | 1894 | 1891 | 1930 | 1911 | 1975 |
| # Chemicals Equivocals | 308 | 287 | 407 | 41 | 89 | 119 | 347 | 283 | 213 | 256 | 285 | 438 | 400 |
| # Chemicals out of domain | 245 | 162 | 69 | 79 | 56 | 57 | 56 | 55 | 79 | 56 | 57 | 56 | 69 |
Fig. 5.A–D Cumulative changes for each prediction category from 2014 to 2018. The numbers represent the average change.
Mapping model specific output to common analysis terms.
| Mapped Term | Leadscope Statistical | Leadscope Rule-Based | MultiCASE Statistical | MultiCASE Rule-Based | Lhasa Statistical | Lhasa Rule-Based |
|---|---|---|---|---|---|---|
| Positive | Positive | Positive | Positive | Positive | Alerting Structure with reasoning of equivocal or higher* | |
| Negative | Negative | Negative | Negative | Negative | Negative/Negative with misclassified features | |
| Indeterminate | Indeterminate | Inconclusive | Inconclusive | Equivocal | – | |
| Not in domain | Not in domain | Out of domain | Out of domain | Outside domain | Negative with unclassified features |
Number of compounds predicted in each category of the Lhasa models.
| General Metric Descriptor | Lhasa Statistical | Lhasa Rule Based | Lhasa Consensus | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| V.B (2014)[ | V.C (2016) | V.D (2018) | V.A (2012) | V.B (2013) | V.C (2014) | V.D (2014) | V.E (2016) | V.F (2018) | V.A (2014) | V.B (2016) | V.C (2018) | |
| # Chemicals Positive | 593 | 741 | 696 | 1006 | 1004 | 1009 | 1009 | 997 | 1009 | 1183 | 1265 | 1220 |
| # Chemicals Negative | 2058 | 2029 | 2103 | 2361 | 2363 | 2256 | 2256 | 2274 | 2255 | 1666 | 1698 | 1759 |
| # Chemicals Equivocals | 553 | 461 | 441 | 346 | 267 | 256 | ||||||
| # Chemicals out of domain | 163 | 136 | 127 | 0 | 0 | 102 | 102 | 96 | 103 | 172 | 137 | 132 |
Version A and B of the Lhasa statistical model produce the same results for the dataset. This is a consequence of the fact that changes between these versions were made only to the software interface rather than the model itself.
Fig. 6.A-D Consensus changes for each prediction category.
Fig. 7.Percentage of chemicals with unchanged predictions of models released between 2014 and 2018. (Blue bars represent inter-vendor combinations and green bars represent intra-vendor combinations, Stat = Statistical model, Alert = Rule based model). Average calculated as the mean.
MultiCASE models.
| Year | CASE Ultra version | Statistical-based QSAR | Rule-based SAR | Type of update |
|---|---|---|---|---|
| 2012 | 1450 | A7B (3535 records) | N/a | First official release |
| 2013 | 1460 | A7B (3535 records) | N/a | Minor update |
| 2014 | 1520 | GT1_A7B (3979 records) | GT_EXPERT (125 rules, 8556 records) | Major algorithm update, new models |
| 2015 | 1603 | GT1_A7B (3979 records) | GT_EXPERT (125 rules, 8556 records) | Major algorithm update |
| 2017 | 1623 | GT1_A7B (3979 records) | GT_EXPERT (174 rules, 11461 records) | Minor update |
| 2018 | 1704 | GT1_BMUT (13514 records) | GT_EXPERT (198 alerts, 13514 records) | Major algorithm and models update |
Number of compounds predicted in each category of the MultiCASE models.
| General Metric Descriptor | MultiCASE Statistical | MultiCASE Rule Base | MultiCASE Consensus | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| V1450 (2012) | V1460 (2013) | V1520 (2014) | V1603 (2015) | V1623 (2017) | V1705 (2018) | V1520 (2014) | V1603 (2015) | V1623 (2017) | V1704 (2018 | V1520 (2014) | V1603 (2015) | V1623 (2017) | V1704 (2018) | |
| # Chemicals Positive | 985 | 985 | 683 | 683 | 683 | 654 | 1014 | 954 | 954 | 875 | 1188 | 1139 | 1139 | 1061 |
| # Chemicals Negative | 1888 | 1888 | 1961 | 1963 | 1962 | 2071 | 2134 | 2175 | 2175 | 2279 | 1690 | 1733 | 1732 | 1839 |
| # Chemicals Equivocals | 319 | 319 | 530 | 530 | 530 | 496 | 16 | 49 | 49 | 60 | 249 | 262 | 496 | 294 |
| # Chemicals out of domain | 175 | 175 | 193 | 191 | 192 | 145 | 201 | 187 | 187 | 153 | 239 | 232 | 233 | 173 |