| Literature DB >> 30158463 |
Ulf Norinder1,2, Glenn Myatt3, Ernst Ahlberg4.
Abstract
The occurrence of mutagenicity in primary aromatic amines has been investigated using conformal prediction. The results of the investigation show that it is possible to develop mathematically proven valid models using conformal prediction and that the existence of uncertain classes of prediction, such as both (both classes assigned to a compound) and empty (no class assigned to a compound), provides the user with additional information on how to use, further develop, and possibly improve future models. The study also indicates that the use of different sets of fingerprints results in models, for which the ability to discriminate varies with respect to the set level of acceptable errors.Entities:
Keywords: aromatic amines; confidence; conformal prediction; mutagenicity; random forest
Mesh:
Substances:
Year: 2018 PMID: 30158463 PMCID: PMC6163496 DOI: 10.3390/biom8030085
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Results from the fivefold cross-conformal prediction across 50 test sets.
| Descriptors a | Set b | No Of Compounds Per Set | Significance Level c | Validity Mutagenic Class | Validity Nonmutagenic Class | Efficiency d | BA e | Kappa f | MCC g | Sensitivity h | Specificity h | Percentage Class | Percentage Class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LS fingerprints | internal | 656 | 0.15 | 0.851 | 0.849 | 0.695 | 0.537 | 0.543 | 0.784 | 0.786 | 0.782 | 30.5 | 0 |
| LS fingerprints | external | 280 | 0.15 | 0.857 | 0.871 | 0.718 | 0.581 | 0.589 | 0.809 | 0.804 | 0.814 | 28.2 | 0 |
| LS fingerprints | internal | 656 | 0.20 | 0.803 | 0.797 | 0.824 | 0.485 | 0.493 | 0.759 | 0.764 | 0.753 | 17.5 | 0.2 |
| LS fingerprints | external | 280 | 0.20 | 0.794 | 0.818 | 0.861 | 0.508 | 0.518 | 0.773 | 0.763 | 0.782 | 13.9 | 0 |
| LS fingerprints | internal | 656 | 0.25 | 0.753 | 0.746 | 0.914 | 0.452 | 0.460 | 0.742 | 0.746 | 0.738 | 7.1 | 1.4 |
| LS fingerprints | external | 280 | 0.25 | 0.748 | 0.766 | 0.952 | 0.467 | 0.476 | 0.751 | 0.742 | 0.759 | 4.2 | 0.6 |
| LS fingerprints | internal | 656 | 0.30 | 0.707 | 0.697 | 0.927 | 0.449 | 0.457 | 0.741 | 0.742 | 0.740 | 1.6 | 5.7 |
| LS fingerprints | external | 280 | 0.30 | 0.696 | 0.717 | 0.931 | 0.478 | 0.490 | 0.759 | 0.741 | 0.776 | 0.2 | 6.7 |
| LS PAA features | internal | 656 | 0.15 | 0.853 | 0.851 | 0.793 | 0.596 | 0.601 | 0.813 | 0.815 | 0.811 | 20.7 | 0 |
| LS PAA features | external | 280 | 0.15 | 0.857 | 0.855 | 0.826 | 0.625 | 0.630 | 0.826 | 0.826 | 0.826 | 17.4 | 0.0 |
| LS PAA features | internal | 656 | 0.20 | 0.802 | 0.795 | 0.908 | 0.541 | 0.548 | 0.786 | 0.790 | 0.782 | 8.5 | 0.7 |
| LS PAA features | external | 280 | 0.20 | 0.810 | 0.802 | 0.937 | 0.567 | 0.573 | 0.798 | 0.802 | 0.794 | 5.9 | 0.4 |
| LS PAA features | internal | 656 | 0.25 | 0.753 | 0.746 | 0.930 | 0.532 | 0.539 | 0.782 | 0.783 | 0.781 | 2.3 | 4.7 |
| LS PAA features | external | 280 | 0.25 | 0.758 | 0.748 | 0.945 | 0.559 | 0.565 | 0.794 | 0.798 | 0.791 | 0.3 | 5.2 |
| LS PAA features | internal | 656 | 0.30 | 0.703 | 0.697 | 0.879 | 0.556 | 0.563 | 0.794 | 0.795 | 0.794 | 0.3 | 11.9 |
| LS PAA features | external | 280 | 0.30 | 0.706 | 0.705 | 0.862 | 0.608 | 0.613 | 0.818 | 0.820 | 0.816 | 0.0 | 13.8 |
a Leadscope descriptors (see Section 4.2.1 and Section 4.2.2), b Internal and external test set (see Section Model validation for a description), c Significance level (% acceptable errors), d Percentage of single class (mutagenic or nonmutagenic) predictions, e Balanced accuracy, f Cohen’s Kappa, g Matthews correlation coefficient, h Sensitivity and specificity are calculated using only single label classified compounds, i Percentage of compounds predicted as both, j Percentage of compounds predicted as empty. Abbreviations: LS, general Leadscope; PAA, primary aromatic amine.
Figure 1Efficiency (single class predictions) for the models: (a) based on Leadscope fingerprints; (b) based on Leadscope PAA features. Black bars, internal test sets; grey bars, external test sets.
Figure 2Calculation of conformal prediction p-values and class assignments. Abbreviations: cmpd, compound; Prob, probability; Calibr, calibration.
Figure 3The fivefold cross-validation procedure using conformal prediction (cross-conformal prediction).