| Literature DB >> 32363303 |
Alfred W Mayhew1, David O Topping2, Jacqueline F Hamilton1.
Abstract
Electrospray ionization (ESI) is widely used as an ionization source for the analysis of complex mixtures by mass spectrometry. However, different compounds ionize more or less effectively in the ESI source, meaning instrument responses can vary by orders of magnitude, often in hard-to-predict ways. This precludes the use of ESI for quantitative analysis where authentic standards are not available. Relative ionization efficiency (RIE) scales have been proposed as a route to predict the response of compounds in ESI. In this work, a scale of RIEs was constructed for 51 carboxylic acids, spanning a wide range of additional functionalities, to produce a model for predicting the RIE of unknown compounds. While using a limited number of compounds, we explore the usefulness of building a predictor using popular supervised regression techniques, encoding the compounds as combinations of different structural features using a range of common "fingerprints". It was found that Bayesian ridge regression gives the best predictive model, encoding compounds using features designed for activity coefficient models. This produced a predictive model with an R 2 score of 0.62 and a root-mean-square error (RMSE) of 0.362. Such scores are comparable to those obtained in previous studies but without the requirement to first measure or predict the physical properties of the compounds, potentially reducing the time required to make predictions.Entities:
Year: 2020 PMID: 32363303 PMCID: PMC7191837 DOI: 10.1021/acsomega.0c00732
Source DB: PubMed Journal: ACS Omega ISSN: 2470-1343
Figure 1Scale of measured log relative ionization efficiencies (log RIEs), relative to benzoic acid. A table of log RIEs can be found in the Supporting Information, Table S1.
R2 Scores for a Selection of Fingerprint-Model Combinations for the log RIE Predictionsa
| composition | Nannoolal primary | Nannoolal secondary | Le Bas | UNIFAC | AIOMFAC | |
|---|---|---|---|---|---|---|
| linear regression | –0.046 | –2.8 × 1022 | –0.19 | –0.068 | –3.2 × 1023 | –8.3 × 1022 |
| Bayesian ridge | –0.090 | 0.42 | –0.078 | –0.026 | 0.60 | 0.62 |
| decision tree | –0.21 | 0.32 | –0.37 | 0.44 | 0.30 | 0.27 |
| MLP | –0.61 | –0.38 | –0.033 | –0.37 | –0.080 | –0.30 |
| passive aggressive | –3.5 | 0.16 | –1.3 | –1.6 | 0.090 | –0.18 |
| random forest | 0.37 | 0.12 | –0.073 | 0.43 | 0.069 | 0.16 |
| SGD | –0.066 | –0.0025 | –0.085 | –0.16 | 0.030 | 0.038 |
| SVR | –43 | 0.078 | –0.56 | –92 | –0.51 | –0.41 |
Note that R2 is calculated using eq ; hence, the R2 can be negative, indicating a prediction worse than using the average log RIE value.
Figure 2Predictions produced by Bayesian ridge regression with compounds represented as aiomfac fingerprints. The solid black line = 1:1 (perfect predictions would lie along this line). The dotted black lines = 2 × RMSE from the 1:1 line.