| Literature DB >> 33917733 |
Roman Szucs1,2, Roland Brown1, Claudio Brunelli1, James C Heaton1, Jasna Hradski2.
Abstract
Pharmaceutical drug development relies heavily on the use of Reversed-Phase Liquid Chromatography methods. These methods are used to characterize active pharmaceutical ingredients and drug products by separating the main component from related substances such as process related impurities or main component degradation products. The results presented here indicate that retention models based on Quantitative Structure Retention Relationships can be used for de-risking methods used in pharmaceutical analysis and for the identification of optimal conditions for separation of known sample constituents from postulated/hypothetical components. The prediction of retention times for hypothetical components in established methods is highly valuable as these compounds are not usually readily available for analysis. Here we discuss the development and optimization of retention models, selection of the most relevant structural molecular descriptors, regression model building and validation. We also present a practical example applied to chromatographic method development and discuss the accuracy of these models on selection of optimal separation parameters.Entities:
Keywords: Quantitative Structure Retention Relationships; chromatographic method development; pharmaceutical analysis
Mesh:
Substances:
Year: 2021 PMID: 33917733 PMCID: PMC8068189 DOI: 10.3390/ijms22083848
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Pairwise structural similarities expressed as Tanimoto index. See text for details.
List of the 30 Most Frequently Selected Descriptors by Evolutionary Search.
| Descriptor | Description | Descriptor | Description |
|---|---|---|---|
| CATS2D_03_DL | CATS2D Donor-Lipophilic at lag 03 | LOGP_N-oct | Log Octanol/water |
| CATS2D_09_DA | CATS2D Donor-Acceptor at lag 09 | CATS2D_09_NL | CATS2D Negative-Lipophilic at lag 09 |
| F03[C-O] | Frequency of C-O at topological distance 3 | GATS5s | Geary autocorrelation of lag 5 weighted by I-state |
| GATS6e | Geary autocorrelation of lag 6 weighted by Sanderson electronegativity | GATS6m | Geary autocorrelation of lag 6 weighted by mass |
| GATS7m | Geary autocorrelation of lag 7 weighted by mass | HATS4e | leverage-weighted autocorrelation of lag 4/weighted by Sanderson electronegativity |
| HATS5s | leverage-weighted autocorrelation of lag 5/weighted byI-state | Mor10p | signal 10/weighted by polarizability |
| AMW | average molecular weight | BLTA96 | Verhaar Algae base-line toxicity from MLOGP (mmol/L) |
| Mor24p | signal 24/weighted by polarizability | N-075 | R--N--R/R--N--X |
| nArCOOR | number of esters (aromatic) | NNRS | normalized number of ring systems |
| TDB07m | 3D Topological distance-based descriptors—lag 7 weighted by mass | TDB08s | 3D Topological distance-based descriptors—lag 8 weighted byI-state |
| a_acc | Number of hydrogen bond acceptor atoms | logS | Log of the aqueous solubility |
| PEOE_VSA_NEG | Total negative van der Waals surface area | PEOE_VSA+0 | Sum of vi where qi is in the range of 0.00–0.05 |
| SMR_VSA7 | Sum of vi such that Ri > 0.56 | ACACDO | H-bond acceptor and donor |
| L0LgS | Solubility profiling coefficient | L2LgS | Solubility profiling coefficient |
| pctFU4 | Percent unionized species at pH 4 | pctFU6 | Percent unionized species at pH 6 |
Regression Algorithms and their settings.
| Algorithm | Settings |
|---|---|
| Support Vector Machine [ | Normalized training data |
| Gaussian Processes | Without hyperparameter tuning |
| Multiple Linear Regression | M5 attribute selection method |
| Random Forest [ | WEKA default Setting |
| Partial Least Squares (PLS) | Optimal Number of PLS factors determined |
Figure 2Comparison of Root Mean Square Error (RMSE) (a) and Correlation Coefficient (R) (b). For all calculated descriptors, their combinations and all regression algorithms. Each bar corresponds to the average value for all training sets. Figure 2a also contains average value for all applied algorithms. SVM: Support vector machine; GPR: Gaussian processes regression; MLR: multiple linear regression; RF: random forest; PLS: partial least squares.
Figure 3Predicted vs experimental retention times (tR) for 6 screening conditions. See Table 4 for the details of experiments.
RMSE and R values for test sets at six screening conditions. See Table 4 for the experiment details.
| Experiment #1 | Experiment #2 | Experiment #3 | Experiment #4 | Experiment #5 | Experiment #6 | |
|---|---|---|---|---|---|---|
| RMSE | 0.4262 | 0.9981 | 0.3472 | 1.0133 | 0.4091 | 0.8401 |
| R | 0.9769 | 0.9763 | 0.9851 | 0.9792 | 0.9799 | 0.9874 |
Screening sequence used to optimize the column temperature and gradient elution. See Materials and Methods for other conditions.
| Experiment | Column Temperature (°C) | Gradient Profile a |
|---|---|---|
| 1 | 20 | Time = 0 min, %B = 5%; |
| 2 | 20 | Time = 0 min, %B = 5%; |
| 3 | 40 | Time = 0 min, %B = 5%; |
| 4 | 40 | Time = 0 min, %B = 5%; |
| 5 | 60 | Time = 0 min, %B = 5%; |
| 6 | 60 | Time = 0 min, %B = 5%; |
a Followed by 4 min equilibration.
Figure 4Resolution heat map for key predictive sample set (KPSS). Intensity represents overall chromatogram resolution. High resolution is depicted by red color, low resolution is depicted by blue color. (a) constructed from experimental retention times. (b) constructed from Quantitative Structure Retention Relationship (QSRR) predicted retention times. The diamond indicates the center point selected from the model created from experimental retention times.
Figure 5Predicted chromatogram for KPSS components from the retention model built from experimentally determined retention times (RtModelEXP) (solid line) and and the retention model built from QSSR predicted retention times (RtModelQSRR) (dashed line). Column temperature 40°C. Gradient profile: Time = 0 min, %B = 15%; Time = 12 min, %B = 45%; Time = 17 min, %B = 95%. See Materials and Methods for other details.
Figure 6Portion (%) of all combinations of compounds containing two to ten components for which RtModelEXP and RtModelQSRR predicted baseline separation (Resolution Coefficient (RC) = 1). The total number of combinations evaluated is in parentheses. Black line corresponds to model built from predicted data and red line corresponds to model built from mixture of predicted and experimental data. See text for details.
Figure 7P. Portion (%) of pairwise RC values calculated from RtModelQSRR falling within certain interval RC values calculated from RtModelEXP. See text for details.