| Literature DB >> 34776967 |
Heidi E Steiner1, Jason B Giles1, Hayley Knight Patterson1, Jianglin Feng1, Nihal El Rouby2, Karla Claudio2,3, Leiliane Rodrigues Marcatto4, Leticia Camargo Tavares4,5, Jubby Marcela Galvez6, Carlos-Alberto Calderon-Ospina6, Xiaoxiao Sun7, Mara H Hutz8, Stuart A Scott9, Larisa H Cavallari2, Dora Janeth Fonseca-Mendoza6, Jorge Duconge3, Mariana Rodrigues Botton8,10, Paulo Caleb Junior Lima Santos4,11, Jason H Karnes1,12.
Abstract
Populations used to create warfarin dose prediction algorithms largely lacked participants reporting Hispanic or Latino ethnicity. While previous research suggests nonlinear modeling improves warfarin dose prediction, this research has mainly focused on populations with primarily European ancestry. We compare the accuracy of stable warfarin dose prediction using linear and nonlinear machine learning models in a large cohort enriched for US Latinos and Latin Americans (ULLA). Each model was tested using the same variables as published by the International Warfarin Pharmacogenetics Consortium (IWPC) and using an expanded set of variables including ethnicity and warfarin indication. We utilized a multiple linear regression model and three nonlinear regression models: Bayesian Additive Regression Trees, Multivariate Adaptive Regression Splines, and Support Vector Regression. We compared each model's ability to predict stable warfarin dose within 20% of actual stable dose, confirming trained models in a 30% testing dataset with 100 rounds of resampling. In all patients (n = 7,030), inclusion of additional predictor variables led to a small but significant improvement in prediction of dose relative to the IWPC algorithm (47.8 versus 46.7% in IWPC, p = 1.43 × 10-15). Nonlinear models using IWPC variables did not significantly improve prediction of dose over the linear IWPC algorithm. In ULLA patients alone (n = 1,734), IWPC performed similarly to all other linear and nonlinear pharmacogenetic algorithms. Our results reinforce the validity of IWPC in a large, ethnically diverse population and suggest that additional variables that capture warfarin dose variability may improve warfarin dose prediction algorithms.Entities:
Keywords: Hispanic; Latino; anticoagulant; machine learning; pharmacogenetics; warfarin
Year: 2021 PMID: 34776967 PMCID: PMC8585774 DOI: 10.3389/fphar.2021.749786
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.810
FIGURE 1Dose Prediction Algorithm Creation and Testing. International Warfarin Pharmacogenetics Consortium (IWPC) data and data from US Latinos and Latin Americans (ULLA) were used for prediction independently and merged to test a combined sample. Linear and nonlinear models were fit with IWPC model variables and a set of extended variables in addition to IWPC predictors after a 70/30 training-testing split. All models were assessed for their ability to predict dose within 20% of actual. 100 replicates were performed from data splitting to model assessment.
Subject Characteristics in IWPC and ULLA cohorts.
| Characteristic | IWPC ( | ULLA (n = 1,734) |
|
|---|---|---|---|
| Age, years (mean (SD)) | 59.8 (14.5) | 59.7 (13.8) | 0.917 |
| Height, cm (median [IQR]) | 166.88 (160.02–176.02) | 166.00 (160.00, 172.72) | <0.001 |
| Weight, kg (median [IQR]) | 75.40 (62.27–89.70) | 75.00 (65.00, 85.00) | 0.476 |
| Weekly Warfarin Dose, mg (median [IQR]) | 28.00 (19.25–38.50) | 30.00 (22.50, 37.50) | <0.001 |
|
| <0.001 | ||
| *1/*1 | 3717 (73.6) | 1384 (79.8) | |
| *1/*2 | 650 (12.9) | 198 (11.4) | |
| *1/*3 | 450 (8.9) | 83 (4.8) | |
| *2/*2 | 46 (0.9) | 24 (1.4) | |
| *2/*3 | 62 (1.2) | 13 (0.7) | |
| *3/*3 | 16 (0.3) | 4 (0.2) | |
| Missing | 108 (2.1) | 28 (1.6) | |
|
| <0.001 | ||
| GG | 1503 (29.8) | 729 (42.0) | |
| AG | 1806 (35.8) | 778 (44.9) | |
| AA | 1639 (32.5) | 217 (12.5) | |
| Missing | 101 (2.0) | 10 (0.6) | |
| Race (n [%]) | <0.001 | ||
| White | 2794 (55.3) | 1153 (66.5) | |
| Asian | 1527 (30.2) | 0 (0.0) | |
| Black or African American | 451 (8.9) | 292 (16.8) | |
| Mixed or Missing | 277 (5.5) | 289 (16.7) | |
| Ethnicity ( | <0.001 | ||
| Hispanic or Latino | 35 (0.7) | 1734 (100.0) | |
| not Hispanic or Latino | 4139 (82.0) | 0 (0.0) | |
| Unknown | 875 (17.3) | 0 (0.0) | |
IWPC indicates International Warfarin Pharmacogenetics Consortium cohort; ULLA, US Latino and Latin American cohort; SD, standard deviation; IQR, interquartile range; cm, centimeters; kg, kilograms; mg, milligrams.
p values were calculated using a chi Square test for categorical variables, ANOVA for continuous variables and Wilcoxon rank sum test for non-normal continuous variables.
CYP2C9 alleles *5, *6, *13, *14 were collapsed into *3 and *11 to *2, consistent with Klein et al.
VKORC1 1639 G>A (rs9923231) rs2359612, rs9934438, rs8050894 were used as tagSNPs where rs9923231 was missing.
Native American race was collapsed into “Mixed or Missing.”
Comparison of Warfarin Dose Prediction Algorithms by Median Percentage Predicted within 20% of Actual and Mean Absolute Error (MAE) in the IWPC, ULLA, and Merged cohorts.
| IWPC ( | ULLA ( | Merged ( | ||||
|---|---|---|---|---|---|---|
| Model | Within 20% | MAE (95%CI) | Within 20% | MAE (95%CI) | Within 20% | MAE (95%CI) |
| IWPC | 45.84 | 8.36 (7.89–8.85) | 47.88 | 8.12 (7.44–8.82) | 46.66 | 8.24 (7.89–8.58) |
| IWPCV | 45.87 | 8.41 (7.91–8.87) | 47.02 | 8.20 (7.52–8.90) | 46.61 | 8.24 (7.90–8.59) |
| IWPC SVR | 45.81 | 8.43 (7.93–8.90) | 46.54 | 8.25 (7.55–8.94) | 46.80 | 8.21 (7.86–8.56) |
| IWPC MARS | 45.575 | 8.44 (7.95–8.91) | 47.50 | 8.17 (7.49–8.88) | 46.56 | 8.27 (7.92–8.61) |
| IWPC BART | 45.45 | 8.45 (7.95–8.93) | 47.31 | 8.15 (7.46–8.84) | 46.28 | 8.25 (7.90–8.60) |
| NLM | 47.43 | 8.25 (7.77–8.74) | 47.79 | 8.11 (7.45–8.79) | 47.78 | 8.13 (7.78–8.47) |
| SVR | 47.33 | 8.29 (7.8–8.785) | 47.41 | 8.22 (7.52–8.93) | 47.61 | 8.11 (7.77–8.46) |
| MARS | 46.70 | 8.33 (7.85–8.81) | 47.31 | 8.20 (7.52–8.88) | 47.18 | 8.18 (7.84–8.53) |
| BART | 46.90 | 8.31 (7.84–8.79) | 46.92 | 8.16 (7.47–8.87) | 47.46 | 8.14 (7.79–8.48) |
IWPC indicates International Warfarin Pharmacogenetics Consortium cohort, ULLA, US Latino and Latin American cohort, Merged, ULLA plus IWPC, CI, Confidence Interval, IWPCV, IWPC variables, IWPC MARS, IWPC variables in a Multivariate Adaptive Regression Splines, IWPC SVR, IWPC variables in a Support Vector Regression, IWPC BART, IWPC variables in a Bayesian Additive Regression Trees, NLM, Novel Linear Model.
Estimates of mean absolute error (MAE) and the percentage of individuals predicted within 20% of their actual dose for each model were based on 100 replicates of resampling testing data.
Models feature the variables age, height, weight, CYP2C9 diplotype, VKORC1 genotype, race, amiodarone use, and enzyme inducer use.
Models feature the same variables as b in addition to warfarin indication, ethnicity, statin use, aspirin use, history of diabetes.
FIGURE 2Comparison of Warfarin Dose Prediction Algorithms in the ULLA and Merged cohorts. Proportion of patients predicted within 20% of their actual dose is plotted in the (A) US Latinos and Latin Americans (ULLA) cohort and (B) Merged cohort containing both ULLA and IWPC cohorts. The boxplot visualizes five summary statistics (the median, 25 and 75% quartiles and two whiskers at 1.5* Interquartile Range). The points indicate the proportion of patients predicted within 20% at each of the 100 rounds of resampling. Models feature IWPC variables or IWPC variables in addition to new predictors. IWPC indicates International Warfarin Pharmacogenetics Consortium model, Merged, IWPC cohort plus ULLA cohort, IWPCV, IWPC variables, IWPC MARS, IWPC variables in a Multivariate Adaptive Regression Splines, IWPC SVR, IWPC variables in a Support Vector Regression, IWPC BART, IWPC variables in a Bayesian Additive Regression Trees, NLM, Novel Linear Model. From left to right, the first five models, IWPC, IWPCV, IWPC_SVR, IWPC_MARS, and IWPC_BART feature the clinical variables age, height, weight, race, enzyme inducer user, amiodarone use and the genetic variables CYP2C9 Diplotype and VKORC1-1639G>A Genotype, the next four models, NLM, SVR, MARS and BART feature the additional variables gender, ethnicity, statin use, aspirin use, history of diabetes, warfarin indication, the last model features only the clinical variables from the first set. IWPC indicates International Warfarin Pharmacogenetics Consortium model, IWPCV, IWPC variables, IWPC MARS, IWPC variables in a Multivariate Adaptive Regression Splines, IWPC SVR, IWPC variables in a Support Vector Regression, IWPC BART, IWPC variables in a Bayesian Additive Regression Trees, NLM, Novel Linear Model, Clinical, the IWPC Clinical model.
FIGURE 3Subgroup Comparisons of Warfarin Dose Prediction Algorithms in the ULLA cohort. Proportion of patients predicted within 20% of their actual dose in the US Latinos and Latin Americans (ULLA) cohort by (A) actual-dose group, (B) race group, and (C) country of enrollment. The boxplot visualizes five summary statistics (the median, 25 and 75% quartiles and two whiskers at 1.5* Interquartile Range). The points indicate the proportion of patients predicted within 20% at each of the 100 rounds of resampling. The horizontal line indicates the median percentage predicted within 20% across all participants. From left to right, the first five models, IWPC, IWPCV, IWPC_SVR, IWPC_MARS, and IWPC_BART feature the clinical variables age, height, weight, race, enzyme inducer user, amiodarone use and the genetic variables CYP2C9 Diplotype and VKORC1-1639G>A Genotype, the next four models, NLM, SVR, MARS and BART feature the additional variables gender, ethnicity, statin use, aspirin use, history of diabetes, warfarin indication, the last model features only the clinical variables from the first set. IWPC indicates International Warfarin Pharmacogenetics Consortium model, IWPCV, IWPC variables, IWPC MARS, IWPC variables in a Multivariate Adaptive Regression Splines, IWPC SVR, IWPC variables in a Support Vector Regression, IWPC BART, IWPC variables in a Bayesian Additive Regression Trees, NLM, Novel Linear Model, Clinical, the IWPC Clinical model.
Model comparisons by race data in the ULLA cohort (n = 1,734).
| White ( | Black ( | Mixed or Missing ( | ||||
|---|---|---|---|---|---|---|
| Model | Within 20% | MAE (95%CI) | Within 20%a | MAE (95%CI) | Within 20% | MAE (95%CI) |
| IWPC | 50.26 | 7.86 (7.05–8.68) | 41.08 | 8.89 (7.36–10.42) | 45.85 | 8.55 (6.52–10.57) |
| IWPCV | 49.09 | 7.93 (7.1–8.76) | 41.19 | 8.86 (7.4–10.32) | 46.11 | 8.60 (6.56–10.64) |
| IWPC SVR | 48.78 | 7.93 (7.09–8.76) | 39.73 | 9.07 (7.59–10.55) | 45.19 | 8.65 (6.60–10.69) |
| IWPC MARS | 49.64 | 7.88 (7.05–8.71) | 40.48 | 8.99 (7.52–10.46) | 45.54 | 8.60 (6.56–10.64) |
| IWPC BART | 49.62 | 7.80 (6.99–8.62) | 39.04 | 9.10 (7.60–10.60) | 45.60 | 8.67 (6.57–10.77) |
| NLM | 49.21 | 7.83 (7.01–8.65) | 41.56 | 8.89 (7.43–10.35) | 47.50 | 8.49 (6.41–10.57) |
| SVR | 49.08 | 7.92 (7.09–8.75) | 41.37 | 9.07 (7.62–10.52) | 46.66 | 8.55 (6.47–10.63) |
| MARS | 49.48 | 7.87 (7.04–8.71) | 40.51 | 8.97 (7.50–10.44) | 45.37 | 8.66 (6.62–10.70) |
| BART | 49.05 | 7.81 (7.00–8.62) | 39.30 | 9.15 (7.67–10.63) | 46.12 | 8.67 (6.54–10.79) |
| CLINICAL | 39.30 | 9.74 (8.79–10.7) | 33.33 | 10.00 (8.47–11.52) | 35.77 | 10.39 (8.19–12.6) |
ULLA indicates US Latino and Latin American warfarin users cohort; CI, Confidence Interval; IWPC, International Warfarin Pharmacogenetics Consortium model; IWPCV, IWPC variables; IWPC MARS, IWPC variables in a Multivariate Adaptive Regression Splines; IWPC SVR, IWPC variables in a Support Vector Regression; IWPC BART, IWPC variables in a Bayesian Additive Regression Trees; NLM, Novel Linear Model.
Estimates of mean absolute error (MAE) and the percentage of individuals predicted within 20% of their actual dose for each model were based on 100 replicates of resampling 30% testing data.
Models feature the variables age, height, weight, race, amiodarone use, and enzyme inducer use and genetic variables CYP2C9 diplotype, VKORC1 genotype.
cModels feature the same variables as b in addition to warfarin indication, ethnicity, statin use, aspirin use, history of diabetes.
Model features the clinical variables only from b.
Partial R2 values, parameter estimates with standard errors, and p-values of the 50th replicate of models trained in the Merged cohort (n = 7,030).
| IWPC | IWPCV | NLM | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Model Variable | R2 |
|
| R2 |
|
| R2 |
|
|
| Intercept | - | 5.6 ± 0.27 | 1.11 × 10−93 | - | 5.02 ± 0.27 | 2.87 × 10−76 | - | 4.12 ± 0.33 | 4.92 × 10−35 |
| Age | 0.12 | −0.26 ± 0.01 | 8.82 × 10−151 | 0.12 | −0.24 ± 0.01 | 3.24 × 10−133 | 0.09 | −0.21 ± 0.01 | 6.65 × 10−88 |
| Height | 0.02 | 0.01 ± 0 | 1.73 × 10−07 | 0.02 | 0.01 ± 0 | 5.17 × 10−12 | 0.02 | 0.01 ± 0 | 1.42 × 10−14 |
| Weight | 0.04 | 0.01 ± 0 | 1.14 × 10−41 | 0.04 | 0.01 ± 0 | 3.27 × 10−37 | 0.04 | 0.01 ± 0 | 1.71 × 10−38 |
|
| 0.11 | - | - | 0.11 | - | - | 0.11 | - | - |
| *1/*2 | - | −0.52 ± 0.04 | 4.75 × 10−33 | - | −0.41 ± 0.04 | 2.5 × 10−21 | - | −0.42 ± 0.04 | 9.25 × 10−23 |
| *1/*3 | - | −0.94 ± 0.05 | 9.18 × 10−72 | - | −0.85 ± 0.05 | 2.02 × 10−59 | - | −0.86 ± 0.05 | 1.16 × 10−62 |
| *2/*2 | - | −1.06 ± 0.14 | 1.49 × 10−13 | - | −0.97 ± 0.14 | 1.53 × 10−11 | - | −0.96 ± 0.14 | 1.22 × 10−11 |
| *2/*3 | - | −1.92 ± 0.13 | 1.63 × 10−49 | - | −1.54 ± 0.13 | 9.94 × 10−33 | - | −1.56 ± 0.13 | 2.76 × 10−34 |
| *3/*3 | - | −2.33 ± 0.29 | 2.73 × 10−15 | - | -2.24 ± 0.29 | 3.46 × 10−14 | - | -2.24 ± 0.29 | 1.24 × 10−14 |
| Missing | - | −0.22 ± 0.1 | 0.0293 | - | −0.2 ± 0.1 | 0.047 | - | −0.23 ± 0.1 | 0.0214 |
|
| 0.23 | - | - | 0.23 | - | - | 0.23 | - | - |
| A/G | −0.87 ± 0.03 | 1.77 × 10−139 | - | −0.8 ± 0.03 | 1.2 × 10−119 | - | −0.79 ± 0.03 | 2.25 × 10−121 | |
| A/A | −1.7 ± 0.04 | 2.03 × 10−279 | - | −1.62 ± 0.04 | 2.34 × 10−256 | - | −1.61 ± 0.04 | 3.87 × 10−260 | |
| Missing | −0.49 ± 0.12 | 2.7 × 10−05 | - | −0.34 ± 0.12 | 0.00299 | - | −0.38 ± 0.11 | 0.00103 | |
| Race | 0.01 | - | - | 0.01 | - | - | 0.02 | - | - |
| Asian | −0.11 ± 0.05 | 0.0231 | - | −0.1 ± 0.05 | 0.0423 | - | −0.16 ± 0.05 | 0.00263 | |
| Black or African American | −0.28 ± 0.05 | 1.75 × 10−08 | - | −0.16 ± 0.05 | 0.000908 | - | −0.21 ± 0.05 | 2.51 × 10−5 | |
| Mixed or Missing | −0.1 ± 0.05 | 0.0457 | - | −0.07 ± 0.05 | 0.152 | - | 0.02 ± 0.06 | 0.71 | |
| Enzyme Inducer Use | 0.02 | 1.18 ± 0.13 | 8.37 × 10−21 | 0.02 | 0.85 ± 0.13 | 1.83 × 10−11 | 0.02 | 0.78 ± 0.12 | 2.99 × 10−10 |
| Amiodarone Use | 0.04 | −0.55 ± 0.04 | 2.16 × 10−37 | 0.04 | −0.54 ± 0.04 | 1.42 × 10−36 | 0.03 | −0.45 ± 0.05 | 2.28 × 10−21 |
| Ethnicity | - | - | - | - | - | - | 0.01 | - | - |
| Hispanic/Latino | - | - | - | - | - | - | - | −0.07 ± 0.04 | 0.117 |
| Unknown | - | - | - | - | - | - | - | −0.18 ± 0.05 | 0.000133 |
| Gender (female) | - | - | - | - | - | - | 0.01 | 0.11 ± 0.03 | 0.000701 |
| Statin Use | - | - | - | - | - | - | 0.01 | −0.1 ± 0.04 | 0.00794 |
| Aspirin Use | - | - | - | - | - | - | 0.01 | −0.07 ± 0.04 | 0.117 |
| Indication | - | - | - | - | - | - | 0.03 | - | - |
| DVT/PE | - | - | - | - | - | - | 0.24 ± 0.05 | 1.5 × 10−7 | |
| TIA | - | - | - | - | - | - | −0.03 ± 0.08 | 0.689 | |
| Valve | - | - | - | - | - | - | 0.34 ± 0.04 | 7.67 × 10−16 | |
| Other | - | - | - | - | - | - | -0.01 ± 0.04 | 0.873 | |
| Diabetes | - | - | - | - | - | - | 0.01 | 0.18 ± 0.04 | 3.28 × 10−5 |
| Smoking status | - | - | - | - | - | - | 0.01 | 0.27 ± 0.05 | 4.24 × 10−7 |
| Total R2 | 47.03 | 47.03 | 48.55 | ||||||
IWPC indicates International Warfarin Pharmacogenetics Consortium model; IWPCV, the same variables as IWPC in a new model; NLM, Novel Linear Model including the additional predictors: statin use, aspirin use, warfarin indication, ethnicity, history of diabetes; SE, Standard Error; DVT, Deep Vein Thrombosis; PE, Pulmonary Embolism; TIA, Transient Attack; AFIB, Atrial Fibrillation
p-values determined by the lm function in R.
CYP2C9 Diplotypes *5, *6, *13, *14 collapsed into *1/*3 and *11 to *1/*2.
VKORC1 1639 G>A (rs9923231) rs2359612, rs9934438, rs8050894 were used as proxies where rs9923231 was missing.
Native American race was collapsed into “Mixed or Missing”.