| Literature DB >> 34215839 |
Weijie Ma1, Hongying Li2, Li Dong3, Qin Zhou4, Bo Fu5, Jiang-Long Hou3, Jing Wang6, Wenzhe Qin7, Jin Chen8.
Abstract
Patients requiring low-dose warfarin are more likely to suffer bleeding due to overdose. The goal of this work is to improve the feedforward neural network model's precision in predicting the low maintenance dose for Chinese in the aspect of training data construction. We built the model from a resampled dataset created by equal stratified sampling (maintaining the same sample number in three dose-groups with a total of 3639) and performed internal and external validations. Comparing to the model trained from the raw dataset of 19,060 eligible cases, we improved the low-dose group's ideal prediction percentage from 0.7 to 9.6% and maintained the overall performance (76.4% vs. 75.6%) in external validation. We further built neural network models on single-dose subsets to invest whether the subsets samples were sufficient and whether the selected factors were appropriate. The training set sizes were 1340 and 1478 for the low and high dose subsets; the corresponding ideal prediction percentages were 70.2% and 75.1%. The training set size for the intermediate dose varied and was 1553, 6214, and 12,429; the corresponding ideal prediction percentages were 95.6, 95.1%, and 95.3%. Our conclusion is that equal stratified sampling can be a considerable alternative approach in training data construction to build drug dosing models in the clinic.Entities:
Year: 2021 PMID: 34215839 PMCID: PMC8253817 DOI: 10.1038/s41598-021-93317-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Statistics on the factors in the training, internal validation, and external validation dataset (i.e., Set A, B, and C).
| Characteristic (unit) | Training set (N = 15,248) | Internal validation set (N = 1906) | External validation set (N = 1906) | ||
|---|---|---|---|---|---|
| Mean ± SD/N (%) | Mean ± SD/N (%) | Mean ± SD/N (%) | |||
| Initial dose (mg/day) | 2.93 ± 0.72 | 2.93 ± 0.70 | 1.00 | 2.66 ± 0.52 | < 0.001 |
| Albumin (g/L) | 41.59 ± 4.42 | 41.45 ± 4.44 | 0.19 | 41.29 ± 4.24 | 0.005 |
| Creatinine (μmol/L) | 77.26 ± 19.15 | 76.84 ± 19.02 | 0.37 | 80.81 ± 20.29 | < 0.001 |
| APPT (s) | 31.21 ± 7.65 | 31.55 ± 7.37 | 0.067 | 30.89 ± 5.71 | 0.077 |
| Starting time of anticoagulation (X days after surgery) (days)a | 2 | 2 | 0.26 | 2 | < 0.001 |
| 0.24 | < 0.001 | ||||
| Rheumatic heart disease | 12,664 (83.05) | 1573 (82.53) | 1563 (82.00) | ||
| Degenerative mitral valve disease | 617 (4.05) | 79 (4.14) | 60 (3.15) | ||
| Degenerative aortic valve disease | 970 (6.36) | 133 (6.98) | 87 (4.56) | ||
| Congenital heart disease | 13 (0.09) | 0 | 0 | ||
| Degenerative cardiac conduction system disease | 564 (3.70) | 72 (3.78) | 87 (4.56) | ||
| Ischemic heart disease | 17 (0.11) | 6 (0.31) | 1 (0.05) | ||
| Infective endocarditis | 204 (1.34) | 15 (0.79) | 39 (2.05) | ||
| Secondary valvular heart disease | 89 (0.58) | 12 (0.63) | 36 (1.89) | ||
| Traumatic valvular heart disease | 8 (0.05) | 0 | 0 | ||
| Dilated cardiomyopathy | 3 (0.02) | 0 | 0 | ||
| Hypertrophic cardiomyopathy | 9 (0.06) | 1 (0.05) | 4 (0.21) | ||
| Systemic autoimmune disease | 90 (0.59) | 15 (0.79) | 29 (1.53) | ||
| 0.34 | < 0.001 | ||||
| Stenosis | 165 (1.08) | 21 (1.11) | 20 (1.05) | ||
| Insufficiency | 7610 (49.91) | 993 (52.10) | 1087 (57.03) | ||
| Stenosis and insufficiency | 137 (0.90) | 17 (0.89) | 18 (0.94) | ||
| 0.12 | < 0.001 | ||||
| Saturated dose | 2647 (17.36) | 304 (15.95) | 70 (3.67) | ||
| General dose | 12,601 (82.64) | 1602 (84.05) | 1836 (96.33) | ||
APPT activated partial thromboplastin time, SD standard deviation, Saturated dose dose ranging from 5 to 10 mg/day; General dose, dose ranging from 2.5 to 5 mg/day.
aStarting time of anticoagulation (X days after surgery) (days) were showed by median because of its right skewed distribution. Continuous variables materials were analyzed by using independent sample t test; Categorical data materials were analyzed by using chi-square analysis. The starting time of anticoagulation was analyzed by using Wilcoxon rank-sum test; The type of disease was analyzed by using Monte Carlo method.
Figure 1The structure of the feedforward models in this study. The single input layer has nodes. The output layer has node. The hidden layer has nodes; the concrete values of in the models—PNN, SSNN, PNNlow-dose, PNNhigh-dose, PNNmid-dose (N = 12,429), PNNmid-dose (N = 6214), PNNmid-dose (N = 1533)—were 5, 9, 11, 12, 7, 7, and 10.
Overall predictive accuracy comparison: PNN vs. SSNN.
| Validation set | Model | MAE (mg/day) | Under-predicted percentage | Ideal-predicted percentage | Over-predicted percentage | MSE (mg/day) |
|---|---|---|---|---|---|---|
| Internal | PNN | 0.3250 | 159 (8.3) | 1507 (79.1) | 240(12.6) | 0.3475 |
| SSNN | 0.4341 | 185 (9.7) | 1438 (75.4) | 292 (14.9) | 0.4230 | |
| External | PNN | 0.3452 | 99 (5.2) | 1456 (76.4) | 351 (18.4) | 0.3933 |
| SSNN | 0.4108 | 154 (8.1) | 1441 (75.6) | 311 (16.3) | 0.4085 |
Predictive accuracy comparison: PNN vs. SSNN on sub-dose groups.
| Validation set | Model | Low-dose group (≤ 1.875 mg/day) | Intermediate-dose group (1.875–3.125 mg/day) | High-dose group (≥ 3.125 mg/day) | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Under (%) | Ideal (%) | Over (%) | Under (%) | Ideal (%) | Over (%) | Under (%) | Ideal (%) | Over (%) | ||
| Internal | PNN | 0 (0.0) | 0 (0.0) | 172 (100.0) | 19 (1.2) | 1462 (94.4) | 68 (4.4) | 140 (75.7) | 45 (24.3) | 0 (0.0) |
| SSNN | 0 (0.0) | 15 (8.7) | 157 (91.3) | 56 (3.6) | 1372 (88.6) | 121 (7.8) | 129 (69.7) | 51 (27.6) | 5 (2.7) | |
| External | PNN | 0 (0.0) | 2 (0.7) | 289 (99.3) | 30 (2.0) | 1437 (94.0) | 62 (4.0) | 69 (80.2) | 17 (19.8) | 0(0.0) |
| SSNN | 1 (0.3) | 28 (9.6) | 262 (90.1) | 88 (5.8) | 1392 (91.0) | 49 (3.2) | 65 (75.6) | 21 (24.4) | 0(0.0) | |
Figure 2Predictive performance of PNN and SSNN in terms of ideal prediction percentage on internal and external dataset.
Predictive accuracy of PNN models trained from samples in single dose range.
| Validation set | Training set (dose range, sample number) | MAE (mg/day) | Under-predicted percentage | Ideal-predicted percentage | Over-predicted percentage | MSE (mg/day) |
|---|---|---|---|---|---|---|
| Internal | Low, N = 1340 | 0.2597 | 18 (10.7) | 112 (66.7) | 38 (22.6) | 0.1051 |
| Intermediate, N = 1533 | 0.0813 | 12 (0.0) | 6014 (96.5) | 189 (3.5) | 0.0282 | |
| High, N = 1478 | 0.5467 | 22 (11.9) | 136 (73.5) | 27 (14.6) | 0.5317 | |
| External | Low, N = 1340 | 0.2500 | 2 (1.2) | 118 (70.2) | 48 (28.6) | 0.0988 |
| Intermediate, N = 1533 | 0.0844 | 15 (0.1) | 7424 (95.1) | 329 (4.8) | 0.0327 | |
| High, N = 1478 | 0.6038 | 18 (9.7) | 139 (75.2) | 28 (15.1) | 0.5996 |
Predictive accuracy of PNN models trained from different number of intermediate-dose samples.
| Validation set | Training set size | MAE (mg/day) | Under-predicted percentage | Ideal-predicted percentage | Over-predicted percentage | MSE (mg/day) |
|---|---|---|---|---|---|---|
| Internal | 12,429 | 0.0681 | 0 (0.0) | 1488 (95.8) | 66 (4.2) | 0.0281 |
| 6214 | 0.0665 | 0 (0.0) | 4497 (96.5) | 164 (3.5) | 0.0256 | |
| 1533 | 0.0813 | 12 (0.2) | 6014 (96.8) | 189 (3.0) | 0.0282 | |
| External | 12,429 | 0.0830 | 0 (0.0) | 1481 (95.3) | 73 (4.7) | 0.0310 |
| 6214 | 0.0690 | 4 (0.1) | 4433(95.1) | 224 (4.8) | 0.0305 | |
| 1533 | 0.0844 | 15 (0.2) | 7424 (95.6) | 329 (4.2) | 0.0327 |