| Literature DB >> 30349060 |
Adrien Coulet1,2, Nigam H Shah3, Maxime Wack4, Mohammad B Chawki4, Nicolas Jay5,4, Michel Dumontier3,6.
Abstract
Prescribing the right drug with the right dose is a central tenet of precision medicine. We examined the use of patients' prior Electronic Health Records to predict a reduction in drug dosage. We focus on drugs that interact with the P450 enzyme family, because their dosage is known to be sensitive and variable. We extracted diagnostic codes, conditions reported in clinical notes, and laboratory orders from Stanford's clinical data warehouse to construct cohorts of patients that either did or did not need a dose change. After feature selection, we trained models to predict the patients who will (or will not) require a dose change after being prescribed one of 34 drugs across 23 drug classes. Overall, we can predict (AUC ≥ 0.70-0.95) a dose reduction for 23 drugs and 22 drug classes. Several of these drugs are associated with clinical guidelines that recommend dose reduction exclusively in the case of adverse reaction. For these cases, a reduction in dosage may be considered as a surrogate for an adverse reaction, which our system could indirectly help predict and prevent. Our study illustrates the role machine learning may take in providing guidance in setting the starting dose for drugs associated with response variability.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30349060 PMCID: PMC6197198 DOI: 10.1038/s41598-018-33980-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Example of phenotype profile. The shown phenotype profile comprises of the top 10 diagnoses (A), condition-mentions (B), and lab tests (C) that are seen before a tacrolimus prescription in patients who subsequently needed a dose reduction. Only the top 10 in each type are shown. Each phenotype is associated with a statistically significant p-value (hypergeometric test, p < 0.05, Bonferroni correction for multiple testing), and ordered by the absolute value of the log of the Risk Ratio (on a 0 to 2 scale), in the first column. For interpretation purpose, the number of articles in PubMed that mention both the drug and the phenotype is also provided, in the second column (on a 0 to 180 scale). For example, 45 articles mention both tacrolimus and candidiasis. (A) Diagnoses are ICD-9-CM codes associated with patient visits; (B) Conditions are phenotypic terms mentioned in text of clinical notes; (C) Lab tests are orders of laboratory tests. Lab codes prefixed with the string “*NO*” indicate a negative relationship between the lab test and the drug dose decrease.
Results of the prediction of dose reduction using phenotype profiles.
| Drug or drug set | 10-fold cross-validation | Hold last year out | ||||
|---|---|---|---|---|---|---|
| |instances| | AUC-ROC | F-m (P; R) | |instances| | AUC-ROC | F-m (P; R) | |
|
| 353 | 0.85 | 0.77 (0.78; 0.77) | 314 | 0.95 | |
|
| 709 | 0.94 | 598 | 0.94 | ||
|
| 292 | 0.88 | 0.80 (0.80; 0.80) | 460 | 0.86 | 0.85 (0.88; 0.86) |
|
| 419 | 0.88 | 0.80 (0.81; 0.80) | 338 | 0.90 | 0.80 (0.80; 0.80) |
|
| 941 | 0.90 | 0.81 (0.82; 0.81) | 866 | 0.82 | 0.74 (0.78; 0.75) |
|
| 2851 | 0.82 | 0.74 (0.74; 0.74) | 2630 | 0.79 | 0.72 (0.72; 0.72) |
|
| 4853 | 0.90 | 0.83 (0.83; 0.83) | 4288 | 0.80 | 0.71 (0.72; 0.71) |
|
| 17302 | 0.86 | 0.78 (0.78; 0.78) | 15366 | 0.76 | 0.70 (0.70; 0.70) |
|
| 21607 | 0.79 | 0.71 (0.71; 0.71) | 19212 | 0.73 | 0.70 (0.70; 0.70) |
|
| 12119 | 0.93 | 0.88 (0.88; 0.88) | 10354 | — | — |
|
| 91267 | 0.70 | 0.64 (0.64; 0.64) | 80614 | — | — |
Two evaluations were performed: 10-fold cross-validation and hold last year out. Only the top 100 conditions mentioned in clinical note, top 100 diagnostic codes and top 100 lab codes with a significant p-value (hypergeometric test, p < 0.05, Bonferroni correction for multiple testing) are used in the phenotype profiles. Instances are balanced sets of intervals of dose decrease and dose continuation. Drugs or drug sets associated with a F-measure (F-m) ≥ 0.7 during the hold out evaluation are reported, in addition to results associated with the sets of all P450-drugs and drugs of the ATC class L. L is the first level-ATC classes Antineoplastic and immunomodulating agents and H is Systemic hormonal preparations, excluding sex hormones and insulins. Results for the ATC class L are the best for the 10-fold cross validation, whereas they are not computable for the hold out validation because of empty phenotype profiles in the prediction year. |instances| refers to the number of instances in the training set. Precision (P) and Recall (R) are provided along with the F-measure. Complete results are available in Supplementary material S1. Instance numbers are counted on the train set only and are averaged over the 10 folds of the cross-validation.
Classifier performance by the type of phenotypic features.
| Drug or drug set | F-measure | |||
|---|---|---|---|---|
| Top 100 diagnostics | Top 100 conditions | Top 100 labs | Top 300 features | |
|
| 0.37 | 0.41 | 0.81 | 0.77 |
|
| 0.41 | 0.43 | 0.90 | 0.89 |
|
| 0.36 | 0.49 | 0.79 | 0.80 |
|
| 0.39 | 0.42 | 0.78 | 0.80 |
|
| 0.38 | 0.43 | 0.82 | 0.81 |
|
| 0.38 | 0.45 | 0.76 | 0.74 |
|
| 0.40 | 0.49 | 0.82 | 0.83 |
|
| 0.37 | 0.48 | 0.78 | 0.78 |
|
| 0.36 | 0.42 | 0.71 | 0.71 |
|
| 0.41 | 0.33 | 0.87 | 0.88 |
|
| 0.36 | 0.45 | 0.64 | 0.64 |
Top 100 lists are ordered on the basis of their p-value. Performances are computed using a 10-fold cross-validation. The top 300 features is the combination of the top 100 features of each type (diagnostic codes, conditions mentioned in clinical notes and lab orders).
Figure 2Overview of our approach of predicting the need for a reduced drug dose, at first prescription. (1) Clinical conditions mentioned in clinical notes are identified using terms from medical ontologies; other phenotypic features encoded in Electronic Health Records are directly extracted. (2) Drug dose changes and continuations are detected. (3) Characteristics associated with dose changes are identified to construct phenotype profiles. (4) Phenotype profiles are used to filter the features and build a reduced data set. (5) The resulting matrix is then used to train two binary random forest models: one for predicting dose reductions and one for dose increases. (6) We used two evaluation setups and review by experts to determine the performance of the models.
Figure 3Definition of the dose change intervals and construction of features. The top panel (a) Shows three kinds of intervals. Each interval is delimited by two drug prescriptions d1 and d2, prescribed at t1 and t2. d1 and d2 have a same ingredient. No other drug d with the same ingredient is prescribed between t1 and t2. The bottom panels show the construction of phenotypic features and their expansion using ontology hierarchies. (b) The three kinds of features are: diagnostic codes (diag), conditions mentioned in clinical notes (p) and laboratory test orders (lab). Diagnostic codes and test orders are available in EHRs, whereas condition mentions result from automated annotation of clinical notes. (c) The diagnostic codes and conditions found in clinical notes are generalized according to ICD-9-CM and SNOMED-CT, respectively. For example, if ICD-9-CM states that diag3 is more general than diag1, then diag3 is also associated with the interval. Generalization is not done for lab tests because too few of them are mapped to an ontology. All features are constructed from EHR data before the first prescription (d1) of intervals.
Numbers of intervals of each type and number of their associated phenotypic features.
| Dose reduction | Dose increase | Dose continuation | Total | ||
|---|---|---|---|---|---|
|
| 50,704 | 60,719 | 176,140 | 287,563 | |
|
| 22,571 | 25,381 | 56,902 | 69,308 | |
|
|
| 1,434,606 | 1,623,309 | 4,088,965 | 7,146,880 |
|
| 3,687,022 | 4,193,204 | 10,811,286 | 18,691,512 | |
|
|
| 441,746 | 505,235 | 1,173,403 | 2,120,384 |
|
| 4,110,928 | 4,728,006 | 11,487,221 | 20,326,155 | |
|
| 6,773,097 | 7,906,040 | 17,896,716 | 32,575,853 | |
Diagnostic codes and condition mentions found in clinical notes are expanded using ICD-9-CM and SNOMED CT, respectively.
Reduction of the size of phenotype profiles.
| Dose reductions | Dose increases | |||||
|---|---|---|---|---|---|---|
| diagnostics | conditions | labs | diagnostics | conditions | labs | |
|
| 213,737 | 92,255 | 106,041 | 67,722 | 23,539 | 24,231 |
|
| 104,414 | 42,225 | 50,572 | 41,796 | 15,122 | 14,961 |
|
| 88,973 | 13,552 | 34,277 | 29,875 | 5,353 | 5,974 |
|
| 31,218 | 3,553 | 24,442 | 27,071 | 5,353 | 5,974 |
|
| 18,256 | 1,947 | — | 11,838 | 1,932 | — |
This table reports the number of features of each type that comprise phenotype profiles at various steps of filtering. RR and IC filtering steps are based on Risk Ratio (RR) and Information Content (IC). The elim method helps filtering results of the ontology expansion process[31]. This method cannot be applied to laboratory test orders since those are not encoded with ontology codes.