| Literature DB >> 26606038 |
Jing Zhao, Aron Henriksson, Lars Asker, Henrik Boström.
Abstract
BACKGROUND: The digitization of healthcare data, resulting from the increasingly widespread adoption of electronic health records, has greatly facilitated its analysis by computational methods and thereby enabled large-scale secondary use thereof. This can be exploited to support public health activities such as pharmacovigilance, wherein the safety of drugs is monitored to inform regulatory decisions about sustained use. To that end, electronic health records have emerged as a potentially valuable data source, providing access to longitudinal observations of patient treatment and drug use. A nascent line of research concerns predictive modeling of healthcare data for the automatic detection of adverse drug events, which presents its own set of challenges: it is not yet clear how to represent the heterogeneous data types in a manner conducive to learning high-performing machine learning models.Entities:
Mesh:
Year: 2015 PMID: 26606038 PMCID: PMC4660129 DOI: 10.1186/1472-6947-15-S4-S1
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Figure 1Extracting data for machine learning methods from electronic health records.
The 27 selected ADE related diagnosis codes.
| Code | Description |
|---|---|
| D642 | Secondary sideroblastic anemia due to drugs and toxins |
| E273 | Drug-induced adrenocortical insufficiency |
| F110 | Mental and behavioural disorders (MBDs) due to use of opioids: acute intoxication |
| F112 | MBDs due to use of opioids: dependence syndrome |
| F130 | MBDs due to use of sedatives or hypnotics: acute intoxication |
| F132 | MBDs due to use of sedatives or hypnotics: dependence syndrome |
| F150 | MBDs due to use of other stimulants, including caffeine: acute intoxication |
| F151 | MBDs due to use of other stimulants, including caffeine: harmful use |
| F152 | MBDs due to use of other stimulants, including caffeine: dependence syndrome |
| F190 | MBDs due to multiple drug use: acute intoxication |
| F192 | MBDs due to multiple drug use: dependence syndrome |
| F199 | MBDs due to multiple drug use: unspecified mental and behavioural disorder |
| G240 | Drug-induced dystonia |
| G251 | Drug-induced tremor |
| G444 | Drug-induced headache, not elsewhere classified |
| G620 | Drug-induced polyneuropathy |
| I427 | Cardiomyopathy due to drugs and other external agents |
| I952 | Hypotension due to drugs |
| L270 | Generalized skin eruption due to drugs and medicaments |
| L271 | Localized skin eruption due to drugs and medicaments |
| O355 | Maternal care for (suspected) damage to fetus by drugs |
| T782 | Adverse effects: anaphylactic shock, unspecified |
| T783 | Adverse effects: angioneurotic oedema |
| T784 | Adverse effects: allergy, unspecified |
| T808 | Other complications following infusion, transfusion and therapeutic injection |
| T886 | Anaphylactic shock due to correct drug or medicament properly administered |
| T887 | Unspecified adverse effect of drug or medicament |
Statistical description of 27 datasets.
| Number of features | |||||
|---|---|---|---|---|---|
| Dataset | Instances | Codes | Measurements | Combination | |
| D642 | 3733 | 2.87% | 3999 | 494 | 8262 |
| E273 | 183 | 12% | 912 | 240 | 2935 |
| F110 | 146 | 22.6% | 1051 | 205 | 2958 |
| F112 | 146 | 63.7% | 1054 | 205 | 2963 |
| F130 | 112 | 54.5% | 779 | 142 | 2237 |
| F132 | 112 | 27.7% | 777 | 142 | 2231 |
| F150 | 111 | 14.4% | 476 | 107 | 1543 |
| F151 | 111 | 17.1% | 475 | 107 | 1542 |
| F152 | 111 | 69.4% | 481 | 111 | 1573 |
| F190 | 168 | 31.5% | 869 | 160 | 2454 |
| F192 | 168 | 50% | 865 | 160 | 2447 |
| F199 | 168 | 8.93% | 866 | 160 | 2448 |
| G240 | 68 | 20.6% | 444 | 136 | 1636 |
| G251 | 194 | 6.7% | 1014 | 263 | 3209 |
| G444 | 908 | 2.5% | 1774 | 318 | 4594 |
| G620 | 382 | 6% | 1624 | 280 | 4152 |
| I427 | 448 | 5.1% | 1341 | 299 | 3852 |
| I952 | 483 | 8.3% | 1654 | 333 | 4471 |
| L270 | 435 | 35.9% | 1297 | 325 | 3912 |
| L271 | 434 | 11.1% | 1286 | 325 | 3897 |
| O355 | 237 | 35.4% | 736 | 110 | 1930 |
| T782 | 1203 | 8.5% | 1625 | 319 | 4405 |
| T783 | 1207 | 8.6% | 1627 | 319 | 4408 |
| T784 | 1213 | 60.8% | 1628 | 319 | 4409 |
| T808 | 391 | 87.5% | 1229 | 271 | 3533 |
| T886 | 715 | 6.2% | 2226 | 401 | 5606 |
| T887 | 716 | 61.7% | 2230 | 400 | 5604 |
Figure 2Concept hierarchies of ATC and ICD-10 codes. C10AA01 is the ATC code for Simvastatin and F25.1 is the ICD-10 code for Schizoaffective disorder.
Learning algorithms and their default settings.
| Classifier | Description | Notes |
|---|---|---|
| DT | CART decision tree | minimum 1 instance per leaf |
| SVM Poly | Support Vector Machine | polynomial kernel of degree 3 |
| SVM RBF | Support Vector Machine | RBF kernel, gamma = 0.0 |
| LogReg | Logistic Regression | L2 regularization |
| kNN | k nearest neighbors | k = 5 |
| AdaBoost | Adaptive boosting | Decision trees, 50 base estimators |
| Bagging | Bagging using CART tree | 10 base estimators |
| NB | Naïve Bayes | |
| RF | Random forest | 500 trees, inspected features = |
Comparing multiple representations of clinical measurements.
| Accuracy (rank) | AUC (rank) | |
|---|---|---|
| Mean | 80.75 (2.96) | 0.635 (2.74) |
| SD | 80.23 (3.44) | 0.535 (5.33) |
| Slope | 80.54 (3.33) | 0.612 (3.52) |
| Existence | 79.25 (4.48) | 0.604 (4.26) |
| Count | 80.54 (3.63) | 0.633 (2.96) |
| All | 81.41 (2.74) | 0.655 (2.19) |
| P-value | 0.01 | <0.0001 |
Comparing different levels of clinical codes.
| Accuracy (rank) | AUC (rank) | |
|---|---|---|
| Level 1 | 83.24 (3.37) | 0.731 (3.74) |
| Level 2 | 84.08 (2.78) | 0.742 (3.41) |
| Level 3 | 83.80 (2.93) | 0.757 (2.81) |
| Level 4 | 83.93 (2.67) | 0.763 (2.67) |
| All | 84.47 (2.44) | 0.763 (2.37) |
| P-value | 0.17 | 0.008 |
Comparing random forests using clinical measurements (M), clinical codes (C) and their combination (M+C).
| Accuracy | AUC | |||||
|---|---|---|---|---|---|---|
| Dataset | M | C | M+C | M | C | M+C |
| D642 | 98.79 (3) | 98.95 (2) | 99.03 (1) | 0.961 (3) | 0.980 (2) | 0.994 (1) |
| E273 | 86.98 (3) | 87.51 (1) | 87.51 (1) | 0.691 (3) | 0.706 (2) | 0.741 (1) |
| F110 | 80.45 (2) | 83.14 (1) | 80.38 (3) | 0.676 (3) | 0.824 (1) | 0.798 (2) |
| F112 | 68.48 (2) | 72.73 (1) | 66.30 (3) | 0.672 (3) | 0.803 (1) | 0.752 (2) |
| F130 | 54.97 (3) | 60.61 (1) | 56.89 (2) | 0.573 (3) | 0.666 (1) | 0.646 (2) |
| F132 | 71.33 (1) | 69.47 (3) | 69.47 (2) | 0.558 (3) | 0.686 (1) | 0.616 (2) |
| F150 | 84.02 (3) | 86.85 (1) | 85.85 (2) | 0.706 (3) | 0.901 (1) | 0.885 (2) |
| F151 | 84.68 (1) | 82.03 (2) | 82.03 (2) | 0.502 (3) | 0.619 (1) | 0.535 (2) |
| F152 | 72.82 (3) | 76.30 (1) | 74.95 (2) | 0.733 (3) | 0.838 (1) | 0.826 (2) |
| F190 | 64.78 (3) | 74.58 (1) | 72.88 (2) | 0.608 (3) | 0.805 (1) | 0.782 (2) |
| F192 | 60.07 (3) | 67.33 (1) | 61.05 (2) | 0.660 (3) | 0.730 (1) | 0.682 (2) |
| F199 | 90.04 (3) | 91.61 (1) | 90.98 (2) | 0.568 (3) | 0.577 (2) | 0.700 (1) |
| G240 | 78.33 (3) | 81.31 (1) | 81.31 (1) | 0.596 (3) | 0.622 (2) | 0.639 (1) |
| G251 | 93.34 (1) | 93.29 (2) | 93.29 (2) | 0.328 (3) | 0.719 (1) | 0.523 (2) |
| G444 | 97.47 (3) | 97.51 (1) | 97.51 (1) | 0.479 (3) | 0.631 (2) | 0.666 (1) |
| G620 | 93.47 (3) | 94.26 (1) | 94.26 (1) | 0.509 (3) | 0.765 (1) | 0.756 (2) |
| I427 | 95.77 (3) | 96.57 (2) | 96.80 (1) | 0.713 (3) | 0.895 (1) | 0.891 (2) |
| I952 | 91.92 (1) | 91.63 (3) | 91.84 (2) | 0.517 (3) | 0.552 (1) | 0.542 (2) |
| L270 | 86.65 (1) | 85.20 (3) | 85.70 (2) | 0.909 (2) | 0.908 (3) | 0.915 (1) |
| L271 | 89.17 (2) | 89.84 (1) | 89.10 (3) | 0.784 (3) | 0.800 (2) | 0.802 (1) |
| O355 | 62.00 (3) | 90.96 (2) | 91.43 (1) | 0.642 (3) | 0.962 (1) | 0.956 (2) |
| T782 | 91.02 (3) | 91.90 (2) | 92.09 (1) | 0.695 (3) | 0.712 (2) | 0.717 (1) |
| T783 | 90.39 (3) | 91.27 (1) | 91.18 (2) | 0.774 (3) | 0.845 (2) | 0.862 (1) |
| T784 | 60.44 (3) | 68.63 (2) | 68.82 (1) | 0.611 (3) | 0.732 (2) | 0.753 (1) |
| T808 | 86.45 (3) | 93.88 (1) | 91.59 (2) | 0.857 (3) | 0.953 (2) | 0.962 (1) |
| T886 | 93.57 (3) | 94.05 (1) | 94.05 (1) | 0.629 (3) | 0.655 (2) | 0.656 (1) |
| T887 | 70.65 (2) | 69.24 (3) | 70.94 (1) | 0.721 (2) | 0.720 (3) | 0.754 (1) |
| Average | 81.41 (2.48) | 84.47 (1.56) | 83.6 (1.70) | 0.655 (2.93) | 0.763 (1.56) | 0.754 (1.52) |
| P-value | 0.007 | < 0.0001 | ||||
Figure 3Averaged accuracy from random forests using clinical measurements (M), clinical codes (C) and their combination (M+C) at each feature selection threshold.
Figure 4Averaged AUC from random forests using clinical measurements (M), clinical codes (C) and their combination (M+C) at each feature selection threshold.
Figure 5Proportion of clinical measurements (M) and clinical codes (C) among selected features.
Figure 6Relative informativeness of the 5 representations of clinical measurements (MN: mean; SD: standard deviation; SL: slope; YN: existence; CN: count) and 4 levels (L1 - L4) of clinical codes based on their information gain scores. Larger area indicates lower informativeness.
Figure 7Accuracy of random forest using clinical measurements (M), clinical codes (C) and their combination (M+C) at each feature selection threshold in each dataset.
Figure 8AUC of random forest using clinical measurements (M), clinical codes (C) and their combination (M+C) at each feature selection threshold in each dataset.
Figure 9Accuracy of multiple classifiers using clinical measurements (M), clinical codes (C) and their combination (M+C) at each feature selection threshold.
Figure 10AUC of multiple classifiers using clinical measurements (M), clinical codes (C) and their combination (M+C) at each feature selection threshold.