| Literature DB >> 27459846 |
Aron Henriksson1, Jing Zhao2, Hercules Dalianis2, Henrik Boström2.
Abstract
BACKGROUND: Learning deep representations of clinical events based on their distributions in electronic health records has been shown to allow for subsequent training of higher-performing predictive models compared to the use of shallow, count-based representations. The predictive performance may be further improved by utilizing multiple representations of the same events, which can be obtained by, for instance, manipulating the representation learning procedure. The question, however, remains how to make best use of a set of diverse representations of clinical events - modeled in an ensemble of semantic spaces - for the purpose of predictive modeling.Entities:
Keywords: Adverse drug events; Distributional semantics; Electronic health records; Heterogeneous data; Pharmacovigilance; Random forest
Mesh:
Year: 2016 PMID: 27459846 PMCID: PMC4965720 DOI: 10.1186/s12911-016-0309-0
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
The number of features available to the ensemble and each tree with the three utilization strategies
| Utilization strategy | Ensemble features | Tree features |
|---|---|---|
| FDR | 8000 |
|
| RDR-FS | 8000 |
|
| RDR-ALL | 8000 | 800 |
Description of datasets
| Words (Lemmas) | Diagnoses (ICD-10) | Drugs (ATC) | Measurements | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset | Visits | Types | Instances | Types | Instances | Types | Instances | Types | Instances | |||||
| D64.2 | 416 | 46125 | 2110354 | 536 | 6320 | 364 | 8960 | 304 | 60689 | |||||
| E27.3 | 34 | 9564 | 112789 | 143 | 248 | 157 | 662 | 138 | 3982 | |||||
| F11.0 | 76 | 12200 | 232203 | 180 | 367 | 159 | 687 | 157 | 3920 | |||||
| F11.2 | 308 | 30077 | 904496 | 486 | 1875 | 347 | 4329 | 260 | 23637 | |||||
| F13.0 | 120 | 14764 | 215626 | 232 | 390 | 204 | 1167 | 153 | 6178 | |||||
| F13.2 | 76 | 12507 | 215321 | 220 | 484 | 195 | 922 | 167 | 4621 | |||||
| F15.0 | 32 | 5849 | 39658 | 71 | 148 | 96 | 257 | 105 | 1427 | |||||
| F15.1 | 46 | 9174 | 102697 | 122 | 259 | 142 | 573 | 137 | 4518 | |||||
| F15.2 | 256 | 25179 | 658428 | 394 | 1347 | 295 | 3439 | 209 | 22870 | |||||
| F19.0 | 122 | 15823 | 278873 | 237 | 475 | 214 | 1120 | 227 | 5519 | |||||
| F19.1 | 74 | 12651 | 177644 | 186 | 373 | 186 | 985 | 152 | 4688 | |||||
| F19.2 | 288 | 29291 | 799717 | 492 | 1259 | 326 | 3667 | 262 | 19653 | |||||
| F19.9 | 68 | 13144 | 177749 | 177 | 350 | 178 | 992 | 87 | 3743 | |||||
| G24.0 | 28 | 10017 | 101769 | 76 | 132 | 136 | 599 | 113 | 3551 | |||||
| G62.0 | 20 | 4622 | 35997 | 41 | 71 | 93 | 219 | 56 | 1119 | |||||
| I95.2 | 70 | 11528 | 145432 | 162 | 652 | 177 | 799 | 144 | 5252 | |||||
| L27.0 | 274 | 34504 | 1114979 | 556 | 1619 | 375 | 5324 | 273 | 28451 | |||||
| L27.1 | 78 | 13477 | 234268 | 220 | 545 | 186 | 1260 | 128 | 6088 | |||||
| N14.1 | 28 | 9180 | 82075 | 105 | 387 | 128 | 335 | 99 | 2215 | |||||
| O35.5 | 128 | 10567 | 121849 | 278 | 882 | 223 | 1654 | 125 | 3894 | |||||
| T59.9 | 40 | 5803 | 47694 | 81 | 165 | 104 | 317 | 76 | 1467 | |||||
| T78.2 | 102 | 13341 | 188250 | 208 | 602 | 200 | 1063 | 200 | 5384 | |||||
| T78.3 | 266 | 22659 | 411014 | 393 | 1178 | 282 | 2454 | 208 | 9967 | |||||
| T78.4 | 1520 | 46575 | 1633049 | 926 | 4571 | 463 | 9567 | 370 | 39883 | |||||
| T80.8 | 732 | 39077 | 1655988 | 709 | 5323 | 425 | 9890 | 269 | 35283 | |||||
| T88.6 | 96 | 15137 | 227317 | 240 | 549 | 209 | 1290 | 185 | 6325 | |||||
| T88.7 | 564 | 42794 | 1436333 | 767 | 3303 | 467 | 7263 | 306 | 41793 | |||||
Predictive performance with the three strategies over 27 datasets
| Dataset | Accuracy % (Rank) | AUC (Rank) | ||||
|---|---|---|---|---|---|---|
| FDR | RDR-FS | RDR-ALL | FDR | RDR-FS | RDR-ALL | |
| D642 | 95.19 (1.5) | 93.76 (3.0) | 95.19 (1.5) | 0.974 (1.0) | 0.967 (3.0) | 0.969 (2.0) |
| E273 | 77.50 (3.0) | 80.00 (2.0) | 85.00 (1.0) | 0.923 (2.0) | 0.954 (1.0) | 0.902 (3.0) |
| F110 | 91.25 (3.0) | 92.92 (1.5) | 92.92 (1.5) | 0.956 (3.0) | 0.966 (1.0) | 0.958 (2.0) |
| F112 | 88.27 (3.0) | 88.98 (2.0) | 90.27 (1.0) | 0.950 (2.0) | 0.937 (3.0) | 0.960 (1.0) |
| F130 | 90.83 (2.0) | 90.83 (2.0) | 90.83 (2.0) | 0.958 (2.0) | 0.952 (3.0) | 0.961 (1.0) |
| F132 | 89.58 (1.5) | 86.67 (3.0) | 89.58 (1.5) | 0.938 (2.0) | 0.930 (3.0) | 0.977 (1.0) |
| F150 | 90.00 (2.0) | 90.00 (2.0) | 90.00 (2.0) | 0.876 (2.0) | 0.875 (3.0) | 0.876 (1.0) |
| F151 | 89.17 (2.0) | 90.00 (1.0) | 85.00 (3.0) | 0.990 (1.0) | 0.954 (3.0) | 0.990 (2.0) |
| F152 | 94.97 (2.0) | 94.58 (3.0) | 95.32 (1.0) | 0.979 (1.0) | 0.978 (2.0) | 0.977 (3.0) |
| F190 | 90.83 (2.0) | 90.00 (3.0) | 90.95 (1.0) | 0.958 (2.0) | 0.960 (1.0) | 0.957 (3.0) |
| F191 | 87.50 (2.0) | 83.75 (3.0) | 88.75 (1.0) | 0.961 (3.0) | 0.962 (2.0) | 0.977 (1.0) |
| F192 | 90.21 (1.0) | 89.90 (3.0) | 90.19 (2.0) | 0.942 (3.0) | 0.956 (1.0) | 0.947 (2.0) |
| F199 | 87.08 (2.0) | 82.92 (3.0) | 88.75 (1.0) | 0.959 (1.0) | 0.939 (3.0) | 0.955 (2.0) |
| G240 | 87.50 (3.0) | 90.00 (2.0) | 92.50 (1.0) | 1.000 (1.0) | 0.924 (3.0) | 0.973 (2.0) |
| G620 | 90.00 (1.5) | 85.00 (2.0) | 90.00 (1.5) | 0.900 (2.0) | 0.900 (2.0) | 0.900 (2.0) |
| I952 | 87.50 (2.0) | 85.00 (3.0) | 88.75 (1.0) | 0.932 (2.0) | 0.883 (3.0) | 0.956 (1.0) |
| L270 | 85.05 (1.0) | 83.60 (3.0) | 84.31 (2.0) | 0.917 (2.0) | 0.917 (3.0) | 0.920 (1.0) |
| L271 | 73.33 (3.0) | 74.58 (2.0) | 78.33 (1.0) | 0.798 (3.0) | 0.804 (1.0) | 0.799 (2.0) |
| N141 | 70.00 (2.0) | 67.50 (3.0) | 72.50 (1.0) | 0.800 (3.0) | 0.830 (1.0) | 0.825 (2.0) |
| O355 | 99.17 (2.5) | 99.17 (2.5) | 100.0 (1.0) | 1.000 (2.0) | 1.000 (2.0) | 1.000 (2.0) |
| T599 | 92.50 (2.5) | 97.50 (1.0) | 92.50 (2.5) | 1.000 (2.0) | 1.000 (2.0) | 1.000 (2.0) |
| T782 | 84.17 (3.0) | 85.17 (2.0) | 88.17 (1.0) | 0.925 (2.0) | 0.924 (3.0) | 0.931 (1.0) |
| T783 | 90.00 (2.0) | 88.16 (3.0) | 90.36 (1.0) | 0.951 (2.0) | 0.946 (3.0) | 0.955 (1.0) |
| T784 | 93.16 (2.0) | 92.17 (3.0) | 93.68 (1.0) | 0.981 (2.0) | 0.982 (1.0) | 0.980 (3.0) |
| T808 | 94.93 (1.0) | 93.97 (3.0) | 94.92 (2.0) | 0.982 (1.0) | 0.979 (3.0) | 0.982 (2.0) |
| T886 | 84.50 (2.0) | 84.50 (2.0) | 84.50 (2.0) | 0.914 (1.0) | 0.882 (3.0) | 0.910 (2.0) |
| T887 | 84.03 (1.0) | 83.14 (2.0) | 82.62 (3.0) | 0.896 (1.0) | 0.892 (2.0) | 0.890 (3.0) |
| Mean | 88.08 (2.1) | 87.55 (2.4) | 89.11 (1.5) | 0.939 (1.9) | 0.933 (2.3) | 0.942 (1.9) |
|
| 0.0023 | 0.2540 | ||||
Fig. 1Ensemble inspection: average tree and ensemble performance with the three strategies. The deltas indicate the amount of diversity in the ensembles
Ensemble inspection: average tree accuracy and diversity with the three strategies
| Strategy | Average tree accuracy | Diversity | ||||
|---|---|---|---|---|---|---|
| Mean score | Mean rank |
| Mean score | Mean rank |
| |
| FDR | 73.22 | 2.0 | <0.0001 | 0.15 | 1.9 | <0.0001 |
| RDR-FS | 70.99 | 3.0 | 0.16 | 1.6 | ||
| RDR-ALL | 76.02 | 1.0 | 0.13 | 2.6 | ||
P-values of pairwise differences between the three strategies w.r.t average tree accuracy and diversity
| Average tree accuracy | Diversity | |
|---|---|---|
| FDR vs. RDR-FS | 0.00007 | 0.27630 |
| FDR vs. RDR-ALL | 0.00001 | 0.00650 |
| RDR-FS vs. RDR-ALL | <0.0001 | 0.00004 |
Fig. 2Predictive performance as the size of the semantic space ensemble is varied
Average performance with different semantic space ensemble sizes
| Pool size | Accuracy | AUC | ||||
|---|---|---|---|---|---|---|
| Mean score | Mean rank |
| Mean score | Mean rank |
| |
| 1 | 84.80 | 8.7 | <0.0001 | 0.934 | 9.3 | <0.0001 |
| 2 | 86.12 | 6.5 | 0.945 | 8.0 | ||
| 3 | 86.31 | 5.6 | 0.949 | 7.0 | ||
| 4 | 86.43 | 5.2 | 0.950 | 6.4 | ||
| 5 | 86.45 | 4.9 | 0.951 | 5.7 | ||
| 6 | 86.45 | 5.4 | 0.952 | 4.6 | ||
| 7 | 86.43 | 4.7 | 0.953 | 3.7 | ||
| 8 | 86.60 | 4.4 | 0.953 | 3.3 | ||
| 9 | 86.27 | 5.1 | 0.954 | 3.1 | ||
| 10 | 86.98 | 4.4 | 0.957 | 3.8 | ||