| Literature DB >> 35365510 |
Seung Eun Yi1,2, Vinyas Harish3,4,5,6, Jahir Gutierrez1, Mathieu Ravaut7, Kathy Kornas3, Tristan Watson3,8, Tomi Poutanen1, Marzyeh Ghassemi2,6, Maksims Volkovs1, Laura C Rosella9,4,5,6,8,10.
Abstract
OBJECTIVE: To predict older adults' risk of avoidable hospitalisation related to ambulatory care sensitive conditions (ACSC) using machine learning applied to administrative health data of Ontario, Canada. DESIGN, SETTING AND PARTICIPANTS: A retrospective cohort study was conducted on a large cohort of all residents covered under a single-payer system in Ontario, Canada over the period of 10 years (2008-2017). The study included 1.85 million Ontario residents between 65 and 74 years old at any time throughout the study period. DATA SOURCES: Administrative health data from Ontario, Canada obtained from the (ICES formely known as the Institute for Clinical Evaluative Sciences Data Repository. MAIN OUTCOME MEASURES: Risk of hospitalisations due to ACSCs 1 year after the observation period.Entities:
Keywords: health policy; public health; quality in health care
Mesh:
Year: 2022 PMID: 35365510 PMCID: PMC8977821 DOI: 10.1136/bmjopen-2021-051403
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Figure 1Overview of the data preparation. Panel A shows the patient selection process. For a given patient, only the information when they are between 65 and 74 years old was kept and the rest was discarded. This was done for the whole study period (between 2008 and 2017, included). Panel B then shows the construction of instances for the eligible patients—each instance consisted of 2 years of observation window, 1 year of buffer where we did not have any information about the patient, and 3 months of target window. This instance was therefore a summary of the patient health information. Note that one patient could generate multiple instances. The first instance had an observation window from January 2008 to December 2009, a buffer from January 2010 to December 2010 and a target window from January 2011 to March 2011, all included. The last instance had an observation window from October 2014 to September 2016, a buffer from October 2016 to September 2017 and a target window from October 2017 to December 2017, all included. Instances for the same patient did not share any time period of the target window with each other, meaning each patient could have a maximum of 28 instances with non-overlapping target widows. This was often not the case due to the exclusion criteria and the individuals not fitting the age group of 65–74 years old anymore. For model development, as shown in panel C, we split our data based on patients and study period. We trained and validated our prediction model with data between 2008 and 2014, on two different sets of patients. Then, we tested the model on two types of data: data of the patients who were already used for training and validation between 2014 and 2018, and data of the patients who entered the cohort in 2014 and onwards. The latter consisted of ‘young’ patients between 65 and 66 years old at the beginning of the observation period. Finally, panel D shows the types of information that was extracted from each patient. ACSC, ambulatory care sensitive condition; LHIN, Local Health-Integrated Network.
Characteristics of the patients in the study
| Train | Validation | Test | ||||
| Total | Positives | Total | Positives | Total | Positives | |
| Cohort | ||||||
| 1 237 507 | 119 728 | 309 380 | 30 224 | 1 375 277 | 80 788 | |
| 16 923 230 | 180 152 | 4 225 605 | 45 571 | 9 862 034 | 104 676 | |
| Demographics | ||||||
| 8 845 443 (52.3%) | 93 333 (51.8%) | 2 213 395 (52.4%) | 23 542 (51.7%) | 5 153 174 (52.3%) | 54 368 (51.9%) | |
| 8 077 787 (47.7%) | 86 819 (48.2%) | 2 012 210 (47.6%) | 22 029 (48.3%) | 4 708 860 (47.7%) | 50 308 (48.1%) | |
| 69.0±2.86 | 69.1±2.88 | 69.0±2.86 | 69.1±2.87 | 69.0±2.81 | 69.1±2.84 | |
| 1 619 193 (9.57%) | 16 686 (9.26%) | 405 211 (9.59%) | 4007 (8.79%) | 1 071 959 (10.9%) | 10 878 (10.4%) | |
| 15 304 037 (90.4%) | 163 466 (90.7%) | 3 820 394 (90.4%) | 41 564 (91.2%) | 8 790 075 (89.1%) | 93 798 (89.6%) | |
| Geography | ||||||
| 2 479 688 (14.7%) | 29 056 (16.1%) | 616 941 (14.6%) | 7069 (15.5%) | 1 340 756 (13.6%) | 15 499 (14.8%)) | |
| Income quintile | ||||||
| 2 982 780 (17.6%) | 33 814 (18.8%) | 736 716 (17.4%) | 8612 (18.9%) | 1 842 945 (18.7%) | 20 940 (20.0%) | |
| 3 335 222 (19.7%) | 36 536 (20.3%) | 835 276 (19.8%) | 8847 (19.4%) | 2 007 802 (20.4%) | 21 849 (20.9%) | |
| 3 351 614 (19.8%) | 35 237 (19.6%) | 838 012 (19.8%) | 9043 (19.8%) | 1 997 302 (20.3%) | 21 266 (20.3%) | |
| 3 507 494 (20.7%) | 36 756 (20.4%) | 882 844 (20.9%) | 9421 (20.7%) | 1 913 979 (19.4%) | 19 838 (19.0%) | |
| 3 702 063 (21.9%) | 37 314 (20.7%) | 921 278 (21.8%) | 9521 (20.9%) | 2 084 008 (21.1%) | 20 613 (19.7%) | |
| Education quintile | ||||||
| 3 075 278 (18.2%) | 34 957 (19.4%) | 762 687 (18.0%) | 8779 (19.3%) | 1 706 122 (17.3%) | 19 391 (18.5%) | |
| 3 385 511 (20.0%) | 37 128 (20.6%) | 841 692 (19.9%) | 9481 (20.8%) | 1 937 907 (19.7%) | 21 469 (20.5%) | |
| 3 493 804 (20.6%) | 36 945 (20.5%) | 870 402 (20.6%) | 9190 (20.2%) | 2 030 870 (20.6%) | 21 454 (20.5%) | |
| 3 500 217 (20.7%) | 36 902 (20.5%) | 877 965 (20.8%) | 9299 (20.4%) | 2 084 940 (21.1%) | 21 550 (20.6%) | |
| 3 357 516 (19.8%) | 33 022 (18.3%) | 844 839 (20.0%) | 8485 (18.6%) | 2 034 100 (20.6%) | 20 045 (19.1%) | |
| Comorbidities | ||||||
| 709 309 (4.19%) | 9667 (5.37%) | 177 912 (4.21%) | 2507 (5.50%) | 416 924 (4.23%) | 5696 (5.44%) | |
| 1 530 096 (9.04%) | 18 798 (10.4%) | 382 872 (9.06%) | 4891 (10.7%) | 914 621 (9.27%) | 10 990 (10.5%) | |
| 354 214 (2.09%) | 4455 (2.47%) | 89 539 (2.12%) | 1113 (2.44%) | 225 636 (2.29%) | 2719 (2.60%) | |
| 2 034 989 (12.0%) | 26 649 (14.8%) | 508 111 (12.0%) | 7.093 (15.6%) | 1 246 159 (12.6%) | 16 445 (15.7%) | |
| 2 198 985 (13.0%) | 25 044 (13.9%) | 546 554 (12.9%) | 6210 (13.6%) | 1 317 439 (13.4%) | 14 662 (14.0%) | |
| 838 537 (4.95%) | 14 327 (7.95%) | 210 965 (4.99%) | 3672 (8.06%) | 483 842 (4.91%) | 7974 (7.62%) | |
| 133 502 (0.79%) | 1575 (0.87%) | 32 232 (0.76%) | 389 (0.85%) | 88 715 (0.90%) | 1018 (0.97%) | |
| 2 910 066 (17.2%) | 40 751 (22.6%) | 726 976 (17.2%) | 10 396 (22.8%) | 1 726 397 (17.5%) | 23 837 (22.8%) | |
| 3 510 361 (20.7%) | 43 579 (24.2%) | 879 995 (20.8%) | 11 068 (24.3%) | 1 899 361 (19.3%) | 23 359 (22.3%) | |
| 344 161 (2.03%) | 4454 (2.47%) | 86 853 (2.06%) | 1158 (2.54%) | 206 300 (2.09%) | 2722 (2.60%) | |
| 4 394 754 (26.0%) | 52 155 (29.0%) | 1 095 489 (25.9%) | 13 022 (28.6%) | 2 660 592 (27.0%) | 30 929 (29.5%) | |
| 10 494 913 (62.0%) | 116 393 (64.6%) | 2 622 675 (62.1%) | 29 352 (64.4%) | 6 033 618 (61.2%) | 67 044 (64.0%) | |
| 3 410 850 (20.2%) | 40 032 (22.2%) | 851 249 (20.1%) | 10 117 (22.2%) | 2 244 412 (22.8%) | 26 407 (25.2%) | |
| 7 786 874 (46.0%) | 85 908 (47.7%) | 1 943 700 (46.0%) | 21 771 (47.8%) | 4 801 635 (48.7%) | 52 944 (50.6%) | |
| 10 244 732 (60.5%) | 111 020 (61.6%) | 2 553 893 (60.4%) | 28 336 (62.2%) | 6 131 185 (62.2%) | 66 621 (63.6%) | |
| 1 791 824 (10.6%) | 18 620 (10.3%) | 450 339 (10.7%) | 4656 (10.2%) | 1 032 043 (10.5%) | 10 525 (10.1%) | |
| 727 558 (4.30%) | 10 545 (5.85%) | 180 987 (4.28%) | 2698 (5.92%) | 489 046 (4.96%) | 7004 (6.69%) | |
| 965 743 (5.71%) | 12 601 (6.99%) | 239 406 (5.67%) | 3122 (6.85%) | 547 368 (5.55%) | 7026 (6.71%) | |
| 3 (2, 7) | 3 (2, 5) | 3 (2, 4) | 3 (2, 5) | 3 (2, 5) | 3 (2, 5) | |
*History of mental health-related visit, excluding dementia, deliberate self-harm codes and mood disorder codes.
COPD, chronic obstructive pulmonary disease.
Figure 2(A) Calibration curve. (B) Model evaluation on major subgroups of the population. The incidence rates are shown in blue and the average model prediction in pink. The subgroup sizes are displayed on the x-axis along with the subgroup types. For education and income quintiles, higher index refers to higher education level and income respectively, in the area a given patient lives in. The number of events refers to the number of any interaction a given patient had with the healthcare system—clinician visits, hospitalisation, ambulatory usage, lab tests and drug prescriptions. ACSC, ambulatory care sensitive condition.
Figure 3Global feature importance. Shapley values were generated using 50 000 random samples from the test set. Multiple runs using different samples showed the same ordering of feature contribution. ACSC, ambulatory care sensitive condition; COPD, chronic obstructive pulmonary disease.
Figure 4Distribution of ambulatory care sensitive condition (ACSC) risks predicted by the model compared with the actual variation in ACSC incidence rates in the province, in 2017. (A) The incidence rates of ACSC by sub-Local Health-Integrated Network (LHIN), normalised by the population size of the corresponding sub-LHIN. For (B), we computed the predictions of ACSC with our model for all patients and mapped them for different sub-LHINs. The model (in B) captures the normalised distribution of these patients (in A).