| Literature DB >> 33280248 |
Christiaan H Righolt1, Geng Zhang1, Salaheddin M Mahmud1.
Abstract
Characterizing long-term prescription data is challenging due to the time-varying nature of drug use. Conventional approaches summarize time-varying data into categorical variables based on simple measures, such as cumulative dose, while ignoring patterns of use. The loss of information can lead to misclassification and biased estimates of the exposure-outcome association. We introduce a classification method to characterize longitudinal prescription data with an unsupervised machine learning algorithm. We used administrative databases covering virtually all 1.3 million residents of Manitoba and explicitly designed features to describe the average dose, proportion of days covered (PDC), dose change, and dose variability, and clustered the resulting feature space using K-means clustering. We applied this method to metformin use in diabetes patients. We identified 27,786 metformin users and showed that the feature distributions of their metformin use are stable for varying the lengths of follow-up and that these distributions have clear interpretations. We found six distinct metformin user groups: patients with intermittent use, decreasing dose, increasing dose, high dose, and two medium dose groups (one with stable dose and one with highly variable use). Patients in the varying and decreasing dose groups had a higher chance of progression of diabetes than other patients. The method presented in this paper allows for characterization of drug use into distinct and clinically relevant groups in a way that cannot be obtained from merely classifying use by quantiles of overall use.Entities:
Keywords: Clustering; K-means; drug exposure; machine learning; pharmacoepidemiology
Year: 2020 PMID: 33280248 PMCID: PMC7719192 DOI: 10.1002/prp2.687
Source DB: PubMed Journal: Pharmacol Res Perspect ISSN: 2052-1707
Figure 1Cumulative distribution functions of drug use features according to the length of prescription data since the first dispensed metformin prescription. The defined daily dose (DDD) is 2 g for metformin
Figure 2Within‐cluster sum of squares according to the number of clusters
Figure 3Average dose of the patients within each cluster over time according to the number of groups (K = 4 ‐ 7) for K‐means clustering. The defined daily dose (DDD) is 2 g for metformin
Figure 4Scatter plots of drug use for each individual patient for each pair of features after classification into six clusters. The defined daily dose (DDD) is 2 g for metformin
Number (percentage) of diabetes cases according to metformin use pattern groups with 5 years of follow‐up according to certain socio‐economic and clinical characteristics
| K‐means groups | ||||||
|---|---|---|---|---|---|---|
|
Intermittent user (N = 4475) |
Decreasing dose (N = 1953) |
Medium dose (N = 9166) |
Varying dose (N = 2422) |
Increasing dose (N = 4581) |
High dose (N = 5189) | |
| Gender | ||||||
| Male | 2,223 (49.7%) | 1,016 (52.0%) | 4,626 (50.5%) | 1,303 (53.8%) | 2,530 (55.2%) | 2,971 (57.3%) |
| Female | 2,252 (50.3%) | 937 (48.0%) | 4,540 (49.5%) | 1,119 (46.2%) | 2,051 (44.8%) | 2,218 (42.7%) |
| Age at the diagnosis date of diabetes | ||||||
| <45 | 1,488 (33.3%) | 760 (38.9%) | 1,480 (16.1%) | 1,152 (47.6%) | 1,140 (24.9%) | 1,212 (23.4%) |
| 45 ‐ 54 | 1,154 (25.8%) | 546 (28.0%) | 2,457 (26.8%) | 670 (27.7%) | 1,599 (34.9%) | 1,717 (33.1%) |
| 55 ‐ 64 | 940 (21.0%) | 383 (19.6%) | 2,600 (28.4%) | 398 (16.4%) | 1,206 (26.3%) | 1,461 (28.2%) |
| 65+ | 893 (20.0%) | 264 (13.5%) | 2,629 (28.7%) | 202 (8.3%) | 636 (13.9%) | 799 (15.4%) |
| Income quintile | ||||||
| Q1 (lowest) | 1,286 (28.7%) | 589 (30.2%) | 1,930 (21.1%) | 811 (33.5%) | 1,101 (24.0%) | 1,169 (22.5%) |
| Q2 | 975 (21.8%) | 453 (23.2%) | 1,982 (21.6%) | 515 (21.3%) | 1,021 (22.3%) | 1,143 (22.0%) |
| Q3 | 825 (18.4%) | 334 (17.1%) | 1,886 (20.6%) | 380 (15.7%) | 884 (19.3%) | 955 (18.4%) |
| Q4 | 765 (17.1%) | 326 (16.7%) | 1,852 (20.2%) | 412 (17.0%) | 851 (18.6%) | 1,082 (20.9%) |
| Q5 (highest) | 605 (13.5%) | 235 (12.0%) | 1,435 (15.7%) | 290 (12.0%) | 694 (15.1%) | 797 (15.4%) |
| Unknown | 19 (0.4%) | 16 (0.8%) | 81 (0.9%) | 14 (0.6%) | 30 (0.7%) | 43 (0.8%) |
| Residence | ||||||
| Rural | <1,956 (<43.7%) | <883 (<45.2%) | 3,682 (40.2%) | <1,121 (<46.3%) | <1,902 (<41.5%) | 2,250 (43.4%) |
| Urban | 2,517 (56.2%) | 1,067 (54.6%) | 5,461 (59.6%) | 1,301 (53.7%) | 2,675 (58.4%) | 2,919 (56.3%) |
| Unknown | <6 (<0.1%) | <6 (<0.3%) | 23 (0.3%) | <6 (<0.2%) | <6 (<0.1%) | 20 (0.4%) |
| Diabetes progression | 430 (9.6%) | 347 (17.8%) | 953 (10.4%) | 431 (17.8%) | 553 (12.1%) | 762 (14.7%) |
| Insulin use | 217 (4.8%) | 210 (10.8%) | 393 (4.3%) | 289 (11.9%) | 391 (8.5%) | 472 (9.1%) |
| Diabetes complication | 213 (4.8%) | 138 (7.1%) | 560 (6.1%) | 142 (5.9%) | 162 (3.5%) | 290 (5.6%) |
| No. of physician visits during the 5‐year period before diabetes diagnosis | ||||||
| 1 ‐ 11 | 773 (17.3%) | 421 (21.6%) | 1,219 (13.3%) | 477 (19.7%) | 759 (16.6%) | 991 (19.1%) |
| 12 ‐ 24 | 980 (21.9%) | 423 (21.7%) | 1,874 (20.4%) | 611 (25.2%) | 1,053 (23.0%) | 1,178 (22.7%) |
| 25 ‐ 44 | 1,159 (25.9%) | 500 (25.6%) | 2,725 (29.7%) | 646 (26.7%) | 1,263 (27.6%) | 1,382 (26.6%) |
| 45+ | 1,563 (34.9%) | 609 (31.2%) | 3,348 (36.5%) | 688 (28.4%) | 1,506 (32.9%) | 1,638 (31.6%) |
Insulin use or diabetic complications after initiating metformin.