| Literature DB >> 31802967 |
Joe Alexander1, Roger A Edwards2, Luigi Manca3, Roberto Grugni3, Gianluca Bonfanti3, Birol Emir4, Ed Whalen4, Steve Watt1, Marina Brodsky5, Bruce Parsons6.
Abstract
PURPOSE: Variability in patient treatment responses can be a barrier to effective care. Utilization of available patient databases may improve the prediction of treatment responses. We evaluated machine learning methods to predict novel, individual patient responses to pregabalin for painful diabetic peripheral neuropathy, utilizing an agent-based modeling and simulation platform that integrates real-world observational study (OS) data and randomized clinical trial (RCT) data. PATIENTS AND METHODS: The best supervised machine learning methods were selected (through literature review) and combined in a novel way for aligning patients with relevant subgroups that best enable prediction of pregabalin responses. Data were derived from a German OS of pregabalin (N=2642) and nine international RCTs (N=1320). Coarsened exact matching of OS and RCT patients was used and a hierarchical cluster analysis was implemented. We tested which machine learning methods would best align candidate patients with specific clusters that predict their pain scores over time. Cluster alignments would trigger assignments of cluster-specific time-series regressions with lagged variables as inputs in order to simulate "virtual" patients and generate 1000 trajectory variations for given novel patients.Entities:
Keywords: agent-based modeling and simulation; coarsened exact matching; hierarchical cluster analysis; machine learning; time series regressions
Year: 2019 PMID: 31802967 PMCID: PMC6827520 DOI: 10.2147/POR.S214412
Source DB: PubMed Journal: Pragmat Obs Res ISSN: 1179-7266
Figure 1Simulation steps. Reproduced from Alexander J, Edwards RA, Brodsky M, et al Using time-series analysis approaches for improved prediction of pain outcomes in subgroups of patients with painful diabetic peripheral neuropathy. PLoS One. 2018;13(12):e0207120. Creative commons license and disclaimer available from .8
Abbreviations: OS, observational study; PDF, probability density function; RCT, randomized controlled trial.
Results for Training Dataset and Testing Dataset for kNN Method Only, SFCM Method Only, and the Ensemble Method
| Training Data | Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 | Cluster 5 | Cluster 6 | Total | |
|---|---|---|---|---|---|---|---|---|
| Total | 431 | 189 | 437 | 266 | 127 | 316 | 1766 | |
| N° of correct classifications | kNN only | 415 | 71 | 232 | 188 | 3 | 141 | 1050 |
| SFCM only | 246 | 127 | 101 | 92 | 45 | 130 | 741 | |
| Ensemble method* | 431 | 188 | 432 | 262 | 123 | 301 | 1737 | |
| Accuracy ratio | kNN only | 96.3% | 37.6% | 53.1% | 70.7% | 2.4% | 44.6% | 59.5% |
| SFCM only | 57.1% | 67.2% | 23.1% | 34.6% | 35.4% | 41.1% | 42.0% | |
| Ensemble method* | 100.0% | 99.5% | 98.9% | 98.5% | 96.9% | 95.3% | 98.4% | |
| Total | 148 | 94 | 189 | 207 | 68 | 170 | 876 | |
| N° of correct classifications | kNN only | 136 | 37 | 111 | 108 | 0 | 55 | 447 |
| SFCM only | 70 | 69 | 49 | 39 | 17 | 49 | 293 | |
| Ensemble method* | 148 | 92 | 183 | 196 | 64 | 157 | 840 | |
| Accuracy ratio (%) | kNN only | 91.9% | 39.4% | 58.7% | 52.2% | 0.0% | 32.4% | 51.0% |
| SFCM only | 47.3% | 73.4% | 25.9% | 18.8% | 25.0% | 28.8% | 33.4% | |
| Ensemble method* | 100.0% | 97.9% | 96.8% | 94.7% | 94.1% | 92.4% | 95.9% | |
Note: *The Ensemble Method consists of three steps, shown in Figure 3.
Abbreviations: kNN, k-nearest neighbors; SFCM, supervised fuzzy c-means.
Figure 2Accuracy results for the kNN method only, SFCM method only, and the ensemble method in (A) training dataset by cluster, (B) testing dataset by cluster, and (C) overall testing and training datasets.
Abbreviations: kNN, k-nearest neighbors; SFCM, supervised fuzzy c-means.
Figure 3Ensemble method flowchart.