| Literature DB >> 31910437 |
Qiong Wang1,2,3, Jenna M Reps3,4, Kristin Feeney Kostka3,5, Patrick B Ryan3,4,6, Yuhui Zou7, Erica A Voss3,4,8, Peter R Rijnbeek3,8, RuiJun Chen3,6,9, Gowtham A Rao3,4, Henry Morgan Stewart3,5, Andrew E Williams3,10, Ross D Williams3,8, Mui Van Zandt3,5, Thomas Falconer3,6, Margarita Fernandez-Chas3,5, Rohit Vashisht3,11, Stephen R Pfohl3,11, Nigam H Shah3,11, Suranga N Kasthurirathne3,12,13, Seng Chan You3,14, Qing Jiang1, Christian Reich3,5, Yi Zhou15.
Abstract
BACKGROUND ANDEntities:
Mesh:
Year: 2020 PMID: 31910437 PMCID: PMC6946584 DOI: 10.1371/journal.pone.0226718
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Patient-level characteristics across data sources.
| Data Source | Coverage | Data Type | No. of Patients | % | Time, year (y) | ||
|---|---|---|---|---|---|---|---|
| Female | Male | Start | End | ||||
| USA | Electronic Health Records | 93,423,000 | 54.0 | 46.0 | 2006 | 2018 | |
| USA | Claims | 142,660,000 | 51.2 | 48.8 | 2000 | 2018 | |
| USA | Claims | 9,964,100 | 55.3 | 44.7 | 2000 | 2018 | |
| USA | Claims | 26,299,000 | 56.8 | 43.2 | 2006 | 2017 | |
| Japan | Claims | 5,550,200 | 53.8 | 46.2 | 2005 | 2018 | |
| Germany | Outpatient Primary Care | 36,078,000 | 56.5 | 43.5 | 1992 | 2018 | |
| USA | Hospital Claims | 88,815,000 | 56.1 | 43.9 | 2007 | 2018 | |
| USA | Claims | 153,008,000 | 50.9 | 49.1 | 2010 | 2018 | |
| USA | Pre-adjudicated Pharmacy and Medical Claims | 654,052,000 | 53.0 | 47.0 | 2010 | 2019 | |
| USA | Electronic Health Records | 3,113,080 | 53.9 | 46.1 | 2000 | 2018 | |
| USA | Electronic Health Record | 19,420,000 | 46.5 | 53.5 | 2005 | 2019 | |
Table 1 Shows the Characteristics of the 11 datasets we studied.
aDataset used to develop prediction model
Model performance across OHDSI data network.
| Database | T | O | Incidence | AUROC | AUPRC |
|---|---|---|---|---|---|
| 621,178 (155,259) | 5,624 (1,406) | 0.91 | Test: 0.75 | Test: 0.04 | |
| 274,384 | 4,836 | 1.76 | 0.76 | 0.07 | |
| 441,939 | 6,772 | 1.53 | 0.72 | 0.05 | |
| 151,876 | 1,629 | 1.07 | 0.75 | 0.04 | |
| 20,181 | 31 | 0.15 | 0.7 | 0.05 | |
| 41,311 | 45 | 0.11 | 0.5 | 0.02 | |
| 191,036 | 2011 | 1.05 | 0.68 | 0.02 | |
| 556,151 | 9951 | 1.79 | 0.78 | 0.11 | |
| 4,331,167 | 54,973 | 1.27 | 0.60 | 0.03 | |
| 7,930 | 142 | 1.79 | 0.76 | 0.12 | |
| 55,684 | 387 | 0.69 | 0.69 | 0.03 |
Table 2 shows the model performance across the11 datasets we studied. T = Target cohort, O = Outcome cohort, AUROC = Area under the receiver operating curve and AUPRC = Area under the precision recall curve.
a Development database
Fig 1Model performance across OHDSI network sites.
Fig 1 shows the receiver operating characteristic plot and calibration plot for the internal and external validation.
Patients with risk greater than 1% population average risk and patients with risk less than 1% population average risk.
| Predicted risk | This percentage of patients would be flagged | Risk in flagged group (%) | Risk relative to population | |
|---|---|---|---|---|
| 0.060 | 1% | 10.0 | 10 | |
| 0.020 | 6.8% | 4.4 | 4.4 | |
| 0.013 | 14% | 3.1 | 3.1 | |
| 0.000 | 100% | 1.0 | 1 | |
| 0.003 | 10% | 0,025 | 1/40 | |
| 0.005 | 38% | 0.12 | 1/8 | |
| 0.013 | 86% | 0.47 | 1/2 | |
| 0.000 | 100% | 1.0 | 1 |
Table 3 illustrates various cut off values. Flagging patients as high risk with a predicted risk higher than certain predicted risk could be used to identify a subset of the patients that have a higher than average risk. Flagging patients as low risk lower than certain predicted risk could be used to identify a subset of the patients that have a lower than average risk.