| Literature DB >> 29756355 |
Melissa D Curtis1, Sandra D Griffith1, Melisa Tucker1, Michael D Taylor2, William B Capra2, Gillis Carrigan2, Ben Holzman1, Aracelis Z Torres1, Paul You1, Brandon Arnieri2, Amy P Abernethy1.
Abstract
OBJECTIVE: To create a high-quality electronic health record (EHR)-derived mortality dataset for retrospective and prospective real-world evidence generation. DATA SOURCES/STUDYEntities:
Keywords: Mortality data; data quality; electronic health records; external validation; oncology
Mesh:
Year: 2018 PMID: 29756355 PMCID: PMC6232402 DOI: 10.1111/1475-6773.12872
Source DB: PubMed Journal: Health Serv Res ISSN: 0017-9124 Impact factor: 3.402
Validation Metrics for Mortality Data
| NDI data | ||||
|---|---|---|---|---|
| Deceased | Alive | |||
| Flatiron Health composite data | Deceased | True positives (A) | False positives (B) | PPV = A/(A + B) |
| Alive | False negatives (C) | True negatives (D) | NPV = D/(C + D) | |
| Sensitivity = A/(A + C) | Specificity = D/(B + D) | |||
For sensitivity and specificity analyses, an individual was placed into one of the four categories (A, B, C, or D), depending on how a patient's mortality status from the composite death date agreed with that from the NDI. True positives (A) were all individuals with a death date in both the composite dataset and the NDI. False positives (B) were all individuals with a death date in the composite dataset but not in the NDI. False negatives (C) were all individuals without a death date in the composite death dataset but with a death date in the NDI. True negatives (D) were all individuals who did not have a death date in the composite death dataset or in the NDI. Sensitivity indicated the percent of deaths in the NDI that were correctly recorded in the composite dataset, computed as the proportion of true positives among all the positives in the NDI gold standard [A/(A + C)]. Specificity indicated the percent of individuals without a death date in the NDI who were also not recorded as deceased in the composite dataset, computed as the proportion of true negatives among all the negatives in the NDI gold standard [D/(B + D)]. PPV indicated the percent of individuals with a death date in the composite dataset who were also considered dead in the NDI gold standard dataset [A/(A + B)]. NPV indicated the percent of individuals without a date of death in the composite dataset who were also not recorded as deceased in the NDI gold standard [D/(C + D)]. Date agreement indicated the percentage of the composite death dates that were exactly the same between NDI and the composite dataset; patients without a death in NDI but with a death in the composite dataset were counted as a disagreement in the date agreement calculation. Date agreement was also calculated allowing for a ±15‐day window and a ±30‐day window.
Validation Metrics during Each Step in the Development of the Mortality Variable for the Advanced NSCLC Cohort
| Sensitivity | Specificity | PPV | NPV | Date Agreement (Exact Date) | Date Agreement (±15 days) | Date Agreement (±30 days) | |
|---|---|---|---|---|---|---|---|
| Structured EHR only (EHR) | 65.97% (64.84%, 67.09%) | 97.06% (96.49%, 97.63%) | 97.82% (97.40%, 98.24%) | 58.78% (57.50%, 60.07%) | 88.70% (87.72%, 89.67%) | 96.53% (95.99%, 97.07%) | 96.99% (96.49%, 97.49%) |
| SSDI only | 34.73% (33.59%, 35.86%) | 99.06% (98.73%, 99.38%) | 98.66% (98.20%, 99.12%) | 43.15% (42.05%, 44.25%) | 97.28% (96.62%, 97.94%) | 98.45% (97.95%, 98.95%) | 98.49% (98.00%, 98.99%) |
| EHR‐CDD1 | 84.06% (83.19%, 84.93%) | 96.26% (95.63%, 96.90%) | 97.83% (97.45%, 98.20%) | 75.13% (73.85%, 76.42%) | 92.33% (91.62%, 93.04%) | 96.87% (96.41%, 97.32%) | 97.41% (97.00%, 97.83%) |
| EHR‐CDD1 + SSDI | 88.83% (88.08%, 89.58%) | 96.06% (95.40%, 96.71%) | 97.83% (97.46%, 98.19%) | 81.14% (79.93%, 82.35%) | 93.83% (93.21%, 94.45%) | 97.07% (96.64%, 97.49%) | 97.52% (97.13%, 97.91%) |
| EHR‐CDD1‐SSDI + ABS (Final v2.0) | 90.60% (89.90%, 91.29%) | 96.00% (95.34%, 96.66%) | 97.84% (97.48%, 98.20%) | 83.62% (82.46%, 84.78%) | 93.50% (92.87%, 94.13%) | 97.00% (96.57%, 97.42%) | 97.49% (97.10%, 97.88%) |
For each step in the variable development process, 95% CIs are shown. This cohort included patients diagnosed with advanced NSCLC on or after January 1, 2011, and through December 31, 2015 (N = 10,195).
Figure 1Overall Survival for Advanced NSCLC Determined Using Indicated Mortality Data
Notes. NDI data were used as the benchmark in this study and were assumed to have 100 percent completeness. Patients were excluded from this analysis if their death date fell before the advanced diagnosis date.
Figure 2Sensitivity of advNSCLC Data by Practice
Notes. Data were restricted to practices with ≥100 patients. Boxplots show the median sensitivity, with lower and upper hinges of the boxes corresponding to the 25 and 75 percent interquartile range (IQR); lower and upper whiskers indicate sensitivity within 1.5 IQR of the lower and upper quantiles, respectively; and points outside of the whiskers show the rest of the data.
Validation Metrics for Different Tumor Types, with Data Shown for the Final Mortality Variable That Comprises Structured EHR Data, CDD1, SSDI, and Abstraction of Unstructured EHR Data
| Sensitivity | Specificity | PPV | NPV | Date Agreement (Exact) | Date Agreement (±15 days) | Date Agreement (±30 days) | |
|---|---|---|---|---|---|---|---|
| advNSCLC ( | 89.70% (88.76%, 90.64%) | 97.30% (96.70%, 97.89%) | 97.89% (97.43%, 98.36%) | 87.09% (85.92%, 88.25%) | 93.38% (92.55%, 94.22%) | 96.96% (96.40%, 97.53%) | 97.54% (97.03%, 98.05%) |
| advMelanoma ( | 88.39% (85.97%, 90.81%) | 98.84% (98.16%, 99.52%) | 98.18% (97.12%, 99.25%) | 92.33% (90.69%, 93.97%) | 95.37% (93.66%, 97.09%) | 97.36% (96.06%, 98.65%) | 97.85% (96.68%, 99.02%) |
| mCRC ( | 85.26% (83.97%, 86.54%) | 98.23% (97.84%, 98.62%) | 96.97% (96.30%, 97.63%) | 90.93% (90.12%, 91.75%) | 91.95% (90.85%, 93.05%) | 95.84% (95.05%, 96.63%) | 96.69% (95.99%, 97.40%) |
| mBC ( | 86.95% (85.11%, 88.80%) | 98.49% (98.01%, 98.96%) | 96.70% (95.67%, 97.73%) | 93.68% (92.75%, 94.60%) | 91.40% (89.70%, 93.09%) | 95.83% (94.65%, 97.01%) | 96.26% (95.15%, 97.38%) |
The cohorts here included patients with the respective diagnoses on or after January 1, 2013, and through December 31, 2015, as data for advMelanoma, mCRC, and mBC were available from this date. The advNSCLC cohort was restricted to the same date range here to enable comparisons of data across the cohorts. 95% CIs are shown.