| Literature DB >> 26606521 |
Yimei Li1,2, Matt Hall3, Brian T Fisher4,5, Alix E Seif2, Yuan-Shung Huang5, Rochelle Bagatell2, Kelly D Getz2, Todd A Alonzo6,7, Robert B Gerbing7, Lillian Sung8, Peter C Adamson2,7, Alan Gamis7,9, Richard Aplenc1,2.
Abstract
PURPOSE: Clinical trials data from National Cancer Institute (NCI)-funded cooperative oncology group trials could be enhanced by merging with external data sources. Merging without direct patient identifiers would provide additional patient privacy protections. We sought to develop and validate a matching algorithm that uses only indirect patient identifiers.Entities:
Mesh:
Year: 2015 PMID: 26606521 PMCID: PMC4659568 DOI: 10.1371/journal.pone.0143480
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Data elements available in COG and PHIS.
| Variables | COG | PHIS |
|---|---|---|
| Demographics | Yes | Yes |
| Diagnosis | Yes | Yes (Inferred from ICD-9) |
| Risk stratification data | Yes | No |
| Biological specimen | Yes | No |
| Mortality | Yes | Yes (Inpatient) |
| Leukemia relapse status | Yes | Yes (Inferred from ICD-9) |
| Bone marrow transplant status | Yes | Yes (Inferred from ICD-9) |
| Toxicities | Yes (Reported by Data Managers) | Yes (Inferred from ICD-9 and Resource Data) |
| Medications | No | Yes |
| Procedures | No | Yes |
| Blood bank resources | No | Yes |
| Radiology services/procedures | No | Yes |
| Cost data | No | Yes (PHIS adjusted charge and cost-to-charge ratio) |
The stepwise merge algorithm using indirect patient identifiers.
| Step | ||||
|---|---|---|---|---|
| Matching variables | 1 | 2 | 3 | 4 |
| Treatment site | = | = | = | = |
| Gender | = | = | = | = |
| Birth year | = | = | = | = |
| Birth month | = | = | = | = |
| Enrollment/admission year | = | = | +/- 1 | |
| Enrollment/admission month | = | |||
Matching results using the indirect stepwise algorithm, developed on AAML0531.
Note: The number of unique match from the direct merge method was 383 (92.3%).
| Step 1 | Step 2 | Step 3 | Step 4 | ||
|---|---|---|---|---|---|
| Number of patients available for match | 415 | 211 | 83 | 50 | |
| Number of patients with a unique match (%) | 204 (49.2%) | 128 (60.7%) | 33 (39.8%) | 13 (26.0%) | |
| Number of patients with no match (%) | 204 (49.2%) | 66 (31.2%) | 16 (19.3%) | 16 (32.0%) | |
| Number of patients matched with multiple PHIS records (%) | 5 (1.2%) | 15 (7.0%) | 34 (40.9%) | 21 (42.0%) | |
| Number of patients matched with multiple COG records (%) | 2 (0.4%) | 2 (0.9%) | 0 (0%) | 0 (0%) | |
| Cumulative number of unique matches (%) | 204 (49.2%) | 332 (80.0%) | 365 (88.0%) |
| |
| Criterion 1 | |||||
| Number of unique matches that are concordant with the direct method (%) | 198 (97.1%) | 121 (94.5%) | 33 (100%) | 10 (76.9%) | |
| Cumulative number of unique matches that are concordant with the direct method (%) | 198 (52.4%) | 319 (84.4%) | 352 (93.1%) |
| |
| Criterion 2 | |||||
| Number of unique matches that are concordant with the direct method (%) | 202 (99.0%) | 122 (95.3%) | 33 (100%) | 12 (92.3%) | |
| Cumulative number of unique matches that are concordant with the direct method (%) | 202 (53.4%) | 324 (85.7%) | 357 (94.4%) |
| |
* Criterion 1 considers a match as discordant, if the indirect algorithm yielded a unique match but the direct merge method yielded duplicate matches.
** Criterion 2 considers a match as concordant, if the indirect algorithm yielded a unique match but the direct merge method yielded duplicate matches, and the match in the indirect merge method was among one of the duplicate matches in the direct merge method.
Matching results using the indirect stepwise algorithm, validated on AAML1031.
Note: The number of unique match from the direct merge method was 151 (91.5%).
| Step 1 | Step 2 | Step 3 | Step 4 | ||
|---|---|---|---|---|---|
| Number of patients available for match | 165 | 36 | 14 | 8 | |
| Number of patients with a unique match (%) | 129 (78.2%) | 22 (61.1%) | 6 (42.9%) | 0 (0%) | |
| Number of patients with no match (%) | 30 (18.2%) | 7 (19.4%) | 0 (0%) | 1 (12.5%) | |
| Number of patients matched with multiple PHIS (%) | 6 (3.6%) | 7 (19.4%) | 8 (57.1%) | 7 (87.5%) | |
| Number of patients matched with multiple COG records (%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | |
| Cumulative number of unique matches (%) | 129 (78.2%) | 151 (91.5%) | 157 (95.2%) |
| |
| Criterion 1 | |||||
| Number of unique matches that are concordant with the direct method (%) | 124 (96.1%) | 22 (100%) | 4 (66.7%) | 0 (NA) | |
| Cumulative number of unique matches that are concordant with the direct method (%) | 124 (79.0%) | 146 (93.0%) | 150 (95.5%) |
| |
| Criterion 2 | |||||
| Number of unique matches that are concordant with the direct method (%) | 129 (100%) | 22 (100%) | 4 (66.7%) | 0 (NA) | |
| Cumulative number of unique matches that are concordant with the direct method (%) | 129 (82.2%) | 151 (96.2%) | 155 (98.7%) |
| |
* Criterion 1 considers a match as discordant, if the indirect algorithm yielded a unique match but the direct merge method yielded duplicate matches.
** Criterion 2 considers a match as concordant, if the indirect algorithm yielded a unique match but the direct merge method yielded duplicate matches, and the match in the indirect merge method was among one of the duplicate matches in the direct merge method.