| Literature DB >> 26370237 |
Christian A Klaus1, Luis E Carrasco2, Daniel W Goldberg3,4, Kevin A Henry5, Recinda L Sherman6.
Abstract
BACKGROUND: The utility of patient attributes associated with the spatiotemporal analysis of medical records lies not just in their values but also the strength of association between them. Estimating the extent to which a hierarchy of conditional probability exists between patient attribute associations such as patient identifying fields, patient and date of diagnosis, and patient and address at diagnosis is fundamental to estimating the strength of association between patient and geocode, and patient and enumeration area. We propose a hierarchy for the attribute associations within medical records that enable spatiotemporal relationships. We also present a set of metrics that store attribute association error probability (AAEP), to estimate error probability for all attribute associations upon which certainty in a patient geocode depends.Entities:
Mesh:
Year: 2015 PMID: 26370237 PMCID: PMC4570180 DOI: 10.1186/s12942-015-0019-3
Source DB: PubMed Journal: Int J Health Geogr ISSN: 1476-072X Impact factor: 3.918
Person, place and time attribute associations in patient medical records
| Descriptive epidemiology concept | Attribute association description | Core attributes from chronic disease record | Questions relevant to probability of error in attribute association | Association |
|---|---|---|---|---|
| Person | Patient identifying fields | Patient names, date of birth, government issued ID | What is the probability that the correct patient was not identified? | 1 |
| Time | Patient: date of diagnosis | Patient, date of diagnosis, diagnostic confirmation | What is the probability that the correct date of diagnosis was not identified? | 2 |
| Place | Patient: address at diagnosis | Patient, date of diagnosis, address, postal code, and postal locality or city | 1. What is the probability that the correct address of patient primary residence was not identified? | 3 |
Patient geocode and enumeration area attribute associations in patient medical records
| Attribute association description | Core attributes from chronic disease record | Questions relevant to probability of error in attribute association | Association |
|---|---|---|---|
| Patient: geocode | Patient identifying fields, date of diagnosis, address, postal code, postal locality and geocode | What is the probability that the wrong set of coordinates (and by extension, residence) was chosen during geocoding for the patient? | 4 |
| Patient: enumeration area (EA) feature | Patient identifying fields, date of diagnosis, address, postal code, postal locality, geocode, county, sub county enumeration area | What is the probability that the wrong enumeration area was assigned to the patient? | 5 |
Fig. 1Illustration of hierarchical nature of attribute associations related to geocoding of cancer cases
Fig. 2Summary of hierarchical nature of attribute associations related to geocoding of cancer cases
Centralized databases that disease registry cases are commonly linked to
| Number | Name | Description | Purpose of Linkage |
|---|---|---|---|
| 1 | Admission records (admission table in CCR database) | One or more admission records are consolidated to generate a single tumor level record | Record consolidation |
| 2 | Medicaid | Medicaid claims data, generally for a rolling 5 year period | Casefinding; patient attribute confirm/update |
| 3 | Hospital discharge | Discharge sheets submitted by hospitals to state health authorities | Casefinding |
| 4 | Rapid case ascertainment | Databases of smaller subsets of patients who meet criteria for enrolling in research studies | Casefinding; patient attribute confirm/update |
| 5 | Dept. of Motor Vehicles Driver’s License Data | US state database storing the demographic and other attributes of drivers | Patient attribute confirm/update |
| 6 | Board of Elections’ Voter Registration Data (BOE-VR) | US state or county database storing demographic and other attributes of voters | Patient attribute confirm/update |
| 7 | National Death Index | List of deceased persons, aggregated across US states | Patient attribute confirm/update |
| 8 | State Death Registry | List of deceased persons, aggregated across counties in a state | Casefinding; patient attribute/update |
| 9 | Social Security Death Index(SSDI) | List of deceased persons, aggregated across counties in a state, with US Social Security Numbers | Casefinding; patient attribute confirm/update |
| 10 | Government Postal Address Database | Used to clean addresses, or diagnose address problems | Patient attribute confirm/update |
| 11 | GIS Street Centerlines | Digital linework corresponding to center of US streets, with address and postal code attributes. Used for navigation and emergency response | Geocoding |
| 12 | GIS parcels | Digital parcel polygons corresponding to property ownership, with some site address attributes. Used for tax assessment | Geocoding |
| 13 | GIS address points | Digital points, generally corresponding to a primary residence within a parcel. Used for navigation and emergency response | Geocoding |
| 14 | Census enumeration area polygons | Digital polygons, approximating delineation of census enumeration areas (EA). Used for spatial overlay to assign EA | Assignment of EA to patient record |
| 15 | Census enumeration area table | Table that can be joined to using geocoding reference identifiers, to assign enumeration area | Assignment of EA to patient record |
Fig. 3Entity spatiotemporal relationship enabling attribute association (ESTREAA) hierarchy categories selected for attribute associations in Tables 1 and 2. White background indicates AAEP evaluation based on record linkage. Grey indicates AAEP propagating from a smaller to larger attribute association. AA Attribute Association
Descriptions and examples of entity spatiotemporal relationship enabling attribute association (ESTREAA) hierarchy categories
| ESTREAA hierarchy category | Description/example | #(%) of cases in case study sample |
|---|---|---|
| 1 | No AAEP detected (AAEP = 0), or AAEP is null, in all associations | 12,791 (95.69 %) |
| 2 | AAEP detected in association 4, but contained spatially to enumeration area of interest. An example is a case with a missing house number, on a street wholly contained within the enumeration area of interest | 27 (0.20 %) |
| 3 | For these cases associations 1–4 are free of AAEP, but association 5 is not. Examples include cases where a county boundary intersects a parcel (found in eastern seaboard states of US), or for which there is disagreement about what enumeration area a parcel belongs to, between local and national government agencies, for example, between US counties and the US Census Bureau, regarding the correct county | 0 |
| 4 | AAEP in association 4 causes AAEP in association 5. An example is a case whose address matches to postal code only. The postal code area overlaps with the enumeration area but is not coincident with it | 185 (1.38 %) |
| 5 | AAEP in association 3 propagates into associations 4 and 5. An example is a case with a ‘Multimatch’ Address: patient address contains error in more than one address component and matches to more than one candidate based on which component is edited. Another example: patient owns more than one residence and primary residence cannot be determined | 74 (0.57 %) |
| 6 | AAEP in association 3 propagates into association 4, but uncertainty is contained spatially in one enumeration area of interest so association 5 remains free of AAEP | 0 |
| 7 | AAEP in patient date of diagnosis (association 2) does not impact the choice of address at diagnosis (association 3). Examples include cases for which date of diagnosis is unknown (death certificate only cases) but patient has never changed residence in his/her lifetime. Another example is a clinically diagnosed case for which year of diagnosis is known and month and day is uncertain, but does not affect the choice of patient address at diagnosis | 13 (0.097 %) |
| 8 | Patient date of diagnosis (association 2) is uncertain, but this does not affect association 3. Association 4 has AAEP based on record linkage, but, similar to category 2, it does not propagate into association 5 | 0 |
| 9 | Patient date of diagnosis (association 2) is uncertain, and this affects the choice of address at diagnosis (association 3), which affects the confidence in the geocode, which affects the confidence in the enumeration area. Examples include death certificate only cases where date of diagnosis is unknown | 276 (2.06 %) |
| 10 | Cases where patient is positively identified, but all other associations have AAEP except association 5. An example is a patient whose date of diagnosis is uncertain, which propagates AAEP into associations 3 and 4, however the uncertainty is contained spatially to enumeration area of interest, and so association 5 is free of AAEP | 0 |
| 11 | Cases where patient is not positively identified (association 1), and AAEP from that association propagates into all other associations | 0 |
How ESTREAA hierarchy category might be assigned, and AAEP might be estimated for selected circumstances
| Example | Attribute association | Circumstance | How AAEP was estimated |
|---|---|---|---|
| 1 | 1. Patient identifying fields | Patient lacks government issued ID and address, and patient names and date of birth match to three individuals in external data | The patient could be three different persons matched to in various sources. The AAEP is calculated as 1 − (1/3) or 0.666 |
| 2 | 2. Patient-date of diagnosis | Patient diagnosis year known, but month and day unknown | One day out of 365 is chosen, thus the probability of choosing the wrong day is 1/365. AAEP = 1 − (1/365) = 0.997 |
| 3 | 3. Patient: date of diagnosis-address | Patient address is missing house number | Patient address matches to the address featuresa of 20 residences on one street. AAEP = 1 − (1/20), or 0.95 |
| 4 | 3. Patient: date of diagnosis-address | Patient address is missing prefix direction. To confirm that address is valid, it is matched to USPS ZIP + 4 database | Patient address matches to 2 addresses in USPS ZIP + 4 database. AAEP = 1 − (1/2) or 0.5 |
| 5 | 3. Patient: date of diagnosis-address | Error suspected in more than one component of patient address (‘multimatch’ address). Patient address can be matched to 12 different address features in geographic reference data depending on which address component(s) are edited | AAEP = 1 − (1/12) or 0.916 |
| 6 | 3. Patient: date of diagnosis-address | Address at diagnosis cannot be geocoded. Patient address history unknown or incomplete. Patient address identified via linkage to external source on patient name and date of birth, and used to match to geographic reference data with one to one match. Date of diagnosis was not spanned by duration of address validity in external data source | AAEP estimated at 0.25 based on best available information about error rate of external data source |
| 7 | 3. Patient: date of diagnosis-address | Patient has PO Box address. Patient address history unknown or incomplete. Patient names and PO Box address match to owner names and mailing addresses of 4 parcels, whose sale dates precede the date of diagnosis | There are 4 possible addresses and only one is chosen. AAEP = 1 − (1/4) = 0.75 |
| 8 | 3. Patient: date of diagnosis-address | Patient year of diagnosis known. Patient day and month of diagnosis is unknown. Patient address history for year of diagnosis is known. During that time patient lived at 3 addresses in sequence for 0.4, 0.1, and 0.5 % of the year; the first address is chosen | AAEP = 1 − 0.4 or 0.6 % |
| 9 | 4. Patient: date of diagnosis-address-geocode | Patient address matches to a street in geographic reference data with 21 address features that are missing house numbers | Patient address matches to address features of 21 residences on one street. AAEP = 1 − (1/21), or 0.952 |
| 10 | 4. Patient-date of diagnosis-address-geocode | Patient street address could not be matched to street level geographic reference data. Patient postal code matched to postal code area centroid | Postal code encompasses 13,500 address features. AAEP = 1 − (1/13,500) = 0.999 |
| 11 | 5. Patient-date of diagnosis-address-geocode-enumeration area | Patient address lacks a house number. Street to which patient address is geocoded is contained within 1 enumeration area | Because all potential matches are contained within the chosen enumeration area, AAEP = 0 |
| 12 | 5. Patient-date of diagnosis-address-geocode-enumeration area | Patient address lacks a house number; there are 70 address feature matching candidates. The area of uncertainty that contains the potential matches spans 2 enumeration areas. These contain 20 and 50 candidate address features; the latter enumeration area is chosen | AAEP = 1 − (50/70) = 0.285 |
| 13 | 5. Patient-date of diagnosis-address-geocode-enumeration area | Patient street address could not be matched to street level geographic reference data. Patient postal code matched to postal code area centroid. Postal code area spans 4 enumeration areas, which contain 2160, 1620, 1620 and 6750 address features respectively; the latter enumeration area is chosen | Postal code encompasses 12,150 address features. AAEP is 1 − (6750/12,150) = 0.444 |
| 14 | 5. Patient-date of diagnosis-address-geocode-enumeration area | Both patient address and postal code are unmatched in geographic reference data. County centroid is assigned as geocode | AAEP is 1 − (1/395,909 address features in county) or 0.999 |
aEmergency dispatch address features as published by county or city data authors
Address validation and geocoding success for cases in wake county, 2008–2012
| Case count | Address validation successa | Batch geocoded | Interactively geocoded | Geocoding success rate (street level) | Geocoding success rate (postal code level) | Geocoding success rate (county level only) |
|---|---|---|---|---|---|---|
| 13,366 | 87.2 % | 95.8 % | 3.6 % | 96.9 % | 1.4 % | 0.5 % |
aPercentage of addresses matched to the USPS ZIP + 4 database prior to geocoding
Percentages of cases with AAEP in Wake County, 2008–2012, for Four NC medical facilities
| Medical Facility | Patient identifying fields (%) | Patient-diagnosis date (%) | Patient address at diagnosis date (%) | Patient address at diagnosis date-geocode (%) | Patient-enumeration area at diagnosis date (US Census Tract, 2010 Census) (%) |
|---|---|---|---|---|---|
| 1 | 0 | 0.9 | <0.1 | 1.08 | 0.9 |
| 2 | 0 | 0.38 | <0.1 | 2.57 | 2.18 |
| 3 | 0 | 0.71 | 0 | 0.71 | 0.57 |
| 4 | 0 | 0.25 | 0.13 | 1.16 | 0.58 |
Percentages of selected patient records (2008–2012) with enumeration area discordance for patient address, by geographic reference data set used for batch geocoding, in Wake County
| City or county maintained parcel centroids and/or address points, 2009 | Street centerline (state of NC maintained) 2007 | TIGER street centerlines 2010 | City or county maintained Parcel centroids and/or address points, 2011 | |
|---|---|---|---|---|
| City or county maintained parcel centroids and/or address points, 2009 | <0.1 % | 0.1 % | <0.1 % | |
| Street centerline (state of NC maintained) 2007 | <0.1 % | 0.6 % | <0.1 % | |
| TIGER street centerlines 2010 | 0.1 % | 0.6 % | <0.1 % | |
| City or county maintained parcel centroids and/or address points, 2011 | <0.1 % | <0.1 % | <0.1 % |