| Literature DB >> 32806682 |
Ellen J Kinnee1, Sheila Tripathy2, Leah Schinasi2,3, Jessie L C Shmool4, Perry E Sheffield5, Fernando Holguin6, Jane E Clougherty2.
Abstract
Although environmental epidemiology studies often rely on geocoding procedures in the process of assigning spatial exposure estimates, geocoding methods are not commonly reported, nor are consequent errors in exposure assignment explored. Geocoding methods differ in accuracy, however, and, given the increasing refinement of available exposure models for air pollution and other exposures, geocoding error may account for an increasingly larger proportion of exposure misclassification. We used residential addresses from a reasonably large, dense dataset of asthma emergency department visits from all New York City hospitals (n = 21,183; 26.9 addresses/km2), and geocoded each using three methods (Address Point, Street Segment, Parcel Centroid). We compared missingness and spatial patterning therein, quantified distance and directional errors, and quantified impacts on pollution exposure estimates and assignment to Census areas for sociodemographic characterization. Parcel Centroids had the highest overall missingness rate (38.1%, Address Point = 9.6%, Street Segment = 6.1%), and spatial clustering in missingness was significant for all methods, though its spatial patterns differed. Street Segment geocodes had the largest mean distance error (µ = 29.2 (SD = 26.2) m; vs. µ = 15.9 (SD = 17.7) m for Parcel Centroids), and the strongest spatial patterns therein. We found substantial over- and under-estimation of pollution exposures, with greater error for higher pollutant concentrations, but minimal impact on Census area assignment. Finally, we developed surfaces of spatial patterns in errors in order to identify locations in the study area where exposures may be over-/under-estimated. Our observations provide insights towards refining geocoding methods for epidemiology, and suggest methods for quantifying and interpreting geocoding error with respect to exposure misclassification, towards understanding potential impacts on health effect estimates.Entities:
Keywords: exposure misclassification; geocoding error; geographic information systems (GIS); spatial analysis; spatial uncertainty; urban epidemiology
Mesh:
Year: 2020 PMID: 32806682 PMCID: PMC7459468 DOI: 10.3390/ijerph17165845
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Spatial distribution of the 21,183 residential addresses used in this analysis, aggregated to ZIP Code.
Figure 2Spatial patterns of missingness by 5-digit ZIP Code. Maps of (a) rates of missingness and (b) statistical clusters of high (neighboring ZIP Codes have similarly high rates) and low (neighboring ZIP Codes have similarly low rates) levels of missingness.
Figure 3Spatial patterns of distance error between methods. Maps of (a) spatial pattern of distance errors (meters) and (b) clusters of high and low distance errors between Street Segment and Address Point geocodes, and between Parcel Centroid and Address Point geocodes. High clusters are points with longer distances between geocoding methods and low clusters are points with shorter distances between geocoding methods.
Figure 4Spatial patterns of cardinal directional errors between methods. Maps of (a) Street Segment and (b) Parcel Centroid directional error compared to Address Points. Directional offset rose plots [49] show the number of observations by direction aggregated into 5-degree bins. Histograms show the distribution of compass angles by degree.
Figure 5Bland–Altman plots of pollutant exposure misclassification. The x-axis depicts the average concentration estimate, based on the Address Point and alternative geocode; the y-axis depicts the difference in concentration estimate from the Address Point value (“error”) using the alternative geocode. Orange points on maps indicate significant clusters of over-estimates (above 95% Confidence Limit (CL)); green points indicate significant clusters of under-estimates (below 95% CL).
Figure 6Spatial clusters of over- and under-estimates of NO2, by Census Tract poverty rates.
Mean NO2 (ppb) and poverty rate for over- and under-estimated NO2 exposure points *.
| Error Type | Frequency ( | Mean NO2 ( | Mean Percent Below Federal Poverty Level ( |
|---|---|---|---|
| Street Segment geocodes | 855 (4.0%) | 30.4 ± 4.18 | 25.9 ± 11.5 |
| Street Segment geocodes | 519 (2.5%) | 24.8 ± 3.0 | 26.9 ± 12.8 |
| Parcel Centroid geocodes | 426 (2.0%) | 28.9 ± 4.3 | 28.6 ± 12.2 |
| Parcel Centroid geocodes | 584 (2.8%) | 25.5 ± 3.0 | 30.9 ± 12.6 |
| Total Frequency ( | 2384 (11.3%) | 25.6 ± 3.2 | 28.1 ± 12.3 |
* Over- and under-estimated points include all points that are outside the Bland–Altman Confidence Limits.
Figure 7Geocoding spatial uncertainty surfaces, generated by interpolation of measured distance errors.
Address locator settings.
| Locator Setting | Address Point | Parcel Centroid | Street Segment |
|---|---|---|---|
| Style 1 | US Address–Single House | US Address–Single House | US Address–Dual Ranges |
| Reference data | NYC Open Data Address Points | NYC Tax Parcel polygons | TeleAtlas StreetMaps Premium 2013 |
| Minimum match score 2 | 85 | 85 | 85 |
| Minimum candidate score 2 | 75 | 75 | 75 |
| Spelling sensitivity 2 | 80 | 80 | 80 |
| Side Offset 2 | 0 | 0 | 20 |
1https://desktop.arcgis.com/en/arcmap/10.5/manage-data/geocoding/commonly-used-address-locator-styles.htm2https://desktop.arcgis.com/en/arcmap/10.5/manage-data/geocoding/geocoding-options-properties.htm#ESRI_SECTION1_69547BEEDFC94FD78DE5A5D93A3CD195.