| Literature DB >> 17367520 |
Gina S Lovasi1, Jeremy C Weiss, Richard Hoskins, Eric A Whitsel, Kenneth Rice, Craig F Erickson, Bruce M Psaty.
Abstract
BACKGROUND: Geocoding methods vary among spatial epidemiology studies. Errors in the geocoding process and differential match rates may reduce study validity. We compared two geocoding methods using 8,157 Washington State addresses. The multi-stage geocoding method implemented by the state health department used a sequence of local and national reference files. The single-stage method used a single national reference file. For each address geocoded by both methods, we measured the distance between the locations assigned by each method. Area-level characteristics were collected from census data, and modeled as predictors of the discordance between geocoded address coordinates.Entities:
Mesh:
Year: 2007 PMID: 17367520 PMCID: PMC1838410 DOI: 10.1186/1476-072X-6-12
Source DB: PubMed Journal: Int J Health Geogr ISSN: 1476-072X Impact factor: 3.918
Area characteristics for geocoded addresses
| N = 7686 | N = 372 | N = 40 | |
| Parcel data available, % | 63% | 48% | 55% |
| Local street data available, % | 59% | 66% | 35% |
| Density, median, population/km2 | |||
| County | 111 | 91 | 21 |
| Census tract | 980 | 1177 | 60 |
| Census block group | 1183 | 1280 | 133 |
| Percent poverty, median | |||
| County | 9% | 11% | 14% |
| Census tract | 9% | 12% | 10% |
| Census block group | 8% | 12% | 10% |
Area characteristics were based on the multi-stage geocoded address coordinates when available (N = 8,058) and the single-stage geocoded address coordinates otherwise (N = 40).
Figure 1Distance and directional bias between geocoded address coordinates for multi-stage and single-stage geocoding methods: This figure shows one dot for each address geocoded by both methods, with reference circles at 0.5 and 1.0 km. The multi-stage geocoded address coordinates was centered as a reference, and the dots used to show the relative position of the single-stage address coordinates for the same address. Dots close to the middle (0,0) represent small discrepancy-distances and high concordance between the two methods. Dots directly above the center had single-stage geocoded address coordinates further north than their multi-stage coordinates. Dots randomly scattered in all directions would indicate no directional bias, whereas an off-center cluster of dots would indicate systematic bias between the two methods. Addresses with discrepancy distances greater than 2 km were not included in this figure (N = 78).
Figure 2Directional bias between geocoding methods: This figure shows an angular histogram with radial lengths proportional to frequencies of shifts in each direction between the single-stage and multi-stage geocoded address coordinates. A light circle is drawn at the mean frequency for reference.
Distance between locations for the same address assigned by two geocoding methods, by area characteristics
| 7,686 | 160 (140, 179) | 49 (48, 51) | 54 | 180 | 296 | 2218 | 1.8% | ||
| ≥ 20% | 1,027 | 156 (77, 234) | 28 (25, 30) | 26 | 144 | 258 | 1370 | 1.4% | |
| 10 to 19% | 2,248 | 169 (135, 203) | 46 (44, 49) | 48 | 183 | 310 | 3052 | 2.2% | |
| < 10 % | 4,411 | 156 (132, 180) | 59 (57, 61) | 62 | 187 | 300 | 1873 | 1.7% | |
| ≥ 1000 | 3,770 | 106 (83, 129) | 44 (43, 46) | 50 | 134 | 174 | 717 | 0.8% | |
| 500 – 999 | 1,504 | 141 (108, 174) | 50 (47, 53) | 55 | 175 | 269 | 1970 | 1.6% | |
| 200 – 499 | 931 | 213 (140, 286) | 49 (45, 54) | 52 | 202 | 333 | 5593 | 2.5% | |
| < 200 | 1,481 | 281 (218, 345) | 66 (61, 71) | 70 | 401 | 779 | 5548 | 4.1% | |
| Local Parcels | 4,644 | 152 (128, 176) | 61 (59, 62) | 63 | 172 | 258 | 1704 | 1.5% | |
| Local Roads | 1,075 | 136 (107, 165) | 52 (48, 56) | 53 | 193 | 333 | 1667 | 2.0% | |
| TIGER-based | 1,967 | 191 (140, 242) | 30 (28, 32) | 25 | 211 | 414 | 4028 | 2.5% | |
Discrepancy-distance indicates distance between multi-stage and single-stage geocoded address coordinates for the same address; p50, p90, p95, and p99 indicate the 50th, 90th, 95th, and 99th percentiles; TIGER indicates Topologically Integrated Geographic Encoding and Referencing system line files, this includes NAVTEQ and Dynamap reference files
* Indicates significance (p < 0.05) for a comparison of discrepancy-distances across subgroups using a linear regression model
Figure 3Discrepancy-distance distributions by category of (a) density and (b) poverty: This figure shows a smoothed kernel density (similar to a histogram) for discrepancy-distances by category. Large discrepancy-distances indicate disagreement between the single-stage and multi-stage geocoding methods. Density is categorized using the number of residents per square kilometer in the census tract. Poverty is categorized using the percent of residents below the poverty line in the census tract.
Multi-variable regression model of discrepancy-distance
| Two-fold increase | 0.87 (0.85 – 0.91) | |
| Two-fold increase | 0.90 (0.88 – 0.91) | |
| Local Parcels | 1.00 (reference) | |
| Local Roads | 0.89 (0.82–0.96) | |
| TIGER-based | 0.47 (0.44–0.51) |
Discrepancy-distance indicates distance between multi-stage and single-stage coordinates for the same address; TIGER indicates Topologically Integrated Geographic Encoding and Referencing system line files, this includes NAVTEQ and Dynamap reference files
* Ratios lower than one indicate that a category or characteristic was associated with a smaller discrepancy-distance (closer agreement between the two geocoding methods)
Supplemental geocoding results using satellite images
| 40 | 372 | 1000 (a random sample) | |
| 24 (60) | 233 (63) | 962 (96) | |
| 10th percentile | 17 | . | 10 |
| 25th percentile | 57 | . | 19 |
| 50th percentile (median) | 172 | . | 38 |
| 75th percentile | 871 | . | 83 |
| 90th percentile | 3402 | . | 162 |
| Proportion > 1000, % | 25.0 | . | 1.5 |
| 10th percentile | . | 6 | 6 |
| 25th percentile | . | 8 | 11 |
| 50th percentile (median) | . | 30 | 32 |
| 75th percentile | . | 67 | 65 |
| 90th percentile | . | 153 | 125 |
| Proportion > 1000, % | . | 3.9 | 1.8 |