| Literature DB >> 16857050 |
Eric A Whitsel1, P Miguel Quibrera, Richard L Smith, Diane J Catellier, Duanping Liao, Amanda C Henley, Gerardo Heiss.
Abstract
BACKGROUND: Published studies of geocoding accuracy often focus on a single geographic area, address source or vendor, do not adjust accuracy measures for address characteristics, and do not examine effects of inaccuracy on exposure measures. We addressed these issues in a Women's Health Initiative ancillary study, the Environmental Epidemiology of Arrhythmogenesis in WHI.Entities:
Year: 2006 PMID: 16857050 PMCID: PMC1557664 DOI: 10.1186/1742-5573-3-8
Source DB: PubMed Journal: Epidemiol Perspect Innov ISSN: 1742-5573
Figure 1Location of the 3,615 addresses. EPA = United States Environmental Protection Agency Air Quality System monitors. NGS = United States National Geodetic Survey stations. WHI = Women's Health Initiative clinical trial participant residential parcels.
Characteristics of the 3,615 addresses
| Address Source | EPA | 2,522 (70) |
| WHI | 1,050 (29) | |
| NGS | 43 (1) | |
| Address Typea | Complete | 2,808 (78) |
| No Street Number | 460 (13) | |
| Intersection | 347 (10) | |
| Zip Code | Absent | 2,359 (65) |
| Present | 1,256 (35) | |
| Edit | Unedited | 1,533 (42) |
| Minor | 1,392 (39) | |
| Major | 690 (19) | |
| Densityb | persons/km2 | 1,066 (2,645) |
| Original Datumc | NAD83 or WGS84 | 1,615 (45) |
| Unknown | 1,274 (35) | |
| NAD27 | 726 (20) |
aComplete = street number, name, city and state present; No Street Number = street name, city and state present; Intersection = crossing street names, city and state present. b33rd and 67th percentiles = 221 and 920 persons/km2. cOf associated coordinates: NAD83 and NAD27 = North American Datum of 1983 and 1927; WGS84 = World Geodetic System of 1984.
Characteristics of the four vendors
| A | Yes | 40 ft | Yes | 2002 | 2004 | Yes | 4×/yr | WGS84 | No |
| B | No | 5 ft | Yes | 2002 | 2004 | Yes | 4×/yr | NAD83 | No |
| C | Yes | 50 ft | No | 2002 | 2004 | Yes | 6×/yr | NAD83 | Yes |
| D | No | 0 ft | No | 2002 | 2003 | No | 2×/yr | NAD83 | No |
aOf assigned coordinates: NAD83 = North American Datum of 1983. WGS84 = World Geodetic System of 1984. bAfter initial processing by geocoding software. CASS = Address standardization certified by the United States Postal Service National Customer Support Center Certification Program, Coding Accuracy Support System. TIGER = Topologically Integrated Geographic Encoding and Referencing (TIGER/Line®) file. USPS = United States Postal Service files e.g. the city-state, ZIP+4® and ZIPMove products.
Accuracy of geocodes assigned by the four vendors
| A | 98% | 79% | 20% | 77% | 85% | 99% | 1809 (8790) |
| B | 82% | 78% | 4% | 83% | 88% | 99% | 748 (4611) |
| C | 81% | 77% | 4% | 81% | 87% | 99% | 704 (4418) |
| D | 30% | 30% | 0% | 97% | 98% | 100% | 228 (884) |
aDue to rounding, may differ from the sum of street- and centroid-type match rates.
bGeographic or delivery-weighted center of a statistical tabulation area, e.g. U.S. Census tract. cSpherical distance in meters between criterion standard and vendor-assigned coordinates (mean [standard deviation]).
Figure 2Distribution of the spherical distance in meters (. Column I: Scatterplots in which Xs and center points represent vendor-assigned and criterion standard coordinates, respectively. Columns II and III: Normalized frequency histograms before (II) and after (III) log-transformation. Columns I and II exclude outlying values to allow equal cross-vendor scaling of axes in meters. n = sample size. sd = standard deviation.
Overall match rate, census tract concordance and ρ, by address and match characteristics
| Address Source | EPA | 62% | 47% | 1,619 (7,904) |
| NGS | 88% | 72% | 1,125 (3,711) | |
| WHI | 98% | 97% | 159 (409) | |
| Address Type | No Street Number | 28% | 8% | 5,111 (6,150) |
| Intersection | 60% | 43% | 1,259 (6,270) | |
| Complete | 82% | 73% | 793 (6,063) | |
| Zip Code | Absent | 60% | 45% | 1,609 (8,205) |
| Present | 96% | 92% | 376 (1,634) | |
| Edit | Major | 59% | 45% | 2,622 (10,029) |
| Minor | 70% | 58% | 828 (3,833) | |
| Unedited | 81% | 73% | 688 (5,877) | |
| Densityb (persons/km2) | Rural, 0–221 | 65% | 54% | 2,069 (8,280) |
| Suburban, 222–920 | 79% | 71% | 566 (6,172) | |
| Urban, ≥ 920 | 74% | 60% | 485 (2,319) | |
| Datumc | Unknown | 60% | 43% | 1,600 (8,612) |
| NAD27 | 64% | 51% | 1,475 (6,619) | |
| NAD83 or WGS84 | 87% | 81% | 590 (3,961) | |
| Match Type | Centroid | 100% | 34% | 5,331 (9,207) |
| Street | 100% | 90% | 607 (5,577) |
aSpherical distance in meters between criterion standard and vendor-assigned coordinates (mean [standard deviation]). bStratified at the 33rd and 67th percentiles. cOriginal datum of coordinates. NAD27 and NAD83 = North American Datum of 1927 and 1983. WGS84 = World Geodetic System of 1984.
Spherical distance in meters (ρ) between criterion standard and vendor-assigned coordinates (mean [standard deviation]), by match type and vendor
| Street | A | 293 (564) | 272 (476) | 280 (492) | NA |
| B | 287 (545) | 262 (438) | 268 (447) | NA | |
| C | 288 (551) | 266 (456) | 275 (471) | NA | |
| Centroid | A | 6,375 (10,437) | 6,194 (9,473) | 5,630 (8,576) | 5,497 (8,345) |
| B | 4,854 (27,279) | 3,663 (15,948) | 4,230 (18,730) | 4,303 (19,185) | |
| C | 5,524 (34,703) | 3,298 (13,068) | 3,900 (15,943) | 4,210 (17,638) | |
aFor address source, type, zip code, edit, population density (persons/km2) and datum.
bAlso adjusted for within-address correlation of . cAdditionally adjusted for among-vendor heteroscedasticity of (see methods). NA = not applicable.
Odds ratios (95% confidence intervals) for overall address match and census tract concordance, by vendor
| A | 12 (9, 15) | 66 (47, 93) | 0.8 (0.7, 0.9) | 1.0 (0.9, 1.2) |
| B | 1.1 (0.9, 1.2) | 1.1 (0.9, 1.3) | 1.1 (0.9, 1.2) | 1.1 (0.9, 1.3) |
| C | 1.0 | 1.0 | 1.0 | 1.0 |
aAdjusted for address source, type, zip code, edit, population density, and datum. bAlso adjusted for match type.
Effect of mean ρa on classification of distance to the nearest highwayb, exposure misclassification ratesc and census tract concordanced
| Street | 0 | 27% | 0% | 0% | 0% | 100% |
| 150 | 29% | 8% | 6% | 15% | 90% | |
| 300 | 26% | 11% | 11% | 22% | 82% | |
| 600 | 27% | 15% | 14% | 29% | 70% | |
| Centroid | 0 | 32% | 0% | 0% | 0% | 100% |
| 2,500 | 19% | 9% | 22% | 31% | 66% | |
| 5,000 | 16% | 9% | 25% | 33% | 55% | |
| 10,000 | 14% | 8% | 26% | 34% | 42% | |
aSpherical distance in meters between criterion standard and vendor-assigned coordinates. Standard deviation of = 500 and 15,000 meters for street- and centroid-type matches, respectively. bInterstate, U.S., or state highway or major traffic thoroughfare. cFalse + indicates misclassification of the unexposed (≥ 100 m) as exposed (< 100 m). False – indicates misclassification of the exposed as unexposed. The sum of false + and – error rates may not equal the total error rate due to rounding. dPercent of census tracts matching those in the datasets without positional error (= 0). Based on a 5% random sample of street-type address matches (n = 2,608) and a census of centroid-type address matches (n = 2,671) in The Environmental Epidemiology of Arrhythmogenesis in WHI, 1999–2002.
Cell counts from a hypothetical case-control study of the association between distance to the nearest highway and coronary heart disease mortality
| < 100 m | a* = 88 | b* = 108 |
| ≥ 100 m | c* = 137 | d* = 294 |
OR* = (a* × d*) ÷ (b* × c*) = 1.8