| Literature DB >> 20161766 |
Frank C Curriero1, Martin Kulldorff, Francis P Boscoe, Ann C Klassen.
Abstract
BACKGROUND: The importance of geography as a source of variation in health research continues to receive sustained attention in the literature. The inclusion of geographic information in such research often begins by adding data to a map which is predicated by some knowledge of location. A precise level of spatial information is conventionally achieved through geocoding, the geographic information system (GIS) process of translating mailing address information to coordinates on a map. The geocoding process is not without its limitations, though, since there is always a percentage of addresses which cannot be converted successfully (nongeocodable). This raises concerns regarding bias since traditionally the practice has been to exclude nongeocoded data records from analysis. METHODOLOGY/PRINCIPALEntities:
Mesh:
Year: 2010 PMID: 20161766 PMCID: PMC2818716 DOI: 10.1371/journal.pone.0008998
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The 1990 Census geography for Maryland zip code 21237.
Shown are the 13 Census tracts and 2 counties (Baltimore County and Baltimore City) associated with zip code 21237. Census tract 24005450100 is highlighted with one of its block groups (240054501003) and one of its blocks (24005450100315) identified.
Figure 2Maryland county level percent nongeocodes and urban/rural status.
Maryland county level percent of nongeocoded case records (ratio of nongeocoded cases to the total nongeocoded plus geocoded cases) based on the geocoded 1992–1997 Maryland prostate cancer data subset (). Also shown on the insert map is the 1990 US Census county level urban/rural categorization from the most urban (Baltimore Region) to most rural (Eastern Shore).
Imputation results for Strategy 3 at the county level stratified by urban/rural geographic region.
| Geographic | Case Enumerations | Imputation | Ratio | |||
| Region | True | Geocoded | Nongeocoded | Imputed | Interval | Imputed/True |
| Baltimore Region | ||||||
| 468 | 246 | 222 | 472.2 | (464, 480) | 1.01 | |
| 475 | 288 | 187 | 472.5 | (444, 481) | 0.99 | |
| 583 | 341 | 242 | 586.4 | (580, 592) | 1.01 | |
| 1344 | 735 | 609 | 1340.2 | (1333, 1347) | 1.00 | |
| 2788 | 2001 | 787 | 2824.9 | (2806, 2843) | 1.01 | |
| 3081 | 2032 | 1049 | 3044.6 | (3025, 3064) | 0.99 | |
| Suburban Washington | ||||||
| 388 | 187 | 201 | 389.7 | (383, 397) | 1.00 | |
| 1880 | 1241 | 639 | 1879.8 | (1875, 1885) | 1.00 | |
| 2366 | 1579 | 787 | 2363.9 | (2359, 2368) | 1.00 | |
| Southern Maryland | ||||||
| 62 | 16 | 46 | 63.3 | (61, 65) | 1.02 | |
| 117 | 54 | 63 | 115.9 | (113, 118) | 0.99 | |
| 185 | 71 | 114 | 184.7 | (182, 187) | 1.00 | |
| Western Maryland | ||||||
| 38 | 6 | 32 | 38.0 | (36, 41) | 1.00 | |
| 279 | 154 | 125 | 279.0 | (276, 281) | 1.00 | |
| 356 | 222 | 134 | 355.2 | (352, 358) | 1.00 | |
| Eastern Shore | ||||||
| 42 | 12 | 30 | 37.3 | (33, 41) | 0.89 | |
| 51 | 3 | 48 | 49.9 | (47, 52) | 0.98 | |
| 88 | 20 | 68 | 88.0 | (87, 88) | 1.00 | |
| 97 | 49 | 48 | 101.8 | (98, 106) | 1.05 | |
| 139 | 96 | 43 | 139.0 | (139, 139) | 1.00 | |
| 158 | 89 | 69 | 158.0 | (158, 158) | 1.00 | |
| 160 | 38 | 122 | 160.0 | (160, 160) | 1.00 | |
| 185 | 79 | 106 | 186.9 | (184, 189) | 1.01 | |
| 195 | 90 | 105 | 195.0 | (195, 195) | 1.00 | |
Presented for each of the 24 Maryland counties are True total case enumerations with corresponding totals derived from those that were labeled as geocoded and those that were labeled as nongeocoded followed by an imputed total and a 95% multiple imputation interval computed using Strategy 3. Units for all results are number of cases. Imputation intervals are starred when they contained the true total. The ratio of imputed number cases to the true number cases is also listed. Results based on the geocoded Maryland Prostate cancer data split into the experimental geocodes and experimental nongeocodes subsets for evaluation.
Results for Imputation Strategies 1, 2, and 3 at the Census county, tract, and block group level.
| Imputation Approach | |||
| Spatial Scale | Strategy 1 | Strategy 2 | Strategy 3 |
|
| |||
| % Covered | 25.0% | 83.3% | 83.3% |
| Avg Interval Width | 15.9 | 10.0 | 9.4 |
|
| |||
| % Covered | 80.0% | 86.2% | 90.5% |
| Avg Interval Width | 7.5 | 6.9 | 6.7 |
|
| |||
| % Covered | 93.7% | 94.0% | 95.8% |
| Avg Interval Width | 4.1 | 3.9 | 3.8 |
Presented are the percentage of times the 95% multiple imputation intervals contained the true total case enumerations and the width of the imputation based intervals (in units of number of cases), averaged across the 24 Maryland counties, 1151 tracts, and 3670 block groups. Results based on the geocoded Maryland Prostate cancer data split into the experimental geocodes and experimental nongeocodes subsets for evaluation.
Results for Imputation Strategies 1, 2, and 3 at the Census tract level stratified by urban/rural geographic region.
| Geographic | Tract Level Imputation | ||
| Region | Strategy 1 | Strategy 2 | Strategy 3 |
| Baltimore Region | |||
| % Covered | 80.9% | 86.3% | 92.5% |
| Avg Width | 7.9 | 7.4 | 7.2 |
| Suburban Washington | |||
| % Covered | 82.5% | 84.9% | 90.1% |
| Avg Width | 7.1 | 6.7 | 6.5 |
| Southern Maryland | |||
| % Covered | 84.8% | 93.5% | 91.3% |
| Avg Width | 6.5 | 5.5 | 5.0 |
| Western Maryland | |||
| % Covered | 78.7% | 83.6% | 85.2% |
| Avg Width | 7.4 | 6.6 | 6.5 |
| Eastern Shore | |||
| % Covered | 65.7% | 89.2% | 85.2% |
| Avg Width | 7.0 | 5.9 | 5.5 |
Presented are the percentage of times the 95% multiple imputation intervals contained the true total case enumerations and the width of the imputation based intervals (in units of number of cases), both averaged across the tracts within each region. Results based on the geocoded Maryland Prostate cancer data split into the experimental geocodes and experimental nongeocodes subsets for evaluation.