| Literature DB >> 28555477 |
Yolanda J McDonald1, Michael Schwind, Daniel W Goldberg, Amanda Lampley, Cosette M Wheeler.
Abstract
Geocoding is the science and process of assigning geographical coordinates (i.e. latitude, longitude) to a postal address. The quality of the geocode can vary dramatically depending on several variables, including incorrect input address data, missing address components, and spelling mistakes. A dataset with a considerable number of geocoding inaccuracies can potentially result in an imprecise analysis and invalid conclusions. There has been little quantitative analysis of the amount of effort (i.e. time) to perform geocoding correction, and how such correction could improve geocode quality type. This study used a low-cost and easy to implement method to improve geocode quality type of an input database (i.e. addresses to be matched) through the processes of manual geocode intervention, and it assessed the amount of effort to manually correct inaccurate geocodes, reported the resulting match rate improvement between the original and the corrected geocodes, and documented the corresponding spatial shift by geocode quality type resulting from the corrections. Findings demonstrated that manual intervention of geocoding resulted in a 90% improvement of geocode quality type, took 42 hours to process, and the spatial shift ranged from 0.02 to 151,368 m. This study provides evidence to inform research teams considering the application of manual geocoding intervention that it is a low-cost and relatively easy process to execute.Entities:
Mesh:
Year: 2017 PMID: 28555477 PMCID: PMC5978681 DOI: 10.4081/gh.2017.526
Source DB: PubMed Journal: Geospat Health ISSN: 1827-1987 Impact factor: 1.212
Figure 1Manual geocode correction tool interface.
Figure 2Prompt for new accuracy description.
Geocode quality types and descriptions ranked from most to least accurate and geocode quality types of the original and corrected dataset.
| Quality type | Description | Original quality type Total (N=784) % | Corrected quality type Total (N=784) % | ||
|---|---|---|---|---|---|
| Building centroid | Matched to the centroid of the building | 0 | 0.00 | 638 | 81.38 |
| Exact parcel centroid point | Matched to the centroid of the parcel | 194 | 24.75 | 44 | 5.61 |
| Address range interpolation | Uses information about the address number ranges to estimate the position of a numbered address | 386 | 49.23 | 79 | 10.08 |
| Street centroid | Matched to the centroid of the street | 0 | 0.00 | 18 | 2.29 |
| USPS zip centroid | Matched to the zip code area centroid | 204 | 26.02 | 4 | 0.51 |
| City centroid | Matched to the centroid of the city | 0 | 0.00 | 1 | 0.13 |
| State centroid | Matched to the centroid of the state | 0 | 0.00 | 0 | 0.00 |
USPS, United States Postal Service.
Geocode quality types of the original and corrected dataset and spatial shift improvement by each geocode quality type correction.
| Old geocode quality type | New geocode quality type | Total (N=703) | Spatial shift (m) | |||||
|---|---|---|---|---|---|---|---|---|
| N | % | Mean | Median | IQR (Q1, Q3) | Minimum | Maximum | ||
| Address range interpolation | Building centroid | 323 | 45.95 | 355.22 | 105.88 | (54.21, 221.96) | 3.49 | 33936.56 |
|
| ||||||||
| Address range interpolation | Exact parcel centroid | 10 | 1.42 | 253.77 | 72.32 | (42.75, 130.22) | 7.04 | 1904.97 |
|
| ||||||||
| Exact parcel centroid | Building centroid | 171 | 24.32 | 116.62 | 11.66 | (2.29, 27.25) | 0.02 | 8260.35 |
|
| ||||||||
| USPS zip centroid | Building centroid | 143 | 20.34 | 5070.82 | 3094.47 | (1446.09, 5455.60) | 191.04 | 54717.53 |
|
| ||||||||
| USPS zip centroid | Exact parcel centroid | 14 | 1.99 | 9903.80 | 5669.26 | (3036.69, 11614.65) | 871.14 | 41691.95 |
|
| ||||||||
| USPS zip centroid | Address range interpolation | 29 | 4.13 | 6581.60 | 3405.08 | (858.99, 12227.95) | 114.31 | 23920.18 |
|
| ||||||||
| USPS zip centroid | Street centroid | 13 | 1.85 | 22956.72 | 11708.03 | (3959.76, 20884.24) | 1734.06 | 151367.94 |
|
| ||||||||
| All corrections | 703 | 1963.18 | 113.81 | (24.64, 940.39) | 0.02 | 151367.94 | ||
USPS, United States Postal Service.
Geocode quality type change of N≥5;
IQR, interquartile range.
Figure 3Spatial shift from original geocode to corrected geocode.