| Literature DB >> 17805429 |
Paul A Zandbergen1, Joseph W Green.
Abstract
BACKGROUND: The widespread availability of powerful tools in commercial geographic information system (GIS) software has made address geocoding a widely employed technique in spatial epidemiologic studies.Entities:
Keywords: bias; children; data quality; error sources; geocoding; geographic information systems (GIS); positional accuracy; schools; vehicle emissions
Mesh:
Substances:
Year: 2007 PMID: 17805429 PMCID: PMC1964899 DOI: 10.1289/ehp.9668
Source DB: PubMed Journal: Environ Health Perspect ISSN: 0091-6765 Impact factor: 9.031
Figure 1Locations of schools and major road network in Orange County, Florida (major roads with AADT ≥ 25,000 vehicles per day).
Figure 2Cumulative distribution functions of positional error in geocoded locations of schools in Orange County, Florida (n = 126).
Summary statistics for the positional error (in meters) of geocoded locations of schools (n = 126) in Orange County, Florida, using four different techniques.
| Statistics | Centerlines | TIGER | Firm A | Firm B |
|---|---|---|---|---|
| Mean | 219 | 351 | 300 | 461 |
| Median | 155 | 178 | 153 | 151 |
| Standard deviation | 272 | 604 | 602 | 2,330 |
| Minimum | 50 | 49 | 48 | 39 |
| Maximum | 2,302 | 4,379 | 5,565 | 2,596 |
| 90th percentile | 211 | 271 | 238 | 218 |
| 95th percentile | 227 | 302 | 255 | 237 |
| 95% RMSE | 196 | 306 | 235 | 210 |
95% RMSE is the root mean square error of the error distribution after removing 5% outliers. It is more common to use the 100% RMSE, but for non-normally distributed data the removal of 5% outliers before determining the RMSE value produces a more robust accuracy statistic.
Figure 3Cumulative distribution functions of distance to major roads of school locations in Orange County, Florida (n = 126) mapped using five different techniques.
Bias and error in determining schools (n = 126) at risk based on proximity to major roads in Orange County, Florida.
| No. of schools within buffer zone
| Measures of agreement
| ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Geocoding type, buffer radius (m) | School buildings | Street geocoding | Confirmed positives | False negatives | False positives | Confirmed negatives | Prevalance (%) | False negatives (%) | False positives (%) | Sensitivity (%) | Specificity (%) |
| Street centerlines | |||||||||||
| 50 | 1 | 3 | 0 | 1 | 3 | 122 | 0.79 | 0.79 | 2.38 | 0.00 | 97.60 |
| 100 | 3 | 5 | 1 | 2 | 4 | 119 | 2.38 | 1.59 | 3.17 | 33.33 | 96.75 |
| 150 | 6 | 9 | 4 | 2 | 5 | 115 | 4.76 | 1.59 | 3.97 | 66.67 | 95.83 |
| 250 | 17 | 20 | 12 | 5 | 8 | 101 | 13.49 | 3.97 | 6.35 | 70.59 | 92.66 |
| 500 | 44 | 44 | 42 | 2 | 2 | 80 | 34.92 | 1.59 | 1.59 | 95.45 | 97.56 |
| 1,000 | 69 | 71 | 66 | 3 | 5 | 52 | 54.76 | 2.38 | 3.97 | 95.65 | 91.23 |
| TIGER roads | |||||||||||
| 50 | 1 | 9 | 0 | 1 | 9 | 116 | 0.79 | 0.79 | 7.14 | 0.00 | 92.80 |
| 100 | 3 | 15 | 1 | 2 | 14 | 109 | 2.38 | 1.59 | 11.11 | 33.33 | 88.62 |
| 150 | 6 | 20 | 4 | 2 | 16 | 104 | 4.76 | 1.59 | 12.70 | 66.67 | 86.67 |
| 250 | 17 | 29 | 13 | 4 | 16 | 93 | 13.49 | 3.17 | 12.70 | 76.47 | 85.32 |
| 500 | 44 | 46 | 40 | 4 | 6 | 76 | 34.92 | 3.17 | 4.76 | 90.91 | 92.68 |
| 1,000 | 69 | 70 | 67 | 2 | 3 | 54 | 54.76 | 1.59 | 2.38 | 97.10 | 94.74 |
| Commercial Firm A | |||||||||||
| 50 | 1 | 5 | 0 | 1 | 5 | 120 | 0.79 | 0.79 | 3.97 | 0.00 | 96.00 |
| 100 | 3 | 11 | 1 | 2 | 10 | 113 | 2.38 | 1.59 | 7.94 | 33.33 | 91.87 |
| 150 | 6 | 14 | 4 | 2 | 10 | 110 | 4.76 | 1.59 | 7.94 | 66.67 | 91.67 |
| 250 | 17 | 23 | 11 | 6 | 12 | 97 | 13.49 | 4.76 | 9.52 | 64.71 | 88.99 |
| 500 | 44 | 46 | 39 | 5 | 7 | 75 | 34.92 | 3.97 | 5.56 | 88.64 | 91.46 |
| 1,000 | 69 | 72 | 66 | 3 | 6 | 51 | 54.76 | 2.38 | 4.76 | 95.65 | 89.47 |
| Commercial Firm B | |||||||||||
| 50 | 1 | 3 | 0 | 1 | 3 | 122 | 0.79 | 0.79 | 2.38 | 0.00 | 97.60 |
| 100 | 3 | 8 | 2 | 1 | 6 | 117 | 2.38 | 0.79 | 4.76 | 66.67 | 95.12 |
| 150 | 6 | 12 | 5 | 1 | 7 | 113 | 4.76 | 0.79 | 5.56 | 83.33 | 94.17 |
| 250 | 17 | 22 | 10 | 7 | 12 | 97 | 13.49 | 5.56 | 9.52 | 58.82 | 88.99 |
| 500 | 44 | 48 | 43 | 1 | 5 | 77 | 34.92 | 0.79 | 3.97 | 97.73 | 93.90 |
| 1,000 | 69 | 72 | 67 | 2 | 5 | 52 | 54.76 | 1.59 | 3.97 | 97.10 | 91.23 |
Number of schools residing within the buffer radius as a percentage of all schools within the study area (n = 126).
Number of false negatives as a percentage of all schools within the study area (n = 126).
Number of false positives as a percentage of all schools within the study area (n = 126).
Number of confirmed positives as a percentage of all schools within the study area (n = 126).
Number of confirmed negatives as a percentage of all schools within the study area (n = 126).
Figure 4Examples of positional error in geocoding of school locations in Orange County, Florida. (A) Effect of driveway. (B) Misplacement along street segment. (C) Effect of parcel size. (D) Combined effects.