| Literature DB >> 24207169 |
Daniel W Goldberg1, Morven Ballard, James H Boyd, Narelle Mullan, Carol Garfield, Diana Rosman, Anna M Ferrante, James B Semmens.
Abstract
BACKGROUND: Geocoding, the process of converting textual information describing a location into one or more digital geographic representations, is a routine task performed at large organizations and government agencies across the globe. In a health context, this task is often a fundamental first step performed prior to all operations that take place in a spatially-based health study. As such, the quality of the geocoding system used within these agencies is of paramount concern to the agency (the producer) and researchers or policy-makers who wish to use these data (consumers). However, geocoding systems are continually evolving with new products coming on the market continuously. Agencies must develop and use criteria across a number axes when faced with decisions about building, buying, or maintaining any particular geocoding systems. To date, published criteria have focused on one or more aspects of geocode quality without taking a holistic view of a geocoding system's role within a large organization. The primary purpose of this study is to develop and test an evaluation framework to assist a large organization in determining which geocoding systems will meet its operational needs.Entities:
Mesh:
Year: 2013 PMID: 24207169 PMCID: PMC3834528 DOI: 10.1186/1476-072X-12-50
Source DB: PubMed Journal: Int J Health Geogr ISSN: 1476-072X Impact factor: 3.918
Geocoding system quality metrics
| Percentage of all records capable of being geocoded | |
| Geographic levels of geocode match – building level, parcel level, street centroid level, postcode level, etc. and percentages of matchable geocodes at each level | |
| Frequency distribution of match scores for matchable geocodes | |
| Frequency distribution of distances between matchable geocodes and ground truth locations | |
| Frequency distribution of distances between the same geocode produced by multiple geocoding systems. | |
| Frequency distribution of administrative unit concordance between the same geocode produced by multiple geocoding systems. |
Geocoding system operational capabilities metrics
| User defined reference layers | Is it possible to use any available reference data | |
| Specialized address parsing | Add in support for new street types, named places | |
| Specialized matching algorithms | Consider neighboring areas for matches | |
| Customized feature hierarchies | Hierarchy based on organizational policy | |
| Operating system support | Windows, Unix/Linux/Solaris | |
| System/Workflow Integration | Into tools and systems used by the organization (eg SAS wrappers) | |
| Varying operational modes | Batch/interactive/manual | |
| Desktop Version | Standalone product for highly sensitive data | |
| In-house Server Version | Internal server for multiple users within agency firewall | |
| API Version | Vendor or custom-written code for off-site processing | |
| Spatial confidence values | Descriptions of the region size (geographic area) that a geocode output is known to fall within | |
| Input address/matched address concordance | Descriptions of which attributes of the input address were incorrect, incomplete, partially matched or not used in the matching process | |
| Automatic batch geocoding | The ability to process a data file of records using a single process | |
| Interactive review | The ability to perform manual review for non-matched records to attempt to determine a correct output geocode | |
| Alias tables | The ability to incorporate tables of named places, common synonyms for street address attributes | |
| Weighted centroids | The ability to bias the output location of a geocode based a known distribution of a characteristic of interest such as the distribution of population or specific subsets of a population in an area |
Geocoding system flexibility metrics
| Does the user have the ability to include his/her own custom reference data layers? Example – including one’s own parcel layer for a locality if it is available. | |
| Does the user have the ability to include his/her own custom address parsing rules? Example – including a parsing approach where the “St.” in “St. Patrick” is converted to “Saint” to provide higher match rates given a reference data source that has the term listed as “Saint”. | |
| Does the user have the ability to include his/her own custom matching rules? Example – Inspecting nearby postal codes for similarly named streets and providing a higher matching score for candidate match features that are found in adjacent postal codes and lower match scored for candidate match features found in non-adjacent postal codes. | |
| Does the user have the ability to include his/her own custom ordering of reference layers? Example – Adding the ability to search first in postal codes then in municipalities in urban regions (where postal codes are small and municipalities are big) and municipalities first then postal codes in rural regions (where municipalities are small and postal codes are large). |
Geocoding system integration metrics
| Does the system work on the operating system used by the organization? Example – Windows, Linux, Unix. | |
| Can the system be integrated into existing systems and workflows used by the organization? Example – A system that can be wrapped as a SAS component so it can be integrated into automated SAS data processing workflows already used by the organization. | |
| Does the system have the ability to geocode records in batch? Example – Uploading a large data set to a server and running the geocoding process over the whole file. | |
| Does the system have the ability to allow a user to interactively geocode records? Example – Displaying an interface that allows a user to geocode one record at a time. | |
| Does the system have the ability to allow a user to interactively geocode records that do not process correctly in batch mode? Example – Displaying an interface that lists records that did not match in batch processing and allows the user to research, correct, and re-geocode individual records one-by-one. |
Geocoding system interface metrics
| Does the system work on a desktop computer? | |
| Does the system work on a server? | |
| Does the system provide an API for which custom programs can be developed? |
Geocoding system cost metrics
| Price of licensing the software | |
| Additional costs for licensing reference data | |
| Cost of support contract | |
| Cost of full time equivalent support (in-house) | |
| Cost of full time equivalent development (in-house) | |
| Cost of full time equivalent maintenance (in-house) | |
| Cost of full time equivalent training (in-house) | |
| Full time equivalent specialization required (in-house) |
Geocoding system metadata metrics
| Does the system output spatial confidence intervals with each geocoded location? Example – Returning a buffer around the location within which the true geocode is known to be located | |
| Does the system return an indication of the similarity between the input address requested and the address of the geographic reference feature matched? Example – Providing a list of the input address attributes that matched or did not match the address attributes associated geographic reference feature used for interpolation |
Geocoding system capability metrics
| Does the system provide the ability to process a database of address records in batch mode? Example – Running the geocoding system over a database of records in a text file. | |
| Does the system provide an interface that allows a user to review address records that do not match on a case-by-case basis? Example – Providing a graphical user interface (GUI) that allows a user to review geocoded results, make corrections and re-geocode. | |
| Does the system provide the ability to add address alias tables into the geocoding process? Example – Providing the user with a capability to include the coordinates of named places, such as nursing homes, caravan parks, or prisons. | |
| Does the system allow for the use of weighting schemes to bias the placement of centroid-level output? Example – Including a population density layer that moves the output of a postcode-level geocode closer to the location within the postcode that has the highest level of population density. |
Operational capabilities results
| Yes | Yes | Yes | Yes | No | |
| Yes | Yes | Yes | Yes | No | |
| Yes | No | No | Yes | Yes | |
| Yes | Yes* | Yes* | Yes | Yes | |
| | | | | | |
| Yes (Unix) | No (Windows) | No (Windows) | Yes (Windows, Unix, Linux) | Windows | |
| Yes | No | No | No | No | |
| Yes | Yes | Yes | Yes | Yes | |
| Yes | No | No | Yes | No | |
| Yes | Yes | Yes | Yes | No | |
| Yes | Yes | Yes | Yes | Yes | |
| Yes | Yes | Yes | Yes | Yes | |
| Yes | No | No | Yes | No | |
| No | Yes | Yes | Yes | No | |
| | | | | | |
| Yes | Yes | Yes | Yes | Yes | |
| No | No | No | Yes | Yes | |
| Yes | Yes | Yes | Available (not by default) | No | |
| Yes | No | No | Yes | No | |
| Yes | Yes | Yes | Yes | Yes | |
| | | | | | |
| Yes | Yes | Yes | Yes | Yes | |
| Yes | Yes | Yes | Yes | Yes | |
| Yes | Yes | Yes | Yes | No | |
| No | No | No | Yes | No | |
| Yes | No | No | Yes | Yes |
* Only if street centroids, suburb and postcode reference data are available.
Reference data set support and setup time
| No | Yes – 20 mins | Yes – 3 weeks | |
| Yes – 1 day | Yes – < 5 mins | Yes– < 5 mins | |
| No | Yes – < 5 mins | Yes – < 5 mins | |
| Yes – < 5 mins | Yes – < 5 mins | No | |
| Yes – < 5 mins | No | No |
Processing time by geocoding system, reference data set, and input data set
| PSA | <2 m | <2 m | <2 m | <2 m | - | |
| PSA+ | <2 m | <2 m | <2 m | - | - | |
| | G-NAF | - | <2 m | - | <2 m | <2 m |
| PSA | 45 m | 19 m | 13 m | 17 m | - | |
| PSA+ | 39 m | 19 m | 12 m | - | - | |
| | G-NAF | - | 24 m | - | 34 m | 13 m |
| PSA | 55 m | 16 m | 16 m | 25 m | - | |
| PSA+ | 2 h 25 m | 22 m | 17 m | - | - | |
| G-NAF | - | 19 m | - | 30 m | 23 m |
Input data A (Gold standard) match type and match rate summary (n = 2203 records)
| | |||||||
| - | - | - | - | - | - | ||
| | 1875 | 85.1 | 303 | 13.8 | 2178 | 98.9 | |
| | 1875 | 85.1 | 303 | 13.8 | 2178 | 98.9 | |
| 1765 | 80.1 | 67 | 3.0 | 1832 | 83.2 | ||
| | 1624 | 73.7 | 77 | 3.5 | 1701 | 77.2 | |
| | 1624 | 73.7 | 77 | 3.5 | 1701 | 77.2 | |
| - | - | - | - | - | - | ||
| | 1696 | 77.0 | 21 | 1.0 | 1717 | 77.9 | |
| | 1696 | 77.0 | 21 | 1.0 | 1717 | 77.9 | |
| 1959 | 88.9 | 236 | 10.7 | 2195 | 99.6 | ||
| | 1938 | 88.0 | 257 | 11.7 | 2195 | 99.6 | |
| | - | - | - | - | - | - | |
| 1991 | 90.4 | 212 | 9.6 | 2203 | 100.0 | ||
| | - | - | - | - | - | - | |
| - | - | - | - | - | - | ||
Input data B (Administrative) match type and match rate summary (n = 1364058 records)
| | |||||||
| - | - | - | - | - | - | ||
| | 1306310 | 95.8 | 55907 | 4.1 | 1362217 | 99.9 | |
| | 1313046 | 96.3 | 49805 | 3.7 | 1362851 | 99.9 | |
| 1136220 | 83.3 | 36915 | 2.7 | 1173135 | 86.0 | ||
| | 1165034 | 85.4 | 58664 | 4.3 | 1223698 | 89.7 | |
| | 1165034 | 85.4 | 58664 | 4.3 | 1223698 | 89.7 | |
| - | - | - | - | - | - | ||
| | 1219245 | 89.4 | 21932 | 1.6 | 1241177 | 91.0 | |
| | 1219245 | 89.4 | 21932 | 1.6 | 1241177 | 91.0 | |
| 1318281 | 96.6 | 43825 | 3.2 | 1362106 | 99.9 | ||
| | 1325911 | 97.2 | 35442 | 2.6 | 1361353 | 99.8 | |
| | - | - | - | - | - | - | |
| 1329627 | 97.5 | 34431 | 2.5 | 1364058 | 100.0 | ||
| | - | - | - | - | - | - | |
| - | - | - | - | - | - | ||
Input data C (Health) match type and match rate summary (n = 998066 records)
| | |||||||
| - | - | - | - | - | - | ||
| | 712645 | 71.4 | 149309 | 15.0 | 861954 | 86.4 | |
| | 724326 | 72.6 | 145595 | 14.6 | 869921 | 87.2 | |
| 446182 | 44.7 | 101049 | 10.1 | 547231 | 54.8 | ||
| | 486188 | 48.7 | 78508 | 7.9 | 564696 | 56.6 | |
| | 486188 | 48.7 | 78508 | 7.9 | 564696 | 56.6 | |
| - | - | - | - | - | - | ||
| | 440062 | 44.1 | 27806 | 2.8 | 467868 | 46.9 | |
| | 440062 | 44.1 | 27806 | 2.8 | 467868 | 46.9 | |
| 734518 | 73.6 | 211175 | 21.2 | 945693 | 94.8 | ||
| | 725115 | 72.7 | 217965 | 21.8 | 943080 | 94.5 | |
| | - | - | - | - | - | - | |
| 716241 | 71.8 | 271326 | 27.2 | 987567 | 98.9 | ||
| | - | - | - | - | - | - | |
| - | - | - | - | - | - | ||