| Literature DB >> 31258456 |
Milan Gritta1, Mohammad Taher Pilehvar1, Nut Limsopatham1, Nigel Collier1.
Abstract
Geographical data can be obtained by converting place names from free-format text into geographical coordinates. The ability to geo-locate events in textual reports represents a valuable source of information in many real-world applications such as emergency responses, real-time social media geographical event analysis, understanding location instructions in auto-response systems and more. However, geoparsing is still widely regarded as a challenge because of domain language diversity, place name ambiguity, metonymic language and limited leveraging of context as we show in our analysis. Results to date, whilst promising, are on laboratory data and unlike in wider NLP are often not cross-compared. In this study, we evaluate and analyse the performance of a number of leading geoparsers on a number of corpora and highlight the challenges in detail. We also publish an automatically geotagged Wikipedia corpus to alleviate the dearth of (open source) corpora in this domain.Entities:
Keywords: Geocoding; Geoparsing; Geotagging; NED; NEL; NER; NLP
Year: 2017 PMID: 31258456 PMCID: PMC6560650 DOI: 10.1007/s10579-017-9385-8
Source DB: PubMed Journal: Lang Resour Eval ISSN: 1574-020X Impact factor: 1.358
Fig. 1The geoparsing pipeline comprises two main stages geotagging and geocoding. Geotagging retrieves only literal toponyms (ideally filtering out metonymic occurrences) from text and generates candidate coordinates for each. Geocoding then leverages the surrounding context to choose the correct coordinate and link the mention to an entry in a geographical knowledge base
Fig. 2A sample article from WikToR. The article has been shortened
Fig. 3An article from Wikipedia about Ottawa. All necessary disambiguation information is provided in the first sentence of each article, which is included in every WikToR sample
Fig. 4A typical power law distribution of geocoding errors in KMs away from the gold location. Most errors are relatively low but increase very fast for the approximately last 20% of locations. This is a typical pattern observed across different systems and datasets
Fig. 5How to calculate the AUC, a visual illustration. Lower scores are better. Figure 4 shows the same error data in its original form (before applying the natural logarithm)
Geotagging performance on LGL
| LGL | Precision | Recall | F-score |
|---|---|---|---|
| GeoTxt | 0.80 | 0.59 | 0.68 ( |
| Edinburgh | 0.71 | 0.55 | 0.62 ( |
| Yahoo! | 0.64 | 0.55 | 0.59 ( |
| CLAVIN |
| 0.44 | 0.57 ( |
|
|
|
|
|
The bold values indicate the best performance for that metric out of all tested systems
Numbers in brackets are improved scores for inexact matches such as geotagging “Helmand” instead of “Helmand Province” or vice versa
** Inexact scores not available due to the system’s non-standard output
Geotagging performance on WikToR
| WikToR | Accuracy | Accuracy (inexact) |
|---|---|---|
| GeoTxt | 0.51 |
|
|
|
|
|
| Yahoo! | 0.4 |
|
| CLAVIN | 0.21 |
|
| Topocluster | 0.54 | (**) |
The bold values indicate the best performance for that metric out of all tested systems
Inexact example: geotagging the “City of London” instead of only “London” and vice versa
** Not available due to the system’s nonstandard output
Geocoding results on LGL
| LGL | AUC | Med | Mean | AUCE | A@161 |
|---|---|---|---|---|---|
| GeoTxt | 0.29 | 0.05 | 2.9 | 0.21 | 0.68 |
|
|
| 1.10 |
| 0.22 |
|
| Yahoo! | 0.34 | 3.20 | 3.3 | 0.35 | 0.72 |
| CLAVIN | 0.26 |
|
|
| 0.71 |
| Topocluster | 0.38 | 3.20 | 3.8 | 0.36 | 0.63 |
The bold values indicate the best performance for that metric out of all tested systems
Lowest scores are best (except A@161). All figures are exponential (base e) (except A@161), so differences between geoparsers grow rapidly
Geocoding results for WikToR
| WikToR | AUC | Med | Mean | AUCE | A@161 |
|---|---|---|---|---|---|
| GeoTxt | 0.7 | 7.9 | 6.9 | 0.71 | 0.18 |
| Edinburgh | 0.53 | 6.4 | 5.3 | 0.58 | 0.42 |
|
|
|
|
|
|
|
| CLAVIN | 0.7 | 7.8 | 6.9 | 0.69 | 0.16 |
| Topocluster | 0.63 | 7.3 | 6.2 | 0.66 | 0.26 |
The bold values indicate the best performance for that metric out of all tested systems
Lowest scores are best (except A@161). All figures are exponential (base e) (except A@161), so differences between geoparsers grow fast