| Literature DB >> 20482806 |
Alan M MacEachren1, Michael S Stryker, Ian J Turton, Scott Pezanowski.
Abstract
BACKGROUND: The volume of health science publications is escalating rapidly. Thus, keeping up with developments is becoming harder as is the task of finding important cross-domain connections. When geographic location is a relevant component of research reported in publications, these tasks are more difficult because standard search and indexing facilities have limited or no ability to identify geographic foci in documents. This paper introduces HEALTH GeoJunction, a web application that supports researchers in the task of quickly finding scientific publications that are relevant geographically and temporally as well as thematically.Entities:
Mesh:
Year: 2010 PMID: 20482806 PMCID: PMC2889882 DOI: 10.1186/1476-072X-9-23
Source DB: PubMed Journal: Int J Health Geogr ISSN: 1476-072X Impact factor: 3.918
Figure 1. The primary components in Health GeoJunction are depicted. As indicated, they are organized as a client-server application supported by a spatially-enabled relational database.
Figure 2HEALTH GeoJunction initial view. This screen capture shows the default view in GeoJunction (after the user has zoomed in on Southeast Asia on the map) for the sample of documents retrieved from PubMed using a query that identified documents about avian flu and related concepts. The map view represents the frequencies of documents that have been determined to be from or about each country with spilt graduated circles (orange/left side representing documents that are about that country or a place within it and gray/right side representing documents from an organization in that country). Below the map is a double timeline, showing a histogram of document frequency for the entire time period represented by documents (below) and for the currently selected time period (above). In this case, the time period selected is February-November, 2007. The tag cloud at the upper right represents term frequency in the full selected data set. The view shows the two-term pair option. The default order is by frequency. Users have the option to select the more common alphabetical order. The lower tag cloud represents the results of current place, time, and concept filtering. Gray terms have not changed their rank order, green terms have higher rank frequency in the selection than in the overall document set (with bright green representing those terms that are not in the top 100 in the full set), and purple represents terms that dropped in rank. The default color choices support users with color vision deficiencies; custom choices can be set by the user. The space between tag clouds represents currently applied place, time, and concept filters. The bottom right tabbed window provides access to a table of documents, pop-up abstracts, geographic footprints (that display on the map), and a browser that shows the document in PubMed.
Figure 3GeoJunction view with place and concept filtering. This figure shows the result after the user clicked on the "about" side of the Thailand symbol on the map, filtering the result to only those judged (based on MeSH or GeoJunction feature extraction tools) to be about Thailand, and the user clicked on "disease outbreak" in the top tag cloud to further filter to the subset of documents about Thailand that are also about disease outbreaks. Not surprisingly, H5N1 virus has moved up to be the third most frequent term in this set of documents (now including only 23 of the original set).
Figure 4Geographic footprints. This figure represents subsequent exploration in which the user has identified two papers of interest. The geographic footprints of both are depicted on the map and the abstract for one is highlighted.
Figure 5Place, time, concept filter control. This figure shows a detailed view of the facet-based filtering control in which users see the place, time, and concept filters that they have applied. Users are able to selectively remove any of those filters.
Visual interactions to support analyst tasks by facet dimension.
| Dimension | Task | Visual Interaction |
|---|---|---|
| Time | Filter documents by time range | Size and drag date range slider widgets in timeline |
| Time | Identify article count by month | Mouse-over time line bar chart |
| Time | Filter documents by predefined time interval | Click corresponding interval button above timeline and then place time slider |
| Concept | Filter documents by keyword from corpus | Click tagcloud keyword |
| Concept | Filter documents by keyword in selected set | Click 'filtered' tagcloud keyword |
| Concept | Compare frequency of keyword in corpus versus selected set | Click color assignment widget to classify keyword frequency change by color |
| Concept | Highlight tag frequency above document count threshold | Dynamically highlight tags in tag cloud for full corpus or current selection using respective slider |
| Concept | Retrieve documents by alternative tagging scheme | Select tag format for tag clouds from drop-down list |
| Concept | Retrieve documents by low frequency tag | Select sort tag cloud by count from drop-down list, scroll and click tags |
| Concept | Retrieve documents by specific keyword tags | Select sort tag cloud alphabetically from drop-down list, scroll and click tags |
| Place | Select documents about a country | Click 'about' country graduated symbol |
| Place | Select documents from a country | Click 'from' country graduated symbol |
| Place | Select documents about a country or adjacent countries | Shift-click a country graduated symbol |
| Place | Select documents by place within geographic hierarchy | Tab from map to geographic tree view and click place names |
| Place | Retrieve summary of documents by country ('about' or 'from') and time range | Mouse-over country graduated symbol on map |
| Place | Inspect geographic distribution of documents for query parameters | Map supports standard pan by dragging, map scale slider and buttons for pan by fixed interval |
| Results | Preview document in result set | Mouse-over icon in result set table |
| Results | View map representation of places referenced in document | Select globe icon in result set table |
| Results | Retrieve similar documents | Click 'more-like-this' icon in result set table |
| Results | Inspect retrieved document | Select row in result set table and then document details tab |
| All | Remove place or tag constraint | Clear to remove facet-like stub for criterion |
Figure 6Document processing components. The component steps in extracting and geocoding geographic information found in MeSH headings as well as in the title and abstract are delineated here. The approach relies on the OpenCalais named entity extractor to identify geographic references in free text and on the GeoNames database of place names in the world to find geographic entity matches.
Comparison of precision, recall and F-measure.
| Service | Precision | Recall | F- Measure |
|---|---|---|---|
| Calais | 0.99 | 0.63 | 0.76 |
| Metacarta | 0.52 | 0.83 | 0.60 |
| Alchemy | 0.75 | 0.39 | 0.49 |
| Tagthenet | 0.47 | 0.26 | 0.31 |
| Zemanta | 0.19 | 0.24 | 0.20 |