| Literature DB >> 23382431 |
Simon I Hay1, Katherine E Battle, David M Pigott, David L Smith, Catherine L Moyes, Samir Bhatt, John S Brownstein, Nigel Collier, Monica F Myers, Dylan B George, Peter W Gething.
Abstract
The primary aim of this review was to evaluate the state of knowledge of the geographical distribution of all infectious diseases of clinical significance to humans. A systematic review was conducted to enumerate cartographic progress, with respect to the data available for mapping and the methods currently applied. The results helped define the minimum information requirements for mapping infectious disease occurrence, and a quantitative framework for assessing the mapping opportunities for all infectious diseases. This revealed that of 355 infectious diseases identified, 174 (49%) have a strong rationale for mapping and of these only 7 (4%) had been comprehensively mapped. A variety of ambitions, such as the quantification of the global burden of infectious disease, international biosurveillance, assessing the likelihood of infectious disease outbreaks and exploring the propensity for infectious disease evolution and emergence, are limited by these omissions. An overview of the factors hindering progress in disease cartography is provided. It is argued that rapid improvement in the landscape of infectious diseases mapping can be made by embracing non-conventional data sources, automation of geo-positioning and mapping procedures enabled by machine learning and information technology, respectively, in addition to harnessing labour of the volunteer 'cognitive surplus' through crowdsourcing.Entities:
Mesh:
Year: 2013 PMID: 23382431 PMCID: PMC3679597 DOI: 10.1098/rstb.2012.0250
Source DB: PubMed Journal: Philos Trans R Soc Lond B Biol Sci ISSN: 0962-8436 Impact factor: 6.237
Figure 1.A schematic overview of a niche/occurrence mapping process (for example boosted regression trees (BRT)) that uses pseudo-absence data guided by expert opinion. Consensus based definitive extent layers of infectious disease occurrence at the national level (a) are combined with accurately geo-positioned occurrence (presence) locations (b) to generate pseudo-absence data (c). The presence (b) and pseudo-absence data (c) are then used in the BRT analyses, alongside a suite of environmental covariates (d) to predict the probability of occurrence of the target disease (e).
Figure 2.A schematic of the disease classification process. The classification system results in diseases being categorized into one of five options: (1) do not map; (2) map observed occurrence; (3) map maximum potential range of reservoir or vectors; (4) niche/occurrence mapping with BRT and (5) MGB-based endemicity maps.
The number of clinically important infectious diseases and the subset of those with a rationale for mapping by transmission category (see §2).
| classification | clinically significant diseases ( | diseases with rationale for mapping ( |
|---|---|---|
| animal contact | 20 | 9 |
| blood/body fluid contact | 14 | 5 |
| direct contact | 23 | 7 |
| endogenousa | 35 | 0 |
| food/water-borne | 82 | 36 |
| respiratory | 39 | 9 |
| sexual contact | 11 | 2 |
| soil contact | 21 | 14 |
| unknown | 11 | 4 |
| vector-borne | 88 | 80 |
| water contact | 11 | 8 |
aEndogenous infections are those caused by previously inapparent or dormant pathogens arising from the typical commensal microbial flora of humans.
Figure 3.Radial plots for all diseases with a rationale for mapping, ordered clockwise, by metascore (white line). A white line from the centre to the edge of the circle would show a perfect metascore. (a) Reflects all diseases (n = 174 of 355), (b) viral diseases (n = 62 of 101), (c) parasitic diseases (n = 61 of 96), (d) bacterial diseases (n = 36 of 128), and (e) comprises fungal (n = 9 of 17), protoctistan (n = 2 of 2) and diseases of unknown pathogen (n = 4 of 10). Note that there was one algal disease, which did not have a rationale for mapping and is not shown in this diagram.
The cartographically relevant holdings of the National Center for Biotechnology Information PubMed and GenBank systems. The searches were conducted on 4 November 2011 and 1 March 2012, respectively.
| system | PubMed | GenBank |
|---|---|---|
| start year | 1946 [ | 1982 [ |
| frequency of updates | daily [ | Daily [ |
| number of species catalogued | >250 000 [ | >250 000 [ |
| approximate number of entries | 21 million [ | 340 million [ |
| number of clinically relevant diseases for which data are available | 168 | 155 |
| occurrence point sources for mapping | 526 564 | 672 327 |
Geo-positioned occurrence data archived by the HealthMap and BioCaster online disease outbreak reporting systems. HealthMap uses automated text processing to classify and position alerts that are then confirmed by a human analyst [25]. BioCaster has automated text processing to classify and position alerts processed through a multilingual ontology [26]. The totals were assembled using data provided for HealthMap on 23 November 2011 and BioCaster on 24 February 2012.
| system | HealthMap | BioCaster |
|---|---|---|
| start year | 2006 | 2006 |
| approximate posts per day | 300 [ | 100 [ |
| number of languages | 10 (J. S. Brownstein 2012, personal communication) | 11 [ |
| number of diseases tagged | 245 (J. S. Brownstein 2011, personal communication) | 230 (N. Collier 2012, personal communication) |
| number of clinically relevant diseases for which data are available | 84 of 245 | 99 of 230 (N. Collier 2012, personal communication) |
| total occurrence points | 337 105 (J. S. Brownstein 2011, personal communication) | 189 361 (N. Collier 2012, personal communication) |
| occurrence point sources for mapping | 66 284 (J. S. Brownstein 2011, personal communication) | 140 038 (N. Collier 2012, personal communication) |