| Literature DB >> 26401317 |
Maya Wardeh1, Claire Risley2, Marie Kirsty McIntyre1, Christian Setzkorn1, Matthew Baylis3.
Abstract
Interactions between species, particularly where one is likely to be a pathogen of the other, as well as the geographical distribution of species, have been systematically extracted from various web-based, free-access sources, and assembled with the accompanying evidence into a single database. The database attempts to answer questions such as what are all the pathogens of a host, and what are all the hosts of a pathogen, what are all the countries where a pathogen was found, and what are all the pathogens found in a country. Two datasets were extracted from the database, focussing on species interactions and species distribution, based on evidence published between 1950-2012. The quality of their evidence was checked and verified against well-known, alternative, datasets of pathogens infecting humans, domestic animals and wild mammals. The presented datasets provide a valuable resource for researchers of infectious diseases of humans and animals, including zoonoses.Entities:
Mesh:
Year: 2015 PMID: 26401317 PMCID: PMC4570150 DOI: 10.1038/sdata.2015.49
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Overview of the methods of identifying species-species and species-location interactions.
The first panel lists the resources used in a colour coded fashion. H refers to host and C to country tags in the sequence metadata. PMID is the PubMed unique identifier used in retrieving papers. The second panel explains the method of interrogating the evidence bases to extract species (cargo)-species (carrier) interactions. Species of sequenced organism (i.e., cargo) is first identified using the taxonomy tree, then the host tag in the sequence metadata is disambiguated using the taxonomic tree to identify the carrier species. Lists of PMIDs obtained for cargo and carrier species are intersected to provide additional evidence for the interactions extracted from the sequence metadata and to identify new relationships between cargo and carrier species discovered from the sequence metadata. The third panel illustrates the method of extracting species-location interactions from the evidence-base. First sequenced organisms and location information are extracted from sequence metadata. The species of sequenced organisms is then identified using the taxonomic tree. The location data (L) is split into country (C) and region (R) strings. Both are then disambiguated using the data gathered from GeoNames to obtain the country and region where the species was found. Geonames is also used to interrogate PubMed for papers about each country and region in the database. These are then intersected with species publications, the shared set is used as evidence for the species being found in a given location.
Figure 2Example illustrating the information extracted from sequence metadata—sequence ID=158668169.
.
Figure 3Shared pathogens between vertebrate species in Data Citation 1.
Each node presents a vertebrate species. The size of the node is in proportion to the number of unique pathogen species found to interact with it. Edges between two nodes indicate they both share at least one possible pathogen species. The weight (thickness) of the edges is in proportion to the number of possible pathogen species shared between the two nodes. The location of each particular node corresponds to the size of all nodes in the graph and the weight of the edges linking this particular node with other nodes.
Summary of the species-species dataset (Data Citation 1)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Columns are categories of cargoes, rows are categories of carriers. | |||||||||||||||
| algae | 0 | 0 | 36 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 38 |
| amphibian | 0 | 1 | 16 | 0 | 3 | 24 | 111 | 0 | 0 | 4 | 9 | 0 | 0 | 23 | 191 |
| arthropod | 5 | 294 | 1110 | 2 | 0 | 645 | 77 | 0 | 1 | 11 | 196 | 12 | 0 | 335 | 2688 |
| aves | 0 | 524 | 136 | 0 | 0 | 6 | 125 | 0 | 0 | 0 | 135 | 0 | 0 | 400 | 1326 |
| bacteria | 0 | 0 | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 0 | 0 | 0 | 84 | 103 |
| bryozoa | 0 | 0 | 0 | 0 | 5 | 32 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 37 |
| cnidaria | 0 | 15 | 46 | 0 | 0 | 27 | 0 | 0 | 0 | 1 | 16 | 0 | 0 | 0 | 105 |
| domestic | 1 | 102 | 1215 | 0 | 0 | 227 | 556 | 0 | 0 | 12 | 458 | 0 | 0 | 563 | 3134 |
| fish | 0 | 47 | 588 | 0 | 297 | 82 | 1302 | 1 | 0 | 13 | 66 | 3 | 0 | 148 | 2547 |
| fungi | 0 | 0 | 73 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 23 | 99 |
| helminth | 0 | 0 | 91 | 0 | 0 | 12 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 7 | 112 |
| human | 1 | 19 | 878 | 0 | 0 | 311 | 148 | 0 | 0 | 0 | 70 | 0 | 0 | 204 | 1631 |
| other mammals | 0 | 110 | 298 | 0 | 1 | 114 | 412 | 0 | 1 | 7 | 202 | 0 | 0 | 406 | 1551 |
| mollusca | 1 | 29 | 138 | 0 | 0 | 2 | 109 | 0 | 0 | 8 | 39 | 0 | 0 | 7 | 333 |
| plants | 0 | 1506 | 2977 | 0 | 0 | 694 | 293 | 0 | 0 | 2 | 19 | 0 | 93 | 1597 | 7181 |
| others | 0 | 1 | 23 | 0 | 0 | 0 | 1 | 0 | 0 | 2 | 4 | 0 | 0 | 13 | 44 |
| porifera | 0 | 2 | 50 | 0 | 5 | 18 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 76 |
| primate | 0 | 11 | 26 | 0 | 0 | 19 | 36 | 0 | 0 | 0 | 104 | 0 | 0 | 231 | 427 |
| protozoa | 5 | 0 | 17 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 3 | 34 |
| reptile | 0 | 13 | 54 | 0 | 3 | 11 | 72 | 0 | 0 | 0 | 64 | 4 | 0 | 14 | 235 |
| rodent | 0 | 88 | 157 | 0 | 0 | 10 | 127 | 0 | 0 | 0 | 66 | 0 | 0 | 130 | 578 |
| segmented worm | 0 | 0 | 23 | 0 | 12 | 0 | 5 | 0 | 0 | 2 | 3 | 0 | 0 | 0 | 45 |
| Total | 13 | 2762 | 7963 | 2 | 326 | 2238 | 3375 | 1 | 4 | 71 | 1458 | 19 | 93 | 4190 | 22515 |
Comparison between Taylor et al.[5] and Data Citation 1 for human cargoes
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| V=Viruses, B=Bacteria, F=Fungi, H=Helminths, P=Protozoa[ | ||||||
| in [5] | 217 | 539 | 312 | 287 | 60 | 1415 |
| in Data Citation 1 | 204 | 878 | 311 | 148 | 70 | 1611 |
| in [5] and Data Citation 1 | 147 | 414 | 176 | 134 | 48 | 919 |
| % [5] share with Data Citation 1 | 67.74 | 76.81 | 56.41 | 46.69 | 80.00 | 64.95 |
| % Data Citation 1 share with [5] | 72.06 | 47.15 | 56.59 | 1.67 | 68.57 | 57.05 |
| Total Unique | 274 | 1003 | 447 | 301 | 82 | 2107 |
Comparison between Cleaveland et al.[3] and Data Citation 1 for cargoes of domestic mammals
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| V=Viruses, B=Bacteria, F=Fungi, H=Helminths, P=Protozoa. | ||||||
| in [3] | 147 | 228 | 88 | 349 | 193 | 915 |
| in Data Citation 1 | 179 | 385 | 78 | 245 | 141 | 1038 |
| in [3] and Data Citation 1 | 118 | 188 | 38 | 197 | 73 | 614 |
| % [3] share with Data Citation 1 | 80.27 | 82.46 | 43.18 | 56.45 | 70.87 | 59.15 |
| % Data Citation 1 share with [3] | 65.92 | 47.59 | 48.72 | 80.41 | 51.77 | 59.15 |
| Total Unique | 208 | 435 | 128 | 397 | 171 | 1339 |
Comparison between GMPD and Data Citation 1 for wild mammals-cargoes interactions
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| V=Viruses, B=Bacteria, F=Fungi, H=Helminths, P=Protozoa, A=Arthropod. | |||||||
| in GMPD | 177 | 61 | 5 | 332 | 104 | 127 | 806 |
| in Data Citation 1 | 486 | 395 | 51 | 283 | 284 | 95 | 1694 |
| in GMPD and Data Citation 1 | 127 | 37 | 5 | 218 | 78 | 51 | 516 |
| % GMPD share with Data Citation 1 | 71.75 | 60.66 | 100.00 | 65.66 | 75.00 | 40.16 | 64.02 |
| % Data Citation 1 share with GMPD | 26.13 | 9.37 | 9.80 | 56.92 | 27.46 | 53.68 | 30.46 |
| Total Unique | 536 | 419 | 51 | 497 | 310 | 171 | 1984 |