| Literature DB >> 29843588 |
Hans Christian Wittich1, Marco Seeland2, Jana Wäldchen3, Michael Rzanny3, Patrick Mäder4.
Abstract
BACKGROUND: Predicting a list of plant taxa most likely to be observed at a given geographical location and time is useful for many scenarios in biodiversity informatics. Since efficient plant species identification is impeded mainly by the large number of possible candidate species, providing a shortlist of likely candidates can help significantly expedite the task. Whereas species distribution models heavily rely on geo-referenced occurrence data, such information still remains largely unused for plant taxa identification tools.Entities:
Keywords: Classification; Location-based; Occurrence prediction; Plant distribution; Plant identification; Recommender system; Spatio-temporal context
Mesh:
Year: 2018 PMID: 29843588 PMCID: PMC5975699 DOI: 10.1186/s12859-018-2201-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Characteristics of the FLORKART dataset – a spatial density of occurrence records per grid cell across all taxa; b average distance to nearest neighbor occurrence per taxon, average over all taxa marked by red line; c frequency distribution of occurrence records per grid cell
Fig. 2Characteristics of the GBIF dataset – a spatial density of occurrence records per grid cell across all taxa; b record distribution per month of observation; c average distance to nearest neighbor occurrence per taxon; d frequency distribution of occurrence records per grid cell; e record distribution per year of observation
Fig. 3Characteristics of the Flickr test data – a spatial density of occurrence records per grid cell across all taxa; b record distribution per month of observation; c average distance to nearest neighbor occurrence per taxon; d frequency distribution of occurrence records per grid cell; e record distribution per year of observation
Fig. 4Grid section for a single taxon including area and point occurrences with different extents and uncertainties, respectively. The circle shows the sampling radius around the test position (red cross) being queried. The opacity of a tile is proportional to the taxon’s likelihood of being encountered there
Results of ranked taxon retrieval solely using FLORKART grid-based presence-absence data sampled at the exact location and aggregated for increasing radii around Flickr test observations
| Radius [km] |
|
|
| ||||
|---|---|---|---|---|---|---|---|
| Retrieval at exact location | |||||||
| 0 | 82.31 | 3.38 | 64.13 | 1.11 | 307 | 680 | 4.54 |
| S1: Relative frequency of occurrence records | |||||||
| 1 | 85.40 | 2.42 | 68.15 | 0.79 | 300 | 787 | 3.79 |
| 5 | 92.35 | 4.62 | 74.94 | 1.52 | 237 | 1115 | 2.59 |
| 10 | 94.47 | 4.78 | 74.42 | 1.35 | 234 | 1286 | 2.23 |
| 20 | 96.14 | 5.65 | 72.39 | 1.81 | 237 | 1477 | 1.92 |
| S2: Weighted relative frequency of occurrence records | |||||||
| 1 | 85.40 | 2.62 | 68.73 | 0.95 | 287 | 787 | 3.79 |
| 5 | 92.35 | 4.36 | 74.80 | 1.56 | 240 | 1115 | 2.59 |
| 10 | 94.47 | 4.74 | 74.70 | 1.55 | 232 | 1286 | 2.23 |
| 20 | 96.14 | 5.71 | 73.88 | 1.78 | 233 | 1477 | 1.92 |
| S3: Minimum spatial distance to records’ tile centers | |||||||
| 1 | 85.40 | 4.00 | 63.28 | 1.14 | 330 | 787 | 3.79 |
| 5 | 92.35 | 2.85 | 64.58 | 1.01 | 357 | 1115 | 2.59 |
| 10 | 94.47 | 2.13 | 64.25 | 0.80 | 375 | 1286 | 2.23 |
| 20 | 96.14 | 2.52 | 64.23 | 0.82 | 379 | 1477 | 1.92 |
| S4: Average spatial distance to records’ tile centers | |||||||
| 1 | 85.40 | 2.06 | 60.00 | 0.65 | 380 | 787 | 3.79 |
| 5 | 92.35 | 0.46 | 52.91 | 0.37 | 470 | 1115 | 2.59 |
| 10 | 94.47 | 0.68 | 46.32 | 0.37 | 520 | 1286 | 2.23 |
| 20 | 96.14 | 0.81 | 37.00 | 0.37 | 615 | 1477 | 1.92 |
Results of ranked taxon retrieval solely using GBIF point-based occurrence records sampled at the exact location and aggregated for increasing radii around Flickr test observations
| Radius [km] |
|
|
| ||||
|---|---|---|---|---|---|---|---|
| S1: Relative frequency of occurrence records | |||||||
| 0 | 36.36 | 19.90 | 36.36 | 6.61 | 17 | 73 | 262.00 |
| 1 | 43.40 | 16.43 | 43.40 | 5.06 | 36 | 142 | 218.72 |
| 5 | 59.72 | 11.28 | 58.04 | 3.45 | 89 | 337 | 91.61 |
| 10 | 73.15 | 12.05 | 69.45 | 3.41 | 111 | 504 | 16.60 |
| 20 | 85.51 | 11.12 | 77.68 | 2.71 | 133 | 752 | 5.36 |
| S2: Weighted relative frequency of occurrence records | |||||||
| 1 | 43.40 | 18.15 | 43.38 | 5.54 | 30 | 142 | 218.72 |
| 5 | 59.72 | 14.31 | 58.54 | 4.30 | 70 | 337 | 91.61 |
| 10 | 73.15 | 13.61 | 70.52 | 4.05 | 89 | 504 | 16.60 |
| 20 | 85.51 | 14.98 | 79.73 | 3.77 | 108 | 752 | 5.36 |
| S3: Minimum spatial distance to records’ tile centers | |||||||
| 1 | 43.40 | 12.84 | 43.44 | 3.46 | 51 | 142 | 218.72 |
| 5 | 59.72 | 14.87 | 58.46 | 4.12 | 66 | 337 | 91.61 |
| 10 | 73.15 | 16.00 | 71.09 | 4.59 | 77 | 504 | 16.60 |
| 20 | 85.51 | 16.46 | 80.62 | 4.54 | 92 | 752 | 5.36 |
| S4: Average spatial distance to records’ tile centers | |||||||
| 1 | 43.40 | 14.51 | 43.39 | 4.63 | 55 | 142 | 218.72 |
| 5 | 59.72 | 12.91 | 58.50 | 3.99 | 76 | 337 | 91.61 |
| 10 | 73.15 | 10.48 | 70.69 | 2.97 | 110 | 504 | 16.60 |
| 20 | 85.51 | 9.68 | 78.83 | 2.83 | 136 | 752 | 5.36 |
| S5: Temporal distance to months with recorded occurrences | |||||||
| 0 | 36.35 | 23.12 | 36.35 | 7.36 | 13 | 73 | 261.10 |
| 1 | 43.39 | 19.81 | 43.39 | 5.81 | 24 | 141 | 218.97 |
| 5 | 59.71 | 12.47 | 58.84 | 3.60 | 77 | 337 | 91.78 |
| 10 | 73.15 | 11.21 | 69.95 | 3.08 | 108 | 503 | 16.67 |
| 20 | 85.50 | 7.25 | 77.88 | 1.96 | 168 | 751 | 5.37 |
Results of ranked taxon retrieval using FLORKART presence-absence data in combination with GBIF point-based occurrence records sampled at the exact location and aggregated for increasing radii around Flickr test observations
| Radius [km] |
|
|
| ||||
|---|---|---|---|---|---|---|---|
| S1: Relative frequency of occurrence records | |||||||
| 0 | 86.62 | 20.89 | 74.99 | 7.20 | 121 | 692 | 4.41 |
| 1 | 89.51 | 16.99 | 79.66 | 5.38 | 135 | 810 | 3.67 |
| 5 | 94.10 | 11.62 | 83.88 | 3.67 | 155 | 1142 | 2.54 |
| 10 | 95.98 | 12.19 | 83.03 | 3.55 | 160 | 1320 | 2.18 |
| 20 | 97.40 | 11.02 | 80.09 | 2.77 | 165 | 1525 | 1.86 |
| S2: Weighted relative frequency of occurrence records | |||||||
| 1 | 89.51 | 19.67 | 80.35 | 6.00 | 116 | 810 | 3.67 |
| 5 | 94.10 | 15.38 | 84.33 | 4.60 | 131 | 1142 | 2.54 |
| 10 | 95.98 | 15.16 | 84.38 | 4.28 | 127 | 1320 | 2.18 |
| 20 | 97.40 | 15.00 | 84.08 | 3.83 | 128 | 1525 | 1.86 |
| S3: Minimum spatial distance to records’ tile centers | |||||||
| 1 | 89.51 | 2.48 | 68.07 | 0.94 | 330 | 810 | 3.67 |
| 5 | 94.10 | 3.25 | 66.14 | 1.05 | 364 | 1142 | 2.54 |
| 10 | 95.98 | 1.90 | 65.72 | 0.84 | 378 | 1320 | 2.18 |
| 20 | 97.40 | 2.82 | 67.13 | 1.04 | 359 | 1525 | 1.86 |
| S4: Average spatial distance to records’ tile centers | |||||||
| 1 | 89.51 | 3.70 | 63.18 | 1.81 | 374 | 810 | 3.67 |
| 5 | 94.10 | 1.09 | 52.77 | 0.66 | 478 | 1142 | 2.54 |
| 10 | 95.98 | 0.76 | 45.51 | 0.43 | 529 | 1320 | 2.18 |
| 20 | 97.40 | 1.05 | 36.70 | 0.42 | 624 | 1525 | 1.86 |
| S5: Temporal distance to months with recorded occurrences | |||||||
| 0 | 36.35 | 23.15 | 36.35 | 7.37 | 13 | 73 | 261.10 |
| 1 | 43.39 | 19.86 | 43.39 | 5.76 | 25 | 141 | 218.97 |
| 5 | 59.71 | 12.52 | 58.82 | 3.60 | 77 | 337 | 91.78 |
| 10 | 73.15 | 11.06 | 69.95 | 3.04 | 108 | 503 | 16.67 |
| 20 | 85.50 | 7.22 | 77.87 | 1.98 | 167 | 751 | 5.37 |
| S2+S5: Combined weighted relative frequency and temporal distance | |||||||
| 0 | 86.62 | 23.98 | 75.78 | 8.85 | 133 | 692 | 4.41 |
| 1 | 89.51 | 22.09 | 79.65 | 7.51 | 119 | 810 | 3.67 |
| 5 | 94.10 | 17.92 | 84.49 | 5.69 | 118 | 1142 | 2.54 |
| 10 | 95.98 | 18.14 | 85.25 | 5.12 | 112 | 1320 | 2.18 |
| 20 | 97.40 | 17.14 | 85.52 | 4.61 | 115 | 1525 | 1.86 |
Fig. 5Relative and cumulative frequency per rank of correct taxon for recommending Flickr test records from FLORKART and GBIF datasets, using a search radius of 5 km and six different ranking strategies. The dashed vertical lines mark the median of each distribution
Results of ranked taxon retrieval in selected regions using combined using FLORKART areal data with 10-fold cross-validation on GBIF point data
| Region |
|
|
| ||||
|---|---|---|---|---|---|---|---|
| (a) Schorfheide-Chorin | 99.95 | 56.39 | 99.86 | 17.42 | 17 | 943 | 2.95 |
| (b) Hainich-Dün | 99.72 | 48.16 | 99.59 | 13.08 | 22 | 1058 | 2.65 |
| (c) Schwäbische Alb | 99.95 | 38.03 | 99.83 | 10.47 | 33 | 935 | 2.98 |
Influence of grid resolution on evaluation metrics for S2+S5 and r=10 km
| ×Quarter | Avg. Area | Run- | RAM |
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| MTB tile | [ | time | [GB] | [%] | [%] | [%] | [%] | |||
| 4 | 131.49 | 1.0 × | 0.5 | 96.45 | 16.14 | 84.00 | 4.92 | 140 | 1,349 | 2.12 |
| 1 | 32.87 | 1.1 × | 0.7 | 95.79 | 16.60 | 84.91 | 5.36 | 126 | 1,285 | 2.24 |
| 1/16 | 2.05 | 4.9 × | 5.7 | 96.20 | 17.85 | 85.13 | 5.36 | 114 | 1,331 | 2.16 |
| 1/64 | 0.51 | 15.4 × | 21.0 | 95.93 | 18.24 | 85.26 | 5.21 | 116 | 1,327 | 2.17 |
| 1/100 | 0.33 | 20.5 × | 33.2 | 95.98 | 18.19 | 85.24 | 5.14 | 112 | 1,320 | 2.18 |
| 1/144 | 0.23 | 29.6 × | 47.0 | 95.97 | 18.22 | 85.23 | 5.04 | 115 | 1,323 | 2.17 |