| Literature DB >> 23936126 |
Linda See1, Alexis Comber, Carl Salk, Steffen Fritz, Marijn van der Velde, Christoph Perger, Christian Schill, Ian McCallum, Florian Kraxner, Michael Obersteiner.
Abstract
There is currently a lack of in-situ environmental data for the calibration and validation of remotely sensed products and for the development and verification of models. Crowdsourcing is increasingly being seen as one potentially powerful way of increasing the supply of in-situ data but there are a number of concerns over the subsequent use of the data, in particular over data quality. This paper examined crowdsourced data from the Geo-Wiki crowdsourcing tool for land cover validation to determine whether there were significant differences in quality between the answers provided by experts and non-experts in the domain of remote sensing and therefore the extent to which crowdsourced data describing human impact and land cover can be used in further scientific research. The results showed that there was little difference between experts and non-experts in identifying human impact although results varied by land cover while experts were better than non-experts in identifying the land cover type. This suggests the need to create training materials with more examples in those areas where difficulties in identification were encountered, and to offer some method for contributors to reflect on the information they contribute, perhaps by feeding back the evaluations of their contributed data or by making additional training materials available. Accuracies were also found to be higher when the volunteers were more consistent in their responses at a given location and when they indicated higher confidence, which suggests that these additional pieces of information could be used in the development of robust measures of quality in the future.Entities:
Mesh:
Year: 2013 PMID: 23936126 PMCID: PMC3729953 DOI: 10.1371/journal.pone.0069958
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The spectrum of human impact.
| Human Impact | Description |
| 0% | No evidence of any human activity visible |
| 1 to 50% | Some visible evidence of human activities such as tracks/roads; evidence of managed forests; some evidence of deforestation; some scattered human dwellings, some scattered agricultural fields; some evidence of grazing |
| 51% to 80% | Increasing density of agriculture from subsistence on the lower end to intensive, commercial agriculture with large field sizes on the upper end |
| 81% to 99% | Urban areas with decreasing amounts of green space and increasing density of housing |
| 100% | A built up urban area with no green space, typically the business district of a city |
Figure 1Number of pixels classified per day by the volunteers.
These are daily totals from the start of the competition on day 1 to the end at just over 50 days, which shows a clear acceleration as the competition progressed.
Figure 2Global distribution of pixels collected by the volunteers.
The distribution is shown by (a) human impact and (b) land cover type.
Figure 3Median response time of the volunteers.
The response time is in seconds measured from the start of the competition until the end at just over 50 days.
A confusion matrix for the comparison of controls with responses from the crowd.
| Class 1 (control | Class 2 (control | … | Class | |
| Class 1 (volunteer |
|
|
|
|
| Class 2 (volunteer |
|
|
|
|
| … |
|
|
|
|
| Class n (volunteer |
|
|
|
Regression analysis for the model Y, where Y is the degree of human impact from the control data, X is the degree of human impact from the participants.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|
| 11.300 | 0.363 | 31.16 | 0.000 |
|
| 0.699 | 0.006 | 122.43 | 0.000 |
Extending the regression to include an indicator of expertise, where b is the regression coefficient for this indicator and b is the regression coefficient for participant human impact scores.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|
| 9.009 | 0.432 | 20.85 | 0.000 |
|
| 0.705 | 0.006 | 123.49 | 0.000 |
|
| 4.251 | 0.442 | 9.62 | 0.000 |
The regression analysis of predicting the degree of human impact by expert and non-expert groups, when the regression is split into 2 simultaneous models.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|
| 7.960 | 0.527 | 15.12 | 0.000 |
|
| 14.200 | 0.494 | 28.74 | 0.000 |
|
| 0.725 | 0.008 | 91.06 | 0.000 |
|
| 0.685 | 0.008 | 83.61 | 0.000 |
Figure 4The distribution of human impact by land cover class.
The distribution is shown for (a) the control pixels and (b) the volunteers, where the latter show a much wider range of answers.
Regression analysis for the degree of human impact.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|
| 7.264 | 0.343 | 21.16 | 0.000 |
|
| 4.284 | 0.520 | 8.24 | 0.000 |
|
| 6.567 | 0.504 | 13.03 | 0.000 |
|
| 73.669 | 0.857 | 86.01 | 0.000 |
|
| 36.046 | 0.485 | 74.32 | 0.000 |
|
| 0.220 | 0.012 | 18.52 | 0.000 |
|
| 0.089 | 0.021 | 4.34 | 0.000 |
|
| 0.366 | 0.015 | 24.62 | 0.000 |
|
| 0.098 | 0.010 | 10.06 | 0.000 |
|
| 0.273 | 0.008 | 33.58 | 0.000 |
Figure 5The relationship between the volunteer responses and the controls for human impact by land cover type.
The lines show the coefficient slopes when each control land cover class is evaluated in turn. Note that the data points have had a small random noise component added to allow their density to be visualised.
Consistency of response to degree of human impact.
| Disaggregation | Category | Average HI (%) | Median HI (%) | Std Dev (%) |
| All | All points | 9.60 | 0.00 | 17.43 |
| Expertise | Experts | 10.90 | 5.00 | 18.50 |
| Non-experts | 7.95 | 0.00 | 15.82 | |
| Land cover consistency | Agree on land cover between points | 7.20 | 0.00 | 14.55 |
| Disagree on land cover between points | 17.25 | 10.00 | 22.80 | |
| Confidence | Sure | 7.92 | 0.00 | 15.68 |
| Sure+Quite sure | 9.13 | 0.00 | 16.93 | |
| Quite sure+Less sure+Unsure | 22.08 | 15.00 | 23.65 | |
| Less sure+Unsure | 25.92 | 15.00 | 25.16 |
Regression analysis for the model Y where X is response time and Y is human impact.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|
| 12.9915 | 1.0706 | 12.135 | 0.000 |
|
| 1.4110 | 0.6157 | 2.291 | 0.022 |
Accuracy of land cover (in %) based on comparison of volunteer response with three sets of controls.
| Dataset used | No allowance for confusion between classes | ||
| C1 | C2 | C3 | |
| Full dataset | 66.4 | 66.5 | 76.2 |
| Confidence rating of sure | 69.4 | 69.3 | 78.9 |
| Experts | 69.2 | 72.3 | 84.6 |
| Non-experts | 62.4 | 61.9 | 65.9 |
User’s and producer’s accuracies for the five main land cover types and for different subsets of the data including confidence and expertise.
| Data set | Land cover type | No confusion | |||||
| User’s accuracy | Producer’s accuracy | ||||||
| C1 | C2 | C3 | C1 | C2 | C3 | ||
| Full | 1 | 75.9 | 77.4 | 43.6 | 67.1 | 69.6 | 100.0 |
| 2 | 52.1 | 46.5 | 0.0 | 61.7 | 67.2 | N/A | |
| 3 | 45.1 | 56.3 | 6.0 | 51.3 | 56.3 | 30.0 | |
| 4 | 78.9 | 88.8 | 95.2 | 74.2 | 72.8 | 76.0 | |
| 5 | 71.5 | 68.8 | 64.6 | 62.2 | 60.7 | 76.4 | |
| Sure | 1 | 78.7 | 82.4 | 53.1 | 68.0 | 70.2 | 100.0 |
| 2 | 50.8 | 48.6 | 0.0 | 64.4 | 71.2 | N/A | |
| 3 | 43.9 | 52.4 | 10.7 | 47.7 | 53.7 | 50.0 | |
| 4 | 81.0 | 89.6 | 95.2 | 76.5 | 75.0 | 78.7 | |
| 5 | 72.4 | 68.2 | 63.7 | 66.8 | 65.8 | 78.8 | |
| Expert | 1 | 78.4 | 83.5 | 52.6 | 73.0 | 68.8 | 100.0 |
| 2 | 54.8 | 45.7 | 0.0 | 63.8 | 65.1 | N/A | |
| 3 | 50.9 | 65.6 | 7.1 | 52.4 | 65.2 | 33.3 | |
| 4 | 77.1 | 90.5 | 95.5 | 82.6 | 80.5 | 86.5 | |
| 5 | 76.5 | 75.7 | 78.1 | 59.3 | 71.8 | 80.2 | |
| Non-expert | 1 | 71.9 | 73.6 | 35.0 | 58.6 | 70.2 | 100.0 |
| 2 | 48.5 | 47.2 | 0.0 | 58.9 | 68.8 | N/A | |
| 3 | 38.0 | 48.7 | 5.6 | 49.5 | 48.9 | 28.6 | |
| 4 | 82.8 | 87.0 | 94.6 | 61.2 | 66.3 | 63.0 | |
| 5 | 66.1 | 62.4 | 52.5 | 66.3 | 51.6 | 71.8 | |
1 = Tree cover; 2 = Shrub cover; 3 = Herbaceous vegetation/Grassland; 4 = Cultivated and managed; 5 = Mosaic of cultivated and managed/natural vegetation.
Consistency of response in choosing the land cover type.
| Disaggregation | Category | Consistent | Percentage |
| None | Full dataset | Y | 76.1 |
| N | 23.9 | ||
| Expertise | Expert | Y | 75.7 |
| N | 24.3 | ||
| Non-Expert | Y | 76.7 | |
| N | 23.3 | ||
| Confidence | Sure | Y | 77.6 |
| N | 22.4 | ||
| Quite sure+Lesssure+Unsure | Y | 76.4 | |
| N | 23.6 | ||
| Less sure+Unsure | Y | 66.7 | |
| N | 33.3 |
Logistic regression analysis for the model Logit (P) = a+bX where X is the log of the response time and P is the probability that the land cover is correctly identified.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|
| 1.46573 | 0.13955 | 10.504 | 0.000 |
|
| −0.40005 | 0.07957 | −5.027 | 0.000 |