Literature DB >> 36215274

I'm fine with collecting data: Engagement profiles differ depending on scientific activities in an online community of a citizen science project.

Till Bruckermann^1,2, Hannah Greving³, Milena Stillfried⁴, Anke Schumann⁴, Miriam Brandt⁴, Ute Harms².

Abstract

Digital technologies facilitate collaboration between citizens and scientists in citizen science (CS) projects. Besides the facilitation of data transmission and access, digital technologies promote novel formats for education in CS by including citizens in the process of collecting, analyzing, and discussing data. It is usually assumed that citizens profit more from CS the more they participate in the different steps of the scientific process. However, it has so far not been analyzed whether citizens actually engage in these steps. Therefore, we investigated citizens' actual engagement in different scientific steps online (i.e., data collection and data analysis) in two field studies of a CS project. We then compared them with other CS projects. We analyzed behavioral engagement patterns of N = 273 participants with activity logs and cluster analyses. Opportunities to engage in different steps of the scientific process increased participants' overall commitment compared to contributory CS projects. Yet, despite their increased commitment, participants' engagement was only more active for data collection but not for data analysis. We discuss how participants' perceived role as data collectors influenced their actual engagement in the scientific steps. To conclude, citizens may need support to change their role from data collectors to data inquirers.

Entities: Chemical

Mesh：
Citizen Science
Humans

Year: 2022 PMID： 36215274 PMCID： PMC9551629 DOI： 10.1371/journal.pone.0275785

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

Citizen science (CS), the scientific collaboration between volunteer citizens and full-time scientists, is becoming increasingly popular. A vast number of CS projects allow citizens to participate directly in scientific activity (e.g., Zooniverse projects: [1]). Simultaneously, CS has become an invaluable research tool in many disciplines [2]. Without citizens’ help in data collection, many research projects would not be possible. Citizens also benefit from engaging in CS projects. By participating in the whole scientific process, citizens have the opportunity to increase their knowledge about science and the processes of science [3,4]. Even though the benefits of CS for citizens seem to be clear at first sight, little is known about whether citizens are actually interested in learning about the scientific process or actually use the opportunity to learn more about it. While there has been some research on citizens’ engagement in CS projects [2,5-8], no analysis has been done so far on the extent to which citizens engage with the different tasks of the scientific process [9]. Therefore, we investigated to what extent citizens actually engaged in two different steps of the scientific process (i.e., data collection and data analysis) in a CS project that combined field activities with activities in an online community. Conducting this investigation provided some highly relevant insights about citizens’ actual engagement patterns against the background of the ideal picture of CS projects.

Citizen science and the digital age

Citizen science has a long tradition and, in its current form, was established at the beginning of the 20th century with the Christmas Bird Count project [10]. On the one hand, CS projects provide scientists with the opportunity to collect larger datasets in time and space than would be possible without volunteer contributions. Citizens’ contributions to CS projects also have a high financial value in the scientific research of many disciplines [2]. Without this kind of voluntary participation, many research projects would not be financially feasible. On the other hand, CS projects give citizens the possibility to learn about new topics and science. This means that citizens can gain topical knowledge they may otherwise not have access to by engaging in such projects. Furthermore, they can acquire scientific knowledge, as engaging in CS projects allows them to gain insights into the whole scientific process (i.e., formulating research questions, designing a study, collecting relevant data, analyzing the data, and interpreting and discussing the results). In summary, both scientists and citizens benefit from their joint engagement in CS projects. Nowadays, digital devices facilitate collaboration between citizens and scientists. Data can be more easily collected and transmitted via mobile devices such as smartphones or tablets. At the same time, databases are constantly available online for both scientists and citizens who can easily interact with them. Besides the facilitation of data collection, transmission, and access [11], the digital age also brings new aspects to CS projects. First, digital technologies and software enable the formation of online CS communities [11] which collaboratively develop and create knowledge through interactions of citizens or between scientists and citizens [12,13]. Second, computer-supported collaborations of citizens and scientists promote novel formats for education in CS [14] by including citizens in the process of gaining evidence from data and discussing their results online. For example, new statistical tools can enable citizens themselves to analyze data and discuss the results online [15]. At the same time, technologies also have to be designed in such a way as to facilitate data inspection, analysis, and discussion [16]. Such a scaffolding design provides assistance during the employment of tools for inquiry and helps non-experts to accomplish tasks that otherwise would be too difficult [16,17]. One can also enter a workspace that helps citizens to structure relevant data and to organize questions and results (artifacts) from data analysis [17]. Thus, digital devices, software, and tools can all facilitate citizens’ engagement in CS and enable them to participate in the process of scientific research.

Citizens’ engagement in the scientific process and its challenges

Engaging citizens in the whole scientific process of CS projects has been shown to be beneficial for citizens’ knowledge about science [18,19]. In order to specify which type of engagement or participation would be most beneficial for a person who wants to volunteer, several frameworks have been proposed which describe different levels of participation [19,20]. The frameworks classify participation according to the level of citizens’ inclusion in the different steps of the scientific process: The mere collection of data in a CS project is regarded as the lowest level of participation (contributory projects: [19]); collecting data, data analysis, and the interpretation of results in a CS project is seen as an intermediate level (collaborative projects: [19]), because some projects allow for data analyses that go beyond species identification activities [21] and may include statistical data analyses (e.g., [22]); and being involved in the whole research process, from generating research questions to drawing conclusions, is regarded as the highest level (co-created projects: [19]; see also ‘extreme CS’: [20]). Co-created projects are known to increase the cognitive engagement of citizens [20]. Other participation models also suggest that higher degrees of participation (i.e., developing explanations and discussing results) lead to higher engagement in the project [4,19]. In science education, the model-of-data theory on reasoning suggests that authentic inquiry tasks stimulate reasoning processes on higher cognitive levels [23,24]. Hence, including citizens in scientific activities that require reasoning and drawing conclusions from data promotes cognitive processes and, thus, increases their scientific knowledge. However, even though citizens should be objectively able to participate in the whole scientific process in CS projects [25], it is unclear whether citizens equally engage in all levels of scientific activities or whether they are even interested in doing so [4,9,20]. First, CS projects often take place in informal learning settings [26] which provide citizens with different opportunities to learn from different scientific activities [27]. In this case, their choice of activities may not correspond to the ideal engagement pattern of the highest level [9,19]. Second, the spreading of CS in various disciplines [28] has led to CS projects offering discipline-specific activities for citizens to participate in [21]. For example, in the biological sciences, CS projects often engage participants in observation and monitoring activities [21], while in the health sciences, CS projects may involve their participants in more gamified activities such as digitally folding, and modifying proteins and their structures [29]. Such discipline-specific activities of CS projects might promote that participants’ engagement differs between projects from different disciplines [1]. Third, the engagement is often described only in terms of data contribution [2,6], but does not take into account any engagement in further activities of the scientific process [9]. Recent research has extended these analyses to facilitating participants’ own investigations that included analyzing data, they collected themselves, on an online platform [5]. Results showed that participants stayed connected to the project for a longer time than participants in other CS projects (i.e., Galaxy Zoo and Milky Way: [6]) in which they mostly worked on scientists’ data. Third, some theoretical work [20] and empirical findings [9] also question whether citizens indeed participate equally in different phases of the scientific process, for example, in both the data collection and the data analysis phase. Thus, it remains unclear whether the mere provision of opportunities to engage in different steps of the scientific process is sufficient to actually engage citizens in these steps. Therefore, we investigated actual behavioral patterns of citizens in the online community of a CS project across these two different steps of the scientific process. With this investigation, we aimed to increase the understanding of citizens’ actual scientific behavior, which would have important implications for designing new CS projects.

The current research

The objective of the research presented here was to analyze the engagement patterns and profiles of participants in the online community of a CS project. Before doing this, we analyzed in a first step the engagement patterns across the whole project and compared them with engagement patterns of other projects. This comparison allowed us to relate the CS project we investigated to three other CS projects. In the next step, we then analyzed participants’ specific engagement patterns within our CS project (i.e., comparing data collection and data analysis) by means of log file analyses and cluster analyses. We focused on these forms of analyses, because previous analyses may have been confounded: Most analyses so far have focused on contributory or participatory CS projects without analyzing participants’ engagement in the data collection and data analysis phases separately [5,6]. Obviously, the design for participation may change behavioral patterns and may manifest itself in different engagement profiles. We suggest that tracking citizens’ actual behavioral patterns through log file analyses is beneficial, as these analyses allow for the adoption of a design which fits citizens’ needs and abilities [30]. Our analyses are based on previous research [5,6] and provide a comprehensive analysis of the same engagement metrics used in these previous studies. We expand on these studies in two ways: We provide (1) a between-projects comparison of engagement metrics for CS online communities focused on data contribution or higher levels of participation, and (2) a within-project comparison of citizens’ engagement with scientific activities either focused on data collection or data analysis. Based on our considerations that the facilitation of participants’ own investigations enhances their commitment and that participants’ engagement with higher-level scientific tasks (i.e., data analysis) is lower, we stated the following hypotheses: Participants engage for longer periods and more regularly in our collaborative CS project than participants in other contributory projects (Hypothesis 1). Participants in our CS project engage more actively in the data collection phase than the data analysis phase (Hypothesis 2).

Method and materials

Participants

Citizens from a metropolitan city in the East of Germany applied for participation in two different field studies of an urban ecology CS project (for more details see Project and procedure). Citizens were selected for participation based on where they lived in the city. This method was chosen because the project design strived for an equal distribution of participants across the whole city in both field studies. Those citizens who were selected as participants had access to the CS project via an online platform. On this platform, all of the participants’ activities were tracked in log files. Participants could, however, withdraw from tracking by either choosing to opt-out or to use anonymous browsing. Across both field studies, data of 382 participants was available, among which 77 participants withdraw from data tracking. 32 participants with fewer than two active days were excluded for methodological reasons [6]. Thus, across both field studies, the log file data of N = 273 participants were ultimately analyzed (N1 = 141, 74 female, 64 male, Mage = 52.88, SD = 12.91; N2 = 132, 74 female, 56 male, 1 diverse, Mage = 53.85, SD = 11.56; four participants did not provide demographic information). Participants had given their written informed consent of participation in a research study and an ethics committee approved the questionnaire (LEK 2018/062).

Project and procedure

The study reported here was part of an interdisciplinary research project on knowledge transfer in CS dealing with urban ecology in a metropolitan city in the East of Germany. Our study focused on two field studies of one CS project within this larger research project. Both field studies of the CS project were concerned with terrestrial wildlife in one of the cities and were called ‘Wildlife Researchers’. For the sake of clarity, we refer to them as ‘Wildlife Researchers 1’ and ‘Wildlife Researchers 2’. They were conducted in fall, 2018, and spring, 2019, respectively. In the field studies, participants set up camera traps in their gardens for one month to capture photographs of wildlife moving in front of the camera during the day and at night. Apart from the installment and maintenance of the camera traps, all project activities took place on an online platform. We used this platform to track participants’ activities in the CS project. On the online platform, participants formed an online community to share data and discuss analyses with the other participants of the respective field study of the CS project. In this community, they shared the wildlife photographs from their camera trap in a database. The photographs were used to investigate the distribution of wildlife species across the city as a function of different urbanization factors (i.e., biotic and abiotic factors). On the platform, participants could also analyze data and discuss their results and other topics in the forum. In addition, they had access to technical and methodological information about camera traps in ecological research, as well as scientific information about terrestrial mammals and urban ecology. Fig 1 illustrates the structure of the online platform and participants’ tasks within the online community: During the data collection phase, participants (1) uploaded camera trap photographs, (2) learned how to identify the wildlife species in the photographs from a tutorial, and (3) identified the species. Participants also (4) identified species in pictures from other participants in order to validate other participants’ identification of species. These tasks of the data collection phase correspond to a task typology that clearly distinguishes observations, species identification, and data entries from data analysis [21] (e.g., statistical data analysis: [22]), although species identification could be seen as some kind of initial data analysis [31].

Fig 1

Flow chart of scientific activities and resources (rectangular boxes) as well as data and artifacts (oval boxes) in the project.

During the data analysis phase, participants (5) analyzed their individual data on the species’ occurrence in their own garden and (6) aggregated data of all participants in the respective field study. Participants were guided through the analyses in a structured way using the scientific process, from the research question and hypothesis to the selection of variables and predictions, on to descriptive and numeric statistics, and finally, to interpretation. From their analyses, participants obtained artifacts such as graphical representations and verbal descriptions of their hypotheses, predictions, and explanations of data. These artifacts were (7) made available to be shared in posts or comments in the forum for discussions with other participants and scientists. Thus, participants’ tasks within the data collection were characteristic for contributory CS (simple collection and identification of photographs), while their tasks during the data analysis were characteristic for collaborative projects (analysis and interpretation of data: [19]).

Data analysis and measures

Participants’ activities (such as picture contributions, species identification, species validation, data analyses, forum posts) and their logins to the platform were saved in log files using Matomo (v3.9.1). These log files facilitated the analysis of behavioral engagement patterns and different user profiles (e.g., in the context of CS: [5]; for a review see [32,33]). Therefore, in a first step, we calculated engagement metrics in both field studies separately, based on suggestions from previous research. We then compared them to three previous CS projects, for which data on the same engagement metrics were available: Weather-it [5], Milky Way, and Galaxy Zoo [6]. Milky Way and Galaxy Zoo are contributory CS projects, while Weather-it is a collaborative project and facilitates the collection and analysis of participants’ own data like our CS project did. We also used these metrics to compare participants’ engagement in the data collection and data analysis phases in both field studies. In a second step, we identified user profiles based on the engagement metrics in the first field study of our CS project through cluster analysis, and we compared them for the data collection and data analysis phases separately. An analysis of transitions showed the flow of participants between the data collection and data analysis phases. The user profiles of the second field study were not calculated, due to a lower number of participants that did not suffice for further cluster analyses.

Engagement metrics

We based the calculation of engagement metrics from log files data on previous suggestions for online communities in CS [5,6]. Calculations accounted for active days, when citizens actively engaged in activities on the platform (i.e., uploading photographs, taking the tutorial, identifying and validating species, analyzing data, posting and commenting in the forum), and lurking days, when citizens did not engage in any of these activities but instead browsed through the content of the platform. Days without login and lurking days summed up to the days elapsed between two active days. The difference between the very first and the final, last day of login defined the total days a participant was linked to the project, while the time between the very first day of login and the day the project ended defined the potential days a participant could be linked. Based on this typology of activities, we calculated engagement metrics in accordance with earlier studies [5,6], which stipulated that participants had to be active for at least two days in order to calculate the engagement metrics. All other participants were only visitors per definition. We established definitions for the following metrics and indicate below the abbreviation for the metric and its score range, if applicable, in brackets: The ratio of active and total days a participant was linked to the project was the activity ratio (a [0; 1]). The ratio of lurking and potential days a participant could be linked to the project was the lurking ratio (l [0; 1]). The mean hours of a participant’s active contribution during the active days was the daily devoted time (d [0; 24]). We also counted the number of days between each of a participant’s active days and then calculated the standard deviation of these days in between. Thus, we estimated how regularly a participant actively contributed to the project which was the variation in periodicity (v). Finally, the ratio of total and potential days for a participant in the project was the relative activity duration (r [0; 1]).

Clustering

Apart from the question of how participants engaged in the different CS projects, we also investigated whether there were different engagement profiles based on the engagement metrics. Therefore, we performed cluster analyses based on previous reporting guidelines [34]. As the clustering based on engagement metrics was only possible for participants with more than two active days [5,6], we considered N = 131 participants for clustering, n1 = 121 participants in the data collection phase, and n2 = 36 participants in the data analysis phase in the first field study. Due to a lower number of participants, we did not conduct a cluster analysis in the second field study. The flow of participants between the emerging clusters of the data collection and the data analysis phase in the first field study was plotted in a Sankey diagram and analyzed for deviances in tests of symmetry [35,36] with further post hoc comparisons [37] in R (v3.6.1). The clustering procedure was similar to procedures in previous work [5,6]: Prior to clustering, we normalized each engagement metric to an interval of [0; 1] so that all engagement metrics had equal weight in the clustering process [38]. We used SPSS (v24) for cluster analysis and identified engagement profiles by clustering the engagement metrics on the basis of the minimal within-groups sum of squares in each cluster. Subsequent application of hierarchical agglomerative clustering and k-means clustering optimized the results [39]. First, the hierarchical agglomerative clustering algorithm revealed the number of possible clusters. Afterward, the number of possible clusters was used as a specification for the k-means clustering procedure. The centers from hierarchical clustering were the initial centroids in the k-means clustering procedure, to reduce noise and iteration time [39]. As a method for determining the final number of clusters (‘stopping rules’), we inspected the dendrograms and the within-sum of squares. Dendrograms displayed the increases in dissimilarity within each cluster by squared Euclidean distances during the merging procedure. The within-groups sum of squares decreased with the number of identified clusters (being zero, when each case is its own cluster; [40]). Hence, the quality of identified clusters was inspected based on the within-groups sum of squares representing homogeneity in the engagement metrics of cluster members. Repeating the clustering for the whole sample of the first field study and two subsamples of the data collection and analysis phases provided validity for the clusters we obtained [34].

Results

In a first step, we compared both field studies (i.e., the first field study, ‘Wildlife Researchers 1’, and the second field study, ‘Wildlife Researchers 2’) with three previously conducted CS projects (i.e., Weather-it: [5]; Milky Way and Galaxy Zoo: [6]). We also compared the data collection and data analysis phases in both ‘Wildlife Researchers 1’ and ‘Wildlife Researchers 2’. In both cases, we based our comparisons on the calculated engagement metrics. Means and standard deviations of all engagement metrics are presented in Table 1.

Table 1

Engagement metrics for two field studies of our CS project and three other CS projects (between projects) and, separately, for the data collection and data analysis phases in the two field studies of our CS project (within ‘Wildlife Researchers 1’, within ‘Wildlife Researchers 2’).

	Between projects					Within Wildlife Res. 1		Within Wildlife Res. 2
	Wildlife Res. 1	Wildlife Res. 2	Weather-it	Milky Way	Galaxy Zoo	data collect.	data analysis	data collect.	data analysis
	(N₁ = 141)	(N₂ = 132)	(N = 77)	(N = 6,093)	(N = 23,547)	(n = 61)		(n = 33)
	M (SD)	M (SD)	M (SD)	M (SD)	M (SD)	M₁ (SD₁)	M₂ (SD₂)	M₁ (SD₁)	M₂ (SD₂)
a	.20 (.12)	.18 (.15)	.32*** (.35)	.40*** (.40)	.33*** (.38)	.36*** (.19)	.16 (.10)	.31*** (.19)	.12 (.07)
d	0.78 (0.60)	0.76 (0.50)	—	0.44*** (0.54)	0.32*** (0.40)	0.82 (0.56)	0.70 (0.53)	0.77*** (0.31)	0.44 (0.37)
r	.67 (.20)	.71 (.28)	.43*** (.44)	.20*** (.30)	.23*** (.29)	.85*** (.12)	.59 (.22)	.89** (.13)	.80 (.13)
v	5.32 (4.33)	5.26 (4.05)	5.11 (5.36)	18.27*** (43.31)	25.23*** (49.16)	2.33 (1.92)	5.15** (4.76)	1.86 (1.18)	3.98⁺ (3.73)
l	.20 (.16)	.26 (.18)	.35*** (.39)	—	—	.11 (.15)	.31*** (.21)	.27 (.15)	.46*** (.24)

a = activity ratio; d = daily devoted time; r = relative activity duration; v = variation in periodicity; l = lurking ratio.

p-values refer to the between projects comparison with ‘Wildlife Researchers 1’. +p < .1 **p < .01; ***p < .001.

a = activity ratio; d = daily devoted time; r = relative activity duration; v = variation in periodicity; l = lurking ratio. p-values refer to the between projects comparison with ‘Wildlife Researchers 1’. +p < .1 **p < .01; ***p < .001. In a second step, we first calculated engagement profiles through cluster analysis for ‘Wildlife Researchers 1’. We next conducted separate cluster analyses for the data collection and data analysis phase in order to differentiate engagement profiles and analyzed the transitions of participants between data collection and data analysis.

Engagement metrics

We hypothesized that participants would engage for longer periods and more regularly in the two field studies of our collaborative CS project (i.e., ‘Wildlife Researchers 1’ and ‘Wildlife Researchers 2’) than participants in other contributory projects (see Hypothesis 1). Hence, we compared the engagement metrics (i.e., the activity ratio [a], daily devoted time [d], relative activity duration [r], variation in periodicity [v], and lurking ratio [l]) of ‘Wildlife Researchers 1’ and ‘Wildlife Researchers 2’ with the engagement metrics of three previously conducted CS projects (i.e., Weather-it: [5]; Milky Way and Galaxy Zoo: [6]). To accomplish this comparison, we calculated t-tests, effect sizes (Cohens’ d), and 95% confidence intervals.

Comparison with other projects

The analyses for the category of relative activity duration (r) revealed that participants stayed more days in the online community of ‘Wildlife Researchers 1’ and ‘Wildlife Researchers 2’ than participants in the CS project Weather-it, twildlife 1(140) = 14.623, p < .001, d = 0.78, 95% CI [0.50; 1.07], twildlife 2(131) = 11.560, p < .001, d = 0.81, 95% CI [0.54; 1.10]; than participants in the project Milky Way, twildlife 1(140) = 28.498, p < .001, d = 1.58, 95% CI [1.41; 1.75], twildlife 2(131) = 21.143, p < .001, d = 1.70, 95% CI [1.53; 1.88]; and than participants in the project Galaxy Zoo, twildlife 1(140) = 26.688, p < .001, d = 1.52, 95% CI [1.35; 1.69], twildlife 2(131) = 19.893, p < .001, d = 1.66, 95% CI [1.49; 1.83] (see Table 1). These findings supported Hypothesis 1. With reference to the variation in periodicity, participants contributed more regularly (v) in ‘Wildlife Researchers 1’ and ‘Wildlife Researchers 2’ than participants in the project Milky Way, twildlife 1(130) = −34.262, p < .001, d = –0.30, 95% CI [–0.48; –0.13], twildlife 2(120) = −35.323, p < .001, d = –0.30, 95% CI [–0.48; –0.12], and than participants in the project Galaxy Zoo, twildlife 1(130) = −52.678, p < .001, d = –0.41, 95% CI [–0.58; –0.23], twildlife 2(120) = −54.225, p < .001, d = –‍0.41, 95% CI [–0.59; –0.23]. Yet, ‘Wildlife Researchers 1’ and ‘Wildlife Researchers 2’ did not differ from the project Weather-it, twildlife 1(130) = 0.561, p = .576, twildlife 2(120) = 0.417, p = .677 (see Table 1). These results supported Hypothesis 1. Participants devoted more time on active days (d) within the online community of ‘Wildlife Researchers 1’ and ‘Wildlife Researchers 2’ than participants in the project Milky Way, twildlife 1(140) = 6.707, p < .001, d = 0.63, 95% CI [0.46; 0.80], twildlife 2(131) = 7.464, p < .001, d = 0.59, 95% CI [0.42; 0.77], and than participants in the project Galaxy Zoo, twildlife 1(140) = 9.102, p < .001, d = 1.15, 95% CI [0.98; 1.31], twildlife 2(131) = 10.235, p < .001, d = 1.10, 95% CI [0.93; 1.27] (see Table 1). There was no data available on daily devoted time for the project Weather-it. However, participants in the online community of ‘Wildlife Researchers 1’ and ‘Wildlife Researchers 2’ showed a lower activity ratio (a) than participants in the project Weather-it, twildlife 1(140) = −11.934, p < .001, d = –0.52, 95% CI [–0.81; –0.24], twildlife 2(131) = −10.707, p < .001, d = –0.58, 95% CI [–0.86; –0.29], than participants in the project Milky Way, twildlife 1(140) = −19.682, p < .001, d = –0.51, 95% CI [–0.67; –0.34], twildlife 2(131) = −16.852, p < .001, d = –0.56, 95% CI [–0.73;–‍0.38], and than participants in the project Galaxy Zoo, twildlife 1(140) = −12.902, p < .001, d = –0.34, 95% CI [–‍0.51; –0.18], twildlife 2(131) = −11.475, p < .001, d = –0.40, 95% CI [–0.57; –0.23] (see Table 1). Finally, participants in the online community of ‘Wildlife Researchers 1’ and ‘Wildlife Researchers 2’ lurked (l) less on the online platform than participants in the project Weather-it, twildlife 1(140) = −11.154, p < .001, d = –0.57, 95% CI [–0.85; –0.28], twildlife 2(131) = −5.866, p < .001, d = –‍0.33, 95% CI [–0.61; –0.04] (see Table 1). There was no data available for the lurking ratio in the Milky Way and Galaxy Zoo projects.

Comparisons of phases within the project

We also hypothesized that participants in the two field studies of our CS project (i.e., ‘Wildlife Researchers 1’ and ‘Wildlife Researchers 2’) would engage more actively in the data collection phase than in the data analysis phase (see Hypothesis 2). To test this hypothesis, we compared the engagement metrics of the data collection and the data analysis phases within both field studies. For this comparison, we again calculated t-tests, effect sizes (Cohens’ d), and 95% confidence intervals. The analyses showed that participants’ activity ratio (a) was higher for the data collection than for the data analysis phase in both field studies, twildlife 1(60) = 7.27, p < .001, d = 0.76, 95% CI [0.39; 1.13], twildlife 2(32) = 6.41, p < .001, d = 0.92, 95% CI [0.42; 1.43]. The lurking ratio (l) was higher for the data analysis than for the data collection phase in both field studies, twildlife 1(60) = −7.41, p < .001, d = −0.95, 95% CI [−1.32; −0.57], twildlife 2(32) = −4.51, p < .001, d = −1.08, 95% CI [−1.60; −0.57] (see Table 1). These results supported Hypothesis 2. In the category of relative activity duration (r), participants stayed linked to the data collection phase for a longer period of time than to the data analysis phase in both field studies, twildlife 1(60) = 8.87, p < .001, d = 1.69, 95% CI [1.28; 2.10], twildlife 2(32) = 3.04, p = .005, d = 0.55, 95% CI [0.05; 1.05]. In terms of the variation in periodicity (v), participants contributed more regularly during the data collection than during the data analysis phase in both field studies, twildlife 1(32) = −3.74, p = .001, d = −1.36, 95% CI [−1.89; −0.82], twildlife 2(11) = −1.86, p = .090, d = −1.29, 95% CI [−2.17; −0.41]. Finally, participants’ daily devoted time (d) was significantly higher for the data collection than for the data analysis phase in ‘Wildlife Researchers 2’, twildlife 2(32) = 4.30, p < .001, d = 0.82, 95% CI [0.32; 1.32], but not in ‘Wildlife Researchers 1’, twildlife 1(60) = 1.43, p = .159, d = 0.18, 95% CI [−0.18; 0.54] (see Table 1). These findings were partly in line with Hypothesis 2.

Engagement profiles

In the next step, we identified engagement profiles based on participants’ engagement metrics in ‘Wildlife Researchers 1’ with a cluster analysis for N = 131 participants (i.e., 93%) with more than two active days. This number of participants is much higher than in other projects where less than half of the participants had more than two active days [5]. We can therefore conclude that our participants were active and engaged. We exploratively investigated whether the engagement profiles in our CS project were similar to other collaborative projects such as the project ‘Weather-it’ (see also Hypothesis 1). The optimal number of clusters from the clustering procedure was estimated by its dendrogram and cross-validated by the within-groups sum of squares, which is interpreted as a scree-plot in factor analysis. The inspection of the dendrogram for ‘Wildlife Researchers 1’ indicated two, three, four, five or eight possible clusters, when the dendrogram was broken at five cut-off values of the Euclidean distances (‘20’, ‘14’, ‘11’, ‘9’, ‘4’) and the intersections between horizontal and vertical lines were counted (see S1 File). For cross-validation of the possible clusters, we plotted the change in slope of the within-groups sum of squares: Inconsistencies in the increase for the step from five to four clusters indicated that dissimilar clusters had been merged. Hence, five clusters were the minimal number of clusters (see S2 File). Following this reasoning, we identified five different engagement profiles which were similar to previously identified engagement profiles. This was done with respect to the kinds of profiles, but not with respect to the absolute values of the engagement metrics. Fig 2 represents the standardized engagement metrics for each identified cluster.

Fig 2

Engagement profiles for participants.

Engagement profiles for participants.

Engagement profiles (A) within the ‘Wildlife Researchers 1’ project (N = 131) and the two phases of (B) data collection (n = 121) and (C) data analysis (n = 36) for participants with more than two active days are based on the standardized engagement metrics for clustering [0; 1]. The most prevalent profile was the loyal (lo) engagement profile (nlo = 62). Participants with this profile stayed almost until the end of the project (high relative activity duration, r = 0.79), returned regularly to the community (low variation in periodicity, v = 0.19), and actively contributed to the community at least one day out of five days (moderate activity ratio, a = 0.23, see Fig 2, A1). Participants in the hardworking (ha) engagement profile (nha = 5) outperformed other participants in their activity ratio and daily devoted time. They spent almost 2 hours and 50 minutes during an active day on the platform (high daily devoted time, d = 0.78) and contributed to the CS project every third day (high activity ratio, a = 0.29). However, their relative activity duration within the project was shorter (r = 0.58) than that of participants in the loyal profile (see Fig 2, A2), indicating that participants in this hardworking profile left the project earlier after the work was done. The persistent (pe) engagement profile (npe = 7) was characterized by the highest variation in periodicity (v = 0.79), as participants with this profile participated irregularly. Although persistent participants visited the platform only sporadically, they stayed connected to the project almost until the end of the project, like the participants with a loyal engagement profile (high relative activity duration, r = 0.80). Their very low activity ratio (a = 0.04) indicated low contributions to the CS project (see Fig 2, A3). Participants in the lurker (lu) engagement profile (nlu = 28) were connected to the project for a long period of time (high relative activity duration, r = 0.78). Yet, on more than half of their visiting days, they viewed camera trap pictures or read discussions in the forum (high lurking ratio, l = 0.57) without actively contributing to the CS project (low activity ratio, a = 0.11, see Fig 2, A4). Participants in the moderate (mo) engagement profile (nmo = 29) had a balanced engagement across all engagement metrics with no defining key characteristic. They participated for a shorter amount of time (moderate relative activity duration, r = 0.43) at a moderate to high activity level (moderate to high activity ratio, a = 0.25, see Fig 2, A5). However, the number of participants in this engagement profile was as high as the number of participants in the lurker engagement profile.

Engagement profiles for data collection and data analysis

Besides the identification of engagement profiles across the whole project, we further analyzed how engagement profiles differed and changed between the data collection and data analysis phases. Hence, we conducted separate cluster analyses based on engagement metrics of participants with more than two active days for the data collection (n1 = 121) and data analysis phase (n2 = 36). We exploratively investigated whether engagement profiles differed between the data collection and data analysis phases (see also Hypothesis 2). The sheer numbers of participants already indicated that in the data collection phase, many more participants had more than two active days than participants in the data analysis phase. Again, we inspected the dendrograms and the within-groups sum of squares in both clustering procedures for the data collection and the data analysis phases. The inspection of the dendrogram revealed two to seven or nine possible clusters for the data collection phase. Cross-validation by the within-groups sum of squares showed that within-cluster dissimilarity was still small for six clusters. For the data analysis phase, two to six possible clusters were identified. Cross-validation by the within-groups sum of squares indicated that within-cluster dissimilarity was still small for six clusters (see S1 and S2 Files). For the data collection phase, we identified the same five engagement profiles as for the whole project. Participants in the loyal engagement profile (nlo1 = 52) were well connected to the CS project (high relative activity duration, r = 0.90) and returned regularly (low variation in periodicity, v = 0.19, see Fig 2, B1). Participants with a hardworking engagement profile (nha1 = 4) devoted a lot of time (high daily devoted time, d = 0.78) on many active days (high activity ration, a = 0.62) and stayed well connected to the CS project (high relative activity duration, r = 0.80, see Fig 2, B2). Participants in the persistent engagement profile (npe1 = 7) participated irregularly (high variation in periodicity, v = 0.75) but were still well connected to the CS project (high relative activity duration, r = 0.97, see Fig 2, B3). Participants in the lurking engagement profile (nlu1 = 15) were also well connected to the project (high relative activity duration, r = 0.91) but mostly browsed the content of the platform passively (high lurking ratio, l = 0.46, see Fig 2, B4). Participants in the moderate engagement profile (nmo1 = 8) were moderately connected to the CS project (moderate relative activity duration, r = 0.45) and also contributed actively on a moderate level (moderate activity ratio, a = 0.22, see Fig 2, B5). Next to these five engagement profiles, we identified a sixth engagement profile which was characteristic for the data collection phase: Participants with a contributor (co) engagement profile (nco1 = 35) were especially active during the upload, identification, and validation of camera trap pictures (high activity ratio, a = 0.46). Although they did not devote much time on active days (low daily devoted time, d = 0.12), they were well connected to the CS project (high relative activity duration, r = 0.83, see Fig 2, B6). Besides the loyal participants, participants with the contributor engagement profile were the most common group during the data collection phase. For the data analysis phase, we also identified the same engagement profiles as for the whole project. Participants in the loyal engagement profile (nlo2 = 9) still stayed well connected during the data analysis phase (high relative activity duration, r = 0.78), but used some of their time for passively browsing the content of the platform (moderate lurking ratio, l = 0.20, see Fig 2, C1). The participant in the hardworking engagement profile (nha2 = 1) devoted a lot of time to data analysis (high daily devoted time, d = 1.00) and actively contributed (high activity ratio, a = 0.31, see Fig 2, C2). Participants in the persistent engagement profile (npe2 = 2) were still well connected to the CS project and the data analysis phase (high relative activity duration, r = 0.98). But they visited the platform irregularly (high variation in periodicity, v = 0.86) and used their time on the platform for browsing (high lurking ratio, l = 0.56, see Fig 2, C3). Participants in the lurking engagement profile (nlu2 = 5) mostly browsed the content of the platform without actively contributing (high lurking ratio, l = 0.89) but were still well connected to the platform and the CS project (high relative activity duration, r = 0.92, see Fig 2, C4). Finally, participants in the moderate engagement profile (nmo2 = 12) were moderately connected to the CS project (moderate relative activity duration, r = 0.49), returned regularly (low variation in periodicity, v = 0.12), and mostly used their time for passively browsing the content of the platform (moderate lurking ratio, l = 0.34, see Fig 2, C5). In addition to these five profiles, we identified a sixth engagement profile characteristic of the data analysis phase: Participants with a lasting (la) engagement profile (nla1 = 7) stayed for a relatively long time (high relative activity duration, r = 0.83) during the data analysis phase and, hence, extended their participation in the project. They devoted a lot of time per day to the CS project (high daily devoted time, d = 0.42), yet mostly for browsing content on the platform (moderate lurking ratio, l = 0.44). They also returned irregularly (moderate variation in periodicity, v = 0.28, see Fig 2, C6). Overall, the number of participants within every profile in the data analysis phase was much lower than the number of participants in the data collection phase. Although the five engagement profiles that we found across the field study were present for both the data collection and data analysis phase, the sixth profiles in each phase differed. Together with the decrease in the number of participants, these differences indicated that the data collection and data analysis phases produced qualitatively and quantitatively different engagement profiles.

Transitions between engagement profiles

Lastly, we analyzed the transitions of participants between the engagement profiles from the data collection towards the data analysis phase. To do this, we plotted the number of participants in the engagement profiles for both phases in a Sankey diagram (see Fig 3). Then, we compared the distribution of participants in the engagement profiles between the data collection and data analysis phase in a nominal symmetry test with Χ2 statistics. Transitions of participants between the engagement profiles in the data collection and data analysis phases were tested by further post hoc tests, which provided p-values but no test statistics [37]. We exploratively investigated whether the participant numbers in the engagement profiles differed for the data collection and data analysis phase, as data analysis allowed for higher levels of participation, but also required higher levels of cognitive engagement. We based these considerations on previous findings regarding participants’ reasons to participate in CS projects, and on the number of participants with more than two active days in the data collection and data analysis phases.

Fig 3

Sankey diagram of participants’ transitions.

Sankey diagram of participants’ transitions.

Transitions occur between engagement profiles (N = 141; ‘Wildlife Researchers 1’) between the data collection (left) and the data analysis phase (right), with all ps < .05 (dark grey) [Interactive version online in S3 File]. The engagement profiles of the data collection and data analysis phases each had different numbers of participants allocated to them, Χ2 = 75.34, p < .001. Overall, the majority of participants (n = 88) changed their profile from data collection to data analysis and in most cases adopted a visitor engagement profile (n = 105 participants with less than two active days were visitors per definition). To be more specific, the most prevalent engagement profile during data collection was the loyal engagement profile, with 52 participants. However, from these participants, 41 participants became visitors during the data analysis phase, p < .001, as they reduced their active contribution to less than two active days. Only 5 participants continued their engagement in the moderate profile, p = .063. Thus, there was a high dropout rate within the loyal engagement profile, and loyal contribution was limited to the data collection phase. The 35 participants in the contributor engagement profile actively participated almost half of all the days during data collection. Of these participants, 20 participants dropped out of this profile during the data analysis phase, p < .001, and turned into visitors with no more than two active days. The other contributing participants stayed active for more than two days during the data analysis phase and were distributed across different engagement profiles, with the majority participating almost until the end (i.e., loyal engagement: nco1;lo2 = 5), p = .063. With respect to the other profiles, the picture was the same, only the numbers were smaller. Of the 7 participants in the persistent engagement profile in the data collection phase, 6 participants became visitors in the data analysis phase, p = .031. Of the 15 participants in the lurking engagement profile in the data collection phase, 10 participants became visitors in the data analysis phase, p = .002. Finally, all 8 participants in the moderate engagement profile in the data collection phase were also visitors during the data analysis phase, p = .039. To sum up, the results of post hoc comparisons for each engagement profile indicated that participants with actively contributing profiles (i.e., loyal and contributing) decreased their activities after the data collection phase and participated in more passive engagement profiles. Nevertheless, participants in the contributing and the loyal engagement profiles were still more likely to actively contribute in the data analysis phase than participants in any other profile.

Discussion

The increasing popularity of CS has led to the development of a vast number of projects engaging thousands of citizens. In this respect, it is important to know how participants actually participate in CS projects in order to provide relevant insights for researchers and practitioners in CS to aid them in preparing and conducting future CS projects. To provide such insights, we analyzed participants’ actual engagement patterns in two field studies of a collaborative wildlife urban ecology CS project, compared them with other contributory and collaborative CS projects, and tested whether participants’ engagement differed for the two phases of the field studies (i.e., data collection and data analysis). The findings of our research indicate that participants were engaged for longer periods of time, returned more frequently, and devoted more time to our collaborative project than participants in more contributory projects. Our findings also revealed that participants contributed more actively during the data collection than during the data analysis phase. This difference was similarly apparent in participants’ transitions from being loyal or contributing participants during data collection to being mostly lurking or visiting participants during data analysis. Thus, this research provides evidence that collaborative CS projects enhance participants’ commitment to the project, but that it is especially data collection that promotes the most active contribution.

Engagement metrics in collaborative CS online communities

First, our results correspond to previous findings regarding collaborative projects (i.e., Weather-it). In collaborative projects, participants often stay for a longer period of time, contribute more regularly, and devote more daily hours to the research activities in the online community than in contributory projects. This is the case because collaborative CS projects open the research process to facilitate participants’ own investigations [5]. In contributory CS projects (i.e., Milky Way and Galaxy Zoo: [2,6]), however, participants’ contributions are limited to collecting and processing data for others (i.e., the scientists). These projects often struggle with sporadic engagement of participants. In the case of our research presented here, providing participants with the opportunity to perform tasks beyond collecting data seemed to increase their allocated time (i.e., daily devoted time) over a longer period of time (i.e., relative activity duration) on a more regular basis (i.e., variation of periodicity), while their active engagement (i.e., activity ratio) did not seem to increase in general. These findings are in line with previous research, which has suggested that offering participants a choice of tasks in CS projects encourages participant contribution [2,25].

Engagement metrics differ between data collection and data analysis

Second, our results expand upon previous research studies on collaborative CS projects regarding participants’ engagement in different phases of the project [5,7]. Our results demonstrated that participants’ active contribution (i.e., activity ratio) was much higher in the data collection phase than in the data analysis phase, while the reverse was true for their passive behavior (i.e., lurking ratio). Hence, our results provide indications that the lower activity ratio that was often found in collaborative CS projects [5,6] may have resulted from the fact that participants’ active contributions were skewed towards data collection. Based on these results, our study disentangled confounds in previous research on collaborative CS projects [5]. Opening the research process to collaborative CS projects did not enhance any active engagement besides the higher commitment of participants. Contrary to the intentions of collaborative CS projects [19], a differentiated analysis of the activities for data collection and data analysis showed that participants’ engagement did not correspond to the goal of higher levels of participation [9,41,42].

Participants’ engagement profiles for data collection and data analysis

Third, our research corresponds to previous findings on engagement profiles in contributory and collaborative CS projects. The clustering of engagement metrics in our CS project revealed five profiles that were also found in earlier studies (i.e., the loyal, hardworking, persistent, lurker, and moderate engagement profiles; 5,6). Besides these five profiles, we found two additional engagement profiles when separately clustering the engagement metrics for the data collection and data analysis phases. The contributor engagement profile was exclusive to the data collection phase and reflected the high activity of participants in uploading, identifying, and validating the camera trap photographs. The lasting engagement profile [6] was only present in the data analysis phase, representing participants who contributed irregularly but remained connected to the project. Thus, together with the contributor profile, the engagement profiles in the data collection phase were characterized by high numbers of participants and active contributions. In contrast, together with the lasting profile, the engagement profiles in the data analysis phase were characterized by low numbers of participants and more passive contributions. Previous research has indicated that designing CS projects for higher levels of participation in the research process [19] does not necessarily lead to higher engagement [9,41,42]. This is in line with our findings, which may show that participants’ motivation and knowledge constrained their engagement when dealing with data analysis tasks (e.g., doing statistics: [9]; cf. [43]). These previous and our current findings provide evidence that participants’ contributions were much more frequent and more active in the data collection than in the data analysis phase. One explanation might be that participants in CS projects find their knowledge insufficient for gaining evidence from data. Therefore, they may not see the engagement in data analysis and dealing with research evidence as realistic behavioral actions [44].

Supporting the transition from collecting data to analyzing data

Last, our results for the transition of participants from the different engagement profiles in the data collection and the data analysis phases both correspond to previous findings on participants’ motivation to engage [9,42] and simultaneously shed new light on the behavioral engagement patterns in CS projects [5,6]. The engagement profiles in the two phases differed qualitatively (i.e., characteristics of the profiles) and quantitatively (i.e., number of participants in the profiles). Many more participants actively contributed during data collection (i.e., in the loyal, hardworking, and contributor profiles) than during data analysis (i.e., in the visitor and lasting profiles). This finding may be evidence that some participants were probably fine with collecting large data sets for scientists [2,9]. Despite the low activity during data analysis, those participants who actively engaged on a high level in data collection (i.e., participants in the contributor and loyal profiles) were more likely to also engage in the data analysis phase. We suppose as a post hoc explanation for why highly active participants also engaged in data analysis that those participants felt ownership (i.e., an individuals’ feeling of possessing a concrete or abstract object or entity; [45]) for the data, that they themselves collected and contributed. This could well be the case, as research on project ownership has indicated that high project ownership promotes deeper involvement in problem solving in the inquiry process [46]. Other possible explanations for why highly active participants in the data collection also engaged in data analysis could be that those participants were in general more motivated to participate in all phases [47] or even held a specific set of knowledge and skills that helped them to analyze the data as well [48].

Implications

For online communities in CS projects, our results have theoretical, methodological, and practical implications. By expanding on previous theoretical considerations ([4,19]; cf. [9]), our results suggest that simply offering opportunities for higher levels of participation in CS may not correspond to participants’ preferred level of engagement with research tasks such as data analyses. Although participants’ commitment to collaborative CS projects offering deeper involvement was higher than in contributory projects [5], further research is necessary to find ways to engage participants in higher-level tasks more actively. Such research could, for instance, define the boundary conditions of engagement more clearly [9] and offer trial activities (such as tutorials) that motivate and facilitate participants’ willingness to get involved in cognitive tasks of higher order [25]. Our analyses also revealed differing engagement metrics and profiles across the two phases of the CS projects. These findings point to a possible methodological confound in previous research when tracking participants’ behavior in online CS communities [5,6]. For an unbiased comparison of behavioral engagement patterns between contributory and collaborative CS projects, the actual engagement has to be tracked in, for example, log files. Moreover, close care should be taken that the engagement patterns are tracked separately for the different tasks. Only in this way will a differentiated picture of actual engagement in CS projects be possible. Regarding practical implications for online communities in CS projects, supporting citizens with scaffolding tools for data analysis (i.e., digital tools that facilitate data inspection through visualizations and automated testing procedures) do not seem to suffice. Previous suggestions on the design of online platforms for CS [25] need to be refined on the basis of participants’ individual motives and their resulting engagement. Mechanisms to support specific engagement profiles [5] need to consider participants’ engagement depending on the different tasks of data collection and analysis. Post hoc, we suggest that how citizens’ perceive their roles (i.e., as data collectors) may hinder them from taking on a new role requiring cognitive involvement with data analyses. Hence, besides the support with scaffolding tools within the inquiry process, citizens may also need encouragement to change their role from data collector to data inquirer. Targeting the most active participants during data collection for encouragement could also facilitate the transition of participants to data analysis. Citizens who have already contributed with high engagement during data collection may possibly be more likely to engage with data analysis, as they have already invested a lot of time and effort which in turn may increase their project ownership. Further research should investigate in detail whether participants’ already high investment as a data collector may be an important lever to facilitate their transition from the data collector to the data inquirer role (see also [49]).

Strength, limitations, and future research

This research used two field studies of an existing collaborative CS project involving a sample of citizens representative of CS projects [43]. With the tracking data of the log files, we provided data on citizens’ actual engagement behavior in the online community of the CS project. Thus, this research presents important findings of actual engagement patterns in collaborative CS projects that previous research has only sparsely investigated and that is relevant for the CS community. At the same time, we need to discuss some limitations of our research. Effects of the chronological order of data collection and analysis and their temporal restriction have potentially limited our findings. Participants first had to collect data on wildlife before they were able to analyze it. Although the order is valid from a scientific point of view (i.e., data has to be collected before being analyzed), analyses of contribution patterns in CS projects indicate that participants’ activities rapidly decrease over time [2]. Hence, future research should examine the effects of chronological order on participants’ engagement by providing opportunities for analyzing data from the beginning of the project. One possibility as an example would be to provide data from previous CS projects. If participants take advantage of such an opportunity, a mere change in the chronological order of data collection and analysis in earlier stages could support the change of roles from data collectors to data inquirers in CS projects. Furthermore, engagement with data collection and analysis was time-restricted, so that more resource availability could have motivated more participant engagement. Even though comparable CS projects limited the facilitation of participants’ engagement in time [7], future studies should investigate the availability of resources as an influencing factor in CS. A second limitation concerns the sample size in our field studies. Especially, the sample sizes for the engagement profiles in the data analysis phase were smaller than the sample sizes of those profiles in the data collection phase. While this difference may limit the conclusion about participants’ engagement in the data analysis, the quantitative and qualitative changes between the profiles also highlight the differences in engagement between the data collection and the data analysis phase. In general, the number of participants was limited to the camera traps available in the CS project. Therefore, we did not estimate the sample size a priori to account for the frequent dropout of participants in CS projects. Hence, due to the lower number of participants in the second field study, we were unable to perform further cluster analysis. Payment of the participants could, on the one hand, have assured their continued participation. On the other hand, participants’ voluntary participation allowed us to observe engagement patterns unbiased in terms of monetary reward or social desirability, because many CS projects require such voluntary engagement of citizens (e.g., White House memo on CS; [50]). In future research on online CS communities, the sample size could be increased by adopting research methodology to include participants’ own data collection tools or by paying the participants. Although the suggested engagement metrics [5,6] were useful to track participants’ engagement and led to comparable results, they remain on the level of quantifying behavioral engagement. To go a step further, the quality of participants’ activity in terms of relevant contributions to the community’s goals should be accounted for as well [32]. Our analysis of different scientific tasks in the online CS community qualified participants’ behavior in terms of levels of participation [19]. However, further research should consider the motivation behind participants’ contributions. Therefore, in our future research, we would expand our own analyses to include participants’ motivations.

Conclusion

In sum, this research provides evidence that the degree of participants’ engagement in the online community of our collaborative CS project depended on the scientific tasks of data collection and data analysis. Our results demonstrated that participants’ engagement did not necessarily reflect the intended level of participation in scientific research, but corresponded to previously identified motivations of participants to engage in CS projects. Hence, simply increasing the opportunities for participation in scientific tasks from contributory towards collaborative CS projects does not necessarily lead to enhanced engagement. Future research should explore which design factors in CS projects would in fact facilitate participants’ transition from data collectors to data inquirers, from more straightforward to more complicated tasks in terms of cognitive involvement. (PDF) Click here for additional data file. (PDF) Click here for additional data file. (PDF) Click here for additional data file.

Engagement metrics of ‘Wildlife Researchers 1’.

(SAV) Click here for additional data file.

Engagement metrics of ‘Wildlife Researchers 2’.

(SAV) Click here for additional data file.

Engagement profiles of ‘Wildlife Researchers 1’.

(SAV) Click here for additional data file. 2 Sep 2022

PONE-D-22-09613

I’m fine with collecting data: Engagement profiles differ depending on scientific activities in an online community of a citizen science project PLOS ONE Dear Dr. Bruckermann, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. The manuscript has been evaluated by three reviewers, and their comments are available below. The reviewers have raised a number of concerns that need attention. They request additional information and discussion on how citizen engagement profiles can differ in different types of research to provide context to interpret the results with. They also request additional clarity on the definitions of data collection and data analysis, and note concerns with the small sample sizes for the data analysis profiles and that not all claims may be supported. Could you please revise the manuscript to carefully address the concerns raised?

Please submit your revised manuscript by Oct 17 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Alice Coles-Aldridge Editorial Office PLOS ONE Journal requirements: When submitting your revision, we need you to address these additional requirements. 1.Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly Reviewer #3: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: I Don't Know ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thank you for exploring the extent to which citizens are engaged in different tasks of the scientific process. Many times, citizen involvement in research is based around tokenism, and does not result in meaningful participation of citizens in the work. Revision suggestions: Adding some discussion (2-3 lines, maybe a paragraph) around how citizen engagement profiles can differ in different types of research (e.g., health sciences vs. zoological sciences etc) in the background could provide the reader with some context to interpret the results with. Reviewer #2: The paper builds on an important issue under examination in the citizen science field-- how to deepen participant-engagement to benefit the citizens and the science. One concern I have is the authors' definition of "data collecting" . Here, it is described as the act of setting out camera traps and identifying/labeling the images captured by the cameras. That actually could be considered data analysis. They are analyzing the data collected by camera traps. The act of making and sharing the observations (by going into nature to take and upload the pictures, for example), is what I expected to see in this paper. Perhaps the authors can address this in the paper a bit? There are thousands of citizen science projects that rely on people to collect and share data and typically that means that the participants make and share observations...not retrieve, analyze and label the data. Again, that seems to be a form of data analysis. Reviewer #3: Thank you for conducting this work examining engagement patterns across projects. In general, I think the paper is very strong and is need of minor revisions. This statement is unclear, please explain: The standard deviation of days between pairs of sequential active days was the variation in periodicity (v). The sample sizes for the data analysis profiles are so small - can we really say anything about these profiles? Unsure that this claim is warranted: "We suppose as a post-hoc explanation that ownership (i.e., an individuals’ feeling of possessing a concrete or abstract object or entity; [38]) for the data, that they themselves collected and contributed, might explain why highly active participants also engaged in data analysis." There may be many other factors at play besides ownership. It would be important for you to include alternate explanations (e.g., not interested/motivated to do analysis, feeling "uncredentialed," lack of educational support for analytical skills. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 8 Sep 2022 Dear Alice Coles-Aldridge, dear Reviewers, thank you very much for your positive and helpful feedback and for investing your time and effort in reviewing our manuscript. We appreciate your constructive and valuable feedback that has helped us very much in improving the paper on its way to publication. We have considered all of your suggestions, and we are convinced that we have made appropriate improvements. In this letter, we address all of your comments from the reviews and explain our revisions in detail. To ensure the transparency of our revisions, we have responded to each comment separately and explain our responses to each comment below. In some cases, we have also pasted the new or changed paragraphs in quotation marks and indented them. For a full account of our changes, please refer to the revised manuscript in which we have highlighted the changes in blue color. Comments from the Editor and Reviewers: Editor: Comment 1: Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Response 1: Thank you very much for this positive feedback. We are happy to resubmit a revised version of our manuscript. Comment 2: The manuscript has been evaluated by three reviewers, and their comments are available below. The reviewers have raised a number of concerns that need attention. They request additional information and discussion on how citizen engagement profiles can differ in different types of research to provide context to interpret the results with. They also request additional clarity on the definitions of data collection and data analysis, and note concerns with the small sample sizes for the data analysis profiles and that not all claims may be supported. Could you please revise the manuscript to carefully address the concerns raised? Response 2: These are valid points from the reviewers. And yes, of course, we have carefully revised the manuscript and have addressed all of the reviewers’ comments accordingly (for our detailed responses, see below). Reviewer #1: Comment 3: Thank you for exploring the extent to which citizens are engaged in different tasks of the scientific process. Many times, citizen involvement in research is based around tokenism, and does not result in meaningful participation of citizens in the work. Revision suggestions: Adding some discussion (2-3 lines, maybe a paragraph) around how citizen engagement profiles can differ in different types of research (e.g., health sciences vs. zoological sciences etc) in the background could provide the reader with some context to interpret the results with. Response 3: We thank Reviewer 1 very much for the positive feedback regarding our submission. To provide context for the interpretation of our results, we have now discussed how participants’ engagement potentially differs between projects from specific disciplines in the Introduction section (p. 6). “However, even though citizens should be objectively able to participate in the whole scientific process in CS projects [25], it is unclear whether citizens equally engage in all levels of scientific activities or whether they are even interested in doing so [4,9,20]. First, CS projects often take place in informal learning settings [26] which provide citizens with different opportunities to learn from different scientific activities [27]. In this case, their choice of activities may not correspond to the ideal engagement pattern of the highest level [9,19]. Second, the spreading of CS in various disciplines [28] has led to CS projects offering discipline-specific activities for citizens to participate in [21]. For example, in the biological sciences, CS projects often engage participants in observation and monitoring activities [21], while in the health sciences, CS projects may involve their participants in more gamified activities such as digitally folding, and modifying proteins and their structures [29]. Such discipline-specific activities of CS projects might promote that participants’ engagement differs between projects from different disciplines [1].” Reviewer #2: Comment 4: The paper builds on an important issue under examination in the citizen science field-- how to deepen participant-engagement to benefit the citizens and the science. One concern I have is the authors' definition of "data collecting" . Here, it is described as the act of setting out camera traps and identifying/labeling the images captured by the cameras. That actually could be considered data analysis. They are analyzing the data collected by camera traps. The act of making and sharing the observations (by going into nature to take and upload the pictures, for example), is what I expected to see in this paper. Perhaps the authors can address this in the paper a bit? There are thousands of citizen science projects that rely on people to collect and share data and typically that means that the participants make and share observations...not retrieve, analyze and label the data. Again, that seems to be a form of data analysis. Response 4: We thank the Reviewer very much for the positive feedback and for bringing this point to our attention. We regret that we have not been clear enough about the reasons why we included species identification in the data collection phase and not in the data analysis phase. To address this point, we have explained our considerations in the Introduction section and the Methods section. “The mere collection of data in a CS project is regarded as the lowest level of participation (contributory projects: [19]); collecting data, data analysis, and the interpretation of results in a CS project is seen as an intermediate level (collaborative projects: [19]), because some projects allow for data analyses that go beyond species identification activities [21] and may include statistical data analyses (e.g., [22]); and being involved in the whole research process, from generating research questions to drawing conclusions, is regarded as the highest level (co-created projects: [19]; see also ‘extreme CS’: [20]).” (p. 5) “During the data collection phase, participants (1) uploaded camera trap photographs, (2) learned how to identify the wildlife species in the photographs from a tutorial, and (3) identified the species. Participants also (4) identified species in pictures from other participants in order to validate other participants’ identification of species. These tasks of the data collection phase correspond to a task typology that clearly distinguishes observations, species identification, and data entries from data analysis [21] (e.g., statistical data analysis: [22]), although species identification could be seen as some kind of initial data analysis [31].” (p. 9) Reviewer #3: Comment 5: Thank you for conducting this work examining engagement patterns across projects. In general, I think the paper is very strong and is need of minor revisions. Response 5: We thank the Reviewer very much for the positive feedback. Comment 6: This statement is unclear, please explain: The standard deviation of days between pairs of sequential active days was the variation in periodicity (v). Response 6: To be clearer about the calculation of the variation in periodicity, we have rephrased the definition accordingly. “We also counted the number of days between each of a participant’s active days and then calculated the standard deviation of these days in between. Thus, we estimated how regularly a participant actively contributed to the project which was the variation in periodicity (v).” (p. 12) Comment 7: The sample sizes for the data analysis profiles are so small - can we really say anything about these profiles? Response 7: We can understand the Reviewer’s comment. We believe that the strength of our research approach is that we followed all participants of the Wildlife Researcher projects across the two phases of the project, that is, the data collection phase and the data analysis phase. While we agree that the number of the participants in the data analysis profiles is smaller than in the data collection profiles, we also think that it is eminent for our research to show how many participants stayed active after data collection and thus were included in the data analysis profiles. Still, to better highlight the limitations that come with smaller sample sizes, we have added a discussion of the sample sizes for the data analysis profiles to the Limitations section. “A second limitation concerns the sample size in our field studies. Especially, the sample sizes for the engagement profiles in the data analysis phase were smaller than the sample sizes of those profiles in the data collection phase. While this difference may limit the conclusion about participants’ engagement in the data analysis, the quantitative and qualitative changes between the profiles also highlight the differences in engagement between the data collection and the data analysis phase. In general, the number of participants was limited to the camera traps available in the CS project.” (p. 32) Comment 8: Unsure that this claim is warranted: "We suppose as a post-hoc explanation that ownership (i.e., an individuals’ feeling of possessing a concrete or abstract object or entity; [38]) for the data, that they themselves collected and contributed, might explain why highly active participants also engaged in data analysis." There may be many other factors at play besides ownership. It would be important for you to include alternate explanations (e.g., not interested/motivated to do analysis, feeling "uncredentialed," lack of educational support for analytical skills. Response 8: This is a valid point. To make up for it, we have added further considerations on alternate post-hoc explanations such as that those participants who were highly engaged were overall more motivated to participate in both phases. Furthermore, those participants who were highly engaged in the data collection possibly held a specific set of knowledge and skills that also helped them to engage in the data analysis more strongly. “We suppose as a post hoc explanation for why highly active participants also engaged in data analysis that those participants felt ownership (i.e., an individuals’ feeling of possessing a concrete or abstract object or entity; [45]) for the data, that they themselves collected and contributed. This could well be the case, as research on project ownership has indicated that high project ownership promotes deeper involvement in problem solving in the inquiry process [46]. Other possible explanations for why highly active participants in the data collection also engaged in data analysis could be that those participants were in general more motivated to participate in all phases [47] or even held a specific set of knowledge and skills that helped them to analyze the data as well [48].” (p. 29) Thank you very much again for your positive feedback. We hope that we have revised the manuscript in accordance with what you had in mind. The revisions have improved the manuscript and we hope you agree with us that it is ready for publication. All the best and kind regards, The authors Submitted filename: Response to the Editors and Reviewers_final.docx Click here for additional data file. 26 Sep 2022 I’m fine with collecting data: Engagement profiles differ depending on scientific activities in an online community of a citizen science project PONE-D-22-09613R1 Dear Dr. Bruckermann, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Andrea Fronzetti Colladon, Ph.D. Academic Editor PLOS ONE Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed Reviewer #3: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: I Don't Know Reviewer #2: I Don't Know Reviewer #3: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thank you for addressing the comments provided in the first round of revisions. I think this is a strong manuscript and discuss citizen science in an interesting light. Reviewer #2: (No Response) Reviewer #3: Thank you for the revision. You have adequately addressed all of my comments and I have no further recommendations. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No ********** 28 Sep 2022 PONE-D-22-09613R1 I’m fine with collecting data: Engagement profiles differ depending on scientific activities in an online community of a citizen science project Dear Dr. Bruckermann: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Prof. Andrea Fronzetti Colladon Academic Editor PLOS ONE

10 in total

1. The use and reporting of cluster analysis in health psychology: a review.

Authors: Jane Clatworthy; Deanna Buick; Matthew Hankins; John Weinman; Robert Horne
Journal: Br J Health Psychol Date: 2005-09

2. A new dawn for citizen science.

Authors: Jonathan Silvertown
Journal: Trends Ecol Evol Date: 2009-07-06 Impact factor: 17.712

3. Crowd science user contribution patterns and their implications.

Authors: Henry Sauermann; Chiara Franzoni
Journal: Proc Natl Acad Sci U S A Date: 2015-01-05 Impact factor: 11.205

4. Citizen science. Next steps for citizen science.

Authors: Rick Bonney; Jennifer L Shirk; Tina B Phillips; Andrea Wiggins; Heidi L Ballard; Abraham J Miller-Rushing; Julia K Parrish
Journal: Science Date: 2014-03-28 Impact factor: 47.728

5. Science education. Convergence between science and environmental education.

Authors: Arjen E J Wals; Michael Brody; Justin Dillon; Robert B Stevenson
Journal: Science Date: 2014-05-09 Impact factor: 47.728

6. Crystal structure of a monomeric retroviral protease solved by protein folding game players.

Authors: Firas Khatib; Frank DiMaio; Seth Cooper; Maciej Kazmierczyk; Miroslaw Gilski; Szymon Krzywda; Helena Zabranska; Iva Pichova; James Thompson; Zoran Popović; Mariusz Jaskolski; David Baker
Journal: Nat Struct Mol Biol Date: 2011-09-18 Impact factor: 15.369

Review 10. Innovations in Camera Trapping Technology and Approaches: The Integration of Citizen Science and Artificial Intelligence.

Authors: Siân E Green; Jonathan P Rees; Philip A Stephens; Russell A Hill; Anthony J Giordano
Journal: Animals (Basel) Date: 2020-01-14 Impact factor: 2.752

10 in total