Literature DB >> 32607493

Research data management in health and biomedical citizen science: practices and prospects.

Ann Borda1, Kathleen Gray1, Yuqing Fu1.   

Abstract

BACKGROUND: Public engagement in health and biomedical research is being influenced by the paradigm of citizen science. However, conventional health and biomedical research relies on sophisticated research data management tools and methods. Considering these, what contribution can citizen science make in this field of research? How can it follow research protocols and produce reliable results?
OBJECTIVE: The aim of this article is to analyze research data management practices in existing biomedical citizen science studies, so as to provide insights for members of the public and of the research community considering this approach to research.
METHODS: A scoping review was conducted on this topic to determine data management characteristics of health and bio medical citizen science research. From this review and related web searching, we chose five online platforms and a specific research project associated with each, to understand their research data management approaches and enablers.
RESULTS: Health and biomedical citizen science platforms and projects are diverse in terms of types of work with data and data management activities that in themselves may have scientific merit. However, consistent approaches in the use of research data management models or practices seem lacking, or at least are not prevalent in the review.
CONCLUSIONS: There is potential for important data collection and analysis activities to be opaque or irreproducible in health and biomedical citizen science initiatives without the implementation of a research data management model that is transparent and accessible to team members and to external audiences. This situation might be improved with participatory development of standards that can be applied to diverse projects and platforms, across the research data life cycle.
© The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association.

Entities:  

Keywords:  citizen science; crowdsourcing; participatory health; research data management; self-quantification

Year:  2019        PMID: 32607493      PMCID: PMC7309241          DOI: 10.1093/jamiaopen/ooz052

Source DB:  PubMed          Journal:  JAMIA Open        ISSN: 2574-2531


BACKGROUND

Citizen science

Public engagement in health and biomedical research is being influenced by the paradigm of “citizen science,” that is, active participation in research teams by members of the general public with no formal training in the field of research concerned. Different labels have been used to describe citizen science and related forms of public participation in scientific research, in which citizens may contribute actively to science with their intellectual effort, or their specialized knowledge, or their tools and resources. The online Oxford English Dictionary defines citizen science in the context of data activity: The collection and analysis of data relating to the natural world by members of the general public, typically as part of a collaborative project with professional scientists. Citizen science dates back over a century in some fields of research - for example natural history, where the US Audubon Society’s Christmas Bird Count began in 1900. The potential of citizens and experts working together to improve the development of science and policy gained prominence in such publications as Alan Irwin’s “Citizen Science: A Study of People, Expertise and Sustainable Development.” Citizen science activity has dramatically increased in the 21st century, influenced by societal and technological changes and participatory democracy. Critically it has enabled the large-scale collection and processing of scientific data and widespread dissemination of scientific knowledge and discoveries, notably in environmental sciences, ecology, and astronomy., This is further supported by Internet access and connected communication devices which have been pivotal in making science more accessible to more people and which has, in turn, shaped motivational factors for participation., The concept and practice of citizen science has also become increasingly formalized through the establishment of citizen science organizations worldwide. Citizen science projects can be differentiated according to the extent of responsibilities that citizens undertake for research activities, such as defining research questions, collecting and analyzing data, and interpreting and disseminating results. Other strategies classify citizen science projects according to their approach, such as investigative, online, or educational, or they can take further account of digital and distributed participation levels of “ownership.”, Other new forms of citizen science participation are also arising such as “serious games” developed by professional scientists in which participants are engaged in problem-solving.,

Data quality and management

Data management practices and tools in citizen science projects can take many forms but it is largely recognized that like any scientific project, public participation in research requires an understanding and implementation of appropriate data management and stewardship guidelines., The FAIR Data Principles: findable, accessible, interoperable, and reusable (www.go-fair.org/fair-principles) are among the guiding standards acknowledged as good data practice. Ensuring data quality—for example, data of high quality, accuracy, consistency, and completeness to serve an intended purpose—remains a pervasive issue in data management., The challenge is primarily due to the fact that a lay public is generally considered to be untrained in scientific data management or research integrity, or may be prone to systematic errors which can impact data quality in processing and analysis tasks, for instance. Data quality can also be highly context dependent (ie, “fitness for use”)., The application of data quality methods in citizen science is most often dependent on the types of contributory work being carried out. These contributions can take the form of data collection (manual or automated) or data processing (eg, classification, coding, and annotation). For example, data quality in data collection may be ensured by providing training or close supervision where feasible in a fieldwork context; and cross-checking for consistency with existing literature and/or with expert observations. One survey has examined mechanisms for data quality assurance and validation in the data collection process to inform a framework to apply to the design of citizen science activity, and taking into account the potential for error at both participant and protocol levels. A high level of quality assurance is often associated with the use of “crowdsourcing”—a type of citizen science approach in which multiple people carry out the same work or task, such as contributing to peer review or replication of an analysis, for example, image identification. Such an approach is desirable across the sciences for validation, accuracy, and in reducing bias. Whereas data quality mechanisms in the data collection and processing activities of citizen science have been fairly documented, the application of research data management practices to a whole project data lifecycle is much less in evidence. Published examples include the citizenscience.gov toolkit which contains general guidelines on data management for US federal agencies engaged in citizen science, and the citizen science initiative DataONE (www.dataone.org) guidelines which use a data lifecycle approach specific to that project’s goals and context. The largest data management study to date was conducted by the European Commission's Joint Research Centre in 2015. Of 121 projects that responded, 84% of these represented areas of environmental research, and range from local neighborhood to multinational levels, showing the potential for citizen science to generate different scales of data. Over 60% of projects were reported to have data management plans in place, but there is a notable gap in understanding what these plans comprise. Data re-use licensing was a particular concern across the respondents. Some suggest that open data and open standards could better ensure reuse of project results. As an overarching statement, it has been said that “the evidence that some citizen science programs produce high quality data of immediate use to science … does not translate into the conclusion that all citizen science programs can.” These are among the points which have led to the focus of the present paper.

Citizen science in health and biomedicine

Citizen science is advocated increasingly in health and biomedical research. The emerging connections between biotechnology and citizenship have added steadily to the bioscience research and policy discourse for two decades, for example, Rabinow’s coining of “biosociality” in 1996, Petryna’s account of biological citizens after Chernobyl and Rose and Novas’ “biological citizenship.” Reflecting the rise of consumer-oriented Internet platforms and connected tools for personal health data management, the idea of the so-called “participatory biocitizen” was proposed in 2012 as a way to realize personalized medicine by sharing life-logging and self-quantification data through social media channels., Such advocacy has translated slowly into peer-reviewed research outputs, not least hampered by differing definitional boundaries of what constitutes health and biomedical citizen science in the literature, and for the purposes of policy-making., Benefits of citizen science in health and biomedical research are said to be reciprocal. Professional scientists may gain increased research capacity (eg, the Cochrane Crowd initiative) and more accurate findings from incorporating lay people’s local knowledge or patients’ expertise. For public participants, the benefits may be improved scientific literacy, more empowered communities, more engaged policy and decision-making, and more opportunity for knowledge co-production in areas some areas of biomedical science, such as heritability.,,

Research data management

Proponents of citizen science in health and biomedicine argue that “If millions of biocitizens were streaming data to the cloud, they would build the most powerful dataset for preventive and precision medicine the world has ever known.” However, there is a large leap from volumes of data to standards of evidence in health. The credibility of conventional health and biomedical research relies on rigorous research data management tools and methods, such as the Harvard Biomedical Data Management and Lifecycle Model (datamanagement.hms.harvard.edu) that comprises cyclical steps of data creation, analysis, distribution (eg, sharing), retention, storage and secondary use, for example. Indeed, the improvement of research data management is a major focus of biomedical informatics., This article therefore examines the requirements for data management, and the approaches taken to it, in health and biomedical citizen science, to identify key considerations for engaged and intending citizens and biomedical researchers, and to pave the way for informatics advances that can improve this kind of research.

METHODS

To explore the prevalence and extent of use of research data management in health and biomedical citizen science, we conducted two studies. First a scoping review was undertaken encompassing literature reviews, conceptual articles, book chapters, and reports. First, a manual search was conducted by the authors which generated 75 papers. All items were read and relevant terms for a database search were extracted. This resulted in the following terms: citizen science, crowdsourcing, crowdfunding, participatory health, personalised medicine, and self-quantification. We further refined this result following a recent taxonomy of health research crowdsourcing behaviors, using the search strings: [participat* OR crowd* OR *citizen* OR self*] AND [scienc* OR *omic* OR *quantif*] AND [*medic* OR health* OR wellness OR “well-being” OR wellbeing]. Five online databases (PubMed, Scopus, ACM Digital Library, IEEE Xplore, and Web of Science) were searched. The search was restricted to English language publications between 2012 when the term “participatory biocitizen” appeared in the literature to the end of 2018. By screening Title and Abstract fields, we excluded papers that addressed only nonhuman health research, such as veterinary and ecological sciences, and where there was no evidence of human health related outcomes. We further excluded papers addressing only low-level public participation as defined in general classifications of public participation in scientific studies,,,, for example, where citizens had no other involvement in a research project beyond completing a questionnaire or being interviewed, or beyond having their pre-existing online social media data mined and analyzed by researchers. We additionally excluded papers focusing on microwork for monetary incentives. We retrieved the full text of all accessible papers and manually searched reference lists for additional works that our search may have missed. A total of 146 papers comprised our final result set. References #40–#186 represent the literature review results, and are further tabled below. We analyzed the content of these papers to identify key characteristics of the research projects described in them that may shape their research data management practices. Second, through the scoping review and additional Internet searching, we identified five indicative online platforms for health and biomedical citizen science and we selected projects supported by each platform. Indicative platforms had specific mentions in the literature review papers and in associated internet searches of health and biomedical citizen science projects supported by these platforms. The online accessibility of the platforms and of their hosted projects in terms of web site information available to participants was another consideration. We used deductive thematic analysis of information from the web sites to describe the research data management facilities afforded by these platforms and projects. Finally, we synthesized the findings from these two studies to identify strengths, weaknesses, opportunities and threats of research data management in health and biomedical citizen science. Methods and findings from the platforms and projects review study are further discussed following the literature review.

RESULTS AND DISCUSSION

Literature review

The terms “citizen science” and related participatory health research were most commonly used in connection with public health research; crowdsourcing, with cancer and genomic research and/or rare disease identification; and self-quantification were typically associated with genomic and personalized medicine research. The most prevalent descriptors of health and biomedical citizen science were used to group the result sets as listed in Table 1. In some papers, several descriptors were in use to describe health and biomedical citizen science approaches.
Table 1.

Descriptors of health and biomedical citizen science papers extracted from title/abstract

Descriptor(s) *Paper references (repeating numbers represent papers with more than one descriptor)
Citizen science 42, 43, 44, 46, 55, 62, 63, 64, 66, 70, 75, 77, 78, 80, 8184, 85, 88, 90, 93, 94, 95, 98, 102, 103, 107, 108, 110, 113, 114, 117, 118, 119, 120, 121, 131, 136, 138, 143, 145, 146, 147, 148, 150, 154, 156, 158, 167, 168, 169, 184, 186
Crowdsourcing 40, 41, 44, 52, 53, 54, 62, 68, 72, 77, 81, 86, 90, 96, 99, 100, 103, 111, 115, 116, 119, 122, 123, 124, 125, 130, 134, 135, 137, 140, 142, 144, 146, 149, 157, 158, 159, 160, 161, 164, 166, 168, 172, 173, 174, 177
Participatory health 40, 41, 45, 47, 48, 51, 61, 69, 76, 78, 84, 85, 87, 89, 93, 104, 105, 106, 107, 109, 110, 112, 114, 119, 128, 132, 138, 139, 141, 145, 148, 155, 162, 165, 177, 181, 182, 183, 185
Self-quantification 40, 41, 56, 57, 58, 59, 60, 65, 67, 71, 73, 74, 79, 82, 83, 91, 92, 97, 101, 119, 126, 127, 129, 133, 151, 152, 153, 161, 163, 170, 171, 175, 176, 177, 178, 179, 180

Articles and reports are referenced by number and listed by this number with full citation in the References.

Descriptors of health and biomedical citizen science papers extracted from title/abstract Articles and reports are referenced by number and listed by this number with full citation in the References. The availability and use of suitable MeSH terms was a limiting factor on our review. The MeSH term “Community-Based Participatory Research” (scoped as a “collaborative process of research involving researchers and community representatives”) was associated with only a few relevant papers;,,,, Community-based participatory research (CBPR) is often used for environmental health investigations and/or as an approach in which researchers work closely with the local community in developing and implementing research likely to be of concern to members of that community.,, In regard to conceptual or methodological characteristics described, papers showed no strong pattern. For instance, papers reported on research reporting protocols, human-computer interactions,, platform designs,,,, and social empowerment policies., Also reported were works in progress,, simulated patient-led n = 1 trials, comparative effectiveness research, and feasibility studies., One conclusive clinical trial was reported and several papers reported on citizen science approaches to clinical trial development.,,,,,, The open public and open knowledge philosophy of some citizen science initiatives mean that their descriptions may be published outside the scientific literature, for example, cases described in.,,, We found 16 literature review papers.,,,,,,,,,,,,, Most of these review papers focused on crowdsourced health and biomedical research, including gamification, while others covered citizen science and participatory health, and self-quantification. Platforms cited in the review papers include Zooniverse,, PatientsLikeMe,,,,,, 23andMe,, and initiatives such as Cancer Research UK’s Trailblazer, Mark2Cure, uBiome, DIYbio, and the serious game, Foldit.,,, None of the literature had a primary focus on research data management, although aspects of data management, such as data collection and processing, are described in examples further in the sections below.

Participatory roles

We found that the participatory roles of health and biomedical citizen scientists could be categorized in three main ways.,,,,,, The first is their number and spread, that is, whether public engagement in a project is one sole person, or a community in a localized setting or a global effort. The literature we reviewed reflected at least 10 studies of each size, but about half of the papers gave no indication of the number of citizen scientists intended or involved. Participant characteristics were variably or minimally reported, and logistics of participation were not reported consistently. The second is their status in the project as producers, instigators, and/or mobilisers (eg, are they named as the chief investigators, is this a patient-driven investigation, or is this a school curriculum project, for instance). We found that most studies were instigated by professionally trained researchers; the roles of citizens and patients as co-producers were not fully described in most citizen science and crowdsourced research designs. Community scaled participatory research projects more often provided information on respective roles and responsibilities across research activities.,,,,,,,, The need to ensure ownership and control over local knowledge are highly relevant in situations of CBPR, and may inform wider citizen science practices., The literature on self-quantifiers suggests these individuals have high levels of activation that may motivate them to independently mobilize citizen scientists and/or approaches.,, These approaches are typically outside of the instigation of organized health professionals or scientific organizations, as in the case of biohackers., The third is the functions citizens carry out in the project, such as crowdfunding of resources,, donation of data from self-tracking, self-reporting in participatory health, disease surveillance, and crowdsourcing of data analysis,, among others. Crowdsourcing in particular was most commonly applied to the field of biomedical research and supporting analytical tasks, for example, image processing, sequence alignment, or molecular folding.,, Crowdsourced data processing could further involve both lay people and those knowledgeable in a discipline, particularly where complex tasks (eg, forms of annotation or relational tasks) or problem-solving are applied. Serious games and crowdsourced systematic literature review platforms were among these identifiable activities:,,,,,,. Health and biomedical citizen science further targeted health conditions where participants undertook self-reporting using mobile apps and wearable sensors. This included widespread conditions, such as cancer and mental illness,, and specific conditions, such as Crohn’s disease and colitis., Additionally, research aimed to investigate a wide range of health conditions which can be informed by personal genomic data that people may generate through direct-to-consumer testing services.,,,,,,, Not all citizen science research used digital tools or online platforms, for example, Science café settings, but such technologies have widened the possibilities greatly.

Data management

Our analysis found three characteristics that were most likely to shape the research data management needs of such projects: research aims and objectives; participant roles and functions; research platforms and tools used. Aims and objectives of public-health-oriented citizen science, for instance, included health service access;,,, health literacy improvement; workplace health issues; environmental factors in human health,,, and identifying research priorities to optimize the design and delivery of patient-centered health services. As outlined elsewhere in the paper, aspects of data management were closely associated with types of participatory roles and functions. Data collection and data processing were pervasive activities across the literature. Other specialized tasks included surveillance,,, monitoring,,, and problem-solving., Among self-quantifying citizens, wearable devices were used commonly for personal data collection, for example, Fitbit® and other fitness trackers,,,, and sleep monitoring devices,, while other apps were designed for manual logging of personal health data, such as mood state or dietary intake and weight management.,, In the present literature review, the citizen science platform Zooniverse (www.zooniverse.org) was the most cited generalizable platform supporting citizen science projects. Also prominent were platforms developed to support online communities of interest, for example, around personal genomics, the health social media platform PatientsLikeMe (www.patientslikeme.com), and the direct-to-consumer genomic service 23andMe (www.23andme.com). Table 2 provides a summary of literature review citations by platform and associated descriptors.
Table 2.

Platforms cited in the literature review and associated descriptors

PlatformLiterature review citationsAssociated descriptor(s)
Zooniverse www.zooniverse.org 42, 43, 46, 94, 95, 119, 148, 149, 156, 158, 168, 184, 186Citizen science: 42, 43, 46, 94, 95, 119, 148, 156, 168, 184, 186 Crowdsourcing: 119, 149, 158, 168
PatientsLikeMe www.patientslikeme.com 40, 41, 42, 48, 52, 53, 56, 57, 58, 76, 94, 122, 133, 136, 140, 151, 158, 159, 160, 161, 175, 177

Crowdsourcing: 40, 41, 52, 53, 122, 140, 158159, 161, 177

Self-quantification: 5658, 133, 151, 160, 161, 175, 177

Citizen science: 42, 94, 136,

Participatory health: 48

23andMe www.23andme.com 40, 41, 42, 43, 53, 57, 59, 90, 94, 119, 133, 140, 159, 160, 161, 165, 166, 175

Crowdsourcing: 40, 41, 53, 119, 140, 159, 160, 161, 166

Self-quantification: 57, 59, 133, 175

Citizen science: 42, 43, 90, 94, 119

Scistarter https://scistarter.com/ 108, 148, 159Citizen science:108, 148 Crowdsourcing: 159
Citscibio https://citscibio.org/ 123, 158Crowdsourcing: 123, 158
Platforms cited in the literature review and associated descriptors Crowdsourcing: 40, 41, 52, 53, 122, 140, 158–159, 161, 177 Self-quantification: 56–58, 133, 151, 160, 161, 175, 177 Citizen science: 42, 94, 136, Participatory health: 48 Crowdsourcing: 40, 41, 53, 119, 140, 159, 160, 161, 166 Self-quantification: 57, 59, 133, 175 Citizen science: 42, 43, 90, 94, 119 Other communities of practice were supported through wearable device manufacturers’ platforms and gaming platforms,, to cite a few. Mainstream social media like Facebook, Twitter, and WhatsApp also were used for participant networking, data sharing, and task management.,,,, Crowdsourcing and citizen science activities were most associated with quantitatively measurable tasks, whereas participatory health (especially CBPR) were primarily qualitative in design, and quantified self activity comprised mixed methods approaches to the objective biometric data and the subjective experience of the impact of these data. Data quality and validation aspects of data collection and analysis were a focus in a number of papers, including articles reporting on the effect of training on accuracy, how various characteristics of participants affected their accuracy, and aspects of project design on accuracy:,,,,,. The comparison of participants with a reference group or experts featured in a few studies, for example., Data sharing of personal health and genomic data and associated ethical and legal challenges (eg, consent, re-use, and exploitation) were considered across a number of papers.,,,,,,,,,,,, These overarching challenges appear across a spectrum of projects and platforms, particularly those associated with self-reported data and specimen data contributions, such as 23andMe, PatientsLIkeMe, and the American Gut Project. In the European Commission survey on data management, for example, only 10 projects (8%) out of those surveyed asked their participants to sign an explicit informed consent form. Those studies based in Europe touched on the potential impact of digital regulations, such as the General Data Protection Regulation (GDPR), on aspects of consent. Equally there were considerations of the complexities of data governance and ownership, and possible approaches to these issues, such as the establishment of a participant-driven data commons.,,,

Platforms and projects

Method

At the conclusion of the scoping review, we selected five online platforms and projects that would allow us to consider the range of responsibilities and resources for research data management that are found in health and biomedical citizen science at present. The chosen platforms were based on their frequency as identified examples in the literature review results; the prevalence of health and biomedical projects supported through these platforms; and the use of public participatory approaches in the research process. Of the selected platforms, Citscibio, SciStarter, and Zooniverse are not-for-profit and largely publicly funded. PatientsLikeMe and 23andMe are examples of for-profit crowdsourced patient platforms. We extracted information from the web pages of the platforms and projects (Supplementary Appendix 1 lists the source URLs), from related research institution websites, and from relevant publications, where available. This extraction task was undertaken in August 2018. We summarized and compared their research data management environments and approaches, following the Harvard Biomedical Data Management Lifecycle model comprising: Data creation Data analysis Data distribution Evaluation for data retention Organisation for long-term storage Archiving for research audit and for secondary use. The Harvard Biomedical Data Management Lifecycle model was selected for its use in health and biomedical research data management contexts, and applicability across different types of data activities that were captured in the literature review.

Results and discussion

Platforms in health and biomedical citizen science have been in operation for over a decade. For example, both PatientsLike Me and 23andMe were launched in 2006, and Zooniverse was launched in 2007, SciStarter in 2014, and CitScibio in 2016. Analysis of these platforms enabled us to illustrate the current span and reach of health and biomedical citizen science, as shown in Table 3. The authors felt it particularly critical to include commercial for-profit as well as not-for-profit platforms. Biomedical research in general spans both kinds of organizations as well as government-owned organizations, and not without controversies related to its orientation. PatientsLike Me and 23andMe have attracted both popular and critical attention for their scientific activities, and we wanted to examine them through the lens of responsible research data management.
Table 3.

Overview of selected health and biomedical citizen science platforms and associated projects

PlatformProject

23andMe

https://www.23andme.com/en-int/

This direct-to-consumer genetic testing company was founded in 2006. Now it has more than 3 million customers and over 5 million genetic results stored in its databases. Users of this service buy a test kit, then place their tissue sample in a collector and send it back to the lab. Several weeks later, they receive their genetic results online through their 23andMe account portal. Customers have the option to receive health analysis results and/or ancestry results. 23andMe has partnerships with biomedical R&D institutions to study the relationships between DNA and some health conditions. In 2012, 23andMe acquired the crowdsourced patient advocacy startup “CureTogether.” Over 80% of their clients give explicit consent for 23andMe to use their data in this way. It now has ongoing projects in Parkinson’s disease, inflammatory bowel disease, lupus, fertility and major depressive disorders.175

Lupus

https://www.23andme.com/lupus/

23andMe has collaborated with pharmaceutical company Pfizer and the Lupus Research Institute to study lupus and DNA. This study aimed to recruit 5000 American eligible lupus patients, and the goal was achieved in 17 months.71. Participants could either be an existing client of 23andMe, or other lupus patients who would be provided with a free 23andMe genetic test. Participants also needed to complete regular questionnaires and submit their medical history and physician contacts details to the research team.

CitSciBio

https://citscibio.org/

Citscibio—The Biomedical Citizen Science Hub is sponsored by the Division of Cancer Biology and the Division of Cancer Control and Population Sciences at the National Cancer Institute, at the National Institutes of Health, USA. Launched in 2016, it is an open platform for the general public to find and add relevant resources, projects, and events. According the website, volunteer coordinators have started over 700 projects that have contributed nearly 1 million measurements for analysis to answer local, regional and/or global questions.

Citizen science programs provided by Citscibio are devolved from the platform itself; the platform provides only brief descriptions and web links.

Citscibio also collaborates with the citizen science data management working group of the DataONE (Data Observation Networked Earth) program (https://www.dataone.org/) to facilitate data sharing and stewardship.33

Mark2Cure

https://mark2cure.org/

This project was designed by Dr Ginger Tsueng at the Su Lab at The Scripps Research Institute in California. Anyone who can read English can help research, annotate, and extract crucial information from the biomedical literature of the online biomedical database PubMed. Currently, Mark2Cure studies focus on rare disease conditions, that is, N-glycanase 1 deficiency. Volunteers do a tutorial to understand the interface and the tasks, then mark up the literature on how this disease relates with other conditions or treatments. Volunteers contributed over one million annotations by 2018.98

PatientsLikeMe

https://www.patientslikeme.com/

This online community for patients to exchange information is a private enterprise founded by lay people (Jamie Heywood, Ben Heywood, and Jeff Cole) in 2004. It launched its first online community in 2006 initially to enable Amyotrophic Lateral Sclerosis (ALS) patients to pool information. The philosophy of PatientsLikeMe is that the more information is shared by patients, the more possible it is to conduct high-impact research. In 2011, the platform expanded to all patients and all health conditions; in 2018, over half a million patients have reported on thousands of conditions, treatments and symptoms.134

Amyotrophic lateral sclerosis (ALS)

https://www.patientslikeme.com/conditions/als

12 000 ALS patients are registered on this platform to share their experience. Some reported experimenting with lithium carbonate to slow the progression of ALS. Data from 596 patients in all (149 in the treatment group, and 447 in the control group) subsequently underwent two analyses by trained researchers: an intent-to-treat analysis of 149 patients who reported taking lithium for at least 2 months (but may have discontinued taking the drug or died within 12 months), and an analysis of the subset of 78 patients who stayed on lithium for a full 12 months or died within that period.183

Scistarter

https://scistarter.com/

This is a multidiscipline citizen science platform with over 2700 available citizen science projects and events begun in 2014. Scistarter has collaborations with the US National Science Foundation, Arizona State University's Center for Engagement and Training in Science and Society, NASA, Girl Scouts of America, and others. Volunteers cannot directly access and contribute on the platform but are directed to websites of each separate project. Scistarter lists projects that offer citizen science opportunities, educates the general public about citizen science, helps people to find projects relevant to their location, age and available devices, and records the contributions of citizen scientists.76

“Tell-us!”

https://scistarter.com/project/19871-Tell-us-about-injuries

This research project is supported by the Ludwig Boltzmann Gesellschaft (Society) of Austria. A crowdsourcing program invites patients to share their experiences with healthcare experts, to help the latter to set research priorities. The first “Tell-us!” study, focusing on mental health issues, launched in 2015 and finished in 2016. The second study, underway in 2018, brings together traumatologists and people who have experiences of injury.189

Zooniverse

https://www.zooniverse.org/

This was established and is hosted by a group of professional scientists from Oxford University (UK), the Adler Planetarium in Chicago (USA), and other academic institutions. The first project launched in 2007, was Galaxy Zoo; volunteers are still working now to distinguish images of distant galaxies. It claims to be the world’s largest people-powered research platform for volunteers and professionals to work together on scientific studies. In 2018, 88 projects from 11 disciplines are available on the platform. Researchers interested in using the platform are supported by the Zooniverse team in developing a pilot that is beta-tested for ease of use, task suitability, and functionality prior to full release to public citizen scientists.101

Bash the bug https://www.zooniverse.org/projects/mrniaboc/bash-the-bug

Dr Philip W Fowler, in the Modernising Medical Microbiology Group at Oxford University, is the lead researcher. To date over 8000 volunteers have classified images of samples of tuberculosis (TB) surviving in different dosages of antibiotics. The target is to have every image classified by at least 15 volunteers. The project is part of a global project, the Comprehensive Predictive Resistance for Tuberculosis International Consortium (CRyPTIC), which aims to achieve better, faster and more targeted treatment of multidrug-resistant TB via genetic resistance prediction.

Overview of selected health and biomedical citizen science platforms and associated projects 23andMe https://www.23andme.com/en-int/ This direct-to-consumer genetic testing company was founded in 2006. Now it has more than 3 million customers and over 5 million genetic results stored in its databases. Users of this service buy a test kit, then place their tissue sample in a collector and send it back to the lab. Several weeks later, they receive their genetic results online through their 23andMe account portal. Customers have the option to receive health analysis results and/or ancestry results. 23andMe has partnerships with biomedical R&D institutions to study the relationships between DNA and some health conditions. In 2012, 23andMe acquired the crowdsourced patient advocacy startup “CureTogether.” Over 80% of their clients give explicit consent for 23andMe to use their data in this way. It now has ongoing projects in Parkinson’s disease, inflammatory bowel disease, lupus, fertility and major depressive disorders. Lupus https://www.23andme.com/lupus/ 23andMe has collaborated with pharmaceutical company Pfizer and the Lupus Research Institute to study lupus and DNA. This study aimed to recruit 5000 American eligible lupus patients, and the goal was achieved in 17 months.. Participants could either be an existing client of 23andMe, or other lupus patients who would be provided with a free 23andMe genetic test. Participants also needed to complete regular questionnaires and submit their medical history and physician contacts details to the research team. CitSciBio https://citscibio.org/ Citscibio—The Biomedical Citizen Science Hub is sponsored by the Division of Cancer Biology and the Division of Cancer Control and Population Sciences at the National Cancer Institute, at the National Institutes of Health, USA. Launched in 2016, it is an open platform for the general public to find and add relevant resources, projects, and events. According the website, volunteer coordinators have started over 700 projects that have contributed nearly 1 million measurements for analysis to answer local, regional and/or global questions. Citizen science programs provided by Citscibio are devolved from the platform itself; the platform provides only brief descriptions and web links. Citscibio also collaborates with the citizen science data management working group of the DataONE (Data Observation Networked Earth) program (https://www.dataone.org/) to facilitate data sharing and stewardship. Mark2Cure https://mark2cure.org/ This project was designed by Dr Ginger Tsueng at the Su Lab at The Scripps Research Institute in California. Anyone who can read English can help research, annotate, and extract crucial information from the biomedical literature of the online biomedical database PubMed. Currently, Mark2Cure studies focus on rare disease conditions, that is, N-glycanase 1 deficiency. Volunteers do a tutorial to understand the interface and the tasks, then mark up the literature on how this disease relates with other conditions or treatments. Volunteers contributed over one million annotations by 2018. PatientsLikeMe https://www.patientslikeme.com/ This online community for patients to exchange information is a private enterprise founded by lay people (Jamie Heywood, Ben Heywood, and Jeff Cole) in 2004. It launched its first online community in 2006 initially to enable Amyotrophic Lateral Sclerosis (ALS) patients to pool information. The philosophy of PatientsLikeMe is that the more information is shared by patients, the more possible it is to conduct high-impact research. In 2011, the platform expanded to all patients and all health conditions; in 2018, over half a million patients have reported on thousands of conditions, treatments and symptoms. Amyotrophic lateral sclerosis (ALS) https://www.patientslikeme.com/conditions/als 12 000 ALS patients are registered on this platform to share their experience. Some reported experimenting with lithium carbonate to slow the progression of ALS. Data from 596 patients in all (149 in the treatment group, and 447 in the control group) subsequently underwent two analyses by trained researchers: an intent-to-treat analysis of 149 patients who reported taking lithium for at least 2 months (but may have discontinued taking the drug or died within 12 months), and an analysis of the subset of 78 patients who stayed on lithium for a full 12 months or died within that period. Scistarter https://scistarter.com/ This is a multidiscipline citizen science platform with over 2700 available citizen science projects and events begun in 2014. Scistarter has collaborations with the US National Science Foundation, Arizona State University's Center for Engagement and Training in Science and Society, NASA, Girl Scouts of America, and others. Volunteers cannot directly access and contribute on the platform but are directed to websites of each separate project. Scistarter lists projects that offer citizen science opportunities, educates the general public about citizen science, helps people to find projects relevant to their location, age and available devices, and records the contributions of citizen scientists. “Tell-us!” https://scistarter.com/project/19871-Tell-us-about-injuries This research project is supported by the Ludwig Boltzmann Gesellschaft (Society) of Austria. A crowdsourcing program invites patients to share their experiences with healthcare experts, to help the latter to set research priorities. The first “Tell-us!” study, focusing on mental health issues, launched in 2015 and finished in 2016. The second study, underway in 2018, brings together traumatologists and people who have experiences of injury. Zooniverse https://www.zooniverse.org/ This was established and is hosted by a group of professional scientists from Oxford University (UK), the Adler Planetarium in Chicago (USA), and other academic institutions. The first project launched in 2007, was Galaxy Zoo; volunteers are still working now to distinguish images of distant galaxies. It claims to be the world’s largest people-powered research platform for volunteers and professionals to work together on scientific studies. In 2018, 88 projects from 11 disciplines are available on the platform. Researchers interested in using the platform are supported by the Zooniverse team in developing a pilot that is beta-tested for ease of use, task suitability, and functionality prior to full release to public citizen scientists. Bash the bug https://www.zooniverse.org/projects/mrniaboc/bash-the-bug Dr Philip W Fowler, in the Modernising Medical Microbiology Group at Oxford University, is the lead researcher. To date over 8000 volunteers have classified images of samples of tuberculosis (TB) surviving in different dosages of antibiotics. The target is to have every image classified by at least 15 volunteers. The project is part of a global project, the Comprehensive Predictive Resistance for Tuberculosis International Consortium (CRyPTIC), which aims to achieve better, faster and more targeted treatment of multidrug-resistant TB via genetic resistance prediction. The operating models of platforms vary—for instance, sourcing opportunities to collect and analyze data, or pooling personal health data, or providing access to health data from other sources, or sharing data with peers, or on-selling data for third-party research and development—and this influences the way they support collaboration among members of the public and professional researchers. Some run formal research programs where they are data custodians and where people agree to donate their own data for relevant studies, while others have a less direct relationship to research projects and monitor citizens’ work with research data in less detail. Almost all platforms state that they collect standard website user data, such as site visit and navigation information of visitors, and some use cookies to record the time and duration of each visit, and so on. At this meta-level, the platforms are purposely built to support basic data management needs for web analytics by their host organization. Information about the research data management aspects of these platforms and their representative projects is summarized following Table 3, in terms of data creation, data analysis, data distribution, evaluation, long-term storage, and reuse. A list of web pages consulted for each platform and project is listed in Supplementary Appendix 1.

Data creation

Data may be generated by participants, in the form of personal biometric measurements self-reported or tested independently (eg, 23andMe), or they may be narrative observations and perceptions about their health (eg, TellUs!), or they may be data about a family member or another person they care for (eg, in rare childhood conditions). Datasets may also be compiled by professional researchers for citizen scientists to act upon (as in Mark2Cure), or they may be openly available for a research team to work with (as with increasing access to governments’ aggregated health datasets). Research data creation in heath and biomedical citizen science is diverse, and data quality management processes that would ensure rigor at this stage in the data lifecycle must vary accordingly. Although specific data formats have been proposed for some categories of data, data quality assurance is not clearly addressed on the websites, and some are published externally, such as the biocuration aspects of Mark2Cure., Patient advocacy stories by donor participants were identifiable in several of the examples, such as 23andMe, PatientsLikeMe and ALS,, and Mark2Cure,, among others.

Data analysis

In citizen science projects where nonprofessionals undertake data analysis, this work may be done as a form of recreation (as in gamified projects), or in conjunction with health literacy education and training (eg, CitSciBio), or by expert patients or high-performing athletes (eg, within health self-quantification groups). A citizen scientist’s role may be limited to very basic annotation or contextualization of data for subsequent analysis by professionals (eg, Lupus), or they may use analytical tools that have been set up and provided for a project by specialists or they may need to be familiar with and have access to their own tools, such as the overlapping sensor web described in. Research data analysis may use emergent and contingent designs—as for instance when a project is instigated by a desperate patient (eg, Mark2Cure)—or may be designed to follow a strict protocol from the outset (eg, Bash the Bug). Accordingly, the need for individual citizen scientists to understand and use scientific research methods varies. Nevertheless, it is expected that a research endeavor in health and biomedical science fundamentally will be replicable and reproducible, and will stand up to critical appraisal in the professional science community. It is not apparent whether or how the reporting conditions that are standard for this in the health sciences are met in many of the examples we have reviewed.

Data distribution and use

Some health and biomedical citizen science aspires to change the paradigm for distributing and sharing research data. A strong expression of this can be found in the US Precision Medicine Initiative, now known as All of Us: “The data sharing community [.] entails an infrastructure for the assemblage of multiple types of biomedical data which will be managed by a Data and Research Centre. Access can be given to researchers, ranging from community colleges up to top healthcare research institutes and industries, but also for citizen scientists, who can propose studies using this information.[…].” Expressions of this paradigm are very modest or even negligible in most of the papers and platforms we reviewed, and provisions vary widely from one platform to another. CitSciBio uses Hubzero® (hubzero.org), an open source software platform originally developed for scientific researchers, for data storage and management. According to 23andMe each consumer decides how their information is used and with whom it is shared; consumers can opt out of providing individual identifiable data for research at any time, and if they do so, their data will not be used for further research. Registered members of PatientsLikeMe own their data that they upload and they can choose to keep their data on the platform; or delete their data while keeping their account open; or voluntarily close their account but agree to the data being hosted for 3 years; or delete their account and permanently delete all information about them.

Data retention and subsequent use

Health and biomedical citizen science platforms in general are opaque in the retention of data for audit or re-analysis. The direct-to-consumer personal genomic testing platform 23andMe, potentially holds the most sensitive personal data of the case studies, and is the most explicit about data retention and reuse, followed by PatientsLikeMe. Both platforms state that they support the sharing of personal health data for research. Scistarter provides less information about its data storage and data management infrastructure on its website, but some details are available in a recent paper. CitSciBio and Zooniverse data management protocols are not stated on specific pages in relation to their respective platforms, and this may be due to the project specificity of data protocols in each case. Zooniverse has published a set of success criteria for projects, which includes data related processes, for example. Some projects involve research data of a different nature than personal health data (eg, Mark 2 Cure). In 23andMe, personal and registration information is stored separately from any genetic information to reduce the likelihood that a consumer could be identified; if a consumer opts in to the 23andMe research program and completes a survey, their genetic information is de-identified and stored with their survey response data in a separate research environment. Zooniverse, too, makes clear that it maintains separate data storage of personal information, users’ contributions, and project data on various servers. 23andMe uses software, hardware and physical security measures to protect the computers where consumer data are stored, and robust authentication methods for access to its systems. 23andMe’s privacy measures include avoiding data re-identifiability, limiting personal data accessibility, and detecting threats and vulnerabilities in real time. PatientsLikeMe servers are located in the United States and a range of data security measures is described, however, no further information is provided about methods of data storage. 23andMe states that it does not sell, lease or rent individual-level information to any third party without explicit consent, but that it does share de-identified aggregated data with third parties. It is noted in this respect that there is controversy in the nontransparent way in which 23andMe, for example, has applied for patents based on participant-supplied genomic and phenotypic data. PatientsLikeMe is the only platform that explains how it profits commercially from data supplied by the community. “What happens to research data in citizen science projects over the long term?” Is a question very few people seem to ask, and few would be able to answer, based on what information other platforms provide to the public. None of the five platforms or associated projects fully explain or account for the entire research data lifecycle in a way suggestive of data management practices that might more confidently assure the quality of the research across the data lifecycle. This gap in available information further suggests that not all data processes are presently transparent or open to the public or potential participants.

LIMITATIONS

First, this scoping review was not intended to provide a comprehensive or systematic review of the topic—but rather a representative one. As such, our review of the literature was not designed to generate effect sizes or aggregate metrics. Nevertheless, given the context of identifying and exploring research data management practices across the breadth of citizen science, participatory health, crowdsourcing, and self-quantification, the review methodology that we followed was deemed an appropriate lens. This has inevitably taken into account that there are differing definitional boundaries of what constitutes health and biomedical citizen science in the literature.,,,,, As stated in Fiske et al, it is challenging to develop a meaningful typology of citizen science initiatives in biomedicine. Our analysis of data management processes of the selected platforms was also constrained due to limited access to mechanisms by which we could request more information than was publicly available on platform web sites. We further recognize that our analysis of research data management practices is only one avenue to provide insights into the opportunities and challenges of citizen science particularly in relation to the for-profit platforms, PatientsLikeMe and 23andMe, both of which are undergoing increasing scrutiny by bioethicists, for example.,, Other limitations can be directly associated with the challenges of undertaking such a multidimensional topic. The authors acknowledge that there is much on this topic which requires a fuller investigation, and not least what the implications are for each of the separate and diverse approaches of citizen science, participatory health, crowdsourcing, and self-quantification. The educative aspects in relation to the instigators of a health and biomedical citizen science project and the participants themselves in data management practices are a further consideration which was not scoped in the present study.

CONCLUSION

Health and biomedical citizen science platforms reflect diverse approaches to data management among both participants and researchers, as well as specific values associated with types of data activities, such as data sharing and re-use beyond the lifetime of a project. The lack of standard definitional boundaries of what constitutes health and biomedical citizen science contributes to this diversity of approach. These overall relationships are critical to understand in terms of what specific initiatives seek to achieve, whom they benefit, and what typologies or frameworks might be specifically aligned to participant roles and responsibilities in the data management lifecycle processes of health and biomedical citizen science. This must include the capacity to ensure data are “fit for purpose” and appropriate levels of data governance are supported. Future work requires further considered approaches to a comparative research data management focus which could be additionally supported through a public interest consensus about minimum standards that can be applied throughout the research data lifecycle. Without which many important data collections, analysis and subsequent reuse may be opaque, conflicted or untraceable. On a practitioner level, the situation could be more widely improved with a concerted program of participatory development, producing clearer standards and reusable resources for health and biomedical citizen science research data management. Such a project might take a contributory, collaborative or co-created citizen science approach. Without a directed initiative in this context, health and biomedical citizen science is unlikely to derive optimal shared value, benefits, or sustained use of the enabling platforms, resources, and collective knowledge of its stakeholders.

AUTHOR CONTRIBUTIONS

AB and KG made substantial contributions to the study design, conception and drafting of the manuscript. YF implemented the methods and carried out the qualitative analysis. AB made critical revisions for final approval of the version to be published.

CONFLICT OF INTEREST STATEMENT

None declared. Click here for additional data file.
  115 in total

Review 1.  From Crowdsourcing to Extreme Citizen Science: Participatory Research for Environmental Health.

Authors:  P B English; M J Richardson; C Garzón-Galvis
Journal:  Annu Rev Public Health       Date:  2018-04-01       Impact factor: 21.981

2.  Power to the People: Addressing Big Data Challenges in Neuroscience by Creating a New Cadre of Citizen Neuroscientists.

Authors:  Jane Roskams; Zoran Popović
Journal:  Neuron       Date:  2016-11-02       Impact factor: 17.173

3.  Citizen science on a smartphone: Participants' motivations and learning.

Authors:  Anne M Land-Zandstra; Jeroen L A Devilee; Frans Snik; Franka Buurmeijer; Jos M van den Broek
Journal:  Public Underst Sci       Date:  2015-09-07

4.  Crowdsourcing, citizen sensing and sensor web technologies for public and environmental health surveillance and crisis management: trends, OGC standards and application examples.

Authors:  Maged N Kamel Boulos; Bernd Resch; David N Crowley; John G Breslin; Gunho Sohn; Russ Burtner; William A Pike; Eduardo Jezierski; Kuo-Yu Slayer Chuang
Journal:  Int J Health Geogr       Date:  2011-12-21       Impact factor: 3.918

5.  The cure: design and evaluation of a crowdsourcing game for gene selection for breast cancer survival prediction.

Authors:  Benjamin M Good; Salvatore Loguercio; Obi L Griffith; Max Nanis; Chunlei Wu; Andrew I Su
Journal:  JMIR Serious Games       Date:  2014-07-29       Impact factor: 4.143

6.  Scaling PatientsLikeMe via a "Generalized Platform" for Members with Chronic Illness: Web-Based Survey Study of Benefits Arising.

Authors:  Paul Wicks; Eileen Mack Thorley; Kristina Simacek; Christopher Curran; Cathy Emmas
Journal:  J Med Internet Res       Date:  2018-05-07       Impact factor: 5.428

7.  Crowdsourcing awareness: exploration of the ovarian cancer knowledge gap through Amazon Mechanical Turk.

Authors:  Rebecca R Carter; Analisa DiFeo; Kath Bogie; Guo-Qiang Zhang; Jiayang Sun
Journal:  PLoS One       Date:  2014-01-22       Impact factor: 3.240

8.  Application of Citizen Science Risk Communication Tools in a Vulnerable Urban Community.

Authors:  Yuqin Jiao; Julie K Bower; Wansoo Im; Nicholas Basta; John Obrycki; Mohammad Z Al-Hamdan; Allison Wilder; Claire E Bollinger; Tongwen Zhang; Luddie Hatten; Jerrie Hatten; Darryl B Hood
Journal:  Int J Environ Res Public Health       Date:  2015-12-22       Impact factor: 3.390

9.  Online citizen science games: Opportunities for the biological sciences.

Authors:  Vickie Curtis
Journal:  Appl Transl Genom       Date:  2014-08-09

Review 10.  APPLaUD: access for patients and participants to individual level uninterpreted genomic data.

Authors:  Adrian Thorogood; Jason Bobe; Barbara Prainsack; Anna Middleton; Erick Scott; Sarah Nelson; Manuel Corpas; Natasha Bonhomme; Laura Lyman Rodriguez; Madeleine Murtagh; Erika Kleiderman
Journal:  Hum Genomics       Date:  2018-02-17       Impact factor: 4.639

View more
  2 in total

1.  From Commercialization to Accountability: Responsible Health Data Collection, Use, and Disclosure for the 21st Century.

Authors:  Deven McGraw; Carolyn Petersen
Journal:  Appl Clin Inform       Date:  2020-05-20       Impact factor: 2.342

2.  Toward a digital citizen lab for capturing data about alternative ways of self-managing chronic pain: An attitudinal user study.

Authors:  Najmeh Khalili-Mahani; Sandra Woods; Eileen Mary Holowka; Amber Pahayahay; Mathieu Roy
Journal:  Front Rehabil Sci       Date:  2022-08-15
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.