Literature DB >> 25664461

Ethical challenges of big data in public health.

Effy Vayena¹, Marcel Salathé², Lawrence C Madoff³, John S Brownstein⁴.

Abstract

Entities: Chemical Disease Gene Species

Mesh：

Year: 2015 PMID： 25664461 PMCID： PMC4321985 DOI： 10.1371/journal.pcbi.1003904

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

× No keyword cloud information.

Introduction

Digital epidemiology, also referred to as digital disease detection (DDD), is motivated by the same objectives as traditional epidemiology. However, DDD focuses on electronic data sources that emerged with the advent of information technology [1-3]. It draws on developments such as the widespread availability of Internet access, the explosive growth in mobile devices, and online sharing platforms, which constantly generate vast amounts of data containing health-related information, even though they are not always collected with public health as an objective. Furthermore, this novel approach builds on the idea that information relevant to public health is now increasingly generated directly by the population through their use of online services, without their necessarily having engaged with the health care system [4, 5]. By utilizing global real-time data, DDD promises accelerated disease outbreak detection, and examples of this enhanced timeliness in detection have already been reported in the literature. The most recent example is the 2014 Ebola virus outbreak in West Africa [6]. Reports of the emerging outbreak were detected by digital surveillance channels in advance of official reports. Furthermore, information gleaned by the various datasets can be used for several epidemiological purposes beyond early detection of disease outbreaks [7, 8], such as the assessment of health behavior and attitudes [4] and pharmacovigilance [9]. This is a nascent field that is developing rapidly [10]. While changes in the ways in which epidemiologic information is obtained, analyzed, and disseminated are likely to result in great social benefits, it is important to recognize and anticipate potential risks and unintended consequences. In this article we identify some of the key ethical challenges associated with DDD activities and outline a framework for addressing them. We argue that it is important to engage with these questions while the field is at an early stage of evolution in order to make ethical awareness integral to its development.

The Context in Which DDD Operates

DDD operates at the intersection of personal information, public health, and information technologies, and increasingly within the so-called big data environment. Big data lacks a widely accepted definition. The term has, nevertheless, acquired substantial rhetorical power. We use it here in the sense of very large, complex, and versatile sets of data that are constantly evolving in terms of format and velocity [11]. This dynamic environment generates various ethical challenges that relate not only to the value of health for individuals and societies, but also to individual rights and other moral requirements. In order to spell out these challenges and possible ways of meeting them, it is necessary to take into account the distinctive nature of DDD and the broader context in which it operates. Generally, these distinct features are linked to the methods by which data are generated, the purposes for which they are collected and stored, the kind of information that is inferred by their analysis, and eventually how that information is translated into practice [12]. More specifically, some of these relevant features include those outlined below—namely, the steady growth of digital data, the multifaceted character of big data, and ethical oversight and governance.

The steady growth of digital data

The amount of data that is generated from activities facilitated by the Internet and mobile technologies is unprecedented. The global number of mobile-cellular subscriptions is close to the world’s population figures, with a total penetration rate of 96%. The mobile-cellular penetration rate in developing countries is 89%, and about 40% of the world’s population is connected to the internet [13]. 82% of the world’s online population uses social media and networks. [14]. More than 40,000 health apps are available, and a new higher-level Internet domain name “health” is about to be released [15, 16]. Not surprisingly, personal data have recently been described as a new asset class with the potential to, among other things, transform health care and global public health [17].

The multifaceted character of big data

Big data cannot be readily grouped into clearly demarcated functional categories. Depending on how they are queried and combined with other datasets, a given dataset can traverse categories in unpredictable ways. For example, health data can now be extracted from our purchases of everyday goods, our social media exchanges, and our web searches. New data analytics constantly change the kinds of outcomes that become possible. They go beyond early identification of outbreaks and disease patterns to include predictions of the event’s trajectory or likelihood of reoccurrence [18, 19]. These new possibilities render good data governance, which ensures their ethical use, all the more complex.

Ethical oversight and governance

Public health surveillance and public health research are governed by national and international legislation and guidelines. However, many of these norms were developed in response to very different historical conditions, including technologies that have now been superseded [20]. Such mechanisms may not be appropriate or effective in addressing the new ethical challenges posed by DDD, nor the questions that will be raised if DDD is effectively integrated into standard public health systems. Health research utilizing social media data and other online datasets has already exerted pressure on existing research governance procedures [21].

Ethical Challenges

Against this background we have identified three clusters of ethical challenges facing DDD that require consideration (Table 1).

Table 1

Mapping the ethical issues in digital disease detection.

Categories	Ethical Challenges	Specific Examples	Values
Context sensitivity	Differentiating between commercial versus public health uses of data	Is identification permitted? Is consent required for DDD uses? If so, has consent been obtained? Can it be revoked?	Privacy and contextual integrity
	User agreements, terms of service, participatory epidemiology	Are users protected in all contexts irrespective of privacy laws that differ according to jurisdiction?	Transparency
	Global health issues	Are privately collected data open to global public health uses?	Global justice
Nexus of ethics and methodology	Robust methodology: algorithm validation, algorithm recalibration, noise filtering, and feedback mechanisms	False identification of outbreaks and inaccurate predictions of outbreak trajectory	Risk of harm
		Pressure to mobilize public health resources in light of rapidly spreading unvalidated predictions	Fair use of resources
	Data provenance	Awareness about public health uses of personal data (in aggregated form)	Trust, transparency, accountability
Legitimacy requirements	Best practice standards	Is there a shared code of practice amongst all those working on DDD?	Trustworthiness
	Monitoring bodies (policies for ongoing monitoring and action plans for correction of false results)	Is there a mechanism for quick response to inaccuracies about outbreaks?	Trust, transparency, accountability
	Paced integration of DDD to standard surveillance systems	Are there mechanisms for redressing harms caused by DDD activities?	Justice
	Communication to the public (prevent hype)	Management of expectations	Common good

A. Context sensitivity

At the crux of the debate on the ethics of big data lies a familiar, but formidably complex, question: how can big data be utilized for the common good whilst respecting individual rights and liberties, such as the right to privacy? What are the acceptable trade-offs between individual rights and the common good, and how do we determine the thresholds for such trade-offs? These ethical concerns and the tensions between them are not new to public health research and practice, but now they must be addressed in a new context, with the result that appropriate standards may vary according to the type of big data activity in question. It is clear that the context of DDD differs in significant ways from other types of big data activity concerned with health. DDD has a public health function, aiming ultimately to improve health at the population level. Public health is a common good from which all individuals benefit and one that is essential to human development and prosperity. There is a clear contrast here with forms of corporate activity that may use the exact same data (i.e., social networking data), but for other purposes, such as advertising. The former aims at fostering a public good (health); the latter at generating a corporate profit. Such differences have important ethical implications. A context-sensitive understanding of ethical obligations may reveal that some data uses that may not be acceptable within corporate activity (e.g., user profiling and data sharing with third parties) may be permissible for public health purposes. Furthermore, societal obligations to foster the common good of public health may generate duties on corporate data collectors to make data available for use in DDD. Pursuing this line of thought, it is arguable that privacy considerations that apply in standard public health practice will have to be creatively extended and adapted to the case of DDD. This will result in new standards that relate to data from a diverse range of sources, e.g., self-tracking, citizen scientists, social networks, volunteers, or other participatory contexts [22, 23]. Such new standards are urgently needed, especially as greater convergence of datasets becomes possible. An illustration of global activity on this front is the United Nations Global Pulse project [24]. This project explores the concept of data philanthropy whereby public–private partnerships are formed to share data for the public good. Such so-called data commons, operating on the basis of clear rules about privacy and codes of conduct, can profoundly affect disease surveillance and public health research more generally. Another dimension of context relates to global justice. Historically, new health tools have been predominantly used to improve the health of inhabitants of the better-off parts of the world. DDD projects that access global data are often less costly than traditional public health approaches. They could thus offer a potential breakthrough in early disease detection that would benefit communities throughout the world [25, 26]. However, this potential brings moral obligations in its train. This requires not only efforts to detect diseases in poorer parts of the world but also measures to ensure that the way data are collected and processed respect the rights and interests of people from these diverse regions and communities. This raises difficult questions of cultural relativity, such as whether standards of privacy can take different forms in relation to different cultures or whether some minimal core of uniform standards is also justified.

B. Nexus of ethics and methodology

Robust scientific methodology involves the validation of algorithms, an understanding of confounding, filtering systems for noisy data, managing biases, the selection of appropriate data streams, and so on. Some have expressed skepticism about the role that DDD can play in public health practice given its early state of development [27]. In 2013, when Google Flu Trends overestimated flu prevalence levels in the US, further concerns were raised about the sensitivity of this methodology to the digital environments created by users’ behavior—for example, different uses of search terms [28] from those used to develop the initial algorithm or the distorting influence of searches arising from media coverage of the flu [29, 30]. Methodological robustness is an ethical, not just a scientific, requirement. This is not only because limited resources are wasted on producing defective results or because trust in scientific findings is undermined by misleading or inaccurate findings. There is a further risk of harm to individuals, businesses, or communities if they are falsely identified as affected by an infectious disease. The harm can take many forms, including financial losses, such as a tourist region being falsely identified as the location of a disease outbreak; stigmatization of particular communities, which may adversely affect individual members; and even the infringement of individual freedoms, such as the freedom of movement of an individual falsely identified as a carrier of a particular disease. The issue of data provenance comes within the remit of ethically sound methodology. Currently published DDD studies and other initiatives have mostly used data that are in the public domain (e.g., Twitter) or that have been contributed by individuals with their explicit consent for use in disease surveillance (flunearyou.org). While in principle data in the public domain are open to being used for public health purposes, what constitutes public domain on the Internet is the subject of lively debate [31]. Especially in the context of data derived from social network interactions, it remains unclear whether users understand in what ways their data can be used and who may access them [32]. Any DDD project will inevitably have to navigate this uncertain environment and so must exercise diligence about data provenance and exhibit transparency about its uses.

C. Bootstrapping legitimacy

Legitimacy concerns the extent to which DDD is actually ethically justified in imposing the compliance burdens that it does and also the extent to which it is perceived to be ethically justified. In recent years the concept of “global health security” has been mobilized by international organizations, nongovernmental organizations, and national governments to strengthen the legitimacy of systems of disease surveillance both nationally and globally. The idea of human security has been expanded to include health (protection from infectious diseases and other health hazards), augmenting state responsibilities to provide appropriate safeguards. The revised International Health Regulations [33], which set out a global legal framework for disease detection and response, are premised on the understanding that in our globalized world diseases spread rapidly and therefore on the need for the timely notification of any public health threat of potentially international significance. They also recognize the importance of information gathering from various sources, including unofficial or informal ones, whilst also requiring that the validity of such information be verified [34]. This creates a legitimate space for DDD activities because they are precisely responses to both the accelerated detection and the global nature of the spread of disease. However, even if ethical arguments already justify the DDD enterprise, they only serve as a starting point. DDD will have to build its own legitimacy over time as an integral part of its approach. This means that the issues under categories A and B have to be constantly engaged with thorough processes that bootstrap DDD’s legitimacy, so it is continuously self-generating and enhanced over time. So, for example, it is not enough simply to appeal to the great contribution that DDD stands to make to the common good of public health. It is important that this contribution is made in certain ways rather than others, through transparent procedures that are worthy of engendering trust among those individuals whose data are used in DDD. Current regulatory and ethical oversight mechanisms are ill-equipped to address the entire spectrum of DDD-type activities. The distinction between public health and public health research has long been considered a problematic one, and this is even more evident in the DDD context. Consider an analogy with participant-led biomedical research—a growing movement of people collecting data about themselves and conducting various forms of research in large groups. Either such activities fall through the cracks of the existing oversight mechanisms or else, if they do not, those mechanisms impose inappropriate burdens upon them [35, 36]. Participatory approaches to disease surveillance confront similar challenges. Individuals report on disease symptoms on online platforms, (e.g., flunearyou.org) which enables them to contribute to the common good of disease surveillance and often to receive feedback about disease prevalence in their area [37]. This active participation potentially empowers individuals and democratizes the process of scientific discovery. However, data (personally identifiable information, geolocation, etc.) that are collected for DDD purposes need to be governed in ways that minimize the risk of harm to participants. For example, if individuals take personal risks in order to report events of public health importance (i.e., a farmer reporting avian flu at risk of losing his flock), those risks should be mitigated by appropriate policies (e.g., compensation) that acknowledge the societal contribution and the local/personal costs. For the purposes of ensuring its legitimacy, DDD must develop internal mechanisms such as its own best-practice standards, including monitoring boards with the concrete mandate to ensure that risks and costs to individuals and communities are proportional to benefits. Such boards should also be empowered to negotiate compensation schemes for harms that have been suffered. As in standard public health practice individuals may be adversely affected by a practice that aims to secure the health of the population. However, this laudable goal does not remove the obligation to respect individual rights and dignity in its pursuit. Neither of these standards are to be equated with an automatic insistence on individual consent. Instead, they consist of distinct individual entitlements, of the sort set out in the Universal Declaration of Human Rights, and the inherent value in all human beings, which underlies them.

Conclusions

The emergence of DDD promises tangible global public health benefits, but these are accompanied by significant ethical challenges. While some of the challenges are inherent to public health practice and are only accentuated by the use of digital tools, others are specific to this approach and largely unprecedented. They span a wide spectrum, ranging from risks to individual rights, such as privacy and concerns about autonomy, to individuals’ obligations to contribute to the common good and the demands of transparency and trust. We have grouped these concerns under the headings of context sensitivity, nexus of ethics and methodology, and bootstrapping legitimacy. It is vital that engagement with these challenges comes to be seen as part of the development of DDD itself, not as some extrinsic constraint. We intend this paper to be a contribution to the development of a more comprehensive and concrete ethical framework for DDD, one that will enable DDD to find an ethical pathway to realizing its great potential for public health.

28 in total

1. Public health. Ethics and the conduct of public health surveillance.

Authors: Amy L Fairchild; Ronald Bayer
Journal: Science Date: 2004-01-30 Impact factor: 47.728

2. Ethical issues in health research with novel online sources.

Authors: Effy Vayena; Anna Mastroianni; Jeffrey Kahn
Journal: Am J Public Health Date: 2012-10-18 Impact factor: 9.308

3. Web-scale pharmacovigilance: listening to signals from the crowd.

Authors: Ryen W White; Nicholas P Tatonetti; Nigam H Shah; Russ B Altman; Eric Horvitz
Journal: J Am Med Inform Assoc Date: 2013-03-06 Impact factor: 4.497

4. Why Big Data Won't Cure Us.

Authors: Gina Neff
Journal: Big Data Date: 2013-09 Impact factor: 2.128

Review 5. Routes for breaching and protecting genetic privacy.

Authors: Yaniv Erlich; Arvind Narayanan
Journal: Nat Rev Genet Date: 2014-05-08 Impact factor: 53.242

6. Participatory epidemiology: use of mobile phones for community-based health reporting.

Authors: Clark C Freifeld; Rumi Chunara; Sumiko R Mekaru; Emily H Chan; Taha Kass-Hout; Anahi Ayala Iacucci; John S Brownstein
Journal: PLoS Med Date: 2010-12-07 Impact factor: 11.069

7. Influenza A (H1N1) virus, 2009--online monitoring.

Authors: John S Brownstein; Clark C Freifeld; Lawrence C Madoff
Journal: N Engl J Med Date: 2009-05-07 Impact factor: 91.245

8. Assessing Google flu trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic.

Authors: Samantha Cook; Corrie Conrad; Ashley L Fowlkes; Matthew H Mohebbi
Journal: PLoS One Date: 2011-08-19 Impact factor: 3.240

Review 9. Global public health security.

Authors: Guénaël Rodier; Allison L Greenspan; James M Hughes; David L Heymann
Journal: Emerg Infect Dis Date: 2007-10 Impact factor: 6.883

Review 10. Social media and internet-based data in global systems for public health surveillance: a systematic review.

Authors: Edward Velasco; Tumacha Agheneza; Kerstin Denecke; Göran Kirchner; Tim Eckmanns
Journal: Milbank Q Date: 2014-03 Impact factor: 4.911

72 in total

1. Tracking Health Related Discussions on Reddit for Public Health Applications.

Authors: Albert Park; Mike Conway
Journal: AMIA Annu Symp Proc Date: 2018-04-16

2. Evidence from big data in obesity research: international case studies.

Authors: Emma Wilkins; Ariadni Aravani; Amy Downing; Adam Drewnowski; Claire Griffiths; Stephen Zwolinsky; Mark Birkin; Seraphim Alvanides; Michelle A Morris
Journal: Int J Obes (Lond) Date: 2020-01-27 Impact factor: 5.095

3. Meta-analysis in the era of big data.

Authors: Lucía Silva-Fernández; Loreto Carmona
Journal: Clin Rheumatol Date: 2019-07-04 Impact factor: 2.980

4. Ethics and Epistemology in Big Data Research.

Authors: Wendy Lipworth; Paul H Mason; Ian Kerridge; John P A Ioannidis
Journal: J Bioeth Inq Date: 2017-03-20 Impact factor: 1.352

Review 5. A Primer on Infectious Disease Bacterial Genomics.

Authors: Tarah Lynch; Aaron Petkau; Natalie Knox; Morag Graham; Gary Van Domselaar
Journal: Clin Microbiol Rev Date: 2016-09-07 Impact factor: 26.132

Review 6. Twitter as a Tool for Health Research: A Systematic Review.

Authors: Lauren Sinnenberg; Alison M Buttenheim; Kevin Padrez; Christina Mancheno; Lyle Ungar; Raina M Merchant
Journal: Am J Public Health Date: 2016-11-17 Impact factor: 9.308

Review 7. Big Data in Health: a Literature Review from the Year 2005.

Authors: Isabel de la Torre Díez; Héctor Merino Cosgaya; Begoña Garcia-Zapirain; Miguel López-Coronado
Journal: J Med Syst Date: 2016-08-13 Impact factor: 4.460

8. Scraping the Web for Public Health Gains: Ethical Considerations from a 'Big Data' Research Project on HIV and Incarceration.

Authors: Stuart Rennie; Mara Buchbinder; Eric Juengst; Lauren Brinkley-Rubinstein; Colleen Blue; David L Rosen
Journal: Public Health Ethics Date: 2020-03-11 Impact factor: 1.940

9. Constructing Ebola transmission chains from West Africa and estimating model parameters using internet sources.

Authors: W B P Pettey; M E Carter; D J A Toth; M H Samore; A V Gundlapalli
Journal: Epidemiol Infect Date: 2017-05-02 Impact factor: 2.451

10. Health Research with Big Data: Time for Systemic Oversight.

Authors: Effy Vayena; Alessandro Blasimme
Journal: J Law Med Ethics Date: 2018-03-27 Impact factor: 1.718