Literature DB >> 29961892

Where is the human in the data? A guide to ethical data use.

Abstract

Being asked to write about the ethics of big data is a bit like being asked to write about the ethics of life. Big data is now integral to so many aspects of our daily lives-communication, social interaction, medicine, access to government services, shopping, and navigation. Given this diversity, there is no one-size-fits-all framework for how to ethically manage your data. With that in mind, I present seven ethical values for responsible data use.

Entities: Disease Gene Species

Mesh：

Year: 2018 PMID： 29961892 PMCID： PMC6047448 DOI： 10.1093/gigascience/giy076

Source DB: PubMed Journal: Gigascience ISSN： 2047-217X Impact factor: 6.524

Introduction

Data is ubiquitous because it is so useful. This means that many different parties—data subjects and sources, associated communities, researchers, governments, and businesses—will have competing interests in relation to the data. Just as we make trade-offs in our daily life (to walk or to drive to work? doughnut vs. salad for lunch?), we need to make trade-offs about competing interests in relation to data. I am talking about interests, rather than rights. Note that many parties who don't have legal rights to control access to and use of data, may nonetheless have compelling interests in the data. Responsible data use requires attention to these broad interests. Facebook's recent troubles highlight this. Even if Facebook was legally entitled to share users’ data with Cambridge Analytica, Facebook massively underestimated users’ interests and expectations in relation to privacy, control, and appropriate use. In areas of rapid progress, such as data science, practice can quickly outstrip the legal framework. Data use may be within the parameters of the law (e.g., data protection or privacy regulation) but may nonetheless be unethical and/or outside the social licence. We should be aiming to align the social licence, ethics, and the law to ensure that data use is publicly acceptable, normatively justified, and legal. Where there is misalignment of the law, ethics, and the social licence, data users need to tread carefully.

Ethical deliberation

Following is a list of ethical values, also depicted in Fig. 1, that can help identify who has an interest in the data and where these interests might clash; help data holders to articulate the ethical trade-offs that need to be made; and guide deliberation about responsible data use. The values often clash—maximizing data security will conflict with maximizing social value through broader data use. Under different circumstances, priority will appropriately be given to different values. This process is about making informed, explicit, and justifiable trade-offs, rather than following a set of prescribed rules.

Figure 1:

An infographic summarizing the ethical values.

Social value

Data is in demand because it has value. Data can contribute to knowledge and innovation, drive efficiency, reduce harm from ineffective or poorly targeted services, and reduce costs. Open data is important to drive the advancement of scientific knowledge, preserve datasets, test and verify conclusions, refine algorithms, and safeguard against misconduct.

Harm minimization

Data collection, storage, and use should be designed to minimize and manage risks of harm. Harms can be physical, economic, psychological, or reputational and can be experienced by individuals, communities, or organizations. Anonymization (pseudonymization and de-identification) has been the cornerstone of protecting individual data subjects from harm. However, anonymization is failing in the era of big data, where there are hundreds of thousands of data points for a single individual [1]. Data scientists have proven repeatedly that they can re-identify individuals in supposedly anonymous datasets [2]. Furthermore, anonymization and de-identification do little to protect communities from harm. Data analytics and artificial intelligence (AI) are increasingly used to characterize the behavior of communities and inform the delivery of services. Data can be used to stigmatize or discriminate.

Control

Control refers to the capacity for data subjects to be autonomous and self-determining. Were data subjects asked for their consent at the point of data collection? To what degree will data subjects’ preferences determine how the data is used? Is this a secondary use of the data that differs from the original consent? Is the data use novel and original or is it likely to be consistent with the expectations of data subjects? Various models of consent have been proposed for data, including broad consent, dynamic consent [3], and meta-consent [4]. However, much data use (especially linking and secondary uses) occurs without consent. In these cases, data users need to be safe stewards of the data. Transparency, engagement, and accountability are especially important for data used without consent [5].

Justice

Justice concerns the equitable treatment of those with an interest in the data activities, including the fair distribution of any benefits and burdens arising from the collection, storage, use, linkage, and sharing of data. The term “benefit sharing” was first used in relation to non-human genetic resources in the Convention on Biological Diversity adopted at the Earth Summit in Rio de Janeiro, Brazil, in 1992. Benefit sharing requires that the advantages/profits derived from the data are shared fairly among the data providers and the community from which the data originates. Recent data advocacy, especially in relation to indigenous data, has moved away from “benefit sharing” toward “power sharing,” arguing that data subjects and communities should have decision-making capacity in relation to data governance and use [6].

Trustworthiness

Trustworthiness is the property of being worthy of trust. It can apply to individuals, organizations, and institutions but also relates to data quality, systems of knowledge production, scientific integrity, and professional standards [7]. When judging trustworthiness, we look for truthfulness, reliability, and consistency but also goodwill. A robust data ecosystem requires a high level of trust. A breach of trust can affect not only the agents involved but an entire profession or institution. The dispute between Arizona State University and members of the Havasupai Indian tribe over the use of genetic samples for research left a legacy of mistrust and fear of exploitation [8]. As Tuhiwai Smith famously argued, “’Research’ is probably one of the dirtiest words in the indigenous world's vocabulary” [9]. Trust, when lost, can take significant efforts to rebuild [10].

Transparency

Transparency is openness and accessibility in decision making and actions. When the data activity occurs without the data subjects’ consent and is justified on the grounds of “social value,” the arguments in favor of transparency and openness are especially compelling. Transparency helps to demonstrate respect for data subjects and trustworthiness, and it underpins public engagement and accountability. Full transparency would include a public description of the data activity, purpose and justification, anticipated social value, harm-mitigation strategies, public engagement strategies, level of security and encryption, research results, and the coding/algorithms. When launching a £1.5 billion initiative in AI in April 2018, France's President Macron announced that anyone receiving AI funding money from the government will be required to make their algorithms open and transparent.

Accountability

Accountability refers to holding data users and custodians responsible for the consequences of their decisions and actions. Data regulation is increasingly focused on accountability. A significant innovation in the EU General Data Protection Regulation (GDPR) (which came into force in May 2018) is the introduction of “accountability” (Article 5(2)) to the list of principles relating to personal data. Under the GDPR, organizations will need to be more intentional about their data collection and use and maintain open lines of communication with data subjects.

Conclusion

Given these competing values, there will be multiple different “ethical” solutions to data management. The task is to identify the ethical issues, reason through how to balance conflicting demands, articulate the trade-offs, and justify the conclusions. Do this as publically and transparently as possible, and make time to revise and re-assess. We use data to tell stories, to make sense of the world. This means telling stories about people and how they live. Data has the appealing veneer of scientific objectivity, but the process of telling stories is never ethically neutral. Our starting point should be to ask: Where is the human in the data? What would this data use look like from the data subjects’ perspective?

Abbreviations

AI: artificial intelligence; GDPR: General Data Protection Regulation.

Competing interests

The author declares that she has no competing interests. Click here for additional data file.

7 in total

1. The Havasupai Indian tribe case--lessons for research involving stored biologic samples.

Authors: Michelle M Mello; Leslie E Wolf
Journal: N Engl J Med Date: 2010-06-09 Impact factor: 91.245

2. Identity and privacy. Unique in the shopping mall: on the reidentifiability of credit card metadata.

Authors: Yves-Alexandre de Montjoye; Laura Radaelli; Vivek Kumar Singh; Alex Sandy Pentland
Journal: Science Date: 2015-01-30 Impact factor: 47.728

3. Consent and the ethical duty to participate in health data research.

Authors: Angela Ballantyne; G Owen Schaefer
Journal: J Med Ethics Date: 2018-01-22 Impact factor: 2.903

4. The social licence for research: why care.data ran into trouble.

Authors: Pam Carter; Graeme T Laurie; Mary Dixon-Woods
Journal: J Med Ethics Date: 2015-01-23 Impact factor: 2.903

5. Moving from trust to trustworthiness: Experiences of public engagement in the Scottish Health Informatics Programme.

Authors: Mhairi Aitken; Sarah Cunningham-Burley; Claudia Pagliari
Journal: Sci Public Policy Date: 2016-05-11

6. Dynamic consent: a patient interface for twenty-first century research networks.

Authors: Jane Kaye; Edgar A Whitley; David Lund; Michael Morrison; Harriet Teare; Karen Melham
Journal: Eur J Hum Genet Date: 2014-05-07 Impact factor: 4.246

7. Meta Consent - A Flexible Solution to the Problem of Secondary Use of Health Data.

Authors: Thomas Ploug; Søren Holm
Journal: Bioethics Date: 2016-09-15 Impact factor: 1.898

7 in total

4 in total

1. Public Health Research, Practice, and Ethics for Justice-Involved Persons in the Big Data Era.

Authors: David L Rosen; Mara Buchbinder; Eric Juengst; Stuart Rennie
Journal: Am J Public Health Date: 2020-01 Impact factor: 9.308

2. Preferences for Accessing Electronic Health Records for Research Purposes: Views of Parents Who Have a Child With a Known or Suspected Genetic Condition.

Authors: Melissa Raspa; Ryan S Paquin; Derek S Brown; Sara Andrews; Anne Edwards; Rebecca Moultrie; Laura Wagner; MaryKate Frisch; Lauren Turner-Brown; Anne C Wheeler
Journal: Value Health Date: 2020-10-26 Impact factor: 5.725

3. A scalable software solution for anonymizing high-dimensional biomedical data.

Authors: Thierry Meurers; Raffael Bild; Kieu-Mi Do; Fabian Prasser
Journal: Gigascience Date: 2021-10-04 Impact factor: 6.524

4. Big Data and Public-Private Partnerships in Healthcare and Research: The Application of an Ethics Framework for Big Data in Health and Research.

Authors: Angela Ballantyne; Cameron Stewart
Journal: Asian Bioeth Rev Date: 2019-09-30

4 in total