| Literature DB >> 31152528 |
Raphaël Chevrier1,2, Vasiliki Foufi1,2, Christophe Gaudet-Blavignac1,2, Arnaud Robert1,2, Christian Lovis1,2.
Abstract
BACKGROUND: The secondary use of health data is central to biomedical research in the era of data science and precision medicine. National and international initiatives, such as the Global Open Findable, Accessible, Interoperable, and Reusable (GO FAIR) initiative, are supporting this approach in different ways (eg, making the sharing of research data mandatory or improving the legal and ethical frameworks). Preserving patients' privacy is crucial in this context. De-identification and anonymization are the two most common terms used to refer to the technical approaches that protect privacy and facilitate the secondary use of health data. However, it is difficult to find a consensus on the definitions of the concepts or on the reliability of the techniques used to apply them. A comprehensive review is needed to better understand the domain, its capabilities, its challenges, and the ratio of risk between the data subjects' privacy on one side, and the benefit of scientific advances on the other.Entities:
Keywords: anonymisation; anonymization; confidentiality; data protection; de-identification; deidentification; privacy; pseudonymization; scoping review; secondary use
Mesh:
Year: 2019 PMID: 31152528 PMCID: PMC6658290 DOI: 10.2196/13484
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Categories of information used to collect quantitative and qualitative data from the reviewed articles.
| Type of data | Categories of information |
| Quantitative | Journal Year of publication Author(s) Authors’ backgrounds Authors’ places of work Presence of the terms “de-identification” and “anonymization” Definitions of the terms “de-identification” and “anonymization” Meanings given to the terms “de-identification” and “anonymization” |
| Qualitative | Purposes of de-identification and anonymization Limitations of the privacy-enhancing techniques Ethical or legal considerations Suggestions and recommendations Data utility and information loss Data sharing in biomedical research Types of data subjected to anonymization or de-identification Public opinion on privacy-enhancing techniques and health data sharing |
Figure 1Architecture and breakdown of the search query with the number of records at each level. [ti]: Title; [tiab]: Title/Abstract.
Figure 2Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram for the scoping review process (ie, screening, eligibility, and inclusion).
Characteristics of the 60 articles included in the review and of the journals where they were published.
| Characteristics | Count (N=60), n (%) | |
| 2008-2009 | 5 (8) | |
| 2010-2011 | 11 (18) | |
| 2012-2013 | 10 (17) | |
| 2014-2015 | 11 (18) | |
| 2016-2017 | 23 (38) | |
| Biomedical informatics | 32 (53) | |
| Engineering | 8 (13) | |
| Public health, methodology, and epidemiology | 6 (10) | |
| Bioethics and law & health policies | 5 (8) | |
| Medicine: biomedical sciences | 5 (8) | |
| Medicine: clinical | 4 (7) | |
Presence of definitions for the terms de-identification or anonymization in the reviewed articles.
| Terms with definitions | Count (N=60), n (%) | |
| De-identification | 26 (43) | |
| Anonymization | 12 (20) | |
| Both | 9 (15) | |
| None | 31 (52) |
Examples of attempts to define the terms de-identification or anonymization.
| Terms | Definitions |
| De-identification | “For clinical data to be considered de-identified, the HIPAA ‘Safe Harbor’ technique requires 18 data elements (called PHI: Protected Health Information) to be removed...de-identification only means that explicit identifiers are hidden or removed.” [ |
| Anonymization | “The anonymization consists in removing the patients’ names from the records: unfortunately, other pieces of information enable to identify the patients.” [ |
Researchers’ understanding of de-identification and anonymization as similar or different concepts.
| Use of the terms in the articles | Count (N=60), n (%) |
| Only use or discuss one concept | 19 (32) |
| De-identification and anonymization are two different concepts | 19 (32) |
| De-identification and anonymization are used interchangeably | 19 (32) |
| Ambiguous with regard to the meaning of both terms | 3 (5) |
Background points awarded to the authors of the reviewed articles. The authors are separated by authorship position: first, second, and last.
| Research field | First author (N=92), n (%) | Second author (N=72), n (%) | Last author (N=84), n (%) | Total count (N=248), n (%) |
| Computer science | 36 (14) | 26 (10) | 29 (12) | 91 (36.7) |
| Biomedical informatics | 16 (6) | 15 (6) | 16 (6) | 47 (19.0) |
| Medicine (MDa) | 13 (5) | 9 (4) | 16 (6) | 38 (15.3) |
| Epidemiology and statistics | 6 (2) | 3 (1) | 7 (3) | 16 (6.5) |
| Mathematics and biomathematics | 6 (2) | 5 (2) | 5 (2) | 16 (6.5) |
| Law | 3 (1) | 3 (1) | 2 (1) | 8 (3.2) |
| Psychology | 2 (1) | 3 (1) | 2 (1) | 7 (2.8) |
| Linguistics | 2 (1) | 0 (0) | 2 (1) | 4 (1.6) |
| Project management | 1 (0) | 1 (0) | 1 (0) | 3 (1.2) |
| Bioethics and humanities | 1 (0) | 2 (1) | 0 (0) | 3 (1.2) |
| Public health | 1 (0) | 0 (0) | 1 (0) | 2 (0.8) |
| Neuroscience | 2 (1) | 0 (0) | 0 (0) | 2 (0.8) |
| Behavioral economy | 0 (0) | 2 (1) | 0 (0) | 2 (0.8) |
| Journalism | 1 (0) | 1 (0) | 0 (0) | 2 (0.8) |
| Biology and microbiology | 1 (0) | 0 (0) | 1 (0) | 2 (0.8) |
| Physics | 1 (0) | 1 (0) | 0 (0) | 2 (0.8) |
| Health care administration | 0 (0) | 0 (0) | 1 (0) | 1 (0.4) |
| Ecology and evolution | 0 (0) | 1 (0) | 0 (0) | 1 (0.4) |
| Business (MBAb) | 0 (0) | 0 (0) | 1 (0) | 1 (0.4) |
aMD: Doctor of Medicine.
bMBA: Master of Business Administration.
Figure 3Representation of the 60 publications according to the date of publication, the number of articles per year, and the authors’ locations. The size of the discs used on the graph represents each country’s contribution in number of articles over the studied period (10 years). The exact count is shown between brackets next to each country’s name.