| Literature DB >> 32477684 |
Xing Song1, Lemuel R Waitman1, Yong Hu2, Bo Luo3, Fengjun Li3, Mei Liu1.
Abstract
Artificial intelligence enabled medical big data analysis has the potential to revolutionize medical practice from diagnosis and prediction of complex diseases to making recommendations and resource allocation decisions in an evidence-based manner. However, big data comes with big disclosure risks. To preserve privacy, excessive data anonymization is often necessary, leading to significant loss of data utility. In this paper, we develop a systematic data scrubbing procedure for large datasets when key variables are uncertain for re-identification risk assessment and assess the trade-off between anonymization of electronic health record data for sharing in support of open science and performance of machine learning models for early acute kidney injury risk prediction using the data. Results demonstrate that our proposed data scrubbing procedure can maintain good feature diversity and moderate data utility but raises concerns regarding its impact on knowledge discovery capability. ©2020 AMIA - All rights reserved.Entities:
Keywords: Acute Kidney Injury; Data Anonymization; Data utility; Medical Big Data; Re-identification risk
Year: 2020 PMID: 32477684 PMCID: PMC7233037
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc