| Literature DB >> 35372185 |
Sara Jordan1, Clara Fontaine2, Rachele Hendricks-Sturrup3.
Abstract
Privacy protection for health data is more than simply stripping datasets of specific identifiers. Privacy protection increasingly means the application of privacy-enhancing technologies (PETs), also known as privacy engineering. Demands for the application of PETs are not yet met with ease of use or even understanding. This paper provides a scope of the current peer-reviewed evidence regarding the practical use or adoption of various PETs for managing health data privacy. We describe the state of knowledge of PETS for the use and exchange of health data specifically and build a practical perspective on the steps needed to improve the standardization of the application of PETs for diverse uses of health data.Entities:
Keywords: electronic health records; genomic; health data; machine learning; privacy; privacy engineering
Mesh:
Year: 2022 PMID: 35372185 PMCID: PMC8967420 DOI: 10.3389/fpubh.2022.814163
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
A survey of the peer-reviewed literature on use of PETs for healthcare data.
|
|
| |||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
| AND (”electronic health record“ OR ”electronic medical record“ OR ”EHR“ OR ”EMR“ OR “PHR”) | IEEE: 0 ACM: 0 PubMed: 0 | IEEE: 2 | IEEE: 6 ACM: 8 PubMed: 9 | IEEE: 27 | IEEE: 7 ACM: 80 PubMed: 5 | IEE: 27 |
| AND (”direct to consumer genetic testing“ OR ”consumer genetic testing“ OR ”ancestry testing“ OR ”genetic testing“ OR ”personalized medicine“) | IEEE: 0 ACM: 0 PubMed: 0 | IEEE: 4 | IEEE: 0 ACM: 2 PubMed: 3 | IEEE: 4 | IEEE: 1 ACM: 31 PubMed: 0 | IEEE: 9 |
| AND (“medical” OR “health”) AND (”direct to consumer artificial intelligence“ OR ”consumer artificial intelligence“ OR “artificial intelligence” OR “machine learning”) | IEEE: 0 ACM: 0 PubMed: 0 | IEEE: 2 | IEEE: 3 ACM: 7 PubMed: 17 | IEEE: 4 | IEEE: 3 ACM: 36 PubMed: 4 | IEEE: 87 |
| AND (”medicine“ OR ”medical“ OR ”health“) AND (”mobile app“ OR ”mobile application“ OR “mobile”) | IEEE: 5 ACM: 0 PubMed: 0 | IEEE: 1 | IEEE: 28 ACM: 82 PubMed: 4 | IEEE: 3 | IEEE: 15 ACM: 274 PubMed: 4 | IEEE: 11 |
| AND (”medicine“ OR ”health“ OR “medical”) AND (”x-ray“ OR ”imaging“ OR ”CT“ OR ”MRI“ OR ”PET") | IEEE: 0 ACM: 0 | IEEE: 8 | IEEE: 0 ACM: 11 PubMed: 72 | IEEE: 58 | IEEE: 0 ACM: 89 PubMed: 1 | IEEE: 0 |
Typology of federated learning.
|
|
| |
| Decentralized | Data remains on users' devices or on facility servers and models are sent to those devices or servers for training. Weights are sent back to the model server in raw or aggregate form. This may also be called cross-silo federated learning. | Model components are partitioned and sent to a sample of devices, which train model partitions on device and returns weights to the model server directly or |
| Centralized | Data is centralized then partitioned for sharing out to others to boost their data and local model training. This may also be called data center distributed learning. | Data from decentralized devices is ingested into a central location for model training and new models are sent back to disaggregated locations after training on the centralized data and servers. ( |
Figure 1Federated multi-task learning topology. (A) Cloud-based distributed learning; (B) Centralized ederated learning; (C) Decentralized federated learning; (D) Centralized communication topology with decentralized parameter exchanging topology. Adopted from He et al. (48).
Key characteristics and considerations for each pet.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Differential Privacy | Adds noise to a dataset to reduce an adversary's ability to tell whether an individual is part of the dataset Some variations improve data utility at the cost of weaker privacy protection | Publishing or sharing data to satisfy research needs | Provides measurable privacy guarantees | Privacy-utility tradeoff | Comparable and consistent reporting between DP variations of types and granularity of at-risk private information |
| Homomorphic Encryption | Encryption scheme that enables private computation over encrypted sensitive data Partial, somewhat, and fully homomorphic encryption | Third-party computation | Provides a high level of privacy Compatible with most data types | Inefficient, expensive, and complex | Explore more diverse and lightweight variations of HE especially for resource-constrained environments Analyze performance-privacy tradeoffs carefully |
| Zero-knowledge proofs | Verification of sensitive data between collaborators without explicitly transferring data | Identity and attribute verification | No direct transfer of sensitive health data Space, power, and computationally efficient | Poorly characterized and infrequently discussed in health data research | Explore practical applications with health data and characterize performance and privacy |
| Federated learning | Collaborative ML modeling while keeping training data local to data owners Decentralized or centralized for both data and model | Collaborative ML with theoretically any type of algorithm or data | Enables ML training with more diverse data Reduced computational load for institutions or devices Private data never moves beyond the firewalls of institutions or devices Provides a high level of data sovereignty to owners | Hard to establish a true baseline of privacy across learning system | Identify when a federated approach is the best choice for the specific reason of protecting data privacy Address challenges of interoperability Consistently characterize the tradeoffs between privacy, utility, and performance across different FL approaches to aid decision-making |
| Multi-party computation | Computation across multiple encrypted data sources while ensuring no party learns the private data of another Includes secret sharing, garbled circuits, oblivious transfer | Collaborative inference | Strong privacy protections for all participating parties No need for a trusted third-party High accuracy and precision | Communication and computational complexity are too high to use reasonably at scale and in resource-constrained environments | Develop more practical SMC solutions for resource-constrained environments and computations at scale |
| Synthetic Data | Synthesizing data to use instead of or in addition to real health data | Supports rapid development and benchmarking of ML algorithms | It may be the most effective way to maximize privacy Increasingly easy and cost-efficient to implement | Limited methods to generate realistic data | Develop diverse methods to generate realistic synthetic data of all data types |
| Digital twinning | Virtual representations of what has been manufactured | A virtual counterpart to persons or hospitals to test tools like ML models | A real-time simulated environment without risk of exposing private data | Application in healthcare is primarily theoretical | Develop practical applications of digital twins in healthcare Characterize privacy protections |
PET(s), privacy enhancing technology(ies); ML, machine learning; FL, federated learning; SMC, Secure multiparty computation.