Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Fold-stratified cross-validation for unbiased and privacy-preserving federated learning.

Literature DB >> 32620945

Fold-stratified cross-validation for unbiased and privacy-preserving federated learning.

Romain Bey¹, Romain Goussault², François Grolleau¹, Mehdi Benchoufi¹, Raphaël Porcher¹.

Abstract

OBJECTIVE: We introduce fold-stratified cross-validation, a validation methodology that is compatible with privacy-preserving federated learning and that prevents data leakage caused by duplicates of electronic health records (EHRs).
MATERIALS AND METHODS: Fold-stratified cross-validation complements cross-validation with an initial stratification of EHRs in folds containing patients with similar characteristics, thus ensuring that duplicates of a record are jointly present either in training or in validation folds. Monte Carlo simulations are performed to investigate the properties of fold-stratified cross-validation in the case of a model data analysis using both synthetic data and MIMIC-III (Medical Information Mart for Intensive Care-III) medical records.
RESULTS: In situations in which duplicated EHRs could induce overoptimistic estimations of accuracy, applying fold-stratified cross-validation prevented this bias, while not requiring full deduplication. However, a pessimistic bias might appear if the covariate used for the stratification was strongly associated with the outcome. DISCUSSION: Although fold-stratified cross-validation presents low computational overhead, to be efficient it requires the preliminary identification of a covariate that is both shared by duplicated records and weakly associated with the outcome. When available, the hash of a personal identifier or a patient's date of birth provides such a covariate. On the contrary, pseudonymization interferes with fold-stratified cross-validation, as it may break the equality of the stratifying covariate among duplicates.
CONCLUSION: Fold-stratified cross-validation is an easy-to-implement methodology that prevents data leakage when a model is trained on distributed EHRs that contain duplicates, while preserving privacy.

Entities: Species

Keywords: data leakage; duplicated electronic health records; federated learning; privacy; validation

Mesh：

Year: 2020 PMID： 32620945 PMCID： PMC7647321 DOI： 10.1093/jamia/ocaa096

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

33 in total

1. WebDISCO: a web service for distributed cox model learning without patient-level data sharing.

Authors: Chia-Lun Lu; Shuang Wang; Zhanglong Ji; Yuan Wu; Li Xiong; Xiaoqian Jiang; Lucila Ohno-Machado
Journal: J Am Med Inform Assoc Date: 2015-07-09 Impact factor: 4.497

2. Identifying personal genomes by surname inference.

Authors: Melissa Gymrek; Amy L McGuire; David Golan; Eran Halperin; Yaniv Erlich
Journal: Science Date: 2013-01-18 Impact factor: 47.728

3. Genetics. Genealogy databases enable naming of anonymous DNA donors.

Authors: John Bohannon
Journal: Science Date: 2013-01-18 Impact factor: 47.728

4. Matching identifiers in electronic health records: implications for duplicate records and patient safety.

Authors: Allison B McCoy; Adam Wright; Michael G Kahn; Jason S Shapiro; Elmer Victor Bernstam; Dean F Sittig
Journal: BMJ Qual Saf Date: 2013-01-29 Impact factor: 7.035

5. The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care.

Authors: Matthieu Komorowski; Leo A Celi; Omar Badawi; Anthony C Gordon; A Aldo Faisal
Journal: Nat Med Date: 2018-10-22 Impact factor: 53.440

6. Gaps in health information exchange between hospitals that treat many shared patients.

Authors: Jordan Everson; Julia Adler-Milstein
Journal: J Am Med Inform Assoc Date: 2018-09-01 Impact factor: 4.497

7. Google DeepMind and healthcare in an age of algorithms.

Authors: Julia Powles; Hal Hodson
Journal: Health Technol (Berl) Date: 2017-03-16

Review 8. On the privacy-conscientious use of mobile phone data.

Authors: Yves-Alexandre de Montjoye; Sébastien Gambs; Vincent Blondel; Geoffrey Canright; Nicolas de Cordes; Sébastien Deletaille; Kenth Engø-Monsen; Manuel Garcia-Herranz; Jake Kendall; Cameron Kerry; Gautier Krings; Emmanuel Letouzé; Miguel Luengo-Oroz; Nuria Oliver; Luc Rocher; Alex Rutherford; Zbigniew Smoreda; Jessica Steele; Erik Wetter; Alex Sandy Pentland; Linus Bengtsson
Journal: Sci Data Date: 2018-12-11 Impact factor: 6.444

9. The accuracy, fairness, and limits of predicting recidivism.

Authors: Julia Dressel; Hany Farid
Journal: Sci Adv Date: 2018-01-17 Impact factor: 14.136

10. Privacy-preserving record linkage in large databases using secure multiparty computation.

Authors: Peeter Laud; Alisa Pankova
Journal: BMC Med Genomics Date: 2018-10-11 Impact factor: 3.063

4 in total

1. Patchless Multi-Stage Transfer Learning for Improved Mammographic Breast Mass Classification.

Authors: Gelan Ayana; Jinhyung Park; Se-Woon Choe
Journal: Cancers (Basel) Date: 2022-03-01 Impact factor: 6.639

2. Comparative Multicentric Evaluation of Inter-Observer Variability in Manual and Automatic Segmentation of Neuroblastic Tumors in Magnetic Resonance Images.

Authors: Diana Veiga-Canuto; Leonor Cerdà-Alberich; Cinta Sangüesa Nebot; Blanca Martínez de Las Heras; Ulrike Pötschger; Michela Gabelloni; José Miguel Carot Sierra; Sabine Taschner-Mandl; Vanessa Düster; Adela Cañete; Ruth Ladenstein; Emanuele Neri; Luis Martí-Bonmatí
Journal: Cancers (Basel) Date: 2022-07-27 Impact factor: 6.575

Review 3. Use of Social Media Data to Diagnose and Monitor Psychotic Disorders: Systematic Review.

Authors: Benoit-Marie Robaglia; Alban Lejeune; Michel Walter; Sofian Berrouiguet; Christophe Lemey
Journal: J Med Internet Res Date: 2022-09-06 Impact factor: 7.076

4. A Novel Multistage Transfer Learning for Ultrasound Breast Cancer Image Classification.

Authors: Gelan Ayana; Jinhyung Park; Jin-Woo Jeong; Se-Woon Choe
Journal: Diagnostics (Basel) Date: 2022-01-06

4 in total