Literature DB >> 35388408

Privacy preserving validation for multiomic prediction models.

Talal Ahmed1, Mark A Carty1, Stephane Wenric1, Jonathan R Dry1, Ameen A Salahudeen1, Aly A Khan1, Eric Lefkofsky1, Martin C Stumpe1, Raphael Pelossof1.   

Abstract

Reproducibility of results obtained using ribonucleic acid (RNA) data across labs remains a major hurdle in cancer research. Often, molecular predictors trained on one dataset cannot be applied to another due to differences in RNA library preparation and quantification, which inhibits the validation of predictors across labs. While current RNA correction algorithms reduce these differences, they require simultaneous access to patient-level data from all datasets, which necessitates the sharing of training data for predictors when sharing predictors. Here, we describe SpinAdapt, an unsupervised RNA correction algorithm that enables the transfer of molecular models without requiring access to patient-level data. It computes data corrections only via aggregate statistics of each dataset, thereby maintaining patient data privacy. Despite an inherent trade-off between privacy and performance, SpinAdapt outperforms current correction methods, like Seurat and ComBat, on publicly available cancer studies, including TCGA and ICGC. Furthermore, SpinAdapt can correct new samples, thereby enabling unbiased evaluation on validation cohorts. We expect this novel correction paradigm to enhance research reproducibility and to preserve patient privacy.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Keywords:  machine learning; model validation; privacy; reproducibility; transcriptomics; translational research

Mesh:

Substances:

Year:  2022        PMID: 35388408      PMCID: PMC9116386          DOI: 10.1093/bib/bbac110

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   13.994


  23 in total

1.  Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms.

Authors: 
Journal:  Neural Comput       Date:  1998-09-15       Impact factor: 2.026

2.  Comprehensive Integration of Single-Cell Data.

Authors:  Tim Stuart; Andrew Butler; Paul Hoffman; Christoph Hafemeister; Efthymia Papalexi; William M Mauck; Yuhan Hao; Marlon Stoeckius; Peter Smibert; Rahul Satija
Journal:  Cell       Date:  2019-06-06       Impact factor: 41.582

3.  scGen predicts single-cell perturbation responses.

Authors:  Mohammad Lotfollahi; F Alexander Wolf; Fabian J Theis
Journal:  Nat Methods       Date:  2019-07-29       Impact factor: 28.547

4.  scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets.

Authors:  Yingxin Lin; Shila Ghazanfar; Kevin Y X Wang; Johann A Gagnon-Bartsch; Kitty K Lo; Xianbin Su; Ze-Guang Han; John T Ormerod; Terence P Speed; Pengyi Yang; Jean Yee Hwa Yang
Journal:  Proc Natl Acad Sci U S A       Date:  2019-04-26       Impact factor: 11.205

5.  svaseq: removing batch effects and other unwanted noise from sequencing data.

Authors:  Jeffrey T Leek
Journal:  Nucleic Acids Res       Date:  2014-10-07       Impact factor: 16.971

6.  Efficient integration of heterogeneous single-cell transcriptomes using Scanorama.

Authors:  Brian Hie; Bryan Bryson; Bonnie Berger
Journal:  Nat Biotechnol       Date:  2019-05-06       Impact factor: 54.908

7.  Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.

Authors:  Katherine A Hoadley; Christina Yau; Denise M Wolf; Andrew D Cherniack; David Tamborero; Sam Ng; Max D M Leiserson; Beifang Niu; Michael D McLellan; Vladislav Uzunangelov; Jiashan Zhang; Cyriac Kandoth; Rehan Akbani; Hui Shen; Larsson Omberg; Andy Chu; Adam A Margolin; Laura J Van't Veer; Nuria Lopez-Bigas; Peter W Laird; Benjamin J Raphael; Li Ding; A Gordon Robertson; Lauren A Byers; Gordon B Mills; John N Weinstein; Carter Van Waes; Zhong Chen; Eric A Collisson; Christopher C Benz; Charles M Perou; Joshua M Stuart
Journal:  Cell       Date:  2014-08-07       Impact factor: 41.582

8.  PAM50 assay and the three-gene model for identifying the major and clinically relevant molecular subtypes of breast cancer.

Authors:  A Prat; J S Parker; C Fan; C M Perou
Journal:  Breast Cancer Res Treat       Date:  2012-07-03       Impact factor: 4.872

9.  ColoType: a forty gene signature for consensus molecular subtyping of colorectal cancer tumors using whole-genome assay or targeted RNA-sequencing.

Authors:  Steven A Buechler; Melissa T Stephens; Amanda B Hummon; Katelyn Ludwig; Emily Cannon; Tonia C Carter; Jeffrey Resnick; Yesim Gökmen-Polar; Sunil S Badve
Journal:  Sci Rep       Date:  2020-07-21       Impact factor: 4.379

10.  A benchmark of batch-effect correction methods for single-cell RNA sequencing data.

Authors:  Hoa Thi Nhu Tran; Kok Siong Ang; Marion Chevrier; Xiaomeng Zhang; Nicole Yee Shin Lee; Michelle Goh; Jinmiao Chen
Journal:  Genome Biol       Date:  2020-01-16       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.