| Literature DB >> 34906207 |
Olga Zolotareva1,2, Reza Nasirigerdeh3,4, Julian Matschinske5, Reihaneh Torkzadehmahani3, Mohammad Bakhtiari5, Tobias Frisch6, Julian Späth5, David B Blumenthal7, Amir Abbasinejad8,9, Paolo Tieri10,9, Georgios Kaissis3,4,11,12, Daniel Rückert3,4,11, Nina K Wenke5, Markus List8, Jan Baumbach5,6.
Abstract
Aggregating transcriptomics data across hospitals can increase sensitivity and robustness of differential expression analyses, yielding deeper clinical insights. As data exchange is often restricted by privacy legislation, meta-analyses are frequently employed to pool local results. However, the accuracy might drop if class labels are inhomogeneously distributed among cohorts. Flimma ( https://exbio.wzw.tum.de/flimma/ ) addresses this issue by implementing the state-of-the-art workflow limma voom in a federated manner, i.e., patient data never leaves its source site. Flimma results are identical to those generated by limma voom on aggregated datasets even in imbalanced scenarios where meta-analysis approaches fail.Entities:
Keywords: Differential expression analysis; Federated learning; Meta-analysis; Privacy of biomedical data
Mesh:
Year: 2021 PMID: 34906207 PMCID: PMC8670124 DOI: 10.1186/s13059-021-02553-2
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Gene expression analysis in case of multi-center studies. Bold arrows show the exchange of raw data, dashed arrows – the exchange of model parameters or summary statistics. Grey areas highlight different physical locations
Characteristics of three scenarios for the TCGA-BRCA dataset. The distributions of ages and tumor stages were balanced
| Cohort sizes | Frequency of basal subtype | Frequency of LumA subtype | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Cohort 1 | Cohort 2 | Cohort 3 | Cohort 1 | Cohort 2 | Cohort 3 | Cohort 1 | Cohort 2 | Cohort 3 | |
| No imbalance | 283 | 283 | 284 | 0.20 | 0.20 | 0.20 | 0.57 | 0.57 | 0.58 |
| Mild imbalance | 121 | 242 | 487 | 0.10 | 0.30 | 0.17 | 0.40 | 0.50 | 0.66 |
| Strong imbalance | 65 | 196 | 589 | 0.25 | 0.50 | 0.09 | 0.14 | 0.50 | 0.65 |
Characteristics of the scenarios for the GTEx skin dataset. The frequencies of samples obtained from male and female individuals were similar in all cohorts (between 30 and 34% samples from females in all scenarios)
| Cohort sizes | Fraction of sun-exposed skin samples | Mean ischemic time, min | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Cohort 1 | Cohort 2 | Cohort 3 | Cohort 1 | Cohort 2 | Cohort 3 | Cohort 1 | Cohort 2 | Cohort 3 | |
| No imbalance | 425 | 425 | 427 | 0.53 | 0.53 | 0.53 | 629 | 636 | 636 |
| Mild imbalance | 181 | 363 | 733 | 0.4 | 0.65 | 0.51 | 490 | 620 | 676 |
| Strong imbalance | 97 | 293 | 887 | 0.8 | 0.4 | 0.54 | 347 | 646 | 661 |
Fig. 2The comparison of negative log-transformed p-values computed by Flimma and meta-analysis methods (y-axis) with p-values obtained by limma on the aggregated dataset (x-axis) in three scenarios on A TCGA-BRCA and B GTEx skin datasets. Pearson correlation coefficient (r), Spearman correlation coefficient (ρ), and root-mean squared error (RMSE) calculated for each method are reported in the legend
F1 score, the number of false positives (FP) and the number of false negatives (FN) obtained on TCGA-BRCA dataset in three scenarios. Values corresponding to the best performance over all methods are italicized. All calculated performance measures are reported in Additional file 2: Table S2
| F1 | FP | FN | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Scenario | Balanced | Mildly imbalanced | Strongly imbalanced | Balanced | Mildly imbalanced | Strongly imbalanced | Balanced | Mildly imbalanced | Strongly imbalanced |
| Flimma | |||||||||
| Fisher | 0.92 | 0.93 | 14 | 248 | 192 | 8 | 290 | 265 | |
| Stouffer | 0.92 | 0.93 | 14 | 245 | 189 | 9 | 290 | 265 | |
| REM | 0.97 | 0.95 | 12 | 80 | 121 | 17 | 119 | 215 | |
| RankProd | 0.92 | 0.93 | 14 | 243 | 193 | 12 | 295 | 274 | |
F1 score, the number of false positives (FP), and the number of false negatives (FN) obtained on GTEx skin dataset in three scenarios. Values corresponding to the best performance over all methods are italicized. All calculated performance measures are reported in Additional file 3: Table S3
| F1 | FP | FN | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Scenario | Balanced | Mildly imbalanced | Strongly imbalanced | Balanced | Mildly imbalanced | Strongly imbalanced | Balanced | Mildly imbalanced | Strongly imbalanced |
| Flimma | |||||||||
| Fisher | 0.99 | 0.91 | 0.83 | 4 | 32 | 67 | 18 | 33 | |
| Stouffer | 0.99 | 0.91 | 0.83 | 4 | 32 | 67 | 18 | 33 | |
| REM | 0.99 | 0.95 | 0.94 | 4 | 15 | 21 | 2 | 14 | 12 |
| RankProd | 0.99 | 0.91 | 0.83 | 4 | 32 | 67 | 18 | 33 | |
Fig. 3The dependency of the F1 score on the number of top-ranked genes considered to be differentially expressed. Genes were ranked in order of their negative log-transformed p-values decreasing and the number of top-ranked genes varied between 20 and 3500 (for TCGA-BRCA dataset, A) and 300 (for GTEx Skin dataset, B) with step 5
RMSE, precision, and recall obtained by Flimma and the meta-analysis tools on TCGA-BRCA datasets split by tissue source sites
| The number of cohorts | 3 | 5 | 7 | 10 | 14 |
|---|---|---|---|---|---|
| RMSE | |||||
| Flimma | |||||
| Fisher | 0.94 | 1.82 | 2.53 | 3.86 | 5.37 |
| Stouffer | 1.47 | 2.21 | 2.87 | 4.26 | 5.68 |
| REM | 2.73 | 3.68 | 4.75 | 7.21 | 8.50 |
| RankProd | 5.16 | 8.19 | 11.32 | 18.92 | 23.50 |
| Precision | |||||
| Flimma | |||||
| Fisher | 0.85 | 0.88 | 0.90 | 0.93 | 0.95 |
| Stouffer | 0.85 | 0.88 | 0.91 | 0.93 | 0.95 |
| REM | 0.93 | 0.94 | 0.95 | 0.97 | 0.97 |
| RankProd | 0.92 | 0.87 | 0.90 | 0.93 | 0.95 |
| Recall | |||||
| Flimma | |||||
| Fisher | 0.92 | 0.95 | 0.95 | 0.96 | 0.97 |
| Stouffer | 0.89 | 0.93 | 0.94 | 0.96 | 0.97 |
| REM | 0.93 | 0.96 | 0.97 | 0.98 | 0.98 |
| RankProd | 0.87 | 0.96 | 0.96 | 0.96 | 0.97 |
Values corresponding to the best performance over all methods are italicized
Fig. 4PCA projections computed and plotted by proBatch R package [99] of samples from three GEO cohorts (A, B) and TCGA-BRCA cohorts (C, D) colored according to cohort (A, C) and cancer subtype (B, D)
Fig. 5Comparison of the results obtained by Flimma on uncorrected GEO data with the results of limma voom after batch effect removal by ComBat-Seq
Fig. 6The scheme of Flimma. M denotes local intermediate parameters, N denotes local noise. K is the total number of participants. Note that addition and subtraction may be ordinary or modular, see the “Masking scheme” section for details
Fig. 7The scheme of Flimma workflow. Steps that were reimplemented in a federated fashion are shown in blue. The names of the functions used in the limma voom workflow are shown on the right of the flowchart