Literature DB >> 35626569

Robust Aggregation for Federated Learning by Minimum γ-Divergence Estimation.

Cen-Jhih Li¹, Pin-Han Huang², Yi-Ting Ma¹, Hung Hung³, Su-Yun Huang¹.

Abstract

Federated learning is a framework for multiple devices or institutions, called local clients, to collaboratively train a global model without sharing their data. For federated learning with a central server, an aggregation algorithm integrates model information sent from local clients to update the parameters for a global model. Sample mean is the simplest and most commonly used aggregation method. However, it is not robust for data with outliers or under the Byzantine problem, where Byzantine clients send malicious messages to interfere with the learning process. Some robust aggregation methods were introduced in literature including marginal median, geometric median and trimmed-mean. In this article, we propose an alternative robust aggregation method, named γ-mean, which is the minimum divergence estimation based on a robust density power divergence. This γ-mean aggregation mitigates the influence of Byzantine clients by assigning fewer weights. This weighting scheme is data-driven and controlled by the γ value. Robustness from the viewpoint of the influence function is discussed and some numerical results are presented.

Entities: Chemical

Keywords: byzantine problem; density power divergence; federated learning; influence function; robustness; γ-divergence

Year: 2022 PMID： 35626569 PMCID： PMC9141408 DOI： 10.3390/e24050686

Source DB: PubMed Journal: Entropy (Basel) ISSN： 1099-4300 Impact factor: 2.738

1. Introduction

Federated learning (FL), a distributed learning paradigm, was proposed by Google in 2016 [1] for distributing the optimization of model parameters over multiple machines. The distributed framework allows the data to be stored in local devices without sharing with each other. These machines, called clients, collaboratively train a global model. In this article, we consider the case that there is a central server, which works as a coordinator and aggregates the gradient information sent by the local clients to update the global model through an iterative process [2]. In real-world applications, clients, such as hospitals and clinical institutions, hold highly sensitive and private data such as electronic healthcare records, medical images, etc. [3]. However, FL still encounters several challenges in a real world setting. In its training process, some unreliable clients may produce outlying data information or even send malicious values [4], which leads to a biased result or even a failure of the training process. These arbitrary clients are called Byzantine clients. With a properly designed aggregation scheme, the effect by Byzantine clients can be reduced or excluded. Several papers have made significant contributions to developing Byzantine-resilience methods with specific use of robust aggregators, including marginal median [5], geometric median [6], and trimmed-mean [7]. Another aggregation framework is the Byzantine-robust stochastic aggregation method [8], which adds a regularization term in the objective function to handle the Byzantine and heterogeneous data problem. In this article, we consider the case of FL, which consists of a central server and m clients. Assume that there is an -fraction of Byzantine clients. Our goal is to minimize the following objective function: where is the global model parameter, is the loss function of client i with local data having an unknown distribution , and are the prior weights assigned to local clients. At the round of FL iterative updates, the server broadcasts the current parameter value to local clients. Normal clients faithfully compute an estimate of the local gradients and send the gradient information back to the server. On the contrary, Byzantine clients send arbitrary erroneous messages to obstruct the optimization process. The central server aggregates the gradient information from clients and updates the model parameter from to . At the end, the server should output an estimate of the optimal , which aims to minimize the objective loss (1). In this work we consider uniform client weights, i.e., . The main reason is that, under the Byzantine problem, the local sample sizes claimed by Byzantine clients might not be reliable. Our main contributions of this work are listed below. We propose the -mean as a robust aggregator in federated learning. This robust aggregator mitigates the influence of Byzantine clients by assigning fewer weights. The weighting scheme is data-driven and controlled by the tuning parameter . We have a discussion on robustness from the influence function point of view. Benefits of adopting -mean can be seen from its influence function in comparison to other robust alternatives such as marginal median, geometric median and trimmed mean. The robustness of -mean is then verified through simulation study and real data experiments. The -mean based aggregation outperforms other robust aggregators such as marginal median, geometric median and trimmed mean. The rest of the article is organized as follows. In Section 2, we review some related work in robust federated learning. In Section 3, we propose our main method for robust FL aggregation and provide some theoretical viewpoints by the influence function. In Section 4, we conduct an extensive simulation study. In Section 5, we provide some further numerical examples using MNIST, fashion MNIST and chest X-ray images (pneumonia). In Section 6, we make some concluding remarks.

2. Related Work

McMahan et al. introduced the FedAvg algorithm [9], which used the mean as its aggregator. The sample mean aggregation is a very popular FL framework [9,10]. However, the sample mean is known to be vulnerable to outliers and heavy-tailed distribution. For instance, Byzantine clients [7,8,11] may send extreme values to strongly influence the aggregation of sample mean. There are some robust alternatives to the mean aggregator, such as marginal median, geometric median and trimmed mean. In contrast to mean, marginal median is a relatively robust aggregator by computing coordinate-wise medians. Another Byzantine-resilient aggregator is geometric median, which computes the geometric median of given by (see Weiszfeld’s algorithm [12] for solution computation). The trimmed mean aggregator computes coordinate-wise trimmed means by removing the -fraction of data points from each of the two tails [11]. Both the marginal median and geometric median, though are robust to outliers, but are not efficient, as they only use the most centrally-lying data for inference. As for trimmed mean, first it needs a proper selection of . Second, it often does not work well for outliers lying on one side of a coordinate instead of on two sides. To have a better balance between robustness and efficiency, we will propose another robust aggregator based on a robust density power divergence, namely, the -divergence.

3. Proposed Aggregator and Its Robustness

In this section, we introduce an aggregator, called “-mean”, which is based on the minimum -divergence estimation. Two versions of implementation algorithms are given and some robustness viewpoints based on the influence function are provided.

3.1. Minimum -Divergence Estimation

The -divergence [13,14], which is also known as the density power divergence of type zero [15], is a robust divergence against outliers with a tuning parameter . The -divergence between the data distribution with probability density function g and the model distribution (indexed by ) is defined by where . In the limiting case, it reduces to the Kullback-Leibler divergence, i.e., . In the population level, the minimum -divergence estimation of is given by In the sample level with empirical data , the data distribution will be replaced by the empirical distribution to get the estimate : In this article, we use a Gaussian working model, i.e., . Suppose g takes a contaminated form: , where is the probability density function of contamination distribution. It is assumed that is small for certain , so that the contamination has little effect in the learning process. By taking derivatives with respect to and setting them to zero, the estimates have to satisfy the following estimating equations: where . The solution pair, , has to satisfy the following stationary equations:

3.2. Robust Aggregation by -Mean

In view of the stationary Equations (4) and (5), we use a fixed-point iteration. Two algorithms are provided below. Algorithm 1 is the usual -mean, which uses as the working model, and Algorithm 2 is the simple -mean, which adopts as the working model. Note that, in the former case when the sample size is not sufficiently large enough to have a stable estimate of the covariance inverse, we will use instead, where is the minimum -divergence estimator for the covariance and denotes the diagonal matrix consisting of diagonal elements of . Also note that, in the latter case of simple -mean, can be merged into the hyperparameter . Thus, we will simply use the standard Gaussian as the working model. Input: Gradient information … and the maximum number of iterations S Output: = Start with initials and . for (while and iterations not yet converge) do for do Calculate at the ith local client. end for Denote . , . end for Input: Gradient information … and the maximum number of iterations S Output: = Start with initial . for (while and iterations not yet converge) do for do Calculate at the local client. end for Denote . . end for For the extremely large dimension p, such as in a deep neural network model, the simple -mean algorithm will be more feasible than the usual -mean for better numerical stability.

3.3. Robustness

3.3.1. Influence Function

The influence function is a tool to evaluate the change of an estimator by a small perturbation to the data distribution. An estimator with a lower influence provides better resistance against outliers. Therefore, we analyze the robustness of different estimators by showing their influence functions. Let be sampled from G and T be the statistical functional for estimation. The robustness of can be evaluated by the influence function of T: where is the point mass at . In other words, the influence function of T is the Gâteaux derivative of with respect to G along the direction . In this paper, we consider evaluating the influence function at the Gaussian working model with and seeing the deviant effect when a point perturbation is added. The influence function of M-estimators have been well-studied in [16], which is related to the inverse of Hessian matrix and the first order derivative. The estimator based on -divergence is also an M-estimate. Its influence function derivation can be found in [14] and is given by where and are the minimum -divergence estimators for the location and covariance matrix of the working model , respectively. The influence function of the -mean, , is similar to the influence function of the sample mean, which is given by , except for an additional multiplicative weight . This weight focuses on the Mahalanobis distance between the perturbing point and . As a result, when is an outlier away from the target mean of data distribution, this weighting factor produces a down-weighting effect and prevents the estimator from biased estimation caused by the outlier.

3.3.2. Comparison with Other Aggregators

In this subsection, we present influence functions for several robust aggregation methods including marginal median, geometric median and trimmed-mean. Geometric median : The influence function of is given by [17] where the expectation is taken with respect to the data distribution G. Trimmed-mean : The trimmed-mean is an L-esimator, and its influence function is derived in [18]. For the influence function of the marginal trimmed-mean, we derive it coordinate-wise since the coordinates are independent to each other. Therefore, let be the quantile function of marginal data distribution on i-th component, and the influence function of is given by where , and It is obvious that an influential effect, , still remains if is outside the trimmed range. Marginal median : The influence function of the median is also derived in [18], and we use the same techniques as used in the trimmed-mean to derive the influence function of the marginal median. The influence function is given by where is the marginal median and is the density function of the marginal distribution . The influence functions given above reveal that different robust aggregators have different resistance ability against outliers. The trimmed mean can still be influenced by outliers outside the trimmed range. While the marginal median and the geometric median are fairly robust, they may lose too much efficiency due to the use of only the most centrally-lying data point. The -mean provides an adjustable control between the robustness and efficiency by the hyper-parameter .

4. Simulation Study

In this section, we evaluate our robust -mean aggregator and compare it with existing aggregators, including mean, marginal median, geometric median and trimmed mean, through simulation. For computing the geometric median, Weiszfeld’s algorithm [12] is used. In this simulation study, we investigate the behavior and inspect the robustness of the -mean and other aggregators as an estimator of the location parameter under multivariate Gaussian distribution and multivariate t-distribution with and without Byzantine interference. We build several simulation scenarios, including testing aggregators across the growth of dimension p, different fractions (i.e., values) of Byzantine attacks and for various settings of (which controls the degree of robustness). All our simulation experiments were run on an Nvidia DGX A100 server. We used one A100 GPU card, 16G RAM, and 16 cores of AMD-EPYC-7742 CPUs. However, a personal computer with a moderate performance-efficient GPU card and CPU can also carry out the simulation experiments, but with longer run time. For the real data examples in Section 5, a personal computer will not be sufficient for carrying out the computing job. Scenario 1. We focus on the behavior of aggregators for increasing p from 20 to 1000. Other experimental setting is as follows: the number of clients , the fraction of Byzantine attacks and , and the hyper-parameter for controlling the robustness . Scenario 2. We focus on the behavior of aggregators for increasing contamination fraction from 0 to . Other experimental setting is as follows: , and . Scenario 3. We focus on the effect of values and set for various constants c ranging from 0.5 to 4. Other experimental setting is as follows: , and p ranges from 1 to 1000. Scenario 4. After comparison between the -mean and other aggregators, we focus on the comparison between two versions of our proposal, the -mean versus the simple -mean. Other experimental setting is as follows: , , and p ranges from 1 to 1000.

4.1. Simulation Settings

For regular clients, gradient vectors are generated from the standard Gaussian distribution and t-distribution with 5 degrees of freedom. For Byzantine clients, gradient vectors are generated from the same Gaussian and t distributions, except with a location shift , where is a vector with one in all entries. Each experimental scenario is implemented with 100 replicate runs.

4.2. Results

To compare the performance of different aggregators, the mean squared error (MSE) to the true target value is used as a performance metric. MSE can be further decomposed into the squared bias and variance.

4.2.1. Scenario 1

The results are shown in Figure 1. Without contamination (Figure 1a), all aggregators have MSE close to zero. For the Gaussian case, the MSE curves of the mean, geometric median and -mean are almost collapsed together. These 3 curves have the lowest MSE values followed by the trimmed-mean and marginal median. For the t-distribution, the -mean has the lowest MSE, followed by the geometric median, trimmed-mean, mean and then marginal median. With of contamination, the mean-aggregator fails to perform well and has large values of MSE, verifying that it is not a robust aggregator. Due its large MSE value, the result of the mean-aggregator is removed from Figure 1b. From Figure 1b we can see that -mean performs the best among the five aggregators in terms of MSE. In addition, -mean remains low and stable in variance under both Gaussian and t distributions.

Figure 1

Comparison of different aggregators across different dimensions p. (a) Case (no Byzantine client). (b) Case (10% Byzantine clients).

4.2.2. Scenario 2

As Figure 2 shows, values of squared bias are close to zero when and increase as grows for all aggregators. However, the increasing velocity of aggregators are quite different. Mean-aggregator surges with the highest velocity, while -mean progresses with the lowest velocity. Trimmed-mean slowly grows when ; yet, its increasing pace is about the same as the mean after due to the fixed trimming percentage from both tails. The trends in MSE for different aggregators have similar patterns, and the only difference is in the scale with the -mean having the lowest MSE values. Also note that the variance of the marginal median increases much faster than other aggregators as increases, followed by the geometric median, -mean, trimmed mean and then the mean. However, the trimmed mean and the mean have dominant bias leading to high MSE, even though they have small variance. As the values of squared bias are large for the mean and trimmed-mean, the bias curves for the -mean, geometric median and marginal median in the 2nd row of Figure 2 look collapsed together. To have a better view of these three bias curves, zoomed-in views are provided in the 3rd row of Figure 2, and we can clearly see that the -mean has the lowest bias values.

Figure 2

Comparison of different aggregators across different values with fixed . The two subplots in the 3rd row are the zoomed-in views of subplots in the 2nd row.

4.2.3. Scenario 3

We conduct further experiments on -mean to see the effect of different values by setting with , where and . Results are presented in Figure 3. The squared bias is ignorable, indicating that the -mean is quite robust with different -values. The main source of MSE comes from the variance. Larger values lead to larger stochastic variances, indicating that larger values result in lower estimation efficiency.

Figure 3

Effect of different values across different dimensions with .

4.2.4. Scenario 4

For and under the Gaussian case, the MSE, bias and variance curves for the -mean and simple -mean are collapsed together (Figure 4, left panel). For and under the t-distribution, these curves are nearly indistinguishable (Figure 4, right panel). When p is small (), the MSE values of the simple -mean are significantly larger than those of -mean. However, we did not display the results for , since the relative large MSE values for small p will make the MSE curves of -mean and simple -mean look lying on the x-axis for .

Figure 4

Comparison of -mean versus simple -mean with and .

MNIST [19]. The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. The digits have been size-normalized and centered in a fixed-size, grayscale, images. Fashion MNIST [20]. Fashion-MNIST is a dataset of Zalando’s article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a grayscale image, associated with a label from 10 types of clothing, such as shoes, t-shirts, dresses, sandals, sneakers and more. Chest X-ray images (pneumonia) [21]. The dataset contains 5856 X-ray images and 2 classes (pneumonia and normal). The 5,856 images consist of 5232 training images (which we further split into 90% for model training and 10% for model validation to implement early stopping) and 624 testing images. Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou.

5. Real Data Examples

5.2. Experimental Setting

All the data examples were run on an Nvidia DGX A100 server. We used one A100 GPU card, 64 G RAM, and 16 cores of AMD-EPYC-7742 CPUs. The experimental setting is described below. For MNIST and fashion MNIST, we set and the number of Byzantine clients is two. For chest X-ray images, we set and the number of Byzantine clients is one. Byzantine clients return random values from Gaussian (5, 1). is set to 0.5. This setting is different from the setting in simulation. The main reason is that there is a certain complicated relationship between the value and the neural network model adopted, such as the dimensionality, gradient size, learning rate, etc. We have not yet fully understood this relationship, which might govern the selection of . We will leave it as a future study. We set for the trimmed mean. To allow for the imbalanced size of clients, we obtain the sample size of each client by sampling from the following steps [11]. Sampling a vector from . Sampling from . The sum of vector will be 1 due to the property of Dirichlet distribution. Obtain the sample sizes of clients from multinomial , where n is total sample size and is the minimum sample size guaranteed for each client. We set . In each round of parameter update, the server broadcasts the current parameter values to m clients. Each normal client computes the local gradient and local parameter update, and then returns the change of parameters after iterating k epochs using local data ( for MNIST and fashion MNIST, and for chest X-ray images). Byzantine clients return random values from a Gaussian distribution . Some further implementation details are given below. We run 1000 rounds of FL for MNIST and fashion MNIST, and 100 rounds for chest X-ray images. We apply stochastic gradient descent (SGD) with a cosine decay learning rate (decay over rounds), where the decay step is 1000 for MNIST and fashion MNIST, and 100 for chest X-ray images. In each epoch of local clients on MNIST and fashion MNIST, the SGD will go through only 10% of local data to save computing time. This implementation leads to some fluctuations in the early stage of training but the training process will be much faster than going through all local data. The initial learning rates are 0.1, 0.5, and for MNIST, fashion MNIST, and chest X-ray images, respectively. In each epoch in local iterates, is set as the decay constant. We apply gradient clipping to avoid exploding gradients on MNIST and fashion MNIST. If the 2-norm of aggregated gradient is larger than 1, the vector will be scaled to a new vector with norm 1. To handle the imbalanced class size, we use weighted cross-entropy as loss function in the chest X-ray example (pneumonia: 0.35, normal: 1.0), where the chest X-ray training dataset contains 3883 pneumonia cases and 1349 normal cases. In addition to the classification accuracy, we also use ‘accuracy’, ‘sensitivity’ (also known as ‘recall’ and ‘true positive rate’) and ‘precision’ as our evaluation metrics. In particular, correctly predicting pneumonia is more important than predicting the normal case.

5.3. Models

We use a simple CNN model (Figure A1 in Appendix A) for MNIST and fashion MNIST. For chest X-ray images (pneumonia), we use a pretrained ResNet50 (Figure A2 in Appendix A) with input image size . We further connect the model to 3 dense-block layers.

Figure A1

Model used in MNIST and fashion MNIST.

Figure A2

Model used in chest X-ray images.

5.4. Results

Results for the trimmed mean are not reported here, as they do not perform well due to its reliance on the specification of and its non-robustness to non-symmetric outliers (note that the trimming procedure is symmetric in two tails of each coordinate).

5.4.1. MNIST

Figure 5 shows the results of FL using different aggregators. In the case of no attack, the simple -mean performs as good as the mean-aggregator with 96% testing data classification accuracy. In the Byzantine case, simple -mean performs the best with testing accuracy 96% followed by the two median-based methods, geometric median and marginal median, while the mean-aggregator has failed. In addition, the marginal median has some fluctuations in the early stage of training.

Figure 5

Testing process and comparison of testing accuracy for different aggregators on MNIST.

5.4.2. Fashion MNIST

Figure 6 shows the results of fashion MNIST. The conclusion is similar to MNIST, that the simple -mean performs the best. Note that the fluctuations shown in the training process of marginal median look worse than those shown in Figure 5. Although we have applied gradient clipping, still there are some sudden surges in loss function values and they lead to sudden drops in classification accuracy.

Figure 6

Testing process and comparison of testing accuracy for different aggregators on fashion MNIST.

5.4.3. Chest X-ray Images (Pneumonia)

We train the model on the training dataset without any partitions (i.e., single machine with full training data) as a baseline. In this single machine case, the maximum number of training epochs is 1000 and the training process will stop early if the evaluation metric (loss + accuracy + precision + sensitivity on validation data) does not get better in 10 consecutive epochs. For the FL case, the number of FL rounds is 100 and the maximum training epochs of each client are 20 at every FL round. At each round, the clients will stop early if the evaluation metric (loss + accuracy + precision + sensitivity on validation data) does not get better in 5 consecutive epochs. The simple -mean algorithm has adopted the marginal median as initial. All other settings are the same as those listed in Section 5.2. The results are shown in Figure 7.

Figure 7

Testing process and comparison of testing accuracy for different aggregators on chest x ray. The aggregators by mean and geometric median cannot tolerate the Byzantine attack and both methods were crushed during model training. Thus, there are no results reported for these two methods.

We further report the following testing data results in Table 1: true positive (TP, which is predicted positive and actually positive), true negative (TN, which is predicted negative and actually negative), false positive (FP, which is predicted positive but actually negative), false negative (FN, which is predicted negative but actually positive), precision (Prec, which is ), sensitivity (Sens, which is as well as one minus type-II error) and classification accuracy (Acc). The aggregators by mean and geometric median cannot tolerate the Byzantine (Byz) attack and both methods were crushed during model training. Thus, there are no results reported for these two methods.

Table 1

Pneumonia prediction on test data.

Byz	Aggregator	TN	FN	FP	TP	Prec	Sens (Type II Error)	Acc
	single machine	156	23	78	367	0.8247	0.9410 (0.0590)	0.8381
No	mean	212	103	22	287	0.9288	0.7359 (0.2661)	0.7997
	marginal median	190	63	44	327	0.8814	0.8385 (0.1615)	0.8285
	simple γ-mean	126	8	108	382	0.7796	0.9795 (0.0205)	0.8141
	GeoMed	177	30	57	360	0.8633	0.9231 (0.0769)	0.8606
Yes	mean †	–	–	–	–	–	–	–
	marginal median	228	271	6	119	0.9520	0.3051 (0.6949)	0.5561
	simple γ-mean	140	11	94	379	0.8013	0.9718 (0.0282)	0.8317
	GeoMed †	–	–	–	–	–	–	–

† The symbol “–” indicates model crushed during training.

6. Concluding Remarks

Our major contribution in this article is to propose the -mean aggregation in federated learning, which is robust against outliers and Byzantine attacks. We have provided some theoretical discussions on influence functions and carried out numerical studies to justify our proposal. In both simulation and real data experiments, -mean based aggregation in general outperforms the existing robust aggregators such as marginal median, geometric median and trimmed mean. In addition, the simple -mean does not require complex computation. It can be easily computed by fixed point iteration and works well for extremely high dimensional cases such as in a deep neural network model. The -mean (as well as the simple -mean) takes a form of weighted average, where controls the trade-off between robustness and estimation efficiency. The selection of value is important, and a data-driven selection procedure is needed, which will be pursued in our future study. In our numerical examples (simulation and real data), we have tested -mean’s ability to guard against the Byzantine attacks. However, when it comes to federated learning in real-world applications, adversarial attacks are also an important issue for model robustness. These attacks create adversarial fake examples, which even humans might not be able to distinguish from genuine ones. Moreover, adversarial examples aim to confuse the model and result in wrong prediction. How to modify the baseline -mean algorithm for adversarial attacks will be an interesting and yet challenging problem. We will defer it to a future study.

4 in total

1. A robust removing unwanted variation-testing procedure via γ -divergence.

Authors: Hung Hung
Journal: Biometrics Date: 2019-08-20 Impact factor: 2.571

2. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning.

Authors: Daniel S Kermany; Michael Goldbaum; Wenjia Cai; Carolina C S Valentim; Huiying Liang; Sally L Baxter; Alex McKeown; Ge Yang; Xiaokang Wu; Fangbing Yan; Justin Dong; Made K Prasadha; Jacqueline Pei; Magdalene Y L Ting; Jie Zhu; Christina Li; Sierra Hewett; Jason Dong; Ian Ziyar; Alexander Shi; Runze Zhang; Lianghong Zheng; Rui Hou; William Shi; Xin Fu; Yaou Duan; Viet A N Huu; Cindy Wen; Edward D Zhang; Charlotte L Zhang; Oulan Li; Xiaobo Wang; Michael A Singer; Xiaodong Sun; Jie Xu; Ali Tafreshi; M Anthony Lewis; Huimin Xia; Kang Zhang
Journal: Cell Date: 2018-02-22 Impact factor: 41.582

3. Federated learning for predicting clinical outcomes in patients with COVID-19.

Authors: Ittai Dayan; Holger R Roth; Aoxiao Zhong; Fiona J Gilbert; Mona G Flores; Quanzheng Li; Ahmed Harouni; Amilcare Gentili; Anas Z Abidin; Andrew Liu; Anthony Beardsworth Costa; Bradford J Wood; Chien-Sung Tsai; Chih-Hung Wang; Chun-Nan Hsu; C K Lee; Peiying Ruan; Daguang Xu; Dufan Wu; Eddie Huang; Felipe Campos Kitamura; Griffin Lacey; Gustavo César de Antônio Corradi; Gustavo Nino; Hao-Hsin Shin; Hirofumi Obinata; Hui Ren; Jason C Crane; Jesse Tetreault; Jiahui Guan; John W Garrett; Joshua D Kaggie; Jung Gil Park; Keith Dreyer; Krishna Juluru; Kristopher Kersten; Marcio Aloisio Bezerra Cavalcanti Rockenbach; Marius George Linguraru; Masoom A Haider; Meena AbdelMaseeh; Nicola Rieke; Pablo F Damasceno; Pedro Mario Cruz E Silva; Pochuan Wang; Sheng Xu; Shuichi Kawano; Sira Sriswasdi; Soo Young Park; Thomas M Grist; Varun Buch; Watsamon Jantarabenjakul; Weichung Wang; Won Young Tak; Xiang Li; Xihong Lin; Young Joon Kwon; Abood Quraini; Andrew Feng; Andrew N Priest; Baris Turkbey; Benjamin Glicksberg; Bernardo Bizzo; Byung Seok Kim; Carlos Tor-Díez; Chia-Cheng Lee; Chia-Jung Hsu; Chin Lin; Chiu-Ling Lai; Christopher P Hess; Colin Compas; Deepeksha Bhatia; Eric K Oermann; Evan Leibovitz; Hisashi Sasaki; Hitoshi Mori; Isaac Yang; Jae Ho Sohn; Krishna Nand Keshava Murthy; Li-Chen Fu; Matheus Ribeiro Furtado de Mendonça; Mike Fralick; Min Kyu Kang; Mohammad Adil; Natalie Gangai; Peerapon Vateekul; Pierre Elnajjar; Sarah Hickman; Sharmila Majumdar; Shelley L McLeod; Sheridan Reed; Stefan Gräf; Stephanie Harmon; Tatsuya Kodama; Thanyawee Puthanakit; Tony Mazzulli; Vitor Lima de Lavor; Yothin Rakvongthai; Yu Rim Lee; Yuhong Wen
Journal: Nat Med Date: 2021-09-15 Impact factor: 87.241

4. Federated Learning for Healthcare Informatics.

Authors: Jie Xu; Benjamin S Glicksberg; Chang Su; Peter Walker; Jiang Bian; Fei Wang
Journal: J Healthc Inform Res Date: 2020-11-12

4 in total