| Literature DB >> 33286088 |
Peng Xiao1, Samuel Cheng2, Vladimir Stankovic3, Dejan Vukobratovic4.
Abstract
Federated learning is a decentralized topology of deep learning, that trains a shared model through data distributed among each client (like mobile phones, wearable devices), in order to ensure data privacy by avoiding raw data exposed in data center (server). After each client computes a new model parameter by stochastic gradient descent (SGD) based on their own local data, these locally-computed parameters will be aggregated to generate an updated global model. Many current state-of-the-art studies aggregate different client-computed parameters by averaging them, but none theoretically explains why averaging parameters is a good approach. In this paper, we treat each client computed parameter as a random vector because of the stochastic properties of SGD, and estimate mutual information between two client computed parameters at different training phases using two methods in two learning tasks. The results confirm the correlation between different clients and show an increasing trend of mutual information with training iteration. However, when we further compute the distance between client computed parameters, we find that parameters are getting more correlated while not getting closer. This phenomenon suggests that averaging parameters may not be the optimum way of aggregating trained parameters.Entities:
Keywords: averaging; correlation; decentralized learning; federated learning; mutual information
Year: 2020 PMID: 33286088 PMCID: PMC7516771 DOI: 10.3390/e22030314
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1(a) The MI in the signal variation prediction task calculated by the first method (Section 3.1). (b) The MI in the signal variation prediction task calculated by the second method (Section 3.2). (c) The MSE variation with the training iterations.
Figure 2(a) The MI in the digit recognition task calculated by the first method (Section 3.1). (b) The MI in the digit recognition task calculated by the second method (Section 3.2). (c) The accuracy variation with the training iterations.
Figure 3The distance metrics in the signal variation prediction task (a) Euclidean distance. (b) Manhattan distance. (c) Chebyshev distance.
Figure 4The distance metrics in the digit recognition task (a) Euclidean distance. (b) Manhattan distance. (c) Chebyshev distance.