| Literature DB >> 34319533 |
Zhaoping Xiong1,2,3, Ziqiang Cheng1,4, Xinyuan Lin5, Chi Xu5, Xiaohong Liu1,2, Dingyan Wang2,3, Xiaomin Luo2, Yong Zhang5, Hualiang Jiang6,7, Nan Qiao8, Mingyue Zheng9.
Abstract
Artificial intelligence (AI) models usually require large amounts of high-quality training data, which is in striking contrast to the situation of small and biased data faced by current drug discovery pipelines. The concept of federated learning has been proposed to utilize distributed data from different sources without leaking sensitive information of the data. This emerging decentralized machine learning paradigm is expected to dramatically improve the success rate of AI-powered drug discovery. Here, we simulated the federated learning process with different property and activity datasets from different sources, among which overlapping molecules with high or low biases exist in the recorded values. Beyond the benefit of gaining more data, we also demonstrated that federated training has a regularization effect superior to centralized training on the pooled datasets with high biases. Moreover, different network architectures for clients and aggregation algorithms for coordinators have been compared on the performance of federated learning, where personalized federated learning shows promising results. Our work demonstrates the applicability of federated learning in predicting drug-related properties and highlights its promising role in addressing the small and biased data dilemma in drug discovery.Entities:
Keywords: FedAMP; Non-IID data; drug discovery; federated learning
Mesh:
Substances:
Year: 2021 PMID: 34319533 DOI: 10.1007/s11427-021-1946-0
Source DB: PubMed Journal: Sci China Life Sci ISSN: 1674-7305 Impact factor: 6.038