| Literature DB >> 35755872 |
Wei Zhu1, Jiebo Luo1, Andrew D White2.
Abstract
Chemistry research has both high material and computational costs to conduct experiments. Intuitions are interested in differing classes of molecules, creating heterogeneous data that cannot be easily joined by conventional methods. This work introduces federated heterogeneous molecular learning. Federated learning allows end users to build a global model collaboratively while keeping their training data isolated. We first simulate a heterogeneous federated-learning benchmark (FedChem) by jointly performing scaffold splitting and latent Dirichlet allocation on existing datasets. Our results on FedChem show that significant learning challenges arise when working with heterogeneous molecules across clients. We then propose a method to alleviate the problem: Federated Learning by Instance reweighTing (FLIT(+)). FLIT(+) can align local training across clients. Experiments conducted on FedChem validate the advantages of this method. This work should enable a new type of collaboration for improving artificial intelligence (AI) in chemistry that mitigates concerns about sharing valuable chemical data.Entities:
Keywords: federated learning; graph neural network; molecular property prediction
Year: 2022 PMID: 35755872 PMCID: PMC9214329 DOI: 10.1016/j.patter.2022.100521
Source DB: PubMed Journal: Patterns (N Y) ISSN: 2666-3899
Figure 1Heterogeneous federated molecular learning where three institutions focus on different types of molecules
The server has no access to training data.
Figure 2Illustration for the motivation of FLIT
We assume two clients as A and B, and the local data on these clients do not share the same distribution as the global one. Local models trained on biased local data will overfit the majority groups of data and underfit others. FLIT measures each sample’s prediction confidence and puts more weight on the uncertain data. As a result, the local data distribution will be better aligned to the global one, and the trained local models will also be more consistent with each other.
Statistics of datasets
| Dataset | #Compounds | #tasks | Task type | Metric |
|---|---|---|---|---|
| FreeSolv | 642 | 1 | Reg. | RMSE |
| Lipophilicity | 4,200 | 1 | Reg. | RMSE |
| ESOL | 1,128 | 1 | Reg. | RMSE |
| QM9 | 133,885 | 12 | Reg. | MAE |
| Tox21 | 7,831 | 12 | Cls. | ROC-AUC |
| SIDER | 1,427 | 27 | Cls. | ROC-AUC |
| ClinTox | 1,478 | 2 | Cls. | ROC-AUC |
| BBBP | 2,039 | 1 | Cls. | ROC-AUC |
| BACE | 1,213 | 1 | Cls. | ROC-AUC |
Reg., regression; Cls., classification; RMSE, root-mean-square error; MAE, mean absolute error; ROC-AUC, receiver operating characteristic-area under the curve.
Performance for federated molecular regression
| Dataset | α | Centralized training | Federated learning | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MolNet | FedChem | FedAvg | FedProx | MOON | FedFocalours | FedVATours | FLITours | FLIT+ours | ||
| FreeSolv | 0.1 | 1.40 | 1.430 | 1.771 | 1.693 | 1.376 | 1.686 | 1.371 | 1.634 | 1.228 |
| 0.5 | 1.445 | 1.376 | 1.423 | 1.322 | 1.299 | 1.366 | 1.127 | |||
| 1 | 1.223 | 1.216 | 1.469 | 1.294 | 1.150 | 1.277 | 1.061 | |||
| Lipophilicity | 0.1 | 0.655 | 0.6290 | 0.6361 | 0.6403 | 0.6426 | 0.6403 | 0.6556 | 0.6563 | 0.6392 |
| 0.5 | 0.6306 | 0.6365 | 0.6339 | 0.6351 | 0.6333 | 0.6368 | 0.6270 | |||
| 1 | 0.6505 | 0.6474 | 0.6442 | 0.6461 | 0.6488 | 0.6443 | 0.6403 | |||
| ESOL | 0.1 | 0.97 | 0.6570 | 0.8016 | 0.7702 | 0.7537 | 0.8022 | 0.7776 | 0.7788 | 0.7642 |
| 0.5 | 0.7524 | 0.7382 | 0.7258 | 0.7708 | 0.7243 | 0.7426 | 0.7119 | |||
| 1 | 0.7056 | 0.6828 | 0.6751 | 0.6822 | 0.7253 | 0.6705 | 0.6998 | |||
| QM9 | 0.1 | 0.0479 | 0.0890 | 0.5889 | 0.6036 | 0.5817 | 0.6164 | 0.5606 | 0.5713 | 0.5356 |
| 0.5 | 0.5906 | 0.5751 | 0.5707 | 0.6059 | 0.5656 | 0.5658 | 0.5222 | |||
| 1 | 0.5786 | 0.5691 | 0.5808 | 0.5822 | 0.5602 | 0.5621 | 0.5282 | |||
indicate if lower or higher numbers are better.
Results were obtained with centralized training.
Results were retrieved from Klicpera et al. with a seperate SchNet for each task.
Results were obtained by a single multitask network. Smaller α of LDA generates more extreme heterogeneous scenario. FedFocal and FedVAT are proposed in this paper as the variants of FLIT(+).
Best federated-learning results.
Performance for federated molecular classification
| Dataset | α | Centralized training | Federated learning | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MolNet | FedChem | FedAvg | FedProx | MOON | FedFocalours | FedVATours | FLITours | FLIT+ours | ||
| Tox21 | 0.1 | 0.829 | 0.8182 | 0.7705 | 0.7732 | 0.7331 | 0.7696 | 0.7733 | 0.7711 | 0.7802 |
| 0.5 | 0.7811 | 0.7774 | 0.7461 | 0.7812 | 0.7787 | 0.7825 | 0.7870 | |||
| 1 | 0.7770 | 0.7775 | 0.7457 | 0.7881 | 0.7706 | 0.7748 | 0.7806 | |||
| SIDER | 0.1 | 0.638 | 0.6260 | 0.6029 | 0.6056 | 0.5885 | 0.6016 | 0.6027 | 0.6035 | 0.6038 |
| 0.5 | 0.6011 | 0.5931 | 0.5966 | 0.6086 | 0.5981 | 0.6096 | 0.6146 | |||
| 1 | 0.6011 | 0.6023 | 0.5901 | 0.6003 | 0.6053 | 0.6072 | 0.6174 | |||
| ClinTox | 0.1 | 0.832 | 0.8903 | 0.7491 | 0.7540 | 0.7892 | 0.7789 | 0.7581 | 0.7761 | 0.7775 |
| 0.5 | 0.7521 | 0.7423 | 0.7917 | 0.7770 | 0.7614 | 0.7888 | 0.7852 | |||
| 1 | 0.7784 | 0.7791 | 0.8001 | 0.8036 | 0.7743 | 0.7849 | 0.7993 | |||
| BBBP | 0.1 | 0.690 | 0.8674 | 0.8361 | 0.8610 | 0.8737 | 0.8550 | 0.8673 | 0.8666 | 0.8663 |
| 0.5 | 0.8594 | 0.8879 | 0.8865 | 0.8726 | 0.8641 | 0.8671 | 0.8774 | |||
| 1 | 0.8453 | 0.8557 | 0.8487 | 0.8378 | 0.8386 | 0.8515 | 0.8515 | |||
| BACE | 0.1 | 0.806 | 0.8834 | 0.8203 | 0.8328 | 0.8373 | 0.8253 | 0.8166 | 0.8242 | 0.8467 |
| 0.5 | 0.8212 | 0.8398 | 0.8285 | 0.8332 | 0.8417 | 0.8516 | 0.8667 | |||
| 1 | 0.8486 | 0.8408 | 0.8561 | 0.8497 | 0.8578 | 0.8497 | 0.8561 | |||
indicate if lower or higher numbers are better.
Results are obtained with centralized training.
Best federated-learning results.
Figure 3Performance of baseline and our methods with varying communication rounds
Asterisk (∗) denotes that the results are obtained with centralized training. We find our method has a strong advantage with a few communication rounds.
Figure 4Performance of baseline and our methods with different number of clients
See Figure 3 for color legend. The small-scale local training data reduce federated-learning performance for all methods.