| Literature DB >> 35213530 |
Yuen Ler Chow1,2, Shantanu Singh1, Anne E Carpenter1, Gregory P Way1,3.
Abstract
A variational autoencoder (VAE) is a machine learning algorithm, useful for generating a compressed and interpretable latent space. These representations have been generated from various biomedical data types and can be used to produce realistic-looking simulated data. However, standard vanilla VAEs suffer from entangled and uninformative latent spaces, which can be mitigated using other types of VAEs such as β-VAE and MMD-VAE. In this project, we evaluated the ability of VAEs to learn cell morphology characteristics derived from cell images. We trained and evaluated these three VAE variants-Vanilla VAE, β-VAE, and MMD-VAE-on cell morphology readouts and explored the generative capacity of each model to predict compound polypharmacology (the interactions of a drug with more than one target) using an approach called latent space arithmetic (LSA). To test the generalizability of the strategy, we also trained these VAEs using gene expression data of the same compound perturbations and found that gene expression provides complementary information. We found that the β-VAE and MMD-VAE disentangle morphology signals and reveal a more interpretable latent space. We reliably simulated morphology and gene expression readouts from certain compounds thereby predicting cell states perturbed with compounds of known polypharmacology. Inferring cell state for specific drug mechanisms could aid researchers in developing and identifying targeted therapeutics and categorizing off-target effects in the future.Entities:
Mesh:
Year: 2022 PMID: 35213530 PMCID: PMC8906577 DOI: 10.1371/journal.pcbi.1009888
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Mean squared error (MSE) and earthmoving distance for VAE’s ability to reconstruct Cell Painting and L1000 profiles.
We compare these values with results derived from shuffled models. Earthmoving distance is calculated by taking the mean of the earthmoving distance of each sample. We add the 95% percentile range of earthmoving distance in parenthesis (0.05 lowest, 0.95 highest). Note that since our models required that we normalize Cell Painting and L1000 input data differently (see Methods), the metrics cannot be compared across data modalities.
| Dataset | VAE | MSE | MSE (Shuffled) | Earthmoving | Earthmoving (Shuffled) |
|---|---|---|---|---|---|
| Cell Painting level 5 | Vanilla | 0.00387 | 0.0088 | 0.016 (0.006, 0.040) | 0.024 (0.0096, 0.060) |
| Cell Painting level 5 | Beta | 0.00272 | 0.0088 | 0.012 (0.005, 0.028) | 0.024 (0.01, 0.06) |
| Cell Painting level 5 | MMD | 0.00435 | 0.0088 | 0.016 (0.005, 0.04) | 0.024 (0.01, 0.06) |
| Cell Painting level 4 | Vanilla | 0.00145 | 0.0014 | 0.006 (0.004, 0.009) | 0.0053 (0.004, 0.008) |
| Cell Painting level 4 | Beta | 0.00091 | 0.0014 | 0.0045 (0.002, 0.013) | 0.0051 (0.004, 0.008) |
| Cell Painting level 4 | MMD | 0.00075 | 0.0014 | 0.0039 (0.0022, 0.01) | 0.0051 (0.004, 0.008) |
| L1000 | Vanilla | 0.85 | 1.85 | 0.249 (0.14, 0.36) | 0.61 (0.28, 1.94) |
| L1000 | Beta | 1.23 | 2.10 | 0.445 (0.23, 0.73) | 0.67 (0.29, 2.25) |
| L1000 | MMD | 1.27 | 2.05 | 0.475 (0.25, 0.79) | 0.64 (0.29, 2.027) |
Hyperparameter combination of the top performing models for each dataset.
| Dataset | latent_dim | learning_rate | encoder_batch_ norm | batch_size | val_loss |
|---|---|---|---|---|---|
| Cell Painting level 5 | 10 | 0.001 | True | 96 | 2.24 |
| Cell Painting level 4 | 90 | 0.0001 | True | 32 | 0.74 |
| L1000 | 65 | 0.001 | True | 512 | 1363.45 |