Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Synthetic observations from deep generative models and binary omics data with limited sample size.

Literature DB >> 33003196

Synthetic observations from deep generative models and binary omics data with limited sample size.

Jens Nußberger¹, Frederic Boesel¹, Stefan Lenz¹, Harald Binder¹, Moritz Hess¹.

Abstract

Deep generative models can be trained to represent the joint distribution of data, such as measurements of single nucleotide polymorphisms (SNPs) from several individuals. Subsequently, synthetic observations are obtained by drawing from this distribution. This has been shown to be useful for several tasks, such as removal of noise, imputation, for better understanding underlying patterns, or even exchanging data under privacy constraints. Yet, it is still unclear how well these approaches work with limited sample size. We investigate such settings specifically for binary data, e.g. as relevant when considering SNP measurements, and evaluate three frequently employed generative modeling approaches, variational autoencoders (VAEs), deep Boltzmann machines (DBMs) and generative adversarial networks (GANs). This includes conditional approaches, such as when considering gene expression conditional on SNPs. Recovery of pair-wise odds ratios (ORs) is considered as a primary performance criterion. For simulated as well as real SNP data, we observe that DBMs generally can recover structure for up to 300 variables, with a tendency of over-estimating ORs when not carefully tuned. VAEs generally get the direction and relative strength of pairwise relations right, yet with considerable under-estimation of ORs. GANs provide stable results only with larger sample sizes and strong pair-wise relations in the data. Taken together, DBMs and VAEs (in contrast to GANs) appear to be well suited for binary omics data, even at rather small sample sizes. This opens the way for many potential applications where synthetic observations from omics data might be useful.

Keywords: SNP data; benchmarking; data privacy; generative models; synthetic data

Year: 2021 PMID： 33003196 DOI： 10.1093/bib/bbaa226

Source DB: PubMed Journal: Brief Bioinform ISSN： 1467-5463 Impact factor: 11.622

Keyword Cloud
Cited

3 in total

Synthetic observations from deep generative models and binary omics data with limited sample size.

1. Synthetic single cell RNA sequencing data from small pilot studies using deep generative models.

2. Deep generative models in DataSHIELD.

Review 3. Interpretable generative deep learning: an illustration with single cell gene expression data.