Literature DB >> 32761097

Genomic data imputation with variational auto-encoders.

Yeping Lina Qiu1,2, Hong Zheng1, Olivier Gevaert1,3.   

Abstract

BACKGROUND: As missing values are frequently present in genomic data, practical methods to handle missing data are necessary for downstream analyses that require complete data sets. State-of-the-art imputation techniques, including methods based on singular value decomposition and K-nearest neighbors, can be computationally expensive for large data sets and it is difficult to modify these algorithms to handle certain cases not missing at random.
RESULTS: In this work, we use a deep-learning framework based on the variational auto-encoder (VAE) for genomic missing value imputation and demonstrate its effectiveness in transcriptome and methylome data analysis. We show that in the vast majority of our testing scenarios, VAE achieves similar or better performances than the most widely used imputation standards, while having a computational advantage at evaluation time. When dealing with data missing not at random (e.g., few values are missing), we develop simple yet effective methodologies to leverage the prior knowledge about missing data. Furthermore, we investigate the effect of varying latent space regularization strength in VAE on the imputation performances and, in this context, show why VAE has a better imputation capacity compared to a regular deterministic auto-encoder.
CONCLUSIONS: We describe a deep learning imputation framework for transcriptome and methylome data using a VAE and show that it can be a preferable alternative to traditional methods for data imputation, especially in the setting of large-scale data and certain missing-not-at-random scenarios.
© The Author(s) 2020. Published by Oxford University Press.

Entities:  

Keywords:  deep learning; imputation; variational auto-encoder

Mesh:

Year:  2020        PMID: 32761097      PMCID: PMC7407276          DOI: 10.1093/gigascience/giaa082

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


  30 in total

Review 1.  Deep learning in bioinformatics.

Authors:  Seonwoo Min; Byunghan Lee; Sungroh Yoon
Journal:  Brief Bioinform       Date:  2017-09-01       Impact factor: 11.622

2.  MISSING DATA IMPUTATION IN THE ELECTRONIC HEALTH RECORD USING DEEPLY LEARNED AUTOENCODERS.

Authors:  Brett K Beaulieu-Jones; Jason H Moore
Journal:  Pac Symp Biocomput       Date:  2017

3.  Modeling electroencephalography waveforms with semi-supervised deep belief nets: fast classification and anomaly measurement.

Authors:  D F Wulsin; J R Gupta; R Mani; J A Blanco; B Litt
Journal:  J Neural Eng       Date:  2011-04-28       Impact factor: 5.379

4.  Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders.

Authors:  Gregory P Way; Casey S Greene
Journal:  Pac Symp Biocomput       Date:  2018

5.  Single-cell RNA-seq denoising using a deep count autoencoder.

Authors:  Gökcen Eraslan; Lukas M Simon; Maria Mircea; Nikola S Mueller; Fabian J Theis
Journal:  Nat Commun       Date:  2019-01-23       Impact factor: 14.919

6.  Benchmark of long non-coding RNA quantification for RNA sequencing of cancer samples.

Authors:  Hong Zheng; Kevin Brennan; Mikel Hernaez; Olivier Gevaert
Journal:  Gigascience       Date:  2019-12-01       Impact factor: 6.524

7.  Genomic, Pathway Network, and Immunologic Features Distinguishing Squamous Carcinomas.

Authors:  Joshua D Campbell; Christina Yau; Reanne Bowlby; Yuexin Liu; Kevin Brennan; Huihui Fan; Alison M Taylor; Chen Wang; Vonn Walter; Rehan Akbani; Lauren Averett Byers; Chad J Creighton; Cristian Coarfa; Juliann Shih; Andrew D Cherniack; Olivier Gevaert; Marcos Prunello; Hui Shen; Pavana Anur; Jianhong Chen; Hui Cheng; D Neil Hayes; Susan Bullman; Chandra Sekhar Pedamallu; Akinyemi I Ojesina; Sara Sadeghi; Karen L Mungall; A Gordon Robertson; Christopher Benz; Andre Schultz; Rupa S Kanchi; Carl M Gay; Apurva Hegde; Lixia Diao; Jing Wang; Wencai Ma; Pavel Sumazin; Hua-Sheng Chiu; Ting-Wen Chen; Preethi Gunaratne; Larry Donehower; Janet S Rader; Rosemary Zuna; Hikmat Al-Ahmadie; Alexander J Lazar; Elsa R Flores; Kenneth Y Tsai; Jane H Zhou; Anil K Rustgi; Esther Drill; Ronglei Shen; Christopher K Wong; Joshua M Stuart; Peter W Laird; Katherine A Hoadley; John N Weinstein; Myron Peto; Curtis R Pickering; Zhong Chen; Carter Van Waes
Journal:  Cell Rep       Date:  2018-04-03       Impact factor: 9.423

8.  Effects of GC bias in next-generation-sequencing data on de novo genome assembly.

Authors:  Yen-Chun Chen; Tsunglin Liu; Chun-Hui Yu; Tzen-Yuh Chiang; Chi-Chuan Hwang
Journal:  PLoS One       Date:  2013-04-29       Impact factor: 3.240

9.  Deep learning of the tissue-regulated splicing code.

Authors:  Michael K K Leung; Hui Yuan Xiong; Leo J Lee; Brendan J Frey
Journal:  Bioinformatics       Date:  2014-06-15       Impact factor: 6.937

10.  The Ability of Different Imputation Methods to Preserve the Significant Genes and Pathways in Cancer.

Authors:  Rosa Aghdam; Taban Baghfalaki; Pegah Khosravi; Elnaz Saberi Ansari
Journal:  Genomics Proteomics Bioinformatics       Date:  2017-12-13       Impact factor: 7.691

View more
  5 in total

1.  Interpretable machine learning for high-dimensional trajectories of aging health.

Authors:  Spencer Farrell; Arnold Mitnitski; Kenneth Rockwood; Andrew D Rutenberg
Journal:  PLoS Comput Biol       Date:  2022-01-10       Impact factor: 4.475

2.  A machine learning-based data mining in medical examination data: a biological features-based biological age prediction model.

Authors:  Qing Yang; Sunan Gao; Junfen Lin; Ke Lyu; Zexu Wu; Yuhao Chen; Yinwei Qiu; Yanrong Zhao; Wei Wang; Tianxiang Lin; Huiyun Pan; Ming Chen
Journal:  BMC Bioinformatics       Date:  2022-10-03       Impact factor: 3.307

3.  A Benchmark for Data Imputation Methods.

Authors:  Sebastian Jäger; Arndt Allhorn; Felix Bießmann
Journal:  Front Big Data       Date:  2021-07-08

Review 4.  Privacy considerations for sharing genomics data.

Authors:  Marie Oestreich; Dingfan Chen; Joachim L Schultze; Mario Fritz; Matthias Becker
Journal:  EXCLI J       Date:  2021-07-16       Impact factor: 4.068

5.  A deep learning approach for predicting severity of COVID-19 patients using a parsimonious set of laboratory markers.

Authors:  Vivek Singh; Rishikesan Kamaleswaran; Donald Chalfin; Antonio Buño-Soto; Janika San Roman; Edith Rojas-Kenney; Ross Molinaro; Sabine von Sengbusch; Parsa Hodjat; Dorin Comaniciu; Ali Kamen
Journal:  iScience       Date:  2021-11-27
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.