Literature DB >> 32649756

Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning-based neural network.

Xiang Zhou1, Hua Chai1, Huiying Zhao2, Ching-Hsing Luo1, Yuedong Yang1,3.   

Abstract

BACKGROUND: Gene expression plays a key intermediate role in linking molecular features at the DNA level and phenotype. However, owing to various limitations in experiments, the RNA-seq data are missing in many samples while there exist high-quality of DNA methylation data. Because DNA methylation is an important epigenetic modification to regulate gene expression, it can be used to predict RNA-seq data. For this purpose, many methods have been developed. A common limitation of these methods is that they mainly focus on a single cancer dataset and do not fully utilize information from large pan-cancer datasets.
RESULTS: Here, we have developed a novel method to impute missing gene expression data from DNA methylation data through a transfer learning-based neural network, namely, TDimpute. In the method, the pan-cancer dataset from The Cancer Genome Atlas (TCGA) was utilized for training a general model, which was then fine-tuned on the specific cancer dataset. By testing on 16 cancer datasets, we found that our method significantly outperforms other state-of-the-art methods in imputation accuracy with a 7-11% improvement under different missing rates. The imputed gene expression was further proved to be useful for downstream analyses, including the identification of both methylation-driving and prognosis-related genes, clustering analysis, and survival analysis on the TCGA dataset. More importantly, our method was indicated to be useful for general purposes by an independent test on the Wilms tumor dataset from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) project.
CONCLUSIONS: TDimpute is an effective method for RNA-seq imputation with limited training samples.
© The Author(s) 2020. Published by Oxford University Press.

Entities:  

Keywords:  DNA methylation; RNA-seq imputation; neural network; transfer learning

Year:  2020        PMID: 32649756      PMCID: PMC7350980          DOI: 10.1093/gigascience/giaa076

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


  25 in total

1.  Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules.

Authors:  Dong Wang; Yingli Lv; Zheng Guo; Xia Li; Yanhui Li; Jing Zhu; Da Yang; Jianzhen Xu; Chenguang Wang; Shaoqi Rao; Baofeng Yang
Journal:  Bioinformatics       Date:  2006-06-29       Impact factor: 6.937

2.  Toil enables reproducible, open source, big biomedical data analyses.

Authors:  John Vivian; Arjun Arkal Rao; Frank Austin Nothaft; Christopher Ketchum; Joel Armstrong; Adam Novak; Jacob Pfeil; Jake Narkizian; Alden D Deran; Audrey Musselman-Brown; Hannes Schmidt; Peter Amstutz; Brian Craft; Mary Goldman; Kate Rosenbloom; Melissa Cline; Brian O'Connor; Megan Hanna; Chet Birger; W James Kent; David A Patterson; Anthony D Joseph; Jingchun Zhu; Sasha Zaranek; Gad Getz; David Haussler; Benedict Paten
Journal:  Nat Biotechnol       Date:  2017-04-11       Impact factor: 54.908

3.  Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.

Authors:  Katherine A Hoadley; Christina Yau; Denise M Wolf; Andrew D Cherniack; David Tamborero; Sam Ng; Max D M Leiserson; Beifang Niu; Michael D McLellan; Vladislav Uzunangelov; Jiashan Zhang; Cyriac Kandoth; Rehan Akbani; Hui Shen; Larsson Omberg; Andy Chu; Adam A Margolin; Laura J Van't Veer; Nuria Lopez-Bigas; Peter W Laird; Benjamin J Raphael; Li Ding; A Gordon Robertson; Lauren A Byers; Gordon B Mills; John N Weinstein; Carter Van Waes; Zhong Chen; Eric A Collisson; Christopher C Benz; Charles M Perou; Joshua M Stuart
Journal:  Cell       Date:  2014-08-07       Impact factor: 41.582

4.  Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework.

Authors:  Valentin Voillet; Philippe Besse; Laurence Liaubet; Magali San Cristobal; Ignacio González
Journal:  BMC Bioinformatics       Date:  2016-10-03       Impact factor: 3.169

5.  Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models.

Authors:  Safoora Yousefi; Fatemeh Amrollahi; Mohamed Amgad; Chengliang Dong; Joshua E Lewis; Congzheng Song; David A Gutman; Sameer H Halani; Jose Enrique Velazquez Vega; Daniel J Brat; Lee A D Cooper
Journal:  Sci Rep       Date:  2017-09-15       Impact factor: 4.379

6.  EWAS: epigenome-wide association study software 2.0.

Authors:  Jing Xu; Linna Zhao; Di Liu; Simeng Hu; Xiuling Song; Jin Li; Hongchao Lv; Lian Duan; Mingming Zhang; Qinghua Jiang; Guiyou Liu; Shuilin Jin; Mingzhi Liao; Meng Zhang; Rennan Feng; Fanwu Kong; Liangde Xu; Yongshuai Jiang
Journal:  Bioinformatics       Date:  2018-08-01       Impact factor: 6.937

7.  Single-cell RNA-seq denoising using a deep count autoencoder.

Authors:  Gökcen Eraslan; Lukas M Simon; Maria Mircea; Nikola S Mueller; Fabian J Theis
Journal:  Nat Commun       Date:  2019-01-23       Impact factor: 14.919

8.  Predicting gene expression using DNA methylation in three human populations.

Authors:  Huan Zhong; Soyeon Kim; Degui Zhi; Xiangqin Cui
Journal:  PeerJ       Date:  2019-05-01       Impact factor: 2.984

9.  A statistical framework for cross-tissue transcriptome-wide association analysis.

Authors:  Yiming Hu; Mo Li; Qiongshi Lu; Haoyi Weng; Jiawei Wang; Seyedeh M Zekavat; Zhaolong Yu; Boyang Li; Jianlei Gu; Sydney Muchnik; Yu Shi; Brian W Kunkle; Shubhabrata Mukherjee; Pradeep Natarajan; Adam Naj; Amanda Kuzma; Yi Zhao; Paul K Crane; Hui Lu; Hongyu Zhao
Journal:  Nat Genet       Date:  2019-02-25       Impact factor: 38.330

10.  Module Analysis Captures Pancancer Genetically and Epigenetically Deregulated Cancer Driver Genes for Smoking and Antiviral Response.

Authors:  Magali Champion; Kevin Brennan; Tom Croonenborghs; Andrew J Gentles; Nathalie Pochet; Olivier Gevaert
Journal:  EBioMedicine       Date:  2017-12-01       Impact factor: 8.143

View more
  6 in total

Review 1.  A roadmap for multi-omics data integration using deep learning.

Authors:  Mingon Kang; Euiseong Ko; Tesfaye B Mersha
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

2.  scIMC: a platform for benchmarking comparison and visualization analysis of scRNA-seq data imputation methods.

Authors:  Chichi Dai; Yi Jiang; Chenglin Yin; Ran Su; Xiangxiang Zeng; Quan Zou; Kenta Nakai; Leyi Wei
Journal:  Nucleic Acids Res       Date:  2022-05-20       Impact factor: 19.160

3.  An Adaptive Transfer-Learning-Based Deep Cox Neural Network for Hepatocellular Carcinoma Prognosis Prediction.

Authors:  Hua Chai; Long Xia; Lei Zhang; Jiarui Yang; Zhongyue Zhang; Xiangjun Qian; Yuedong Yang; Weidong Pan
Journal:  Front Oncol       Date:  2021-09-27       Impact factor: 6.244

4.  Multimodal Dimension Reduction and Subtype Classification of Head and Neck Squamous Cell Tumors.

Authors:  Jonathan E Bard; Norma J Nowak; Michael J Buck; Satrajit Sinha
Journal:  Front Oncol       Date:  2022-07-13       Impact factor: 5.738

5.  Completing Single-Cell DNA Methylome Profiles via Transfer Learning Together With KL-Divergence.

Authors:  Sanjeeva Dodlapati; Zongliang Jiang; Jiangwen Sun
Journal:  Front Genet       Date:  2022-07-22       Impact factor: 4.772

6.  Integration of Multimodal Data from Disparate Sources for Identifying Disease Subtypes.

Authors:  Kaiyue Zhou; Bhagya Shree Kottoori; Seeya Awadhut Munj; Zhewei Zhang; Sorin Draghici; Suzan Arslanturk
Journal:  Biology (Basel)       Date:  2022-02-24
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.