Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning-based neural network.

Literature DB >> 32649756

Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning-based neural network.

Xiang Zhou¹, Hua Chai¹, Huiying Zhao², Ching-Hsing Luo¹, Yuedong Yang^1,3.

Abstract

BACKGROUND: Gene expression plays a key intermediate role in linking molecular features at the DNA level and phenotype. However, owing to various limitations in experiments, the RNA-seq data are missing in many samples while there exist high-quality of DNA methylation data. Because DNA methylation is an important epigenetic modification to regulate gene expression, it can be used to predict RNA-seq data. For this purpose, many methods have been developed. A common limitation of these methods is that they mainly focus on a single cancer dataset and do not fully utilize information from large pan-cancer datasets.
RESULTS: Here, we have developed a novel method to impute missing gene expression data from DNA methylation data through a transfer learning-based neural network, namely, TDimpute. In the method, the pan-cancer dataset from The Cancer Genome Atlas (TCGA) was utilized for training a general model, which was then fine-tuned on the specific cancer dataset. By testing on 16 cancer datasets, we found that our method significantly outperforms other state-of-the-art methods in imputation accuracy with a 7-11% improvement under different missing rates. The imputed gene expression was further proved to be useful for downstream analyses, including the identification of both methylation-driving and prognosis-related genes, clustering analysis, and survival analysis on the TCGA dataset. More importantly, our method was indicated to be useful for general purposes by an independent test on the Wilms tumor dataset from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) project.
CONCLUSIONS: TDimpute is an effective method for RNA-seq imputation with limited training samples.

Entities: Chemical Disease Gene Species

Keywords: DNA methylation; RNA-seq imputation; neural network; transfer learning

Year: 2020 PMID： 32649756 PMCID： PMC7350980 DOI： 10.1093/gigascience/giaa076

Source DB: PubMed Journal: Gigascience ISSN： 2047-217X Impact factor: 6.524

25 in total

1. Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules.

Authors: Dong Wang; Yingli Lv; Zheng Guo; Xia Li; Yanhui Li; Jing Zhu; Da Yang; Jianzhen Xu; Chenguang Wang; Shaoqi Rao; Baofeng Yang
Journal: Bioinformatics Date: 2006-06-29 Impact factor: 6.937

2. Toil enables reproducible, open source, big biomedical data analyses.

Authors: John Vivian; Arjun Arkal Rao; Frank Austin Nothaft; Christopher Ketchum; Joel Armstrong; Adam Novak; Jacob Pfeil; Jake Narkizian; Alden D Deran; Audrey Musselman-Brown; Hannes Schmidt; Peter Amstutz; Brian Craft; Mary Goldman; Kate Rosenbloom; Melissa Cline; Brian O'Connor; Megan Hanna; Chet Birger; W James Kent; David A Patterson; Anthony D Joseph; Jingchun Zhu; Sasha Zaranek; Gad Getz; David Haussler; Benedict Paten
Journal: Nat Biotechnol Date: 2017-04-11 Impact factor: 54.908

3. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.

Authors: Katherine A Hoadley; Christina Yau; Denise M Wolf; Andrew D Cherniack; David Tamborero; Sam Ng; Max D M Leiserson; Beifang Niu; Michael D McLellan; Vladislav Uzunangelov; Jiashan Zhang; Cyriac Kandoth; Rehan Akbani; Hui Shen; Larsson Omberg; Andy Chu; Adam A Margolin; Laura J Van't Veer; Nuria Lopez-Bigas; Peter W Laird; Benjamin J Raphael; Li Ding; A Gordon Robertson; Lauren A Byers; Gordon B Mills; John N Weinstein; Carter Van Waes; Zhong Chen; Eric A Collisson; Christopher C Benz; Charles M Perou; Joshua M Stuart
Journal: Cell Date: 2014-08-07 Impact factor: 41.582

4. Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework.

Authors: Valentin Voillet; Philippe Besse; Laurence Liaubet; Magali San Cristobal; Ignacio González
Journal: BMC Bioinformatics Date: 2016-10-03 Impact factor: 3.169

5. Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models.

Authors: Safoora Yousefi; Fatemeh Amrollahi; Mohamed Amgad; Chengliang Dong; Joshua E Lewis; Congzheng Song; David A Gutman; Sameer H Halani; Jose Enrique Velazquez Vega; Daniel J Brat; Lee A D Cooper
Journal: Sci Rep Date: 2017-09-15 Impact factor: 4.379

6. EWAS: epigenome-wide association study software 2.0.

Authors: Jing Xu; Linna Zhao; Di Liu; Simeng Hu; Xiuling Song; Jin Li; Hongchao Lv; Lian Duan; Mingming Zhang; Qinghua Jiang; Guiyou Liu; Shuilin Jin; Mingzhi Liao; Meng Zhang; Rennan Feng; Fanwu Kong; Liangde Xu; Yongshuai Jiang
Journal: Bioinformatics Date: 2018-08-01 Impact factor: 6.937

7. Single-cell RNA-seq denoising using a deep count autoencoder.

Authors: Gökcen Eraslan; Lukas M Simon; Maria Mircea; Nikola S Mueller; Fabian J Theis
Journal: Nat Commun Date: 2019-01-23 Impact factor: 14.919

8. Predicting gene expression using DNA methylation in three human populations.

Authors: Huan Zhong; Soyeon Kim; Degui Zhi; Xiangqin Cui
Journal: PeerJ Date: 2019-05-01 Impact factor: 2.984

9. A statistical framework for cross-tissue transcriptome-wide association analysis.

Authors: Yiming Hu; Mo Li; Qiongshi Lu; Haoyi Weng; Jiawei Wang; Seyedeh M Zekavat; Zhaolong Yu; Boyang Li; Jianlei Gu; Sydney Muchnik; Yu Shi; Brian W Kunkle; Shubhabrata Mukherjee; Pradeep Natarajan; Adam Naj; Amanda Kuzma; Yi Zhao; Paul K Crane; Hui Lu; Hongyu Zhao
Journal: Nat Genet Date: 2019-02-25 Impact factor: 38.330

10. Module Analysis Captures Pancancer Genetically and Epigenetically Deregulated Cancer Driver Genes for Smoking and Antiviral Response.

Authors: Magali Champion; Kevin Brennan; Tom Croonenborghs; Andrew J Gentles; Nathalie Pochet; Olivier Gevaert
Journal: EBioMedicine Date: 2017-12-01 Impact factor: 8.143

6 in total

Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning-based neural network.

1. Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules.

2. Toil enables reproducible, open source, big biomedical data analyses.

3. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.

4. Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework.

5. Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models.

6. EWAS: epigenome-wide association study software 2.0.

7. Single-cell RNA-seq denoising using a deep count autoencoder.

8. Predicting gene expression using DNA methylation in three human populations.

9. A statistical framework for cross-tissue transcriptome-wide association analysis.

10. Module Analysis Captures Pancancer Genetically and Epigenetically Deregulated Cancer Driver Genes for Smoking and Antiviral Response.

Review 1. A roadmap for multi-omics data integration using deep learning.

2. scIMC: a platform for benchmarking comparison and visualization analysis of scRNA-seq data imputation methods.

3. An Adaptive Transfer-Learning-Based Deep Cox Neural Network for Hepatocellular Carcinoma Prognosis Prediction.

4. Multimodal Dimension Reduction and Subtype Classification of Head and Neck Squamous Cell Tumors.

5. Completing Single-Cell DNA Methylome Profiles via Transfer Learning Together With KL-Divergence.

6. Integration of Multimodal Data from Disparate Sources for Identifying Disease Subtypes.