Literature DB >> 33274189

Predicting sites of epitranscriptome modifications using unsupervised representation learning based on generative adversarial networks.

Sirajul Salekin1, Milad Mostavi1, Yu-Chiao Chiu2, Yidong Chen2,3, Jianqiu Michelle Zhang1, Yufei Huang1,3.   

Abstract

Epitranscriptome is an exciting area that studies different types of modifications in transcripts and the prediction of such modification sites from the transcript sequence is of significant interest. However, the scarcity of positive sites for most modifications imposes critical challenges for training robust algorithms. To circumvent this problem, we propose MR-GAN, a generative adversarial network (GAN) based model, which is trained in an unsupervised fashion on the entire pre-mRNA sequences to learn a low dimensional embedding of transcriptomic sequences. MR-GAN was then applied to extract embeddings of the sequences in a training dataset we created for eight epitranscriptome modifications, including m6A, m1A, m1G, m2G, m5C, m5U, 2'-O-Me, Pseudouridine (Ψ) and Dihydrouridine (D), of which the positive samples are very limited. Prediction models were trained based on the embeddings extracted by MR-GAN. We compared the prediction performance with the one-hot encoding of the training sequences and SRAMP, a state-of-the-art m6A site prediction algorithm and demonstrated that the learned embeddings outperform one-hot encoding by a significant margin for up to 15% improvement. Using MR-GAN, we also investigated the sequence motifs for each modification type and uncovered known motifs as well as new motifs not possible with sequences directly. The results demonstrated that transcriptome features extracted using unsupervised learning could lead to high precision for predicting multiple types of epitranscriptome modifications, even when the data size is small and extremely imbalanced.

Entities:  

Keywords:  N6-methyladenosine (m6A); RNA modification site prediction; epitranscriptome; generative adversarial networks (GAN); methylated RNA immunoprecipitation sequencing (MeRIP-Seq); unsupervised representation learning

Year:  2020        PMID: 33274189      PMCID: PMC7710330          DOI: 10.3389/fphy.2020.00196

Source DB:  PubMed          Journal:  Front Phys        ISSN: 2296-424X


  33 in total

1.  The human genome browser at UCSC.

Authors:  W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

2.  pRNAm-PC: Predicting N(6)-methyladenosine sites in RNA sequences via physical-chemical properties.

Authors:  Zi Liu; Xuan Xiao; Dong-Jun Yu; Jianhua Jia; Wang-Ren Qiu; Kuo-Chen Chou
Journal:  Anal Biochem       Date:  2015-12-31       Impact factor: 3.365

Review 3.  Where, When, and How: Context-Dependent Functions of RNA Methylation Writers, Readers, and Erasers.

Authors:  Hailing Shi; Jiangbo Wei; Chuan He
Journal:  Mol Cell       Date:  2019-05-16       Impact factor: 17.970

4.  Nuclear m(6)A Reader YTHDC1 Regulates mRNA Splicing.

Authors:  Wen Xiao; Samir Adhikari; Ujwal Dahal; Yu-Sheng Chen; Ya-Juan Hao; Bao-Fa Sun; Hui-Ying Sun; Ang Li; Xiao-Li Ping; Wei-Yi Lai; Xing Wang; Hai-Li Ma; Chun-Min Huang; Ying Yang; Niu Huang; Gui-Bin Jiang; Hai-Lin Wang; Qi Zhou; Xiu-Jie Wang; Yong-Liang Zhao; Yun-Gui Yang
Journal:  Mol Cell       Date:  2016-02-11       Impact factor: 17.970

5.  Predicting effects of noncoding variants with deep learning-based sequence model.

Authors:  Jian Zhou; Olga G Troyanskaya
Journal:  Nat Methods       Date:  2015-08-24       Impact factor: 28.547

6.  MEME-ChIP: motif analysis of large DNA datasets.

Authors:  Philip Machanick; Timothy L Bailey
Journal:  Bioinformatics       Date:  2011-04-12       Impact factor: 6.937

7.  Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome.

Authors:  Bastian Linder; Anya V Grozhik; Anthony O Olarerin-George; Cem Meydan; Christopher E Mason; Samie R Jaffrey
Journal:  Nat Methods       Date:  2015-06-29       Impact factor: 28.547

8.  RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data.

Authors:  Jia-Jia Xuan; Wen-Ju Sun; Peng-Hui Lin; Ke-Ren Zhou; Shun Liu; Ling-Ling Zheng; Liang-Hu Qu; Jian-Hua Yang
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

9.  DNdisorder: predicting protein disorder using boosting and deep networks.

Authors:  Jesse Eickholt; Jianlin Cheng
Journal:  BMC Bioinformatics       Date:  2013-03-06       Impact factor: 3.169

10.  Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics.

Authors:  Ehsaneddin Asgari; Mohammad R K Mofrad
Journal:  PLoS One       Date:  2015-11-10       Impact factor: 3.240

View more
  4 in total

Review 1.  The Dihydrouridine landscape from tRNA to mRNA: a perspective on synthesis, structural impact and function.

Authors:  Olivier Finet; Carlo Yague-Sanz; Florian Marchand; Damien Hermand
Journal:  RNA Biol       Date:  2022-01       Impact factor: 4.766

2.  CancerSiamese: one-shot learning for predicting primary and metastatic tumor types unseen during model training.

Authors:  Milad Mostavi; Yu-Chiao Chiu; Yidong Chen; Yufei Huang
Journal:  BMC Bioinformatics       Date:  2021-05-12       Impact factor: 3.169

Review 3.  The emerging role of photoacoustic imaging in clinical oncology.

Authors:  Li Lin; Lihong V Wang
Journal:  Nat Rev Clin Oncol       Date:  2022-03-23       Impact factor: 66.675

Review 4.  The Regulation of RNA Modification Systems: The Next Frontier in Epitranscriptomics?

Authors:  Matthias R Schaefer
Journal:  Genes (Basel)       Date:  2021-02-26       Impact factor: 4.096

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.