Travis S Johnson1,2, Sihong Li1, Eric Franz3, Zhi Huang4,2, Shuyu Dan Li5, Moray J Campbell6, Kun Huang2,7, Yan Zhang1,8. 1. Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, USA. 2. Department of Medicine, Indiana University School of Medicine, 545 Barnhill Drive, Indianapolis, IN 46202, USA. 3. Ohio Supercomputer Center, 1224 Kinnear Road, Columbus, OH 43212, USA. 4. School of Electrical and Computer Engineering, Purdue University, 465 Northwestern Avenue, West Lafayette, IN 47907, USA. 5. Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA. 6. Division of Pharmaceutics and Pharmaceutical Chemistry, College of Pharmacy, The Ohio State University, 500 West 12 th Avenue, Columbus, OH 43210, USA. 7. Regenstrief Institute, Indiana University, 1101 West 10 th Street, Indianapolis, IN 46262, USA. 8. The Ohio State University Comprehensive Cancer Center (OSUCCC - James), 460 West 10 th Avenue, Columbus, OH 43210, USA.
Abstract
BACKGROUND: Long thought "relics" of evolution, not until recently have pseudogenes been of medical interest regarding regulation in cancer. Often, these regulatory roles are a direct by-product of their close sequence homology to protein-coding genes. Novel pseudogene-gene (PGG) functional associations can be identified through the integration of biomedical data, such as sequence homology, functional pathways, gene expression, pseudogene expression, and microRNA expression. However, not all of the information has been integrated, and almost all previous pseudogene studies relied on 1:1 pseudogene-parent gene relationships without leveraging other homologous genes/pseudogenes. RESULTS: We produce PGG families that expand beyond the current 1:1 paradigm. First, we construct expansive PGG databases by (i) CUDAlign graphics processing unit (GPU) accelerated local alignment of all pseudogenes to gene families (totaling 1.6 billion individual local alignments and >40,000 GPU hours) and (ii) BLAST-based assignment of pseudogenes to gene families. Second, we create an open-source web application (PseudoFuN [Pseudogene Functional Networks]) to search for integrative functional relationships of sequence homology, microRNA expression, gene expression, pseudogene expression, and gene ontology. We produce four "flavors" of CUDAlign-based databases (>462,000,000 PGG pairwise alignments and 133,770 PGG families) that can be queried and downloaded using PseudoFuN. These databases are consistent with previous 1:1 PGG annotation and also are much more powerful including millions of de novo PGG associations. For example, we find multiple known (e.g., miR-20a-PTEN-PTENP1) and novel (e.g., miR-375-SOX15-PPP4R1L) microRNA-gene-pseudogene associations in prostate cancer. PseudoFuN provides a "one stop shop" for identifying and visualizing thousands of potential regulatory relationships related to pseudogenes in The Cancer Genome Atlas cancers. CONCLUSIONS: Thousands of new PGG associations can be explored in the context of microRNA-gene-pseudogene co-expression and differential expression with a simple-to-use online tool by bioinformaticians and oncologists alike.
BACKGROUND: Long thought "relics" of evolution, not until recently have pseudogenes been of medical interest regarding regulation in cancer. Often, these regulatory roles are a direct by-product of their close sequence homology to protein-coding genes. Novel pseudogene-gene (PGG) functional associations can be identified through the integration of biomedical data, such as sequence homology, functional pathways, gene expression, pseudogene expression, and microRNA expression. However, not all of the information has been integrated, and almost all previous pseudogene studies relied on 1:1 pseudogene-parent gene relationships without leveraging other homologous genes/pseudogenes. RESULTS: We produce PGG families that expand beyond the current 1:1 paradigm. First, we construct expansive PGG databases by (i) CUDAlign graphics processing unit (GPU) accelerated local alignment of all pseudogenes to gene families (totaling 1.6 billion individual local alignments and >40,000 GPU hours) and (ii) BLAST-based assignment of pseudogenes to gene families. Second, we create an open-source web application (PseudoFuN [Pseudogene Functional Networks]) to search for integrative functional relationships of sequence homology, microRNA expression, gene expression, pseudogene expression, and gene ontology. We produce four "flavors" of CUDAlign-based databases (>462,000,000 PGG pairwise alignments and 133,770 PGG families) that can be queried and downloaded using PseudoFuN. These databases are consistent with previous 1:1 PGG annotation and also are much more powerful including millions of de novo PGG associations. For example, we find multiple known (e.g., miR-20a-PTEN-PTENP1) and novel (e.g., miR-375-SOX15-PPP4R1L) microRNA-gene-pseudogene associations in prostate cancer. PseudoFuN provides a "one stop shop" for identifying and visualizing thousands of potential regulatory relationships related to pseudogenes in The Cancer Genome Atlas cancers. CONCLUSIONS: Thousands of new PGG associations can be explored in the context of microRNA-gene-pseudogene co-expression and differential expression with a simple-to-use online tool by bioinformaticians and oncologists alike.
Authors: Azra Krek; Dominic Grün; Matthew N Poy; Rachel Wolf; Lauren Rosenberg; Eric J Epstein; Philip MacMenamin; Isabelle da Piedade; Kristin C Gunsalus; Markus Stoffel; Nikolaus Rajewsky Journal: Nat Genet Date: 2005-04-03 Impact factor: 38.330
Authors: Robert L Grossman; Allison P Heath; Vincent Ferretti; Harold E Varmus; Douglas R Lowy; Warren A Kibbe; Louis M Staudt Journal: N Engl J Med Date: 2016-09-22 Impact factor: 91.245
Authors: Liliana Soroceanu; Lisa Matlaf; Sabeena Khan; Armin Akhavan; Eric Singer; Vladimir Bezrookove; Stacy Decker; Saleena Ghanny; Piotr Hadaczek; Henrik Bengtsson; John Ohlfest; Maria-Gloria Luciani-Torres; Lualhati Harkins; Arie Perry; Hong Guo; Patricia Soteropoulos; Charles S Cobbs Journal: Cancer Res Date: 2015-08-01 Impact factor: 12.701
Authors: João Ramalho-Carvalho; Filipa Quintela Vieira; Carmen Jerónimo; Pedro Costa-Pinheiro; Jorge Torres-Ferreira; Jorge Oliveira; Céline S Gonçalves; Bruno M Costa; Rui Henrique Journal: Clin Epigenetics Date: 2015-04-10 Impact factor: 6.551
Authors: Joshua D Welch; Jeanette Baran-Gale; Charles M Perou; Praveen Sethupathy; Jan F Prins Journal: BMC Genomics Date: 2015-02-22 Impact factor: 3.969
Authors: Zhi Huang; Travis S Johnson; Zhi Han; Bryan Helm; Sha Cao; Chi Zhang; Paul Salama; Maher Rizkalla; Christina Y Yu; Jun Cheng; Shunian Xiang; Xiaohui Zhan; Jie Zhang; Kun Huang Journal: BMC Med Genomics Date: 2020-04-03 Impact factor: 3.063