Literature DB >> 23044541

Pinstripe: a suite of programs for integrating transcriptomic and proteomic datasets identifies novel proteins and improves differentiation of protein-coding and non-coding genes.

Dennis K Gascoigne1, Seth W Cheetham, Pierre B Cattenoz, Michael B Clark, Paulo P Amaral, Ryan J Taft, Dagmar Wilhelm, Marcel E Dinger, John S Mattick.   

Abstract

MOTIVATION: Comparing transcriptomic data with proteomic data to identify protein-coding sequences is a long-standing challenge in molecular biology, one that is exacerbated by the increasing size of high-throughput datasets. To address this challenge, and thereby to improve the quality of genome annotation and understanding of genome biology, we have developed an integrated suite of programs, called Pinstripe. We demonstrate its application, utility and discovery power using transcriptomic and proteomic data from publicly available datasets.
RESULTS: To demonstrate the efficacy of Pinstripe for large-scale analysis, we applied Pinstripe's reverse peptide mapping pipeline to a transcript library including de novo assembled transcriptomes from the human Illumina Body Atlas (IBA2) and GENCODE v10 gene annotations, and the EBI Proteomics Identifications Database (PRIDE) peptide database. This analysis identified 736 canonical open reading frames (ORFs) supported by three or more PRIDE peptide fragments that are positioned outside any known coding DNA sequence (CDS). Because of the unfiltered nature of the PRIDE database and high probability of false discovery, we further refined this list using independent evidence for translation, including the presence of a Kozak sequence or functional domains, synonymous/non-synonymous substitution ratios and ORF length. Using this integrative approach, we observed evidence of translation from a previously unknown let7e primary transcript, the archetypical lncRNA H19, and a homolog of RD3. Reciprocally, by exclusion of transcripts with mapped peptides or significant ORFs (>80 codon), we identify 32 187 loci with RNAs longer than 2000 nt that are unlikely to encode proteins.
AVAILABILITY AND IMPLEMENTATION: Pinstripe (pinstripe.matticklab.com) is freely available as source code or a Mono binary. Pinstripe is written in C# and runs under the Mono framework on Linux or Mac OS X, and both under Mono and .Net under Windows. CONTACT: m.dinger@garvan.org.au or j.mattick@garvan.org.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 23044541     DOI: 10.1093/bioinformatics/bts582

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  33 in total

Review 1.  Unique features of long non-coding RNA biogenesis and function.

Authors:  Jeffrey J Quinn; Howard Y Chang
Journal:  Nat Rev Genet       Date:  2016-01       Impact factor: 53.242

Review 2.  Towards a complete map of the human long non-coding RNA transcriptome.

Authors:  Barbara Uszczynska-Ratajczak; Julien Lagarde; Adam Frankish; Roderic Guigó; Rory Johnson
Journal:  Nat Rev Genet       Date:  2018-09       Impact factor: 53.242

3.  Discovery and annotation of long noncoding RNAs.

Authors:  John S Mattick; John L Rinn
Journal:  Nat Struct Mol Biol       Date:  2015-01       Impact factor: 15.369

4.  Noncanonical open reading frames encode functional proteins essential for cancer cell survival.

Authors:  John R Prensner; Oana M Enache; Victor Luria; Karsten Krug; Karl R Clauser; Joshua M Dempster; Amir Karger; Li Wang; Karolina Stumbraite; Vickie M Wang; Ginevra Botta; Nicholas J Lyons; Amy Goodale; Zohra Kalani; Briana Fritchman; Adam Brown; Douglas Alan; Thomas Green; Xiaoping Yang; Jacob D Jaffe; Jennifer A Roth; Federica Piccioni; Marc W Kirschner; Zhe Ji; David E Root; Todd R Golub
Journal:  Nat Biotechnol       Date:  2021-01-28       Impact factor: 54.908

5.  Models for Predicting Stage in Head and Neck Squamous Cell Carcinoma Using Proteomic and Transcriptomic Data.

Authors:  Chanchala D Kaddi; May D Wang
Journal:  IEEE J Biomed Health Inform       Date:  2015-10-08       Impact factor: 5.772

Review 6.  The rise of regulatory RNA.

Authors:  Kevin V Morris; John S Mattick
Journal:  Nat Rev Genet       Date:  2014-04-29       Impact factor: 53.242

7.  The Long Noncoding RNA SPRIGHTLY Regulates Cell Proliferation in Primary Human Melanocytes.

Authors:  Wei Zhao; Joseph Mazar; Bongyong Lee; Junko Sawada; Jian-Liang Li; John Shelley; Subramaniam Govindarajan; Dwight Towler; John S Mattick; Masanobu Komatsu; Marcel E Dinger; Ranjan J Perera
Journal:  J Invest Dermatol       Date:  2016-01-29       Impact factor: 8.551

8.  Identification of Tumor Microenvironment-Related Prognostic Biomarkers for Ovarian Serous Cancer 3-Year Mortality Using Targeted Maximum Likelihood Estimation: A TCGA Data Mining Study.

Authors:  Lu Wang; Xiaoru Sun; Chuandi Jin; Yue Fan; Fuzhong Xue
Journal:  Front Genet       Date:  2021-06-03       Impact factor: 4.599

9.  Long-read cDNA sequencing identifies functional pseudogenes in the human transcriptome.

Authors:  Robin-Lee Troskie; Yohaann Jafrani; Tim R Mercer; Adam D Ewing; Geoffrey J Faulkner; Seth W Cheetham
Journal:  Genome Biol       Date:  2021-05-10       Impact factor: 13.583

10.  Extensive identification and analysis of conserved small ORFs in animals.

Authors:  Sebastian D Mackowiak; Henrik Zauber; Chris Bielow; Denise Thiel; Kamila Kutz; Lorenzo Calviello; Guido Mastrobuoni; Nikolaus Rajewsky; Stefan Kempa; Matthias Selbach; Benedikt Obermayer
Journal:  Genome Biol       Date:  2015-09-14       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.