Literature DB >> 23802565

Proteogenomic database construction driven from large scale RNA-seq data.

Sunghee Woo1, Seong Won Cha, Gennifer Merrihew, Yupeng He, Natalie Castellana, Clark Guest, Michael MacCoss, Vineet Bafna.   

Abstract

The advent of inexpensive RNA-seq technologies and other deep sequencing technologies for RNA has the promise to radically improve genomic annotation, providing information on transcribed regions and splicing events in a variety of cellular conditions. Using MS-based proteogenomics, many of these events can be confirmed directly at the protein level. However, the integration of large amounts of redundant RNA-seq data and mass spectrometry data poses a challenging problem. Our paper addresses this by construction of a compact database that contains all useful information expressed in RNA-seq reads. Applying our method to cumulative C. elegans data reduced 496.2 GB of aligned RNA-seq SAM files to 410 MB of splice graph database written in FASTA format. This corresponds to 1000× compression of data size, without loss of sensitivity. We performed a proteogenomics study using the custom data set, using a completely automated pipeline, and identified a total of 4044 novel events, including 215 novel genes, 808 novel exons, 12 alternative splicings, 618 gene-boundary corrections, 245 exon-boundary changes, 938 frame shifts, 1166 reverse strands, and 42 translated UTRs. Our results highlight the usefulness of transcript + proteomic integration for improved genome annotations.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 23802565      PMCID: PMC4034692          DOI: 10.1021/pr400294c

Source DB:  PubMed          Journal:  J Proteome Res        ISSN: 1535-3893            Impact factor:   4.466


  24 in total

1.  The human genome browser at UCSC.

Authors:  W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

2.  The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: applications to database search.

Authors:  Sangtae Kim; Nikolai Mischerikow; Nuno Bandeira; J Daniel Navarro; Louis Wich; Shabaz Mohammed; Albert J R Heck; Pavel A Pevzner
Journal:  Mol Cell Proteomics       Date:  2010-09-09       Impact factor: 5.911

3.  Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project.

Authors:  Mark B Gerstein; Zhi John Lu; Eric L Van Nostrand; Chao Cheng; Bradley I Arshinoff; Tao Liu; Kevin Y Yip; Rebecca Robilotto; Andreas Rechtsteiner; Kohta Ikegami; Pedro Alves; Aurelien Chateigner; Marc Perry; Mitzi Morris; Raymond K Auerbach; Xin Feng; Jing Leng; Anne Vielle; Wei Niu; Kahn Rhrissorrakrai; Ashish Agarwal; Roger P Alexander; Galt Barber; Cathleen M Brdlik; Jennifer Brennan; Jeremy Jean Brouillet; Adrian Carr; Ming-Sin Cheung; Hiram Clawson; Sergio Contrino; Luke O Dannenberg; Abby F Dernburg; Arshad Desai; Lindsay Dick; Andréa C Dosé; Jiang Du; Thea Egelhofer; Sevinc Ercan; Ghia Euskirchen; Brent Ewing; Elise A Feingold; Reto Gassmann; Peter J Good; Phil Green; Francois Gullier; Michelle Gutwein; Mark S Guyer; Lukas Habegger; Ting Han; Jorja G Henikoff; Stefan R Henz; Angie Hinrichs; Heather Holster; Tony Hyman; A Leo Iniguez; Judith Janette; Morten Jensen; Masaomi Kato; W James Kent; Ellen Kephart; Vishal Khivansara; Ekta Khurana; John K Kim; Paulina Kolasinska-Zwierz; Eric C Lai; Isabel Latorre; Amber Leahey; Suzanna Lewis; Paul Lloyd; Lucas Lochovsky; Rebecca F Lowdon; Yaniv Lubling; Rachel Lyne; Michael MacCoss; Sebastian D Mackowiak; Marco Mangone; Sheldon McKay; Desirea Mecenas; Gennifer Merrihew; David M Miller; Andrew Muroyama; John I Murray; Siew-Loon Ooi; Hoang Pham; Taryn Phippen; Elicia A Preston; Nikolaus Rajewsky; Gunnar Rätsch; Heidi Rosenbaum; Joel Rozowsky; Kim Rutherford; Peter Ruzanov; Mihail Sarov; Rajkumar Sasidharan; Andrea Sboner; Paul Scheid; Eran Segal; Hyunjin Shin; Chong Shou; Frank J Slack; Cindie Slightam; Richard Smith; William C Spencer; E O Stinson; Scott Taing; Teruaki Takasaki; Dionne Vafeados; Ksenia Voronina; Guilin Wang; Nicole L Washington; Christina M Whittle; Beijing Wu; Koon-Kiu Yan; Georg Zeller; Zheng Zha; Mei Zhong; Xingliang Zhou; Julie Ahringer; Susan Strome; Kristin C Gunsalus; Gos Micklem; X Shirley Liu; Valerie Reinke; Stuart K Kim; LaDeana W Hillier; Steven Henikoff; Fabio Piano; Michael Snyder; Lincoln Stein; Jason D Lieb; Robert H Waterston
Journal:  Science       Date:  2010-12-22       Impact factor: 47.728

4.  A bioinformatics workflow for variant peptide detection in shotgun proteomics.

Authors:  Jing Li; Zengliu Su; Ze-Qiang Ma; Robbert J C Slebos; Patrick Halvey; David L Tabb; Daniel C Liebler; William Pao; Bing Zhang
Journal:  Mol Cell Proteomics       Date:  2011-03-09       Impact factor: 5.911

5.  Discovery and revision of Arabidopsis genes by proteogenomics.

Authors:  Natalie E Castellana; Samuel H Payne; Zhouxin Shen; Mario Stanke; Vineet Bafna; Steven P Briggs
Journal:  Proc Natl Acad Sci U S A       Date:  2008-12-19       Impact factor: 11.205

6.  Multiplexed size separation of intact proteins in solution phase for mass spectrometry.

Authors:  John C Tran; Alan A Doucette
Journal:  Anal Chem       Date:  2009-08-01       Impact factor: 6.986

7.  The genetics of Caenorhabditis elegans.

Authors:  S Brenner
Journal:  Genetics       Date:  1974-05       Impact factor: 4.562

8.  Use of shotgun proteomics for the identification, confirmation, and correction of C. elegans gene annotations.

Authors:  Gennifer E Merrihew; Colleen Davis; Brent Ewing; Gary Williams; Lukas Käll; Barbara E Frewen; William Stafford Noble; Phil Green; James H Thomas; Michael J MacCoss
Journal:  Genome Res       Date:  2008-07-24       Impact factor: 9.043

9.  Regulation of Caenorhabditis elegans vitellogenesis by DAF-2/IIS through separable transcriptional and posttranscriptional mechanisms.

Authors:  Ana S DePina; Wendy B Iser; Sung-Soo Park; Stuart Maudsley; Mark A Wilson; Catherine A Wolkow
Journal:  BMC Physiol       Date:  2011-07-12

10.  Novel peptide identification from tandem mass spectra using ESTs and sequence database compression.

Authors:  Nathan J Edwards
Journal:  Mol Syst Biol       Date:  2007-04-17       Impact factor: 11.429

View more
  41 in total

1.  A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites.

Authors:  Alexander Koch; Daria Gawron; Sandra Steyaert; Elvis Ndah; Jeroen Crappé; Sarah De Keulenaer; Ellen De Meester; Ming Ma; Ben Shen; Kris Gevaert; Wim Van Criekinge; Petra Van Damme; Gerben Menschaert
Journal:  Proteomics       Date:  2014-10-02       Impact factor: 3.984

2.  Proteogenomic strategies for identification of aberrant cancer peptides using large-scale next-generation sequencing data.

Authors:  Sunghee Woo; Seong Won Cha; Seungjin Na; Clark Guest; Tao Liu; Richard D Smith; Karin D Rodland; Samuel Payne; Vineet Bafna
Journal:  Proteomics       Date:  2014-11-17       Impact factor: 3.984

3.  A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra.

Authors:  Qiang Kou; Si Wu; Nikola Tolic; Ljiljana Paša-Tolic; Yunlong Liu; Xiaowen Liu
Journal:  Bioinformatics       Date:  2017-05-01       Impact factor: 6.937

Review 4.  Methods, Tools and Current Perspectives in Proteogenomics.

Authors:  Kelly V Ruggles; Karsten Krug; Xiaojing Wang; Karl R Clauser; Jing Wang; Samuel H Payne; David Fenyö; Bing Zhang; D R Mani
Journal:  Mol Cell Proteomics       Date:  2017-04-29       Impact factor: 5.911

5.  Annotation of the zebrafish genome through an integrated transcriptomic and proteomic analysis.

Authors:  Dhanashree S Kelkar; Elayne Provost; Raghothama Chaerkady; Babylakshmi Muthusamy; Srikanth S Manda; Tejaswini Subbannayya; Lakshmi Dhevi N Selvan; Chieh-Huei Wang; Keshava K Datta; Sunghee Woo; Sutopa B Dwivedi; Santosh Renuse; Derese Getnet; Tai-Chung Huang; Min-Sik Kim; Sneha M Pinto; Christopher J Mitchell; Anil K Madugundu; Praveen Kumar; Jyoti Sharma; Jayshree Advani; Gourav Dey; Lavanya Balakrishnan; Nazia Syed; Vishalakshi Nanjappa; Yashwanth Subbannayya; Renu Goel; T S Keshava Prasad; Vineet Bafna; Ravi Sirdeshmukh; Harsha Gowda; Charles Wang; Steven D Leach; Akhilesh Pandey
Journal:  Mol Cell Proteomics       Date:  2014-07-24       Impact factor: 5.911

6.  Leveraging the complementary nature of RNA-Seq and shotgun proteomics data.

Authors:  Xiaojing Wang; Qi Liu; Bing Zhang
Journal:  Proteomics       Date:  2014-11-17       Impact factor: 3.984

7.  Proteogenomics of Gammarus fossarum to document the reproductive system of amphipods.

Authors:  Judith Trapp; Olivier Geffard; Gilles Imbert; Jean-Charles Gaillard; Anne-Hélène Davin; Arnaud Chaumot; Jean Armengaud
Journal:  Mol Cell Proteomics       Date:  2014-10-07       Impact factor: 5.911

Review 8.  The emergence of proteome-wide technologies: systematic analysis of proteins comes of age.

Authors:  Michal Breker; Maya Schuldiner
Journal:  Nat Rev Mol Cell Biol       Date:  2014-06-18       Impact factor: 94.444

9.  JUMPg: An Integrative Proteogenomics Pipeline Identifying Unannotated Proteins in Human Brain and Cancer Cells.

Authors:  Yuxin Li; Xusheng Wang; Ji-Hoon Cho; Timothy I Shaw; Zhiping Wu; Bing Bai; Hong Wang; Suiping Zhou; Thomas G Beach; Gang Wu; Jinghui Zhang; Junmin Peng
Journal:  J Proteome Res       Date:  2016-06-13       Impact factor: 4.466

10.  Proteogenomic Annotation of Chinese Hamsters Reveals Extensive Novel Translation Events and Endogenous Retroviral Elements.

Authors:  Shangzhong Li; Seong Won Cha; Kelly Heffner; Deniz Baycin Hizal; Michael A Bowen; Raghothama Chaerkady; Robert N Cole; Vijay Tejwani; Prashant Kaushik; Michael Henry; Paula Meleady; Susan T Sharfstein; Michael J Betenbaugh; Vineet Bafna; Nathan E Lewis
Journal:  J Proteome Res       Date:  2019-05-08       Impact factor: 4.466

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.