Literature DB >> 23629695

Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq.

Gloria M Sheynkman1, Michael R Shortreed, Brian L Frey, Lloyd M Smith.   

Abstract

Human proteomic databases required for MS peptide identification are frequently updated and carefully curated, yet are still incomplete because it has been challenging to acquire every protein sequence from the diverse assemblage of proteoforms expressed in every tissue and cell type. In particular, alternative splicing has been shown to be a major source of this cell-specific proteomic variation. Many new alternative splice forms have been detected at the transcript level using next generation sequencing methods, especially RNA-Seq, but it is not known how many of these transcripts are being translated. Leveraging the unprecedented capabilities of next generation sequencing methods, we collected RNA-Seq and proteomics data from the same cell population (Jurkat cells) and created a bioinformatics pipeline that builds customized databases for the discovery of novel splice-junction peptides. Eighty million paired-end Illumina reads and ∼500,000 tandem mass spectra were used to identify 12,873 transcripts (19,320 including isoforms) and 6810 proteins. We developed a bioinformatics workflow to retrieve high-confidence, novel splice junction sequences from the RNA data, translate these sequences into the analogous polypeptide sequence, and create a customized splice junction database for MS searching. Based on the RefSeq gene models, we detected 136,123 annotated and 144,818 unannotated transcript junctions. Of those, 24,834 unannotated junctions passed various quality filters (e.g. minimum read depth) and these entries were translated into 33,589 polypeptide sequences and used for database searching. We discovered 57 splice junction peptides not present in the Uniprot-Trembl proteomic database comprising an array of different splicing events, including skipped exons, alternative donors and acceptors, and noncanonical transcriptional start sites. To our knowledge this is the first example of using sample-specific RNA-Seq data to create a splice-junction database and discover new peptides resulting from alternative splicing.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 23629695      PMCID: PMC3734590          DOI: 10.1074/mcp.O113.028142

Source DB:  PubMed          Journal:  Mol Cell Proteomics        ISSN: 1535-9476            Impact factor:   5.911


  68 in total

1.  Initial sequencing and analysis of the human genome.

Authors:  E S Lander; L M Linton; B Birren; C Nusbaum; M C Zody; J Baldwin; K Devon; K Dewar; M Doyle; W FitzHugh; R Funke; D Gage; K Harris; A Heaford; J Howland; L Kann; J Lehoczky; R LeVine; P McEwan; K McKernan; J Meldrim; J P Mesirov; C Miranda; W Morris; J Naylor; C Raymond; M Rosetti; R Santos; A Sheridan; C Sougnez; Y Stange-Thomann; N Stojanovic; A Subramanian; D Wyman; J Rogers; J Sulston; R Ainscough; S Beck; D Bentley; J Burton; C Clee; N Carter; A Coulson; R Deadman; P Deloukas; A Dunham; I Dunham; R Durbin; L French; D Grafham; S Gregory; T Hubbard; S Humphray; A Hunt; M Jones; C Lloyd; A McMurray; L Matthews; S Mercer; S Milne; J C Mullikin; A Mungall; R Plumb; M Ross; R Shownkeen; S Sims; R H Waterston; R K Wilson; L W Hillier; J D McPherson; M A Marra; E R Mardis; L A Fulton; A T Chinwalla; K H Pepin; W R Gish; S L Chissoe; M C Wendl; K D Delehaunty; T L Miner; A Delehaunty; J B Kramer; L L Cook; R S Fulton; D L Johnson; P J Minx; S W Clifton; T Hawkins; E Branscomb; P Predki; P Richardson; S Wenning; T Slezak; N Doggett; J F Cheng; A Olsen; S Lucas; C Elkin; E Uberbacher; M Frazier; R A Gibbs; D M Muzny; S E Scherer; J B Bouck; E J Sodergren; K C Worley; C M Rives; J H Gorrell; M L Metzker; S L Naylor; R S Kucherlapati; D L Nelson; G M Weinstock; Y Sakaki; A Fujiyama; M Hattori; T Yada; A Toyoda; T Itoh; C Kawagoe; H Watanabe; Y Totoki; T Taylor; J Weissenbach; R Heilig; W Saurin; F Artiguenave; P Brottier; T Bruls; E Pelletier; C Robert; P Wincker; D R Smith; L Doucette-Stamm; M Rubenfield; K Weinstock; H M Lee; J Dubois; A Rosenthal; M Platzer; G Nyakatura; S Taudien; A Rump; H Yang; J Yu; J Wang; G Huang; J Gu; L Hood; L Rowen; A Madan; S Qin; R W Davis; N A Federspiel; A P Abola; M J Proctor; R M Myers; J Schmutz; M Dickson; J Grimwood; D R Cox; M V Olson; R Kaul; C Raymond; N Shimizu; K Kawasaki; S Minoshima; G A Evans; M Athanasiou; R Schultz; B A Roe; F Chen; H Pan; J Ramser; H Lehrach; R Reinhardt; W R McCombie; M de la Bastide; N Dedhia; H Blöcker; K Hornischer; G Nordsiek; R Agarwala; L Aravind; J A Bailey; A Bateman; S Batzoglou; E Birney; P Bork; D G Brown; C B Burge; L Cerutti; H C Chen; D Church; M Clamp; R R Copley; T Doerks; S R Eddy; E E Eichler; T S Furey; J Galagan; J G Gilbert; C Harmon; Y Hayashizaki; D Haussler; H Hermjakob; K Hokamp; W Jang; L S Johnson; T A Jones; S Kasif; A Kaspryzk; S Kennedy; W J Kent; P Kitts; E V Koonin; I Korf; D Kulp; D Lancet; T M Lowe; A McLysaght; T Mikkelsen; J V Moran; N Mulder; V J Pollara; C P Ponting; G Schuler; J Schultz; G Slater; A F Smit; E Stupka; J Szustakowki; D Thierry-Mieg; J Thierry-Mieg; L Wagner; J Wallis; R Wheeler; A Williams; Y I Wolf; K H Wolfe; S P Yang; R F Yeh; F Collins; M S Guyer; J Peterson; A Felsenfeld; K A Wetterstrand; A Patrinos; M J Morgan; P de Jong; J J Catanese; K Osoegawa; H Shizuya; S Choi; Y J Chen; J Szustakowki
Journal:  Nature       Date:  2001-02-15       Impact factor: 49.962

2.  The Ensembl genome database project.

Authors:  T Hubbard; D Barker; E Birney; G Cameron; Y Chen; L Clark; T Cox; J Cuff; V Curwen; T Down; R Durbin; E Eyras; J Gilbert; M Hammond; L Huminiecki; A Kasprzyk; H Lehvaslaiho; P Lijnzaad; C Melsopp; E Mongin; R Pettett; M Pocock; S Potter; A Rust; E Schmidt; S Searle; G Slater; J Smith; W Spooner; A Stabenau; J Stalker; E Stupka; A Ureta-Vidal; I Vastrik; M Clamp
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

3.  The human genome browser at UCSC.

Authors:  W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

4.  The International Protein Index: an integrated database for proteomics experiments.

Authors:  Paul J Kersey; Jorge Duarte; Allyson Williams; Youla Karavidopoulou; Ewan Birney; Rolf Apweiler
Journal:  Proteomics       Date:  2004-07       Impact factor: 3.984

5.  Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases.

Authors:  J R Yates; J K Eng; A L McCormack
Journal:  Anal Chem       Date:  1995-09-15       Impact factor: 6.986

Review 6.  Computational methods for transcriptome annotation and quantification using RNA-seq.

Authors:  Manuel Garber; Manfred G Grabherr; Mitchell Guttman; Cole Trapnell
Journal:  Nat Methods       Date:  2011-05-27       Impact factor: 28.547

7.  The discovery of novel protein-coding features in mouse genome based on mass spectrometry data.

Authors:  Xiao-Bin Xing; Qing-Run Li; Han Sun; Xing Fu; Fei Zhan; Xiu Huang; Jing Li; Chun-Lei Chen; Yu Shyr; Rong Zeng; Yi-Xue Li; Lu Xie
Journal:  Genomics       Date:  2011-08-04       Impact factor: 5.736

8.  IsoformResolver: A peptide-centric algorithm for protein inference.

Authors:  Karen Meyer-Arendt; William M Old; Stephane Houel; Kutralanathan Renganathan; Brian Eichelberger; Katheryn A Resing; Natalie G Ahn
Journal:  J Proteome Res       Date:  2011-06-07       Impact factor: 4.466

9.  Assessing the contribution of alternative splicing to proteome diversity in Arabidopsis thaliana using proteomics data.

Authors:  Edouard I Severing; Aalt D J van Dijk; Roeland C H J van Ham
Journal:  BMC Plant Biol       Date:  2011-05-16       Impact factor: 4.215

10.  RNA sequencing reveals two major classes of gene expression levels in metazoan cells.

Authors:  Daniel Hebenstreit; Miaoqing Fang; Muxin Gu; Varodom Charoensawan; Alexander van Oudenaarden; Sarah A Teichmann
Journal:  Mol Syst Biol       Date:  2011-06-07       Impact factor: 11.429

View more
  55 in total

1.  Large-scale mass spectrometric detection of variant peptides resulting from nonsynonymous nucleotide differences.

Authors:  Gloria M Sheynkman; Michael R Shortreed; Brian L Frey; Mark Scalf; Lloyd M Smith
Journal:  J Proteome Res       Date:  2013-11-11       Impact factor: 4.466

2.  Proteomic analysis of naturally-sourced biological scaffolds.

Authors:  Qiyao Li; Basak E Uygun; Sharon Geerts; Sinan Ozer; Mark Scalf; Sarah E Gilpin; Harald C Ott; Martin L Yarmush; Lloyd M Smith; Nathan V Welham; Brian L Frey
Journal:  Biomaterials       Date:  2015-10-08       Impact factor: 12.479

3.  Leveraging the complementary nature of RNA-Seq and shotgun proteomics data.

Authors:  Xiaojing Wang; Qi Liu; Bing Zhang
Journal:  Proteomics       Date:  2014-11-17       Impact factor: 3.984

4.  JUMPg: An Integrative Proteogenomics Pipeline Identifying Unannotated Proteins in Human Brain and Cancer Cells.

Authors:  Yuxin Li; Xusheng Wang; Ji-Hoon Cho; Timothy I Shaw; Zhiping Wu; Bing Bai; Hong Wang; Suiping Zhou; Thomas G Beach; Gang Wu; Jinghui Zhang; Junmin Peng
Journal:  J Proteome Res       Date:  2016-06-13       Impact factor: 4.466

Review 5.  Decoding neuroproteomics: integrating the genome, translatome and functional anatomy.

Authors:  Robert R Kitchen; Joel S Rozowsky; Mark B Gerstein; Angus C Nairn
Journal:  Nat Neurosci       Date:  2014-10-28       Impact factor: 24.884

6.  FusionPro, a Versatile Proteogenomic Tool for Identification of Novel Fusion Transcripts and Their Potential Translation Products in Cancer Cells.

Authors:  Chae-Yeon Kim; Keun Na; Saeram Park; Seul-Ki Jeong; Jin-Young Cho; Heon Shin; Min Jung Lee; Gyoonhee Han; Young-Ki Paik
Journal:  Mol Cell Proteomics       Date:  2019-06-17       Impact factor: 5.911

7.  Integrating Next-Generation Genomic Sequencing and Mass Spectrometry To Estimate Allele-Specific Protein Abundance in Human Brain.

Authors:  Thomas S Wingo; Duc M Duong; Maotian Zhou; Eric B Dammer; Hao Wu; David J Cutler; James J Lah; Allan I Levey; Nicholas T Seyfried
Journal:  J Proteome Res       Date:  2017-08-09       Impact factor: 4.466

Review 8.  Algorithms and design strategies towards automated glycoproteomics analysis.

Authors:  Han Hu; Kshitij Khatri; Joseph Zaia
Journal:  Mass Spectrom Rev       Date:  2016-01-04       Impact factor: 10.946

9.  Most highly expressed protein-coding genes have a single dominant isoform.

Authors:  Iakes Ezkurdia; Jose Manuel Rodriguez; Enrique Carrillo-de Santa Pau; Jesús Vázquez; Alfonso Valencia; Michael L Tress
Journal:  J Proteome Res       Date:  2015-03-11       Impact factor: 4.466

10.  Improved Protein Inference from Multiple Protease Bottom-Up Mass Spectrometry Data.

Authors:  Rachel M Miller; Robert J Millikin; Connor V Hoffmann; Stefan K Solntsev; Gloria M Sheynkman; Michael R Shortreed; Lloyd M Smith
Journal:  J Proteome Res       Date:  2019-08-23       Impact factor: 4.466

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.