Literature DB >> 20164058

Template proteogenomics: sequencing whole proteins using an imperfect database.

Natalie E Castellana1, Victoria Pham, David Arnott, Jennie R Lill, Vineet Bafna.   

Abstract

Database search algorithms are the primary workhorses for the identification of tandem mass spectra. However, these methods are limited to the identification of spectra for which peptides are present in the database, preventing the identification of peptides from mutated or alternatively spliced sequences. A variety of methods has been developed to search a spectrum against a sequence allowing for variations. Some tools determine the sequence of the homologous protein in the related species but do not report the peptide in the target organism. Other tools consider variations, including modifications and mutations, in reconstructing the target sequence. However, these tools will not work if the template (homologous peptide) is missing in the database, and they do not attempt to reconstruct the entire protein target sequence. De novo identification of peptide sequences is another possibility, because it does not require a protein database. However, the lack of database reduces the accuracy. We present a novel proteogenomic approach, GenoMS, that draws on the strengths of database and de novo peptide identification methods. Protein sequence templates (i.e. proteins or genomic sequences that are similar to the target protein) are identified using the database search tool InsPecT. The templates are then used to recruit, align, and de novo sequence regions of the target protein that have diverged from the database or are missing. We used GenoMS to reconstruct the full sequence of an antibody by using spectra acquired from multiple digests using different proteases. Antibodies are a prime example of proteins that confound standard database identification techniques. The mature antibody genes result from large-scale genome rearrangements with flexible fusion boundaries and somatic hypermutation. Using GenoMS we automatically reconstruct the complete sequences of two immunoglobulin chains with accuracy greater than 98% using a diverged protein database. Using the genome as the template, we achieve accuracy exceeding 97%.

Mesh:

Substances:

Year:  2010        PMID: 20164058      PMCID: PMC2877985          DOI: 10.1074/mcp.M900504-MCP200

Source DB:  PubMed          Journal:  Mol Cell Proteomics        ISSN: 1535-9476            Impact factor:   5.911


  31 in total

1.  De novo peptide sequencing via tandem mass spectrometry.

Authors:  V Dancík; T A Addona; K R Clauser; J E Vath; P A Pevzner
Journal:  J Comput Biol       Date:  1999 Fall-Winter       Impact factor: 1.479

2.  Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching.

Authors:  A Shevchenko; S Sunyaev; A Loboda; A Shevchenko; P Bork; W Ens; K G Standing
Journal:  Anal Chem       Date:  2001-05-01       Impact factor: 6.986

3.  Identification of protein modifications using MS/MS de novo sequencing and the OpenSea alignment algorithm.

Authors:  Brian C Searle; Surendra Dasari; Phillip A Wilmarth; Mark Turner; Ashok P Reddy; Larry L David; Srinivasa R Nagalla
Journal:  J Proteome Res       Date:  2005 Mar-Apr       Impact factor: 4.466

4.  PepNovo: de novo peptide sequencing via probabilistic network modeling.

Authors:  Ari Frank; Pavel Pevzner
Journal:  Anal Chem       Date:  2005-02-15       Impact factor: 6.986

5.  Identification of post-translational modifications by blind search of mass spectra.

Authors:  Dekel Tsur; Stephen Tanner; Ebrahim Zandi; Vineet Bafna; Pavel A Pevzner
Journal:  Nat Biotechnol       Date:  2005-11-27       Impact factor: 54.908

6.  Automated protein (re)sequencing with MS/MS and a homologous database yields almost full coverage and accuracy.

Authors:  Xiaowen Liu; Yonghua Han; Denis Yuen; Bin Ma
Journal:  Bioinformatics       Date:  2009-06-17       Impact factor: 6.937

7.  Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes.

Authors:  Nitin Gupta; Jamal Benhamida; Vipul Bhargava; Daniel Goodman; Elisabeth Kain; Ian Kerman; Ngan Nguyen; Noah Ollikainen; Jesse Rodriguez; Jian Wang; Mary S Lipton; Margaret Romine; Vineet Bafna; Richard D Smith; Pavel A Pevzner
Journal:  Genome Res       Date:  2008-04-21       Impact factor: 9.043

8.  IMGT, the international ImMunoGeneTics database.

Authors:  M P Lefranc; V Giudicelli; C Ginestoux; J Bodmer; W Müller; R Bontrop; M Lemaitre; A Malik; V Barbié; D Chaume
Journal:  Nucleic Acids Res       Date:  1999-01-01       Impact factor: 16.971

9.  dbEST--database for "expressed sequence tags".

Authors:  M S Boguski; T M Lowe; C M Tolstoshev
Journal:  Nat Genet       Date:  1993-08       Impact factor: 38.330

10.  Novel peptide identification from tandem mass spectra using ESTs and sequence database compression.

Authors:  Nathan J Edwards
Journal:  Mol Syst Biol       Date:  2007-04-17       Impact factor: 11.429

View more
  19 in total

1.  Combining phage display with de novo protein sequencing for reverse engineering of monoclonal antibodies.

Authors:  Keith W Rickert; Luba Grinberg; Robert M Woods; Susan Wilson; Michael A Bowen; Manuel Baca
Journal:  MAbs       Date:  2016       Impact factor: 5.857

Review 2.  Proteogenomics to discover the full coding content of genomes: a computational perspective.

Authors:  Natalie Castellana; Vineet Bafna
Journal:  J Proteomics       Date:  2010-07-08       Impact factor: 4.044

3.  Shotgun protein sequencing with meta-contig assembly.

Authors:  Adrian Guthals; Karl R Clauser; Nuno Bandeira
Journal:  Mol Cell Proteomics       Date:  2012-07-13       Impact factor: 5.911

4.  Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq.

Authors:  Gloria M Sheynkman; Michael R Shortreed; Brian L Frey; Lloyd M Smith
Journal:  Mol Cell Proteomics       Date:  2013-04-29       Impact factor: 5.911

Review 5.  The spectral networks paradigm in high throughput mass spectrometry.

Authors:  Adrian Guthals; Jeramie D Watrous; Pieter C Dorrestein; Nuno Bandeira
Journal:  Mol Biosyst       Date:  2012-10

6.  Resurrection of a clinical antibody: template proteogenomic de novo proteomic sequencing and reverse engineering of an anti-lymphotoxin-α antibody.

Authors:  Natalie E Castellana; Krista McCutcheon; Victoria C Pham; Kristin Harden; Allen Nguyen; Judy Young; Camellia Adams; Kurt Schroeder; David Arnott; Vineet Bafna; Jane L Grogan; Jennie R Lill
Journal:  Proteomics       Date:  2011-01-05       Impact factor: 3.984

7.  Automated Antibody De Novo Sequencing and Its Utility in Biopharmaceutical Discovery.

Authors:  K Ilker Sen; Wilfred H Tang; Shruti Nayak; Yong J Kil; Marshall Bern; Berk Ozoglu; Beatrix Ueberheide; Darryl Davis; Christopher Becker
Journal:  J Am Soc Mass Spectrom       Date:  2017-01-19       Impact factor: 3.109

8.  Peppy: proteogenomic search software.

Authors:  Brian A Risk; Wendy J Spitzer; Morgan C Giddings
Journal:  J Proteome Res       Date:  2013-05-06       Impact factor: 4.466

9.  Sequencing-grade de novo analysis of MS/MS triplets (CID/HCD/ETD) from overlapping peptides.

Authors:  Adrian Guthals; Karl R Clauser; Ari M Frank; Nuno Bandeira
Journal:  J Proteome Res       Date:  2013-05-30       Impact factor: 4.466

10.  Top-down analysis of protein samples by de novo sequencing techniques.

Authors:  Kira Vyatkina; Si Wu; Lennard J M Dekker; Martijn M VanDuijn; Xiaowen Liu; Nikola Tolić; Theo M Luider; Ljiljana Paša-Tolić; Pavel A Pevzner
Journal:  Bioinformatics       Date:  2016-05-14       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.