| Literature DB >> 28716134 |
Javier A Alfaro1,2, Alexandr Ignatchenko3, Vladimir Ignatchenko3, Ankit Sinha2, Paul C Boutros4,5,6, Thomas Kislinger7,8.
Abstract
BACKGROUND: Onco-proteogenomics aims to understand how changes in a cancer's genome influences its proteome. One challenge in integrating these molecular data is the identification of aberrant protein products from mass-spectrometry (MS) datasets, as traditional proteomic analyses only identify proteins from a reference sequence database.Entities:
Keywords: Integrative –omics; Mass-spectrometry-based mutant detection; Personalized proteomics; Protein mutant detection; Protein search databases; Proteoforms; Proteogenomics; Proteomics
Mesh:
Substances:
Year: 2017 PMID: 28716134 PMCID: PMC5514513 DOI: 10.1186/s13073-017-0454-9
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Fig. 1The detectable tryptic space of reference and variant human proteins. a Distribution of 2.9 million reference proteome tryptic peptides (length 6–35 amino-acids; including two possible trypsin missed cleavages) derived from four commonly used reference proteomes. Counts are represented using a log10 scale. Panels (b–f) use the prostate cancer cell-line PC-3 as an example. b Distribution of the 35,445 variant peptides that are also contained within at least one reference proteome. Y-axis covariate depicts the source of the variant. Color gradient indicates the percentage of the 35,446 variants that overlap with each reference using a log10 scale. c Numbers of protein variants in the nine major database variants used to search PC-3 proteomics data. Counts are in a log10 scale. d Total number of exome-seq derived variant peptides and their membership in other databases. Counts are in a log10 scale. e Total number of RNA-seq derived variant peptides and their membership in other databases. Counts are in a log10 scale. f Total number of peptides derived from various community-based databases and their redundancy with each other. Counts are in a log10 scale
Fig. 2Detection of variant proteins within the nine deep proteomes. a Numbers of unique variant peptides identified in tiers 1–4 using MS data from the nine deep proteomes. b Unique variant peptides identified within the prostate cancer cell-line PC3 across tiers 1–4 (log10 scale). c Heatmaps depicting the percent contribution of each database towards the total number of peptides identified for that tier in PC3. The number of peptides overlapping each database pair is provided as well. Color scale is in log10. d Total number of spectra, peptides, and unique mutations identified by tier. e Summary of peptides identified within the nine deep proteomes within sample-specific databases or within community-based databases (tiers 1–4). f Percentile score distribution summary by algorithm and tier. X-axis ranges from high scoring peptides (0’th percentile) to lower scoring peptides (100’th percentile). A similar figure using original e-value scores is depicted in Additional file 1: Figure S6. The distribution of peptide scores from a search against a standard UniProt database is shown in black. g Increasing the stringency of identifying a peptide influences the percentage of peptides present in community-based databases between tiers 1 and 2 more than moving to subsequent tiers. h When compared, tier 2 peptides tend to be higher ranked by 12% than tier 1 peptides; this improvement in peptide rank drops off quickly from tier 2 to tier 3 (4%) and tier 3 to tier 4 (1%)
Fig. 3Identification of cancer-related variant peptides. a Genome coverage of potentially detectable proteogenomic peptides (6–35 amino acids) within the generated search databases (bottom). Variant proteins identified at tier 2 within 59 shallow and nine deep proteomes have been summarized in black and gray, respectively (top). Black dots correspond to the locations of COSMIC cancer census genes and orange dots indicate those detected at tier 2. b Variants identified were assessed by the drug gene interaction [43] database to identify variants that might potentially be targetable or affect related pathways. Counts relate to the number of variant peptides identified in each category for tier 2 peptides. Only categories significantly enriched at p < 0.01 are depicted. c Variant peptides detected for CTTNB1. Mutation locations have been depicted in orange. Identification of reference peptides for the same protein are shown in blue, with an alignment describing the peptides detected. Bar plots illustrate the variants that were present in genomics for this gene (top) and all mutations present in community-based databases (bottom). d A tier 2 peptide identified for CTTNB1 showing clear coverage of y and b ions
Fig. 4Identification of fusion peptides. We identified several fusions of FUS to CREB3L2 of which there are 101 reported in the COSMIC database. a Of these 101 fusions, four were repeatedly identified across six cell-lines. b MS2 spectrum for one fusion peptide is displayed