| Literature DB >> 25301683 |
Pratik D Jagtap1, James E Johnson, Getiria Onsongo, Fredrik W Sadler, Kevin Murray, Yuanbo Wang, Gloria M Shenykman, Sricharan Bandhakavi, Lloyd M Smith, Timothy J Griffin.
Abstract
Proteogenomics combines large-scale genomic and transcriptomic data with mass-spectrometry-based proteomic data to discover novel protein sequence variants and improve genome annotation. In contrast with conventional proteomic applications, proteogenomic analysis requires a number of additional data processing steps. Ideally, these required steps would be integrated and automated via a single software platform offering accessibility for wet-bench researchers as well as flexibility for user-specific customization and integration of new software tools as they emerge. Toward this end, we have extended the Galaxy bioinformatics framework to facilitate proteogenomic analysis. Using analysis of whole human saliva as an example, we demonstrate Galaxy's flexibility through the creation of a modular workflow incorporating both established and customized software tools that improve depth and quality of proteogenomic results. Our customized Galaxy-based software includes automated, batch-mode BLASTP searching and a Peptide Sequence Match Evaluator tool, both useful for evaluating the veracity of putative novel peptide identifications. Our complex workflow (approximately 140 steps) can be easily shared using built-in Galaxy functions, enabling their use and customization by others. Our results provide a blueprint for the establishment of the Galaxy framework as an ideal solution for the emerging field of proteogenomics.Entities:
Keywords: customized database generation; peptide corresponding to a novel proteoform; peptide-spectral match evaluation; proteogenomics; salivary proteins; workflows
Mesh:
Substances:
Year: 2014 PMID: 25301683 PMCID: PMC4261978 DOI: 10.1021/pr500812t
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 4.466
Figure 2Overview of components of the Peptide Spectrum Match Evaluation tool. Screenshot of the PSME tool within Galaxy-P showing (a) user interface for setting parameters for PSM evaluation, (b) Tabular format output from the PSME tool, (c) HTML output from the PSME tool, and (d) interactive spectral annotation that can be used to visualize PSMs before further evaluation.
Figure 3Screenshot of a peptide corresponding to a novel proteoform within Integrated Genomic Viewer. View is a zoomed-in screenshot of chromosome 12, which shows the orientation of expression, amino acid sequences within three frames of translation, and reference files in the tracks and amino acid sequence of the identified peptide corresponding to a novel proteoform.
Figure 1Overview of modules and subworkflows comprising the Galaxy-based proteogenomic analysis workflow.
Summary of Genomic Organization of Peptides Corresponding to Novel Proteoforms
| genomic rearrangements | peptides | chromosome location(s) |
|---|---|---|
| alternate frame | 26 | 1,3,5,7, 8, 9, 11, 12, 14, 16, and 19 |
| untranslated region | 15 | 2, 4, 6, 7, 8, 11, 12, 13, 14, and 19 |
| pseudogenes | 6 | 1, 3, 6, 14, 19, and X |
| intronic region | 2 | 12 and 16 |
| novel exon junctions | 2 | 15 and 17 |
| antisense | 1 | 8 |
Figure 4Representation of organization of identified peptides corresponding to a novel proteoform from PRB1 and PRB2 genes on chromosome 12. View is a zoomed-in screenshot of chromosome 12, which shows the orientation of expression, amino acid sequences within three frames of translation, reference files in the tracks, and amino acid sequence of the identified peptide corresponding to a novel proteoform. The red arrows indicate the direction and amino acid sequence (from amino-terminal to carboxy-terminal) of the identified peptides. A red asterisk indicates a stop codon in the normal coding frame. Block arrows in red indicate multiple distinct peptides identified during the proteogenomic analysis.