| Literature DB >> 28837811 |
Christoph N Schlaffner1, Georg J Pirklbauer2, Andreas Bender3, Jyoti S Choudhary2.
Abstract
Current tools for visualization and integration of proteomics with other omics datasets are inadequate for large-scale studies and capture only basic sequence identity information. Furthermore, the frequent reformatting of annotations for reference genomes required by these tools is known to be highly error prone. We developed PoGo for mapping peptides identified through mass spectrometry to overcome these limitations. PoGo reduced runtime and memory usage by 85% and 20%, respectively, and exhibited overall superior performance over other tools on benchmarking with large-scale human tissue and cancer phosphoproteome datasets comprising ∼3 million peptides. In addition, extended functionality enables representation of single-nucleotide variants, post-translational modifications, and quantitative features. PoGo has been integrated in established frameworks such as the PRIDE tool suite and OpenMS, as well as a standalone tool with user-friendly graphical interface. With the rapid increase of quantitative high-resolution datasets capturing proteomes and global modifications to complement orthogonal genomics platforms, PoGo provides a central utility enabling large-scale visualization and interpretation of transomics datasets.Entities:
Keywords: annotation; genome browser; genomics; large-scale; mapping; open-source software; proteogenomics; proteomics; track hubs; visualization
Mesh:
Year: 2017 PMID: 28837811 PMCID: PMC5571441 DOI: 10.1016/j.cels.2017.07.007
Source DB: PubMed Journal: Cell Syst ISSN: 2405-4712 Impact factor: 10.304
Figure 1Schema of the PoGo Algorithm for Mapping Peptides through Proteins to Genomic Loci
(A) Transcript annotation (GTF) and translated sequences (FASTA) form the reference input for PoGo. Standardized proteomics formats are converted into proprietary tab-separated format with minimal peptide information. All four output formats of PoGo contain genomic alignment supplemented with specifications for uniqueness of mappings, quantitative information, and post-translational modifications.
(B) Annotated protein coding transcripts in GTF format and respective translated protein sequences in FASTA format are integrated by PoGo through intermediate coordinates (turquoise), representing the exonic structure of the transcript within the protein.
(C) Peptides, identified through searching mass spectrometry data against the protein sequence database, are mapped against the proteins (see also Figure S4). The position within the proteins then allows retrieval of overlapping coding exons and enables the calculation of the exact genomic coordinates.
(D) Example mappings of PoGo for the overlapping repeat peptide VPEPGCTKVPEPGCTK in a genome browser (0 mismatches). Application of PoGo allowing for up to two mismatches results in identification of two additional repeats (1 and 2 mismatches, red boxes; see also Figures S1, S5, and S6). The additional mappings of the initial peptide sequence were validated through peptides of the exact sequence identified in the same mass spectrometry experiment (validation). Leucine (L) and isoleucine (I) are substituted through their common single-letter code “J.”
(E) Comparison of different peptide-to-genome mapping tools with regard to reference sequence type, integration into frameworks, support of online and offline genome browsers (blue). Additional features (orange) indicate the superior performance of PoGo over other tools.
Figure 2Visualization in the Integrative Genomics Viewer of Different PoGo Output Formats for the Peptide IADPEHDHTGFLTEYVATR within the MAPK3 Gene
Genomic coordinates are shown at the top as the x axis. GENCODE (v20) annotations of transcripts are indicated in blue.
(A) In addition to the genomic location of the peptide, the GTF format also holds other information, such as the gene name and gene identifier, while the BED output visualizes uniqueness of the mapping across the genome. Here, the red color indicates unique mapping to a single transcript of MAPK3.
(B) Genomic loci of post-translational modifications within a peptide; here, phosphorylation identified by brackets in the sequence, are depicted by thick blocks spanning from the first and last modification site. The red color in this output format indicates the presence of phosphorylation (see also Table S1).
(C) View of log2-fold changes mapped for the example peptide to the genomic location across 69 ovarian cancer samples (y axis). High values are shown in red while blue indicates low log2 ratios (see also Figure S7).
| REAGENT or RESOURCES | SOURCE | IDENTIFIER |
|---|---|---|
| Ensembl Human Genome Primary Assembly, release 76 | ||
| GENCODE, release 20 | ||
| Reanalysis of draft human proteome maps | PRIDE: | |
| Phosphopeptide summary, Ovarian Cancer, CPTAC, Phase 2 | ||
| Draft human proteome maps track hubs | this paper | |
| PoGo website | This paper | |
| PoGo | This paper | |
| PoGo GUI | This paper | |
| FileConverter | This paper | |
| Track-Hub Generator | This paper | |
| Perl 5.16.2 | The Perl Programming Language | |
| R 3.3.1 | The R project | |
| GNU C++ compiler (gcc) 6.2.0 | GNU Compiler Collection | |
| Microsoft C/C++ Optimizing Compiler 18.00.31101 | Visual Studio Express 2013 | |
| PGx | ||
| iPiG | ||
| fetchChromSizes.sh | UCSC Genome Bioinformatics | |
| bedToBigBed | UCSC Genome Bioinformatics | |
| Integrative Genomics Viewer (IGV) v2.3.68 | ||
| UCSC Genome Browser | UCSC Genome Bioinformatics | |
| Ensembl Genome Browser | Ensembl Archives | |
| BioDalliance Genome Browser | GENCODE | |