| Literature DB >> 34904638 |
Husen M Umer1, Enrique Audain2, Yafeng Zhu3, Julianus Pfeuffer4, Timo Sachsenberg5, Janne Lehtiö1, Rui Branca1, Yasset Perez-Riverol6.
Abstract
SUMMARY: We have implemented the pypgatk package and the pgdb workflow to create proteogenomics databases based on ENSEMBL resources. The tools allow the generation of protein sequences from novel protein-coding transcripts by performing a three-frame translation of pseudogenes, lncRNAs, and other non-canonical transcripts, such as those produced by alternative splicing events. It also includes exonic out-of-frame translation from otherwise canonical protein-coding mRNAs. Moreover, the tool enables the generation of variant protein sequences from multiple sources of genomic variants including COSMIC, cBioportal, gnomAD, and mutations detected from sequencing of patient samples. pypgatk and pgdb provide multiple functionalities for database handling including optimized target/decoy generation by the algorithm DecoyPyrat. Finally, we have reanalyzed six public datasets in PRIDE by generating cell-type specific databases for 65 cell lines using the pypgatk and pgdb workflow, revealing a wealth of non-canonical or cryptic peptides amounting to more than 5% of the total number of peptides identified. AVAILABILITY: The software is freely available. pypgatk: (https://github.com/bigbio/py-pgatk/), and pgdb: (https://nf-co.re/pgdb). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.Entities:
Year: 2021 PMID: 34904638 PMCID: PMC8825679 DOI: 10.1093/bioinformatics/btab838
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.(a) pypgatk and pgdb components to generate ENSEMBL-based proteogenomics databases. (b) Reanalyzed datasets (four human and two mice); number of identified canonical, non-canonical, variant and mutated peptides identified using cell-type specific proteogenomics databases
Number of peptides identified per class
| Species | Class | #PSMs | #Peptide sequences | #Novel peptides |
|---|---|---|---|---|
| Homo sapiens | Canonical | 4 125 497 | 322 967 | NA |
| Non-canonical | 315 085 | 74 001 | 43 501 | |
| Mutated | 16 518 | 5544 | 786 | |
| Mus musculus | Canonical | 1 159 049 | 105 338 | NA |
| Variant | 4630 | 1928 | 374 | |
| Mutated | 2883 | 913 | 166 |