| Literature DB >> 32456206 |
Ekaterina Poverennaya1, Olga Kiseleva1, Ekaterina Ilgisonis1, Svetlana Novikova1, Arthur Kopylov1, Yuri Ivanov1, Alexei Kononikhin2,3, Mikhail Gorshkov4,5, Nikolay Kushlinskii6, Alexander Archakov1, Elena Ponomarenko1.
Abstract
Despite direct or indirect efforts of the proteomic community, the fraction of blind spots on the protein map is still significant. Almost 11% of human genes encode missing proteins; the existence of which proteins is still in doubt. Apparently, proteomics has reached a stage when more attention and curiosity need to be exerted in the identification of every novel protein in order to expand the unusual types of biomaterials and/or conditions. It seems that we have exhausted the current conventional approaches to the discovery of missing proteins and may need to investigate alternatives. Here, we present an approach to deciphering missing proteins based on the use of non-standard methodological solutions and encompassing diverse MS/MS data, obtained for rare types of biological samples by members of the Russian Proteomic community in the last five years. These data were re-analyzed in a uniform manner by three search engines, which are part of the SearchGUI package. The study resulted in the identification of two missing and five uncertain proteins detected with two peptides. Moreover, 149 proteins were detected with a single proteotypic peptide. Finally, we analyzed the gene expression levels to suggest feasible targets for further validation of missing and uncertain protein observations, which will fully meet the requirements of the international consortium. The MS data are available on the ProteomeXchange platform (PXD014300).Entities:
Keywords: Chromosome-Centric Human Proteome Project (C-HPP); human proteome; mass spectrometry; missing proteins; neXtProt; proteotypic peptide; uncertain proteins
Year: 2020 PMID: 32456206 PMCID: PMC7356824 DOI: 10.3390/proteomes8020012
Source DB: PubMed Journal: Proteomes ISSN: 2227-7382
Figure 1Dynamics of the changes in the number of entries according to neXtProt (2011–2019 years): (a) the blue color indicates the total number of entries (number of protein-coding genes); (b) the red color indicates the number of missing-protein entries (PE2+PE3+PE4), and (c) the green, purple and blue colors indicate the number of uncertain (PE5), new and deleted entries, respectively.
Distribution of the PE2, PE3, PE4, and PE5 proteins among TCGA entries.
| Category of Biomaterial, Where Gene of Interest Was Observed | Total Number | Missing Proteins | Uncertain Proteins (PE5) | ||
|---|---|---|---|---|---|
| PE2 | PE3 | PE4 | |||
| All biomaterials | 9542 | 311 | 10 | 6 | 79 |
| Part of biomaterials | 3074 | 429 | 41 | 8 | 56 |
| Normal or tumor biomaterials * | 161 | 46 | 15 | 0 | 3 |
| Only normal | 274 | 55 | 36 | 3 | 4 |
| Only cancer | 58 | 12 | 8 | 0 | 4 |
| Total | 13,109 | 853 | 110 | 17 | 146 |
* This category included genes specifically observed in the normal or tumor states of different types of biomaterial, but not in both states of the chosen tissue.
Figure 2Venn diagrams: intersection of proteins cleaved by different proteases with (a) no proteotypic peptides at all, and (b) one unique peptide. (c) Histograms of the frequencies of the detection of proteotypic peptides, according to GPMdb. The "no peptides" group corresponds to proteins without even theoretically unique peptides, "0" means that there is no experimental evidence of theoretical proteotypic peptides, and other numbers (1, 5, 10, etc.) mean that this number of proteotypic peptides was detected in a number of cases, illustrated by the height of the corresponding column.
List of missing and uncertain proteins identified with two peptides.
| # | AC | Gene | Number of Samples | Number of Unique Detectable Tryptic Peptides | ||
|---|---|---|---|---|---|---|
| Theoretically | Observed | Observed | ||||
| Missing proteins | ||||||
| 1 | P22532 | SPRR2D | 10 | 1 | 1 | 1/0 |
| 2 | A0A087WSY6 | IGKV3D-15 | 3 | 1 | 1 | 1/0 |
| Uncertain proteins | ||||||
| 3 | Q58FF3 | HSP90B2P | 1 | 10 | 3 | 1/2 |
| 4 | Q58FG1 | HSP90AA4P | 1 | 14 | 13 | 7/5 |
| 5 | Q9BYX7 | POTEKP | 3 | 8 | 8 | 5/5 |
| 6 | Q9BZK3 | NACA4P | 1 | 5 | 5 | 4/4 |
| 7 | Q9H853 | TUBA4B | 35 | 9 | 4 | 2/4 |
Figure 3Mass-spectra of the proteotypic peptide characteristic for the Q8NG97 protein, detected in four biosamples for the first time.
Figure 4Mass-spectra of the detected proteotypic peptides for (a) Q96HZ4-2 and (b) Q96HZ4-3.