| Literature DB >> 26010509 |
Nikolaus Fortelny1,2,3,4, Paul Pavlidis4,5, Christopher M Overall1,2,3.
Abstract
Almost all regulatory processes in biology ultimately lead to or originate from modifications of protein function. However, it is unclear to which extent each mechanism of regulation actually affects proteins and thus phenotypes. We assessed the extent of N-terminal protein truncation in a global analysis of N-terminomics data and find that most proteins have N-terminally truncated proteoforms. Because N-terminomics analyses do not identify the process generating the identified N-termini, we compared identified termini to the three N-termini generating events: protein cleavage, alternative translation, and alternative splicing. Of these, we sought to identify the most likely cause of N-terminal protein truncations in the human proteome. We found that protease cleavage and alternative protein translation are the likely cause for most shortened proteoforms. However, the vast majority (about 90%) of N-termini remain unexplained by any of these processes identified to date, so revealing large gaps in our knowledge of protein termini and their genesis. Further analysis and annotation of terminomics data is required, to which end we have created the TopFIND database, a major systematic annotation effort for protein termini. We outline the new features in version 3.0 of the updated database and the new bioinformatics tools available and encourage submission of generated data to fill current knowledge gaps.Entities:
Keywords: Alternative translation; Protease cleavage; Protease web; Systems biology; TAILS; Terminomics; TopFIND
Mesh:
Substances:
Year: 2015 PMID: 26010509 PMCID: PMC4736679 DOI: 10.1002/pmic.201500043
Source DB: PubMed Journal: Proteomics ISSN: 1615-9853 Impact factor: 3.984
Figure 1(A) Overlap between observed (blue) and inferred human N‐termini in TopFIND. Inferred N‐termini include those that are predicted from knowledge of sites of cleavage, alternative splicing, or alternative translation. N‐termini are counted by position, i.e. an N‐terminus identified multiple times in different experiments at the same position is only counted once. (B) Position of the 7409 observed and internal N‐termini clusters in proteins relative to the full protein length. (C) Overlap between inferred and observed (blue) N‐termini clusters in TopFIND. The arrow points to the systematic breakdown of the 933 observed and inferred N‐termini clusters to the biological processes generating those N‐termini.
Internal N‐termini (position > 3) observed and explained in the individual datasets used in this study
| Reference | Source | Method | Alternative translation | Alternative splicing | Cleavage | Total |
|---|---|---|---|---|---|---|
| Crawford 2013 | Cell lines | Subtiligase | 9 | 35 | 426 | 7402 |
| Lange 2014 | Erythrocytes | N‐TAILS | 1 | 5 | 16 | 763 |
| Mahrus 2008 | Cell lines | Subtiligase | 4 | 6 | 103 | 1228 |
| Van Damme 2010 | Cell lines | COFRADIC | 0 | 0 | 0 | 0 |
| Wildes 2010 | Blood plasma | Subtiligase | 0 | 0 | 41 | 532 |
| Bienvenut 2012 | Cell lines | SCX | 1 | 2 | 0 | 38 |
Figure 2Fraction of explained internal N‐termini in the individual datasets analyzed and the processes identified in each dataset.