| Literature DB >> 35290070 |
Richard D LeDuc1, Eric W Deutsch2, Pierre-Alain Binz3, Ryan T Fellers1, Anthony J Cesnik4,5,6, Joshua A Klein7, Tim Van Den Bossche8,9, Ralf Gabriels8,9, Arshika Yalavarthi1, Yasset Perez-Riverol10, Jeremy Carver, Wout Bittremieux, Shin Kawano11,12, Benjamin Pullman, Nuno Bandeira, Neil L Kelleher1, Paul M Thomas1, Juan Antonio Vizcaíno10.
Abstract
It is important for the proteomics community to have a standardized manner to represent all possible variations of a protein or peptide primary sequence, including natural, chemically induced, and artifactual modifications. The Human Proteome Organization Proteomics Standards Initiative in collaboration with several members of the Consortium for Top-Down Proteomics (CTDP) has developed a standard notation called ProForma 2.0, which is a substantial extension of the original ProForma notation developed by the CTDP. ProForma 2.0 aims to unify the representation of proteoforms and peptidoforms. ProForma 2.0 supports use cases needed for bottom-up and middle-/top-down proteomics approaches and allows the encoding of highly modified proteins and peptides using a human- and machine-readable string. ProForma 2.0 can be used to represent protein modifications in a specified or ambiguous location, designated by mass shifts, chemical formulas, or controlled vocabulary terms, including cross-links (natural and chemical) and atomic isotopes. Notational conventions are based on public controlled vocabularies and ontologies. The most up-to-date full specification document and information about software implementations are available at http://psidev.info/proforma.Entities:
Keywords: FAIR; ProForma; data standards; file formats; mass spectrometry; peptidoform; proteoform; top-down proteomics
Mesh:
Substances:
Year: 2022 PMID: 35290070 PMCID: PMC7612572 DOI: 10.1021/acs.jproteome.1c00771
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 4.466
Figure 1Representation of the same N-terminal segment (sharing the same amino acid sequence) of two hypothetical proteoforms using ProForma 2.0: the unmodified proteoform (top part of the figure) and one containing different protein modifications (lower part of the figure). The text coloration is only included here to improve clarity. The purple tag encodes the existence of an unlocalized phosphorylation event somewhere on the proteoform. The keyword “Phospho” is from Unimod and can be used without additional clarification. The brown tag is a reference to an N-terminal modification using the term ”Acetyl” from Unimod. A 174.3-Da mass shift on the arginine is also indicated.
Comparison of the supported features of ProForma 1.0 and 2.0.
| Feature | ProForma 1.0 | ProForma 2.0 |
|---|---|---|
| Protein modifications designated by CV/ontology names and accession numbers | ✓ | ✓ |
| Representation of glycan composition | ✓ | ✓ |
|
| ✓ | ✓ |
| Delta mass notation for modifications | ✓ | ✓ |
| Information tag | ✓ | ✓ |
| Joint representation of experimental data and interpretation | ✓ | ✓ |
|
| Limited | ✓ |
|
| Limited | ✓ |
|
| X | ✓ |
|
| X | ✓ |
|
| X | ✓ |
|
| X | ✓ |
|
| X | ✓ |
|
| X | ✓ |
|
| X | ✓ |
|
| X | ✓ |
|
| X | ✓ |
|
| X | ✓ |
|
| X | ✓ |
|
| X | ✓ |
|
| X | ✓ |
|
| X | ✓ |
|
| X | ✓ |
|
| X | ✓ |
Examples of ProForma 2.0 notations demonstrating the various features of the specification. For each feature listed in the first column, there is a representative example in the second column showing the encoding. The “Section” column provides the location in the PSI specification document where the feature is explained in detail (Supplementary Document 1).
| Feature | Example | Section |
|---|---|---|
| CV/ontology modification names | EM[Oxidation]EVEES[Phospho]PEK | 4.2.1 |
| CV/ontology protein modification accession numbers | EM[MOD:00719]EVEES[MOD:00046]PEK | 4.2.2 |
| Cross-link within the same peptide | EMEVTK[XLMOD:02001#XL1]SESPEK[#XL1] | 4.2.3.1 |
| Inter-chain cross-links | SEK[XLMOD:02001#XL1]UENCE//EMEVTK[#XL1]SESPEK | 4.2.3.2 |
| Disulfide linkages | EVTSEKC[MOD:00034#XL1]LEMSC[#XL1]EFD | 4.2.3.3 |
| Branched peptides | ETFGD[MOD:00093#BRANCH]//R[#BRANCH]ATER | 4.2.4 |
| Glycans using the GNO ontology as CV | NEEYN[GNO:G59626AS]K | 4.2.5 |
| Delta mass notation for modifications | EM[+15.9949]EVEES[+79.9663]PEK | 4.2.6 |
| Specifying a gap of known mass | RTAAX[+367.0537]WT | 4.2.7 |
| Support for elemental formulas | SEQUEN[Formula:C12H20O2]CE | 4.2.8 |
| Glycan composition | SEQUEN[Glycan:HexNAc1Hex2]CE | 4.2.9 |
|
| [iTRAQ4plex]-EMEVNESPEK | 4.3.1 |
| Labile modifications | {Glycan:Hex}EMEVNESPEK | 4.3.2 |
| Unknown modification position | [Phospho]?EMEVTSESPEK | 4.4.1 |
| Possible set of modification positions | EMEVT[#g1]S[#g1]ES[Phospho#g1]PEK | 4.4.2 |
| Ranges of positions for the modifications | PROT(ESFRMS)[+19.0523]ISK | 4.4.3 |
| Modification position preference and localization scores | EMEVT[#g1(0.01)]S[#g1(0.09)]ES[Phospho#g1(0.90)]PEK | 4.4.4 |
| Scoring for ranges of positions for a modification | PROT(ESFRMS)[+19.0523#g1(0.01)]ISK[#g1(0.99)] | 4.4.5 |
| Isotopes | <13C>ATPEILTVNSIGQLK | 4.6.1 |
| Fixed protein modifications | <[MOD:01090]@C>ATPEILTCNSIGCLK | 4.6.2 |
| Ambiguity in the order of the amino acid sequence | (?DQ)NGTWEMESNENFEGYMK | 4.7 |
| Information tag | ELVIS[Phospho|INFO:newly discovered]K | 4.8 |
| Joint representation of experimental data and interpretation | ELVIS[Phospho|Obs:+79.978]K | 4.9 |
| Representation of ion charges | EMEVEESPEK/2 | 7.1 |
| Multiple peptidoforms assigned to chimeric spectra | EMEVEESPEK/2+ELVISLIVER/3 | 7.1 |