| Literature DB >> 28747677 |
Jeonghun Yeom1,2, Shinyeong Ju1,3, YunJin Choi4, Eunok Paek4, Cheolju Lee5,6.
Abstract
Various forms of protein (proteoforms) are geneEntities:
Mesh:
Substances:
Year: 2017 PMID: 28747677 PMCID: PMC5529458 DOI: 10.1038/s41598-017-06314-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Outline of the Nrich method and the N-terminome discovery scheme. Proteins are labeled by D6-acetic anhydride or propionic anhydride to distinguish endogenous N-terminal acetylation from artificial N-terminal acetylation/propionylation. Followed by Filter Aided Sample Preparation (FASP) and digestion with trypsin or GluC endoprotease, internal peptides are depleted by using amine-reactive NHS beads. The enriched N-terminal peptides (red circle) are divided into 6 fractions by high pH reversed-phase fractionation. Then, all 6 fractions were subjected to LC-MS/MS analysis. The tandem MS spectra are initially searched against UniProtKB database with MS-GF+ and Comet search engines. Unidentified spectra are then selected to search for more diverse modifications using modification-specific search engine MODi. Unidentified spectra following a UniProtKB database search combined with exploration of the three search engines were then put into the same search workflow after replacing the conventional database with a customized novel database, NtermDB. All the identifications on UniProtKB database were named “N-termini”, while the novel identifications on NtermDB were named “Novel N-termini.”
Figure 2Properties of discovered N-terminome. (A) Number of PSMs identified on each set of Nrich experiment. In all cases, the trypsin-treated N-terminome have more PSMs than GluC-treated N-terminome. In the case of the N-blocking reagents, propionic anhydride (PA) showed higher efficiency by more than 10% compared to D6-acetic anhydride (D6). However, in the aspect of PSM counts, D6 treated samples had higher values than those of PA-treated samples. (B) Proportions of PSMs for endogenously acetylated N-termini (blue) and endogenously free N-termini (orange). About 44% of PSMs corresponded to acetylated N-termini. (C) Venn diagram of discovered protein N-termini according to different experimental setup. (D) Acetylation status of the discovered protein N-termini. The degree of acetylation was calculated based on the number of PSMs. (E) Protein N-termini discovered commonly between each pair of experimental setups and the correlation of degree of acetylation.
Figure 3Classification of discovered N-termini and their positions along the protein sequences. (A) Classification of protein N-termini. dbTIS: UniProtKB-annotated translation initiation site; Non-terminal residue: protein N-termini starting with the first, but not methionine residue in the UniProtKB database; Propeptide/Signal/Transit: protein N-termini arising after removal of pro-, signal-, or transit-peptide; putative aTIS: putative alternative translation initiation site. (B) A number of protein N-termini identified according to their positions along the protein sequences.
Figure 4Features of protein N-termini. (A) Distribution of acetylated N-termini and free N-termini according to the type of N-termini. The data are presented in percentage values, and the exact numbers of protein-N-termini are denoted within bars. (B) The amino acid frequency at the second residue of acetylated or free protein N-termini. ‘The second residue’ means the residue next to the initiator methionine. (C) The amino acid frequency at the flanking region of signal peptides and transit peptides. The protein sequence logos were generated using the iceLogo software package with correction for natural amino acid abundance. The red arrows indicate observed cleavage sites.
Classification of dbTIS1 protein N-termini according to the substrate type of N-α-terminal acetyltransferases in human (NATs).
| Type of NAT | Substrates | # of protein N-termini | # of PSM2 | Acetylated (%) | ||
|---|---|---|---|---|---|---|
| Acetylated | Free | Acetylated | Free | |||
| NatA | A | 875 | 67 | 52358 | 9212 | 85.0 |
| C | 7 | 1 | 935 | 20 | 97.9 | |
| G | 49 | 60 | 932 | 6441 | 12.6 | |
| S | 299 | 11 | 12967 | 951 | 93.2 | |
| T | 63 | 9 | 4375 | 1147 | 79.2 | |
| V | 34 | 63 | 1706 | 6570 | 20.6 | |
| D | 1 | 0 | 4942 | 257 | 95.1 | |
| E | 1 | 0 | 3861 | 301 | 92.8 | |
| Total | 1329 | 211 | 82076 | 24899 | 76.7 | |
| NatB | MD | 138 | 1 | 9206 | 567 | 94.2 |
| ME | 175 | 4 | 8877 | 811 | 91.6 | |
| MN | 78 | 5 | 2623 | 230 | 91.9 | |
| Total | 391 | 10 | 20706 | 1608 | 92.8 | |
| NatD | S | 3 | 0 | 432 | 166 | 72.2 |
| NatC/E/F | MI | 15 | 6 | 1053 | 870 | 54.8 |
| ML | 64 | 14 | 1525 | 2063 | 42.5 | |
| MF | 37 | 9 | 1351 | 293 | 82.2 | |
| MW | 6 | 4 | 40 | 103 | 28.0 | |
| MK | 41 | 49 | 762 | 8743 | 8.0 | |
| MA | 44 | 28 | 357 | 473 | 43.0 | |
| MM | 16 | 1 | 3051 | 145 | 95.5 | |
| MV | 40 | 16 | 3680 | 1599 | 69.7 | |
| Total | 263 | 127 | 11819 | 14289 | 45.3 | |
| Others –iMet3 removed | E | 9 | 7 | 30 | 122 | 19.7 |
| I | 1 | 0 | 1 | 0 | 100.0 | |
| K | 1 | 1 | 20 | 1 | 95.2 | |
| M | 7 | 5 | 755 | 146 | 83.8 | |
| N | 4 | 0 | 5 | 1 | 83.3 | |
| P | 41 | 129 | 4826 | 21029 | 18.7 | |
| Q | 1 | 3 | 1 | 3 | 25.0 | |
| D | 0 | 2 | 0 | 5 | 0 | |
| R | 0 | 1 | 0 | 5 | 0 | |
| Y | 0 | 1 | 0 | 1 | 0 | |
| Total | 64 | 149 | 5638 | 21313 | 20.9 | |
| Others –iMet retained | MC | 0 | 1 | 0 | 3 | 0 |
| MG | 8 | 4 | 27 | 92 | 22.7 | |
| MH | 6 | 2 | 245 | 125 | 66.2 | |
| MP | 19 | 10 | 620 | 954 | 39.4 | |
| MQ | 34 | 4 | 2013 | 3942 | 33.8 | |
| MR | 11 | 15 | 96 | 263 | 26.7 | |
| MS | 20 | 15 | 145 | 91 | 61.4 | |
| MT | 21 | 4 | 674 | 266 | 71.7 | |
| MY | 3 | 6 | 19 | 12 | 61.3 | |
| Total | 122 | 61 | 3839 | 5748 | 40.0 | |
1dbTIS: UniProtKB-annotated translation initiation sites, 2PSM: Peptide spectrum match, 3iMet: initiator Met.
Figure 5Putative alternative translational initiation sites. (A) iceLogo diagrams for amino-acid occurrences between dbTIS and putative aTIS. The amino acid frequencies after any methionine (either iMet or internal Met) in the human Swiss-Prot database (release 2015. 1) were determined for use as background correction. The sequences start immediately after methionine. (B) Nucleotide sequences at the flanking region of the initiator methionine residue. The central ATG is the codon for the initiator methionine of dbTIS (left) and putative aTIS (right). (C) The design of NtermDB. It is designed to allow a search for novel protein N-termini within an upstream UTR region. Orange blocks represent UTR regions, and green blocks represent coding sequence regions (CDS). Novel protein N-terminus was assumed to start at the start codon (“ATG”) or a pseudo start codon along the same frame as that of the matching CDS. We chose the farthest upstream (pseudo) start site and in-silico translated the transcript model. See methods for more details. (D) Codon usage in the identified novel N-termini. Nucleotide sequences corresponding to the first residue of the identified 5′-UTR peptides are presented. (E) Number of PSMs for acetylated or free N-terminal 5′-UTR peptides starting with non-start codons.
Figure 6Characterization of protein N-termini from the category of unknown processing. (A) Amino acid distributions at P1 and P1′ positions of protein N-termini identified at residue 3–65 (blue) and >65 (red) along the protein sequences. (B) A number of proteins according to the proportion of dbTIS PSMs. ‘1’ for x-axis value means that all PSMs are matched to dbTIS, ‘0’ means that the protein was identified only with PSMs corresponding to unknown processing category, and the intermediate values mean that the protein was identified with both type of PSMs. (C) Distribution of acetylated and free protein N-termini belonging to unknown processing category.