| Literature DB >> 35521579 |
P Boomathi Pandeswari1, Varatharajan Sabareesh1.
Abstract
Owing to rapid growth in the elucidation of genome sequences of various organisms, deducing proteome sequences has become imperative, in order to have an improved understanding of biological processes. Since the traditional Edman method was unsuitable for high-throughput sequencing and also for N-terminus modified proteins, mass spectrometry (MS) based methods, mainly based on soft ionization modes: electrospray ionization and matrix-assisted laser desorption/ionization, began to gain significance. MS based methods were adaptable for high-throughput studies and applicable for sequencing N-terminus blocked proteins/peptides too. Consequently, over the last decade a new discipline called 'proteomics' has emerged, which encompasses the attributes necessary for high-throughput identification of proteins. 'Proteomics' may also be regarded as an offshoot of the classic field, 'biochemistry'. Many protein sequencing and proteomic investigations were successfully accomplished through MS dependent sequence elucidation of 'short proteolytic peptides (typically: 7-20 amino acid residues), which is called the 'shotgun' or 'bottom-up (BU)' approach. While the BU approach continues as a workhorse for proteomics/protein sequencing, attempts to sequence intact proteins without proteolysis, called the 'top-down (TD)' approach started, due to ambiguities in the BU approach, e.g., protein inference problem, identification of proteoforms and the discovery of posttranslational modifications (PTMs). The high-throughput TD approach (TD proteomics) is yet in its infancy. Nevertheless, TD characterization of purified intact proteins has been useful for detecting PTMs. With the hope to overcome the pitfalls of BU and TD strategies, another concept called the 'middle-down (MD)' approach was put forward. Similar to BU, the MD approach also involves proteolysis, but in a restricted manner, to produce 'longer' proteolytic peptides than the ones usually obtained in BU studies, thereby providing better sequence coverage. In this regard, special proteases (OmpT, Sap9, IdeS) have been used, which can cleave proteins to produce longer proteolytic peptides. By reviewing ample evidences currently existing in the literature that is predominantly on PTM characterization of histones and antibodies, herein we highlight salient features of the MD approach. Consequently, we are inclined to claim that the MD concept might have widespread applications in future for various research areas, such as clinical, biopharmaceuticals (including PTM analysis) and even for general/routine characterization of proteins including therapeutic proteins, but not just limited to analysis of histones or antibodies. This journal is © The Royal Society of Chemistry.Entities:
Year: 2019 PMID: 35521579 PMCID: PMC9059502 DOI: 10.1039/c8ra07200k
Source DB: PubMed Journal: RSC Adv ISSN: 2046-2069 Impact factor: 4.036
Scheme 1Overview of central dogma of molecular biology in (a) eukaryotic and (b) prokaryotic biological systems. This scheme highlights the importance of and the need for proteomics research, in order to correlate protein sequence information with the RNA and DNA sequence. In the case of eukaryotic system, proteomics is essential for elucidation of posttranslational modifications (PTMs), e.g., P: phosphorylation; Ac: acetylation; sugar: glycosylation; OH: hydroxylation.
Scheme 2Illustration of the fundamental criteria of three different approaches for analysis of proteins or proteomes.
Fig. 1LC-ESI mass spectra of tryptic peptides: (a) Residue No. [40–61] (22 a.a. residues long); (b) Residue No. [67–85] (19 a.a. residues long) and (c) Residue No. [105–124] (20 a.a. residues long) from carbamidomethylated RNase A (Bovine pancreas). These data were acquired on an ESI-Q/TOF mass spectrometer (6540 Ultra High Definition Accurate-Mass Q-TOF LC/MS attached to 1290 Infinity LC; Agilent Technologies). Note: C* refers to carbamidomethyl cysteine.
Fig. 2LC-ESI mass spectra of GluC digested peptides: (a) Residue No. [10–49] (40 a.a. residues long) and (b) Residue No. [50–86] (37 a.a. residues long) from carbamidomethylated RNase A (Bovine pancreas). These data were acquired on an ESI-Q/TOF mass spectrometer (6540 Ultra High Definition Accurate-Mass Q-TOF LC/MS attached to 1290 Infinity LC; Agilent Technologies). Note: C* refers to carbamidomethyl cysteine.
Fig. 3In silico proteolysis of 15 representative proteins (see Table S1, supplementary information†) using four different proteases: (1) trypsin, (2) LysC, (3) GluC and (4) AspN. Comparison of population of proteolytic peptides of different lengths obtained from ‘complete proteolysis (0-MC)’ and ‘limited proteolysis (1-MC)’. Based on their length (no. of a.a.r), the peptides have been classified into five different categories: (1) 5–15 a.a.r, (2) 16–25 a.a.r, (3) 26–35 a.a.r, (4) 36–45 a.a.r and (5) 46–55 a.a.r.
Fig. 4In silico proteolysis of 15 representative proteins (see Table S1, supplementary information†) using four different proteases: (1) trypsin, (2) LysC, (3) GluC and (4) AspN. Distribution of population of proteolytic peptides of different lengths obtained from ‘2-missed cleavage (2-MC) limited proteolysis’. Based on their length (no. of a.a.r), the peptides have been classified into five different categories: (1) 5–15 a.a.r, (2) 16–25 a.a.r, (3) 26–35 a.a.r, (4) 36–45 a.a.r and (5) 46–55 a.a.r. (Compare this with Fig. 3 & also see Table 3.)
Scheme 3An overview of different types of separation techniques that have been widely followed for the analysis of proteins as well as for various kinds of proteomic analyses including MD approach based studies.
Fig. 5Distribution of isotope peaks corresponding to charge states: z = +3, z = +4, z = +5 and z = +6, observed in ESI mass spectrum of a GluC proteolytic peptide (Residues No. [50–86], 37 a.a. residues long), zoomed-in from the spectrum shown in Fig. 2b.
Scheme 4Representative fragmentation pattern of a hexapeptide due to CID and ECD/ETD MS/MS. Molecular structure of different types of product/fragment ions resulting from CID and ECD/ETD MS/MS. In the case of MALDI, the charge state (n+) on the peptide is n = 1 and for ESI, n ≥ 1. In the case of ESI, higher values of n are possible, depending on the length and nature of amino acid residues constituting the peptide/protein. ECD and ETD MS/MS are not possible with MALDI, when n = 1.
Scheme 5Venn diagram representation of various computational tools used for each of the three different proteomic approaches. The tools shown within the category of TD approach are available at https://www.topdownproteomics.org/resources/software/.
Scheme 6Flow chart depicting various details and methods followed for each of the three approaches shown in Scheme 2, for analysis of proteins or proteomes.
| S. no. | Proteases | Site specificity of proteolysis |
|---|---|---|
| 1. | GluC | C-terminal side of E (C-terminal side of D) |
| 2. | AspN | N-terminal side of D |
| 3. | LysC | C-terminal side of K |
| 4. | OmpT | Between two consecutive dibasic residues: R↓R, K↓K, R↓K, K↓R |
| 5. | Sap9 | C-terminal side of R, K, KR, RR, RK & KK |
| 6. | IdeS | Between two consecutive glycine residues at the hinge region for immunoglobulin G |
| 7. | Trypsin (restricted/limited proteolysis) | C-terminal side of K & R |
| 8. | Neprosin[ | C-terminal side of P & A |
| 9. | GingisKHAN™[ | Upper hinge region of human immunoglobulin G1 |
E: glutamic acid (Glu); D: aspartic acid (Asp); K: lysine (Lys); M: methionine (Met); W: tryptophan (Trp); R: arginine (Arg); P: proline (Pro); A: alanine (Ala); C: cysteine (Cys).
From Staphylococcus aureus; this can proteolyze peptide bonds at C-terminus of Asp also, at a particular pH 4–6.
GluC (ref. 63, 64, 71, 73, 74–79, 81, 82, 84, 85, 91, 173, 184, and 185).
AspN (ref. 64, 75, 83, 84, 86, 92, 185).
LysC (ref. 84, 86, 172 and 259).
OmpT (ref. 94 and 98).
Sap9 (ref. 71, 95, and 96).
IdeS: (ref. 70, 97, 99, 101–105 and 133).
CNBr: cyanogen bromide; CNBr cleaves at C-terminus of tryptophan also.
BNPS: 2-(2′-nitrophenylsulfonyl)-3-methyl-3-bromoindolenine (BNPS)-skatole.
NTCB: 2-nitro-5-thiocyanobenzoic acid (NTCB).
| S. no. | Chemicals | Site specificity of cleavage |
|---|---|---|
| 1. | CNBr | C-terminal side of M |
| 2. | BNPS-skatole | C-terminal side of W |
| 3. | NTCB | N-terminal side of C |
| 4. |
| C-terminal side of W |
| 5. | Acid hydrolysis (formic acid)[ | C-terminal side of D |
| LysC & Trypsin | |
|---|---|
| Lys-C digest | Trypsin digest |
| 1. [2–7]: ETAAAK | 1. [2–7]: ETAAAK |
| 2. [8–31]: FERQHMDSSTSAASSSNYC*NQMMK | 2. [11–31]: QHMDSSTSAASSSNYC*NQMMK |
| 3. [32–37]: SRNLTK | 3. [40–61]: C*KPVNTFVHESLADVQAVC*SQK |
| 4. [38–61]: DRC*KPVNTFVHESLADVQAVC*SQK | 4. [67–85]: NGQTNC*YQSYSTMSITDC*R |
| 5. [67–91]: NGQTNC*YQSYSTMSITDC*RETGSSK | 5. [86–91]: ETGSSK |
| 6. [92–98]: YPNC*AYK | 6. [92–98]: YPNC*AYK |
| 7. [99–104]: TTQANK | 7. [99–104]: TTQANK |
| 8. [105–124]: HIIVAC*EGNPYVPVHFDASV | 8. [105–124]: HIIVAC*EGNPYVPVHFDASV |
Peptides of length ≤ 5 are not shown.
C* refers to carbamidomethyl cysteine
| GluC |
|---|
| Glu-C digest |
| 1. [1–9]: KETAAAKFE |
| 2. [10–49]: RQHMDSSTSAASSSNYC*NQMMKSRNLTKDRC*KPVNTFVE |
| 3. [50–86]: SLADVQAVC*SQKNVAC*KNGQTNC*YQSYSTMSITDC*RE |
| 4. [87–111]: TGSSKYPNC*AYKTTQANKHIIVAC*E |
| 5. [112–124]: GNPYVPVHFDASV |
| Comparison of results obtained from Trypsin and LysC | ||||||
|---|---|---|---|---|---|---|
| Species | Trypsin | LysC | ||||
| 0-MC | 1-MC | 2-MC | 0-MC | 1-MC | 2-MC | |
| Human | 0.0818 | 0.1681 | 0.6595 | 0.2361 | 0.6697 | 1.6434 |
|
| 0.0931 | 0.2878 | 1.1659 | 0.3081 | 0.9296 | 3.1395 |
| Archaea | 0.0866 | 0.1619 | 0.3676 | 0.1501 | 0.2834 | 0.5869 |
| Plant | 0.0796 | 0.2651 | 0.7771 | 0.2051 | 0.64 | 1.8247 |
| Yeast | 0.0744 | 0.2074 | 0.4549 | 0.1648 | 0.5592 | 0.9005 |
| Comparison of results obtained from GluC and AspN | ||||||
|---|---|---|---|---|---|---|
| Species | GluC | AspN | ||||
| 0-MC | 1-MC | 2-MC | 0-MC | 1-MC | 2-MC | |
| Human | 0.25 | 0.9395 | 1.1826 | 0.4379 | 0.9006 | 2.0512 |
|
| 0.2454 | 0.8209 | 2.0348 | 0.3555 | 0.8571 | 2.6415 |
| Archaea | 0.1265 | 0.3722 | 0.8299 | 0.2654 | 0.6608 | 2.4769 |
| Plant | 0.3063 | 0.8056 | 2.1647 | 0.3318 | 0.7453 | 2.7162 |
| Yeast | 0.2266 | 0.6421 | 0.9512 | 0.1895 | 0.6716 | 1.8490 |
| S. no | Chromatography | LTQ-Orbitrap: MS/MS methods | Samples | References | |
|---|---|---|---|---|---|
| CID or HCD | ETD | ||||
| 1 | RPLC (C8) | CID | — | Ubiquitin |
|
| 2 | RPLC | — | ETD | HeLa S3 cells |
|
| 3 | RPLC (C18) | CID | — | MCF 7 breast cells |
|
| 4 | Gel-filtration | — | ETD | Apomyoglobin, BSA, RHD3 |
|
| 5 | RPLC (C18) | CID | — | RPMI 8226 myeloma cells |
|
| 6 | RPLC (Phenyl) | — | — | Monoclonal antibody |
|
| 7 | RPLC | — | ETD | Mouse embryonic stem cells (ESC) |
|
| 8 | RPLC (C4) | — | ETD | Monoclonal antibodies |
|
| 9 | RPLC (C18)-WCX-HILIC | — | ETD | Mouse ESC |
|
| 10 | RPLC | — | ETD | Histones |
|
| 11 | SEC | — | — | Mustard allergen |
|
| 12 | RPLC (C18)-WCX-HILIC | — | ETD |
|
|
| 13 | RPLC (C18) | CID | ETD | HeLa S3 cells |
|
| 14 | WCX-HILIC | — | ETD | Human hepatocytes |
|
| 15 | WCX-HILIC | — | ETD |
|
|
| 16 | RPLC (C5) | — | ETD | Monoclonal antibodies |
|
| 17 | RPLC (C8) | CID/HCD | ETD | Monoclonal antibodies |
|
| 18 | RPLC (C8) | HCD | — | Model proteins ( |
|
Offline chromatographic methods used for fraction collection prior to LC-MS.
RPLC (C18) used as smaller trap column prior to WCX-HILIC.
UVPD.
Studies on histones.
Supplemented by collision induced decay (CID) using a collision energy of 10%.
Q-FTICR.
| S. no | Chromatography | Orbitrap fusion: MS/MS methods | Samples | References | |
|---|---|---|---|---|---|
| CID or HCD | ETD | ||||
| 1 | WCX-HILIC | HCD | EThcD | Murine erythroleukemia cells |
|
| 2 | RPLC (C18)-WCX-HILIC | — | ETD | HeLa S3 cells |
|
| 3 | RPLC (C18) | HCD | EThcD | Monoclonal antibodies |
|
| 4 | RPLC (C4) | — | ETD | Monoclonal antibodies |
|
| 5 | RPLC (C18) | HCD | EThcD | HeLa cells |
|
| 6 | RPLC (polymer) | — | ETD | Ubiquitin |
|
| 7 | RPLC (polymer) | HCD | — | Ubiquitin, myoglobin, carbonic anhydrase |
|
| 8 | RPLC (C18)-WCX-HILIC | — | ETD | HeLa cells |
|
| S. no | Chromatography | MALDI: MS/MS methods | Samples | References |
|---|---|---|---|---|
| 1 | RPLC (C8) | ISD | Monoclonal antibody |
|
| 2 | RPLC (C4) | ISD | Monoclonal antibodies |
|
| 3 | CZE (offline only) | ISD | Monoclonal antibody |
|
| S. no | Chromatography | LTQ-FT-ICR: MS/MS methods | Samples | References |
|---|---|---|---|---|
| 1 | RPLC | ECD | Rat brain tissues |
|
| 2 | RPLC | ECD | Ten rat tissues |
|
| 3 | RPLC | ECD & CAD | Cardiac myosin binding protein C |
|
| 4 | RPLC (C18) | CID & ETD | Human serum peptides |
|
| 5 | RPLC (C18) | ECD | Calf thymus, HeLa, Jurkat, MCF-7 |
|
| 6 | Chip based nano-ESI | ECD | Ubiquitin |
|
| 7 | RPLC | ECD | Murine erythroleukemia cells |
|
| 8 | SEC | ECD | Cardiac myosin heavy chain |
|