Literature DB >> 29608294

Assessment of Sample Preparation Bias in Mass Spectrometry-Based Proteomics.

Frank Klont¹, Linda Bras², Justina C Wolters³, Sara Ongay¹, Rainer Bischoff¹, Gyorgy B Halmos², Péter Horvatovich¹.

Abstract

For mass spectrometry-based proteomics, the selected sample preparation strategy is a key determinant for information that will be obtained. However, the corresponding selection is often not based on a fit-for-purpose evaluation. Here we report a comparison of in-gel (IGD), in-solution (ISD), on-filter (OFD), and on-pellet digestion (OPD) workflows on the basis of targeted (QconCAT-multiple reaction monitoring (MRM) method for mitochondrial proteins) and discovery proteomics (data-dependent acquisition, DDA) analyses using three different human head and neck tissues (i.e., nasal polyps, parotid gland, and palatine tonsils). Our study reveals differences between the sample preparation methods, for example, with respect to protein and peptide losses, quantification variability, protocol-induced methionine oxidation, and asparagine/glutamine deamidation as well as identification of cysteine-containing peptides. However, none of the methods performed best for all types of tissues, which argues against the existence of a universal sample preparation method for proteome analysis.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Peptides
Proteins

Year: 2018 PMID： 29608294 PMCID： PMC5906755 DOI： 10.1021/acs.analchem.8b00600

Source DB: PubMed Journal: Anal Chem ISSN： 0003-2700 Impact factor: 6.986

Mass spectrometry (MS)-based proteomics is a powerful technological platform for studying proteins in various biological contexts and has a prominent role in identifying and elucidating (patho)physiological processes.[1,2] Using strategies ranging from detecting proteins in their intact form (“top-down” proteomics) to analyzing proteins by means of peptides released through proteolysis (“bottom-up” proteomics), this platform has opened up and expanded opportunities to study proteins, for example, by profiling proteomes, characterizing proteins, quantifying proteins, and by studying protein–protein interactions.[3] As a result of ongoing advances, proteomics has become a tool capable of delivering answers to key biological questions, and its role in basic and applied science will likely expand in the coming decade(s).[2,4] Sample preparation strategies for bottom-up proteomics experiments encompass a protein digestion procedure using proteolytic enzymes (e.g., trypsin, endoproteinase LysC) in order to release peptides which can then be analyzed by liquid chromatography–mass spectrometry (LC-MS).[3] In more simple protocols, proteins are digested directly, though digestion is often preceded by a protein denaturation procedure (e.g., disulfide bond reduction and subsequent cysteine alkylation) to enhance digestion efficiency.[5,6] With such an approach, often referred to as “in-solution digestion” (ISD), any compound present in a sample or added during sample preparation will be injected into the LC-MS instrument.[7] Since researchers often use chemicals that are not compatible with digestion and/or LC-MS detection (e.g., detergents, chaotropes) to improve the performance of their workflow,[7−11] several contaminant removal procedures have been devised which are mostly based on protein precipitation and gel- or centrifugal filter-aided sample cleanup.[7,12−16] All of these different methods have specific advantages yet also exhibit (protocol-specific) biases.[5,8−11,17,18] The selection of sample preparation methods thereby influences the subset of proteins that can be reliably identified and/or quantified by LC-MS and thus is a determining factor for the potential outcomes of a proteomics experiment. When designing a proteomics experiment, previously published projects on the same type of starting material (and with comparable aims) may form the basis of rational sample preparation method selection. However, such studies are not readily available for any type of material and experiment. Proteomics is for example an upcoming research line in head and neck cancer,[19,20] and currently only a few studies can be referred to for assessing the applicability of sample preparation methods. Admittedly, most head and neck tissues are (lympho)epithelial tissues sharing structural features to some extent, yet basing workflow selection-related decisions on such an assumption may be risky. Here we describe a comparison of in-gel digestion, in-solution digestion, on-filter digestion, and on-pellet digestion sample preparation methodologies that are commonly used in LC-MS-based proteomics. For this study, we selected three human tissues originating from the head and neck area (i.e., nasal polyps, parotid gland, and palatine tonsils) thereby aiming to cover the diversity of (solid) tissues that can be encountered within a medical discipline, in this case otorhinolaryngology. The methods were compared on the basis of their performance in discovery proteomics experiments as well as in targeted proteomics on the basis of a QconCAT (quantification concatamers) multiple reaction monitoring method targeting a set of mitochondrial proteins.[21] Methods were compared on the basis of peptide and protein losses, precision of quantification, discovery potential, and the distribution of selected physicochemical properties (e.g., size, charge characteristics, and hydrophobicity) of identified proteins and peptides. In addition, we compared distributions of physicochemical properties for detected proteins and peptides to corresponding distributions of potentially present proteins (as predicted from the human proteome) and peptides (as predicted from the identified proteins in the specific tissues) thereby aiming to identify (protocol-specific) biases. With our work, we aim to assess sample preparation bias in proteomics experiments, to support the rationale of selecting sample preparation methods based on a fit-for-purpose evaluation, and to provide leads for expanding the detection capabilities of mass spectrometry-based proteomics workflows.

Experimental Section

Detailed descriptions of the materials and methods used for this study are included in the Supporting Information, whereas concise descriptions of the materials and methods are presented below.

Tissue Samples

Three different otolaryngeal tissues (i.e., nasal polyps, parotid gland, and palatine tonsil, see Table S-1 in the Supporting Information) were obtained separately from three patients who underwent head and neck surgery at the University Medical Center Groningen. Immediately after resection, tissues were sliced into pieces of approximately 30 mm,[3] snap frozen in liquid nitrogen, and stored at −80 °C until further processing. The study could be carried out under section 7:467 of the Dutch Civil Code as patients gave permission to use the tissues which were regarded as residual materials after surgery and which furthermore cannot be traced back to the patients.

Tissue Homogenization and Protein Extraction

Tissue was pulverized using a CryoMill cryogenic grinder and suspended in 0.1% RapiGest in 50 mM ammonium bicarbonate (ABC) or sodium dodecyl sulfate (SDS)/urea lysis buffer (2% SDS, 8 M urea and 100 mM β-mercapto-ethanol in 50 mM Tris/HCl buffer, pH 7.6) at a final tissue concentration of 30 mg/mL. The suspensions were vortex-mixed for 5 min and subjected to three freeze/thaw cycles. Upon another 5 min of vortex-mixing and pelleting debris via centrifugation (10 min; 14 000g), final lysates were collected. Protein concentration was determined using the micro bicinchoninic acid (BCA) assay, and lysates were stored at −80 °C until analysis.

In-Solution Digestion (ISD)

A volume of RapiGest protein extract corresponding to 20 μg of total protein was diluted to 40 μL with ABC. Proteins were reduced in 10 mM dithiothreitol (DTT) (30 min; 60 °C) and alkylated in the dark in 20 mM iodoacetamide (IAM) (30 min; 25 °C). After quenching unreacted IAM with a 0.5 molar excess of DTT (30 min; 25 °C), trypsin was added in a final proteinase-to-protein ratio of 1:20, and the proteins were digested overnight (37 °C). Digestion was stopped and RapiGest was hydrolyzed through addition of formic acid (FA) in Milli-Q water (H2O), and the final peptide mixture was obtained after pelleting debris via centrifugation (10 min; 14 000g).

On-Pellet Digestion (OPD)

SDS/urea protein extract containing 20 μg of protein was diluted to 25 μL with ABC, and proteins were precipitated through addition of 50 μL of ice-cold 100% acetone and two 50 μL aliquots of ice-cold 85% acetone followed by centrifugation (5 min; 4 °C; 14 000g). The supernatant was removed, and the precipitation step was repeated. After removing the supernatant of the second precipitation step, the pellet was left to dry by air. Subsequently, proteins were solubilized via pretrypsination in 25 μL of ABC with a final proteinase-to-protein ratio of 1:50 (4 h; 37 °C). Proteins were reduced with 10 mM DTT and were alkylated in the dark with 20 mM IAM. After quenching unreacted IAM with DTT, trypsin was added in a final proteinase-to-protein ratio of 1:20, and the proteins were digested overnight. Digestion was stopped through addition of FA, and the final peptide mixture was obtained after pelleting debris.

In-Gel Digestion (IGD)

The in-gel digestion protocol was based on the “In-Gel Digestion and Sample Cleanup” protocol, as described previously in Wolters et al.[21] Briefly, SDS/urea protein extract containing 20 μg of protein was diluted to 15 μL with ABC, mixed with 5 μL of NuPAGE LDS Sample Buffer 4×, and the sample was boiled for 2 min. After the sample was cooled to room temperature, it was loaded onto a NuPAGE 4–12% Bis-Tris Protein Gel, and electrophoresis was carried out at 100 V for only 5 min. Proteins were localized by staining the gel with Bio-Safe Coomassie Blue G-250 stain overnight, and unbound dye was washed away with repeated washes with H2O. The stained protein band was excised, sliced in 2 × 2 mm pieces, and destained via repeated washes with 30% acetonitrile (ACN) in ABC (15 min; 25 °C). Gel pieces were dehydrated upon washing with 50% ACN in ABC (15 min; 25 °C) and 100% ACN (5 min; 25 °C) followed by drying in an oven at 37 °C. Next, proteins were reduced in 10 mM DTT and, after discarding the DTT solution, alkylated in the dark in 20 mM IAM. Remaining IAM was discarded, and the gel pieces were dehydrated as described above. Subsequently, gel pieces were reswollen on ice following dropwise addition of 25 μL ABC containing trypsin in a final proteinase-to-protein ratio of 1:20, and the proteins were digested overnight. After digestion, the residual liquid was collected and remaining peptides were extracted in 25 μL of 5% FA in 75% ACN (20 min; 25 °C). After combining the two volumes, peptides were dried in a CentriVap vacuum concentrator (Labconco) at 45 °C, and the residue was reconstituted in 0.1% FA to obtain the final peptide mixture.

On-Filter Digestion (OFD)

For on-filter digestion, the SDS/urea protein extract was processed according to the “FASP II” protocol, as described previously by Wisniewski et al.,[15] with minor modifications. Briefly, an amount of SDS/urea protein extract corresponding to 20 μg of protein was diluted with urea solution (8 M urea in 0.1 M Tris/HCl, pH 8.5) to 200 μL and was loaded onto a Microcon Ultracel YM-30 filtration device. After centrifugation (15 min; 14 000g), the concentrate was diluted with 200 μL of urea solution and was centrifuged again. Next, 100 μL of 50 mM IAM in urea solution was added to the concentrate, the sample was mixed briefly (1 min; 25 °C), and proteins were alkylated in the dark. After centrifugation, the concentrate was diluted with 100 μL of urea solution and was centrifuged again. This step was repeated twice. Subsequently, the concentrate was diluted with 100 μL of ABC and was centrifuged. After this second wash step was repeated twice, 40 μL of ABC containing trypsin in a final proteinase-to-protein ratio of 1:20 was added to the filter, the sample was mixed briefly, and proteins were digested overnight in a wet chamber. Peptides were collected by centrifuging the filter unit followed by an additional elution (centrifugation) step with 50 μL ABC. After combining the two volumes, peptides were dried in a CentriVap vacuum concentrator (Labconco) at 45 °C, and the residue was reconstituted in 0.1% FA to obtain the final peptide mixture.

Targeted LC-MS/MS Analysis

Targeted proteomics analyses were performed using a TSQ Vantage Triple Quadrupole mass spectrometer using multiple reaction monitoring (MRM) transitions and settings that have been described previously.[21] Peptide separation was achieved with an UltiMate 3000 RSLC UHPLC system on a 50 cm Acclaim PepMap RSLC C18 analytical column (2 μm, 100 Å, 75 μm i.d. × 500 mm) which was kept at 40 °C. For targeted analyses, the final peptide mixtures were spiked with predigested QconCAT (quantification concatamers; designed to target a set of mitochondrial proteins, details have been described previously)[21] at a level of 1.25 ng per μg of total protein. A sample volume corresponding to 1 μg of total protein (based on the micro BCA assay) was loaded onto a Acclaim PepMap100 C18 trap column (5 μm, 100 Å, 300 μm i.d. × 5 mm) using μL-pickup with 0.1% FA in H2O at 20 μL/min. Subsequently, peptides were separated on the analytical column using a 100 min linear gradient from 3 to 60% eluent B (0.1% FA in ACN) in eluent A (0.1% FA in H2O) at 200 nL/min.

Shotgun LC-MS/MS Analysis

Shotgun proteomics analyses were performed using an UltiMate 3000 RSLC UHPLC system connected to an Orbitrap Q Exactive Plus mass spectrometer operating in the data-dependent acquisition (DDA) mode. A sample volume corresponding to 1 μg of total protein (based on the micro BCA assay) was injected onto a Acclaim PepMap100 C18 trap column (vide supra) using μL-pickup with 0.1% FA in H2O at 20 μL/min. Peptides were separated on a 50 cm Acclaim PepMap RSLC C18 analytical column (vide supra) which was kept at 40 °C, using a 117 min linear gradient from 3 to 40% eluent B (0.1% FA in ACN) in eluent A (0.1% FA in H2O) at a flow rate of 200 nL/min. For DDA, survey scans from 300 to 1650 m/z were acquired at a resolution of 70 000 (at 200 m/z) with an AGC target value of 3 × 106 and a maximum ion injection time of 50 ms. From the survey scan, a maximum number of 12 of the most abundant precursor ions with a charge state of 2+ to 6+ were selected for higher energy collisional dissociation (HCD) fragment analysis between 200 and 2000 m/z at a resolution of 17 500 (at 200 m/z) with an AGC target value of 5 × 104, a maximum ion injection time of 50 ms, a normalized collision energy of 28%, an isolation window of 1.6 m/z, an underfill ratio of 1%, an intensity threshold of 1 × 104, and the dynamic exclusion parameter set at 20 s.

Data Processing

Raw data for the targeted proteomics analyses were processed using the Skyline software and were furthermore analyzed using Microsoft Excel (more details on processing of targeted proteomics data have been published previously).[21] Shotgun proteomics data were processed using PEAKS Studio software,[22] and a detailed overview of applied PEAKS search criteria is included in Method S-8 (Supporting Information). Label-free quantification using ion counts was performed on the basis of the results of the principal PEAKS search followed by further filtering and processing of the data using an in-house developed script in R and R Studio. With respect to peptide quantification, peptide areas were summed for all peptides with the same primary amino acid sequence after removing PTMs and independently of the charge states. For protein quantification, areas of peptides belonging to the same protein group were summed, yet only if they were unique for the corresponding protein group. For both peptide and protein quantification, DDA data was scaled by median scale normalization.[23]

Bioinformatics Analysis

Data analysis and visualization was performed using R, R studio, Microsoft Excel, and GraphPad Prism. For evaluation of the physicochemical properties of proteins and peptides, the R “Peptides” and “ggplot2” packages were employed for, respectively, calculating and visualizing corresponding data.

Results

Relative Losses of Peptides and Proteins

Method-induced losses were evaluated on the basis of peptides and proteins that were quantified in all 20 replicates (four methods, five replicates per method) per tissue. Average levels were calculated for each method, the highest observed average level was set to 100%, and the other three average levels were related to the highest average level, which gave the relative average peptide and protein levels (see Figure ). For the QconCAT-multiple reaction monitoring (MRM) experiments, digested QconCATs (with 13C/15N-labeled arginines and lysines) were added in fixed amounts to the samples prior to LC-MS analysis to compare peptide losses (yet also methodological variation) for the different methods.

Figure 1

Assessment of method-induced losses of peptides as quantified by (a) MRM and (b) DDA and (c) proteins as quantified by DDA for the different tissues and the pooled samples. For visualization purposes, levels are expressed as percentage of the highest observed average level for each peptide. For every tissue and for pooled sample analysis, statistically significant differences (p < 0.05, two-tailed Wilcoxon rank-sum test; performed on the absolute average levels) were found between all methods, unless specified otherwise in the figure. Corresponding descriptive statistics are presented in Table S-2 (Supporting Information). For all tissues, the largest losses were observed for IGD with (median relative average) peptide and protein levels of 27–40% as shown in Figure . This figure furthermore shows that the smallest losses were typically observed for ISD, with the exception of the palatine tonsil MRM experiment and all experiments targeting the parotid gland. For the latter tissue, OFD yielded the highest peptide and protein levels (together with OPD), and this method furthermore gave similar (DDA) or higher (MRM) peptide levels for palatine tonsils compared to ISD. However, OFD’s protein losses for the latter tissue and also the losses of peptides (both DDA and MRM) and proteins for nasal polyps were considerably larger compared to ISD, as demonstrated by the 16% (MRM) and 9% (DDA) lower peptide levels as well as the 27% lower protein levels for this tissue. Moreover, Figure shows that OPD featured losses comparable to those of OFD for nasal polyps and parotid gland (15–29% and 3–6% for OPD versus 16–27% and 2–7% for OFD), yet OPD performed less well in the experiments targeting the palatine tonsils with OPD’s levels being around two-thirds of the corresponding levels for ISD and OFD. In summary, IGD’s peptide and proteins levels were around three times lower compared to the other three methods. ISD and OFD generally performed best in terms of peptide and protein losses, although both methods featured markedly increased losses in case of one of the three tissues (i.e., parotid gland for ISD and nasal polyps for OFD). Conversely, OPD gave the highest peptide and protein levels for one of the three tissues (i.e., parotid gland) whereas considerable losses were observed for the other two.

Precision of Peptide and Protein Quantification

To assess methodological precision, peptides and proteins that were quantified in all 20 replicates (four methods, five replicates per method) per tissue were included. Relative standard deviations (RSDs) were calculated using the five replicates per method, and data were visualized in beeswarm plots (MRM experiments) or RSD relative frequency polygon plots (discovery proteomics experiments) (see Figure ). For the QconCAT-MRM experiments, digested QconCATs were added in a fixed amount to the samples before LC-MS analysis (as described in the section above), and for the discovery proteomics experiments, data were normalized following median scale normalization.[23] Plots for the non-normalized data are shown in Figure S-1 (Supporting Information).

Figure 2

Assessment of methodological precision of peptide (as measured by (a) MRM and (b) DDA) and (c) protein (as measured by DDA) quantification for the different tissues and for the pooled samples. For every tissue and for pooled sample analysis, statistically significant differences (p < 0.05, two-tailed Wilcoxon rank-sum test) were found between all methods, unless specified otherwise in the figure. Discovery proteomics data were normalized by median scale normalization, though plots for non-normalized data are included in Figure S-1 (Supporting Information). Descriptive statistics for the data is in this figure are presented in Table S-3 (Supporting Information). In the targeted proteomics experiments, variability introduced by the LC-MS system itself, as determined by five repeated injections of a pooled sample, was similarly low for all four methods (median RSDs ranging from 2.3% to 3.3%) as shown in Figure a. Variability due to the upstream sample preparation steps was furthermore consistently low for IGD and OFD with (median) RSDs of 8–10% and 6–9%, respectively. ISD exhibited similar RSDs though with exception of the nasal polyps experiment for which an RSD of 12% was observed. RSDs around 12% were also observed for OPD in the parotid gland and palatine tonsil samples, yet an up to two times increased RSD (25%) was found for nasal polyps. Thereby, OPD featured rather moderate precision of peptide quantification in the MRM experiments, whereas good precision in all three tissues was observed for IGD and OFD and good precision in two out of the three tissues for ISD. For the discovery proteomics analyses, variability introduced by the LC-MS system was higher compared to the MRM measurements with (median) peptide RSDs of 5.7–9.5% (see Figure b) and protein RSDs of 14.5–18.9% (see Figure c). For peptide quantification, additional variability, as introduced by the sample preparation methods, led to minor RSD increases (2–5%) in all experiments, except for ISD in the nasal polyps experiment for which an RSD increment of 7% was observed. Corresponding variability for protein quantification also revealed minor RSD increases for ISD, OFD, and OPD (3–6%, 0–4%, and 2–2%, respectively) whereas slightly higher increases (6–9%) were observed for IGD. In terms of overall variability, Figure c shows that precision for peptide quantification was rather comparable for the four methods, and only IGD in the parotid gland experiment gave considerably higher RSDs compared to the other three methods. Moreover, Figure c shows that protein quantification (based on the sum of the areas of unique peptides belonging to the same protein group) was generally less precise than peptide quantification, and IGD furthermore featured the highest RSDs for all tissues. With respect to these increases, it should, however, be noted that (for any approach) RSDs increased with decreasing protein and peptide quantities (see Figure S-2 in the Supporting Information). The larger losses for IGD should thus be considered as an (at least partial) explanation for the greater methodological imprecision observed for IGD. On a final note, precision data for the discovery proteomics experiments were influenced to various degrees by the median scale normalization procedure (see Figure S-1 and the Tables S-3 and S-4 in the Supporting Information). In case of ISD and OFD, relative standard deviations were rather unaffected by this normalization procedure, though this procedure led to some improvements in methodological precision for OPD and even larger improvements for IGD.

Discovery Potential

The total number and the overlap of identifications were assessed for peptides (see Figure a) and proteins (see Figure b) that were identified in at least three of the five replicates for the different tissues. Peptides and proteins identified in at least four and five out of five replicates resulted in, respectively, around 20% and 40% fewer peptide identifications as well as 15% and 30% fewer protein identifications (see Figures S-3 and S-4 in the Supporting Information).

Figure 3

Discovery potential of the different sample preparation approaches. Venn diagrams of (a) peptides and (b) proteins identified in at least three out of the five replicates per sample preparation method for the different tissues. Venn diagrams displaying the distribution of peptides and proteins identified in at least four out of five and five out of five replicates for the different tissues as well as those identified in the pooled samples are shown in the Figures S-3–S-5 (Supporting Information). Percentage of peptides identified in the pooled samples containing (c) 0, 1, and 2 or 3 missed cleavages; (d) oxidized methionine residues (relative to the number of methionine-carrying peptides); (e) deamidated asparagine and/or glutamine residues (relative to the number of asparagine- and/or glutamine-carrying peptides); and (f) carbamidomethylated (CAM) cysteine residues (relative to the total number of peptides). The highest numbers of peptides were identified for ISD and OPD, whereas 10–20% fewer peptide identifications were observed for IGD and OFD. Most identified proteins were observed for ISD and OPD in nasal polyps and parotid gland, though 10% fewer identifications for OPD were observed in palatine tonsils. Furthermore, the 10–20% fewer peptide identifications for IGD and OFD corresponded to 5–10% fewer proteins identified for OFD and notably to 20–30% fewer protein identifications for IGD. The latter observation should be evaluated in the context of IGD’s peptide and protein losses and the approximately three times lower peptide and protein levels observed for IGD compared to the other three methods (see Figure ); however, the effect of triplicating the injection volume for IGD revealed modest increases in peptide and protein identifications of 11% and 12%, respectively (see Figure S-6 in the Supporting Information). To zoom in further on the qualitative performance of the methods, trypsin digestion efficiency and the abundance of selected post-translational modifications (PTMs) and/or sample preparation artifacts were assessed. The proportion of peptides displaying zero missed cleavages was 95%, 89%, 93%, and 94% for IGD, ISD, OFD, and OPD, respectively (see Figure c). For ISD, 10% of the peptides contained one missed cleavage as compared to 5–6% for the other methods, and only one percent (or less) of the peptides exhibited two or more missed cleavages. Moreover, methionine-containing peptides were more frequently oxidized (see Figure d) and asparagine- and/or glutamine-containing peptides more frequently deamidated (see Figure e) in IGD compared to ISD, OFD, and OPD (31% versus 4–8% and 17% versus 7–10%, respectively). Other modifications were assessed as well (see Figure S-7 in the Supporting Information) revealing considerable overalkylation in all samples (up to 2.4% for OFD and 3.1% for OPD), lysine and N-terminal carbamylation of around 1% in IGD, and protein N-terminal acetylation of 0.7–1.1% for the studied methods. The degree and extent of cysteine carbamidomethylation was studied more closely due to the absence of a distinct reduction step prior to thiol alkylation in the original (and also in newer versions of the) filter-aided sample preparation (FASP) protocol, which forms the basis of the applied OFD protocol. For all methods, cysteine carbamidomethylation was rather complete (see Figure S-8A in the Supporting Information), yet only 8% of the peptides identified for OFD contained cysteine residues compared to 15% for IGD and 14% for both ISD and OPD (see Figure f). The occurrence of the other 19 amino acids were evaluated as well (see the Figures S-8B and S-8c in the Supporting Information), though relevant differences were only observed for cysteine in case of the OFD approach.

Peptide and Protein Characteristics

The distribution of peptides and proteins according to their molecular weight (MW), isoelectric point (pI), and hydrophobicity (as expressed by the grand average of hydropathy (GRAVY) scale using the method of Kyte and Doolittle[24]) were evaluated for all sample preparation methods. For proteins, distributions according to the three physicochemical characteristics were rather similar (see Figure ); however for IGD, the distributions for MW feature modest shifts toward larger proteins (see Figure a), and the proportion of acidic proteins (pH ± 5) appears to be lower compared with other approaches (see Figure b). In comparison with the expected distributions based on all proteins present in the human reference proteome (i.e., UniProtKB Homo sapiens UP000005640, canonical with 70 956 entries; represented by the straight lines in Figure ), relatively fewer small and basic proteins were detected by the different methods (see Figure a,b). Furthermore, the distributions of GRAVY scores for observed proteins were slightly narrower compared to the corresponding distribution of all proteins present in the reference proteome (see Figure c).

Figure 4

Distribution of identified proteins according to (a) molecular weight, (b) pI, and (c) hydrophobicity (GRAVY) based on proteins identified in three out of five replicates for the pooled samples. Graphs include (colored) lines for the different methods as well as lines for the theoretical distributions of all proteins present in the human reference proteome (straight line) and the distributions of all proteins detected in any of the pooled samples (dashed line). Corresponding plots for the different tissues are shown in the Figures S-9–S-11 (Supporting Information). Regarding the physicochemical properties of the detected peptides, corresponding distributions were also rather comparable for the different methods (see Figure ). However, relatively more acidic peptides (pI ± 4) were observed for OFD (see Figure b) and the MW distribution for IGD featured a minor shift toward smaller peptides (see Figure a). Differences were also observed when comparing the distributions of the four methods to those of in silico predicted tryptic peptides derived from all proteins present in the above-mentioned reference proteome (straight black lines in Figure ) and undetected (in silico predicted tryptic) peptides from the proteins that were actually detected in the specific tissue samples (dash-dot lines in Figure ). Notably, the MW distributions of peptides for the four methods were smaller and shifted toward larger peptides (see Figure a), and the GRAVY distributions featured modest shifts toward positive scores (more hydrophobic peptides) compared with the undetected peptides (see Figure c). In addition, the peptide pI distributions for all four methods indicate an underrepresentation of peptides with a pI around 8.5 (see Figure b), which thus include peptides having their lowest solubility around the pH value of the digestion buffer used in this study (i.e., 50 mM ammonium bicarbonate, pH ± 8.3).

Figure 5

Distribution of identified peptides according to (a) molecular weight, (b) pI, and (c) hydrophobicity (GRAVY) based on peptides identified in three out of five replicates for the pooled samples. Graphs include (colored) lines for the different methods as well as lines for the theoretical distributions of peptides derived from all proteins present in the human reference proteome (straight line), distributions of all peptides detected in any of the pooled samples (dashed line), and theoretical distributions of undetected peptides (at least five amino acids in length) derived from all proteins detected in any of the pooled samples (dash-dot line). Corresponding plots for the different tissues are shown in the Figures S-12–S-14 (Supporting Information).

Discussion

Various sample preparation methods have been described for bottom-up proteomics experiments targeting (solid) tissues, and a wide range of modifications to these methods can also be found in literature.[7,12,13] The most straightforward methods involve direct (in-solution) digestion of proteins without distinct procedures to remove contaminants including detergents, chaotropes, lipids, and nucleic acids.[7,9,10] In our study, we show that such an in-solution digestion (ISD) approach is a good option for quantitative proteomics featuring limited losses and good precision for peptide and protein quantification on the basis of simple and highly automatable workflows. ISD furthermore gave the highest numbers of identified peptides and proteins in the discovery proteomics experiments and did not exhibit a bias regarding amino acid composition or physicochemical properties of identified peptides and proteins, as compared with other methods. However, it is important for direct digestion approaches that samples are sufficiently “clean”, and we did observe column contamination leading to carryover and shifting retention times, which was particularly an issue for the targeted (timed MRM) experiments. In addition, we observed increased proportions of miscleaved peptides in the ISD samples which can likely be attributed to their lower degree of purity.[25] Moreover, chemicals used in ISD workflows need to be compatible with proteolytic digestion as well as LC-MS detection, and, for example, detergents which are often used in proteomics workflows to solubilize proteins (e.g., SDS, NP-40, and CHAPS), are not compatible with mass spectrometric detection.[7−11] MS-compatible alternatives, however, do exist (e.g., PPS Silent Surfactant, ProteaseMAX, Invitrosol, and RapiGest SF, which was used in our study), yet the noncompatible detergents are still mostly used thus requiring appropriate procedures to remove these compounds prior to LC-MS analysis.[26,27] Common methods for detergent removal are based on precipitating proteins with acid (e.g., trichloroacetic acid) or organic solvents (e.g., acetone, which was used in our study for the on-pellet digestion method) while keeping detergents in solution, or by trapping proteins in gels or onto centrifugal filters allowing the separation of proteins from contaminants.[7,12−16] These approaches lead to cleaner samples compared to ISD, which we also observed in our study as corresponding samples did not lead to noticeable carryover or retention time shifts. These approaches are, however, prone to induce considerable protein losses, which we found were most relevant for the in-gel digestion (IGD) method, which is a rather labor-intensive method featuring many steps during which losses may occur. Despite these losses, IGD enabled efficient contaminant removal and detection of considerable numbers of proteins and peptides. Good precision was furthermore achieved in both targeted and discovery experiments. However, enabling precise (label-free) quantification in the discovery experiments required (median scale) normalization of the data, which was likely due to the lower amounts of material that were eventually analyzed by LC-MS. The on-pellet digestion (OPD) method is comparable to ISD with regard to its simplicity and high-throughput capabilities, yet also based on its performance for the nasal polyps and parotid gland samples in terms of the numbers of identifications, losses, and precision of quantification. However, median scale normalization of the data was also required for OPD to enable precise quantification in the discovery experiments. In the palatine tonsil experiments, losses were considerably larger for OPD and also relatively fewer proteins were identified. Accordingly, OPD’s reduced performance for this tissue highlights that one method may not always be performing optimally for just any type of tissue and that furthermore the outcome of a comparative study of sample preparation methods depends greatly on the selected tissue(s). One of the most widely used sample preparation methods in present-day proteomics research is the “FASP” method which relies on an on-filter sample cleanup and protein digestion protocol and furthermore features considerable high-throughput capabilities.[15,28] In our study, we have tested on-filter digestion (OFD) on the basis of the original “FASP II” protocol[15] which showed limited losses (comparable with ISD), good precision in both targeted and discovery proteomics experiments, and high numbers of identified peptides and proteins, which were only somewhat lower compared to ISD and OPD. With respect to the latter, we observed a significant (negative) bias for OFD regarding the identification of cysteine-containing peptides. Even though our tissue lysates did contain a reducing agent, the absence of a distinct reduction step in the OFD protocol prior to thiol alkylation may have led to this bias. This artifact likely affected the numbers of identifications negatively, and it would thus be advised to assess the recovery of cysteine-containing peptides when using OFD or to consider including a distinct reduction step in the protocol.

Conclusions

Every method has its specific advantages and challenges (e.g., the absence of a sample cleanup procedure in the ISD protocol, the relatively large losses for IGD or the rather varying losses for OPD, and the risk of losing cysteine-containing peptides with OFD, as observed in our study), and for all methods, numerous alternative protocols exist in literature which address these, and other challenges thereby resulting in optimized protocols, often for specific applications. With our study, we could not possibly grasp the full range of available methods and variants, nor could we draw any hard, general conclusions regarding the performances of the four methods included our study. In fact, our study shows that a method’s performance is depending on the type of sample being studied, and the outcomes of our comparative study could have been different if only one of the three tissues was included, and likely even so if three other tissues had been included. It may furthermore be speculated that if a different detection principle (e.g., data independent acquisition, DIA) had been employed for our study, other differences, nuances, or outcomes could have been revealed. Nonetheless, our data do show the relevance of selecting the most suitable protocol for an experiment based on a fit-for-purpose evaluation rather than just using the same method for every type of sample. In addition, we also show that peptides and proteins detected with the four methods share similar distributions of physicochemical characteristics, which in turn are considerably different from those of potentially present proteins (as predicted from the human proteome) and peptides (as predicted from the identified proteins). Accordingly, efforts to improve the detection capabilities of proteomics workflows, for example by improving the detectability of currently undetected peptides, are needed to increase the potential of proteomics research.

27 in total

1. Optimization of mass spectrometry-compatible surfactants for shotgun proteomics.

Authors: Emily I Chen; Daniel Cociorva; Jeremy L Norris; John R Yates
Journal: J Proteome Res Date: 2007-05-27 Impact factor: 4.466

2. Development and evaluation of normalization methods for label-free relative quantification of endogenous peptides.

Authors: Kim Kultima; Anna Nilsson; Birger Scholz; Uwe L Rossbach; Maria Fälth; Per E Andrén
Journal: Mol Cell Proteomics Date: 2009-07-12 Impact factor: 5.911

3. Filter-Aided Sample Preparation: The Versatile and Efficient Method for Proteomic Analysis.

Authors: J R Wiśniewski
Journal: Methods Enzymol Date: 2016-10-12 Impact factor: 1.600

Review 4. Protein analysis by shotgun/bottom-up proteomics.

Authors: Yaoyang Zhang; Bryan R Fonslow; Bing Shan; Moon-Chang Baek; John R Yates
Journal: Chem Rev Date: 2013-02-26 Impact factor: 60.622

5. A simple method for displaying the hydropathic character of a protein.

Authors: J Kyte; R F Doolittle
Journal: J Mol Biol Date: 1982-05-05 Impact factor: 5.469

6. Comparative study of workflows optimized for in-gel, in-solution, and on-filter proteolysis in the analysis of plasma membrane proteins.

Authors: Waeowalee Choksawangkarn; Nathan Edwards; Yan Wang; Peter Gutierrez; Catherine Fenselau
Journal: J Proteome Res Date: 2012-04-13 Impact factor: 4.466

Review 7. Protein biomarker discovery for head and neck cancer.

Authors: Tieneke B M Schaaij-Visser; Ruud H Brakenhoff; C René Leemans; Albert J R Heck; Monique Slijper
Journal: J Proteomics Date: 2010-02-04 Impact factor: 4.044

8. Comparison of bottom-up proteomic approaches for LC-MS analysis of complex proteomes.

Authors: Leigh A Weston; Kerry M Bauer; Amanda B Hummon
Journal: Anal Methods Date: 2013-09-21 Impact factor: 2.896

9. Optimization and comparison of bottom-up proteomic sample preparation for early-stage Xenopus laevis embryos.

Authors: Elizabeth H Peuchen; Liangliang Sun; Norman J Dovichi
Journal: Anal Bioanal Chem Date: 2016-04-30 Impact factor: 4.142

10. Critical comparison of sample preparation strategies for shotgun proteomic analysis of formalin-fixed, paraffin-embedded samples: insights from liver tissue.

Authors: Alessandro Tanca; Marcello Abbondio; Salvatore Pisanu; Daniela Pagnozzi; Sergio Uzzau; Maria Filippa Addis
Journal: Clin Proteomics Date: 2014-07-08 Impact factor: 3.988

17 in total

1. Parameterization of Microsomal and Cytosolic Scaling Factors: Methodological and Biological Considerations for Scalar Derivation and Validation.

Authors: Michael J Doerksen; Robert S Jones; Michael W H Coughtrie; Abby C Collier
Journal: Eur J Drug Metab Pharmacokinet Date: 2020-12-19 Impact factor: 2.441

2. Activity of the yeast cytoplasmic Hsp70 nucleotide-exchange factor Fes1 is regulated by reversible methionine oxidation.

Authors: Erin E Nicklow; Carolyn S Sevier
Journal: J Biol Chem Date: 2019-12-05 Impact factor: 5.157

3. Proteogenomic Analysis of Surgically Resected Lung Adenocarcinoma.

Authors: Michael F Sharpnack; Nilini Ranbaduge; Arunima Srivastava; Ferdinando Cerciello; Simona G Codreanu; Daniel C Liebler; Celine Mascaux; Wayne O Miles; Robert Morris; Jason E McDermott; James L Sharpnack; Joseph Amann; Christopher A Maher; Raghu Machiraju; Vicki H Wysocki; Ramaswami Govindan; Parag Mallick; Kevin R Coombes; Kun Huang; David P Carbone
Journal: J Thorac Oncol Date: 2018-07-11 Impact factor: 15.609

4. Simple and Efficient Microsolid-Phase Extraction Tip-Based Sample Preparation Workflow to Enable Sensitive Proteomic Profiling of Limited Samples (200 to 10,000 Cells).

Authors: James C Kostas; Michal Greguš; Jan Schejbal; Somak Ray; Alexander R Ivanov
Journal: J Proteome Res Date: 2021-02-24 Impact factor: 4.466

5. Proteome and allergenome of the European house dust mite Dermatophagoides pteronyssinus.

Authors: Rose Waldron; Jamie McGowan; Natasha Gordon; Charley McCarthy; E Bruce Mitchell; David A Fitzpatrick
Journal: PLoS One Date: 2019-05-01 Impact factor: 3.240

6. Modeling cell line-specific recruitment of signaling proteins to the insulin-like growth factor 1 receptor.

Authors: Keesha E Erickson; Oleksii S Rukhlenko; Md Shahinuzzaman; Kalina P Slavkova; Yen Ting Lin; Ryan Suderman; Edward C Stites; Marian Anghel; Richard G Posner; Dipak Barua; Boris N Kholodenko; William S Hlavacek
Journal: PLoS Comput Biol Date: 2019-01-17 Impact factor: 4.475

7. DeepLigand: accurate prediction of MHC class I ligands using peptide embedding.

Authors: Haoyang Zeng; David K Gifford
Journal: Bioinformatics Date: 2019-07-15 Impact factor: 6.937

Review 8. Taming the Huntington's Disease Proteome: What Have We Learned?

Authors: Connor Seeley; Kimberly B Kegel-Gleason
Journal: J Huntingtons Dis Date: 2021

9. One- vs two-phase extraction: re-evaluation of sample preparation procedures for untargeted lipidomics in plasma samples.

Authors: Andres Gil; Wenxuan Zhang; Justina C Wolters; Hjalmar Permentier; Theo Boer; Peter Horvatovich; M Rebecca Heiner-Fokkema; Dirk-Jan Reijngoud; Rainer Bischoff
Journal: Anal Bioanal Chem Date: 2018-07-02 Impact factor: 4.142

10. Molecular Dynamics model of peptide-protein conjugation: case study of covalent complex between Sos1 peptide and N-terminal SH3 domain from Grb2.

Authors: Dmitrii A Luzik; Olga N Rogacheva; Sergei A Izmailov; Maria I Indeykina; Alexei S Kononikhin; Nikolai R Skrynnikov
Journal: Sci Rep Date: 2019-12-27 Impact factor: 4.379