Frank Klont1, Linda Bras2, Justina C Wolters3, Sara Ongay1, Rainer Bischoff1, Gyorgy B Halmos2, Péter Horvatovich1. 1. Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy , University of Groningen , 9713 AV Groningen , The Netherlands. 2. Department of Otorhinolaryngology, Head and Neck Surgery , University of Groningen, University Medical Center Groningen , Hanzeplein 1 , 9713 GZ Groningen , The Netherlands. 3. Department of Pediatrics, University Medical Center Groningen (UMCG) , University of Groningen , 9713 GZ Groningen , The Netherlands.
Abstract
For mass spectrometry-based proteomics, the selected sample preparation strategy is a key determinant for information that will be obtained. However, the corresponding selection is often not based on a fit-for-purpose evaluation. Here we report a comparison of in-gel (IGD), in-solution (ISD), on-filter (OFD), and on-pellet digestion (OPD) workflows on the basis of targeted (QconCAT-multiple reaction monitoring (MRM) method for mitochondrial proteins) and discovery proteomics (data-dependent acquisition, DDA) analyses using three different human head and neck tissues (i.e., nasal polyps, parotid gland, and palatine tonsils). Our study reveals differences between the sample preparation methods, for example, with respect to protein and peptide losses, quantification variability, protocol-induced methionine oxidation, and asparagine/glutamine deamidation as well as identification of cysteine-containing peptides. However, none of the methods performed best for all types of tissues, which argues against the existence of a universal sample preparation method for proteome analysis.
For mass spectrometry-based proteomics, the selected sample preparation strategy is a key determinant for information that will be obtained. However, the corresponding selection is often not based on a fit-for-purpose evaluation. Here we report a comparison of in-gel (IGD), in-solution (ISD), on-filter (OFD), and on-pellet digestion (OPD) workflows on the basis of targeted (QconCAT-multiple reaction monitoring (MRM) method for mitochondrial proteins) and discovery proteomics (data-dependent acquisition, DDA) analyses using three different human head and neck tissues (i.e., nasal polyps, parotid gland, and palatine tonsils). Our study reveals differences between the sample preparation methods, for example, with respect to protein and peptide losses, quantification variability, protocol-induced methionine oxidation, and asparagine/glutamine deamidation as well as identification of cysteine-containing peptides. However, none of the methods performed best for all types of tissues, which argues against the existence of a universal sample preparation method for proteome analysis.
Mass spectrometry
(MS)-based
proteomics is a powerful technological platform for studying proteins
in various biological contexts and has a prominent role in identifying
and elucidating (patho)physiological processes.[1,2] Using
strategies ranging from detecting proteins in their intact form (“top-down”
proteomics) to analyzing proteins by means of peptides released through
proteolysis (“bottom-up” proteomics), this platform
has opened up and expanded opportunities to study proteins, for example,
by profiling proteomes, characterizing proteins, quantifying proteins,
and by studying protein–protein interactions.[3] As a result of ongoing advances, proteomics has become
a tool capable of delivering answers to key biological questions,
and its role in basic and applied science will likely expand in the
coming decade(s).[2,4]Sample preparation strategies
for bottom-up proteomics experiments
encompass a protein digestion procedure using proteolytic enzymes
(e.g., trypsin, endoproteinase LysC) in order to release peptides
which can then be analyzed by liquid chromatography–mass spectrometry
(LC-MS).[3] In more simple protocols, proteins
are digested directly, though digestion is often preceded by a protein
denaturation procedure (e.g., disulfide bond reduction and subsequent
cysteine alkylation) to enhance digestion efficiency.[5,6] With such an approach, often referred to as “in-solution
digestion” (ISD), any compound present in a sample or added
during sample preparation will be injected into the LC-MS instrument.[7] Since researchers often use chemicals that are
not compatible with digestion and/or LC-MS detection (e.g., detergents,
chaotropes) to improve the performance of their workflow,[7−11] several contaminant removal procedures have been devised which are
mostly based on protein precipitation and gel- or centrifugal filter-aided
sample cleanup.[7,12−16] All of these different methods have specific advantages
yet also exhibit (protocol-specific) biases.[5,8−11,17,18] The selection of sample preparation methods thereby influences the
subset of proteins that can be reliably identified and/or quantified
by LC-MS and thus is a determining factor for the potential outcomes
of a proteomics experiment.When designing a proteomics experiment,
previously published projects
on the same type of starting material (and with comparable aims) may
form the basis of rational sample preparation method selection. However,
such studies are not readily available for any type of material and
experiment. Proteomics is for example an upcoming research line in
head and neck cancer,[19,20] and currently only a few studies
can be referred to for assessing the applicability of sample preparation
methods. Admittedly, most head and neck tissues are (lympho)epithelial
tissues sharing structural features to some extent, yet basing workflow
selection-related decisions on such an assumption may be risky.Here we describe a comparison of in-gel digestion, in-solution
digestion, on-filter digestion, and on-pellet digestion sample preparation
methodologies that are commonly used in LC-MS-based proteomics. For
this study, we selected three human tissues originating from the head
and neck area (i.e., nasal polyps, parotid gland, and palatine tonsils)
thereby aiming to cover the diversity of (solid) tissues that can
be encountered within a medical discipline, in this case otorhinolaryngology.
The methods were compared on the basis of their performance in discovery
proteomics experiments as well as in targeted proteomics on the basis
of a QconCAT (quantification concatamers) multiple reaction monitoring
method targeting a set of mitochondrial proteins.[21] Methods were compared on the basis of peptide and protein
losses, precision of quantification, discovery potential, and the
distribution of selected physicochemical properties (e.g., size, charge
characteristics, and hydrophobicity) of identified proteins and peptides.
In addition, we compared distributions of physicochemical properties
for detected proteins and peptides to corresponding distributions
of potentially present proteins (as predicted from the human proteome)
and peptides (as predicted from the identified proteins in the specific
tissues) thereby aiming to identify (protocol-specific) biases. With
our work, we aim to assess sample preparation bias in proteomics experiments,
to support the rationale of selecting sample preparation methods based
on a fit-for-purpose evaluation, and to provide leads for expanding
the detection capabilities of mass spectrometry-based proteomics workflows.
Experimental
Section
Detailed descriptions of the materials and methods
used for this
study are included in the Supporting Information, whereas concise descriptions of the materials and methods are presented
below.
Tissue Samples
Three different otolaryngeal tissues
(i.e., nasal polyps, parotid gland, and palatine tonsil, see Table S-1 in the Supporting Information) were
obtained separately from three patients who underwent head and neck
surgery at the University Medical Center Groningen. Immediately after
resection, tissues were sliced into pieces of approximately 30 mm,[3] snap frozen in liquid nitrogen, and stored at
−80 °C until further processing. The study could be carried
out under section 7:467 of the Dutch Civil Code as patients gave permission
to use the tissues which were regarded as residual materials after
surgery and which furthermore cannot be traced back to the patients.
Tissue Homogenization and Protein Extraction
Tissue
was pulverized using a CryoMill cryogenic grinder and suspended in
0.1% RapiGest in 50 mM ammonium bicarbonate (ABC) or sodium dodecyl
sulfate (SDS)/urea lysis buffer (2% SDS, 8 M urea and 100 mM β-mercapto-ethanol
in 50 mM Tris/HCl buffer, pH 7.6) at a final tissue concentration
of 30 mg/mL. The suspensions were vortex-mixed for 5 min and subjected
to three freeze/thaw cycles. Upon another 5 min of vortex-mixing and
pelleting debris via centrifugation (10 min; 14 000g), final lysates were collected. Protein concentration
was determined using the micro bicinchoninic acid (BCA) assay, and
lysates were stored at −80 °C until analysis.
In-Solution
Digestion (ISD)
A volume of RapiGest protein
extract corresponding to 20 μg of total protein was diluted
to 40 μL with ABC. Proteins were reduced in 10 mM dithiothreitol
(DTT) (30 min; 60 °C) and alkylated in the dark in 20 mM iodoacetamide
(IAM) (30 min; 25 °C). After quenching unreacted IAM with a 0.5
molar excess of DTT (30 min; 25 °C), trypsin was added in a final
proteinase-to-protein ratio of 1:20, and the proteins were digested
overnight (37 °C). Digestion was stopped and RapiGest was hydrolyzed
through addition of formic acid (FA) in Milli-Q water (H2O), and the final peptide mixture was obtained after pelleting debris
via centrifugation (10 min; 14 000g).
On-Pellet
Digestion (OPD)
SDS/urea protein extract
containing 20 μg of protein was diluted to 25 μL with
ABC, and proteins were precipitated through addition of 50 μL
of ice-cold 100% acetone and two 50 μL aliquots of ice-cold
85% acetone followed by centrifugation (5 min; 4 °C; 14 000g). The supernatant was removed, and the precipitation step
was repeated. After removing the supernatant of the second precipitation
step, the pellet was left to dry by air. Subsequently, proteins were
solubilized via pretrypsination in 25 μL of ABC with a final
proteinase-to-protein ratio of 1:50 (4 h; 37 °C). Proteins were
reduced with 10 mM DTT and were alkylated in the dark with 20 mM IAM.
After quenching unreacted IAM with DTT, trypsin was added in a final
proteinase-to-protein ratio of 1:20, and the proteins were digested
overnight. Digestion was stopped through addition of FA, and the final
peptide mixture was obtained after pelleting debris.
In-Gel Digestion
(IGD)
The in-gel digestion protocol
was based on the “In-Gel Digestion and Sample Cleanup”
protocol, as described previously in Wolters et al.[21] Briefly, SDS/urea protein extract containing 20 μg
of protein was diluted to 15 μL with ABC, mixed with 5 μL
of NuPAGE LDS Sample Buffer 4×, and the sample was boiled for
2 min. After the sample was cooled to room temperature, it was loaded
onto a NuPAGE 4–12% Bis-Tris Protein Gel, and electrophoresis
was carried out at 100 V for only 5 min. Proteins were localized by
staining the gel with Bio-Safe Coomassie Blue G-250 stain overnight,
and unbound dye was washed away with repeated washes with H2O. The stained protein band was excised, sliced in 2 × 2 mm
pieces, and destained via repeated washes with 30% acetonitrile (ACN)
in ABC (15 min; 25 °C). Gel pieces were dehydrated upon washing
with 50% ACN in ABC (15 min; 25 °C) and 100% ACN (5 min; 25 °C)
followed by drying in an oven at 37 °C. Next, proteins were reduced
in 10 mM DTT and, after discarding the DTT solution, alkylated in
the dark in 20 mM IAM. Remaining IAM was discarded, and the gel pieces
were dehydrated as described above. Subsequently, gel pieces were
reswollen on ice following dropwise addition of 25 μL ABC containing
trypsin in a final proteinase-to-protein ratio of 1:20, and the proteins
were digested overnight. After digestion, the residual liquid was
collected and remaining peptides were extracted in 25 μL of
5% FA in 75% ACN (20 min; 25 °C). After combining the two volumes,
peptides were dried in a CentriVap vacuum concentrator (Labconco)
at 45 °C, and the residue was reconstituted in 0.1% FA to obtain
the final peptide mixture.
On-Filter Digestion (OFD)
For on-filter
digestion,
the SDS/urea protein extract was processed according to the “FASP
II” protocol, as described previously by Wisniewski et al.,[15] with minor modifications. Briefly, an amount
of SDS/urea protein extract corresponding to 20 μg of protein
was diluted with urea solution (8 M urea in 0.1 M Tris/HCl, pH 8.5)
to 200 μL and was loaded onto a Microcon Ultracel YM-30 filtration
device. After centrifugation (15 min; 14 000g), the concentrate was diluted with 200 μL of urea solution
and was centrifuged again. Next, 100 μL of 50 mM IAM in urea
solution was added to the concentrate, the sample was mixed briefly
(1 min; 25 °C), and proteins were alkylated in the dark. After
centrifugation, the concentrate was diluted with 100 μL of urea
solution and was centrifuged again. This step was repeated twice.
Subsequently, the concentrate was diluted with 100 μL of ABC
and was centrifuged. After this second wash step was repeated twice,
40 μL of ABC containing trypsin in a final proteinase-to-protein
ratio of 1:20 was added to the filter, the sample was mixed briefly,
and proteins were digested overnight in a wet chamber. Peptides were
collected by centrifuging the filter unit followed by an additional
elution (centrifugation) step with 50 μL ABC. After combining
the two volumes, peptides were dried in a CentriVap vacuum concentrator
(Labconco) at 45 °C, and the residue was reconstituted in 0.1%
FA to obtain the final peptide mixture.
Targeted LC-MS/MS Analysis
Targeted proteomics analyses
were performed using a TSQ Vantage Triple Quadrupole mass spectrometer
using multiple reaction monitoring (MRM) transitions and settings
that have been described previously.[21] Peptide
separation was achieved with an UltiMate 3000 RSLC UHPLC system on
a 50 cm Acclaim PepMap RSLC C18 analytical column (2 μm, 100
Å, 75 μm i.d. × 500 mm) which was kept at 40 °C.
For targeted analyses, the final peptide mixtures were spiked with
predigested QconCAT (quantification concatamers; designed to target
a set of mitochondrial proteins, details have been described previously)[21] at a level of 1.25 ng per μg of total
protein. A sample volume corresponding to 1 μg of total protein
(based on the micro BCA assay) was loaded onto a Acclaim PepMap100
C18 trap column (5 μm, 100 Å, 300 μm i.d. ×
5 mm) using μL-pickup with 0.1% FA in H2O at 20 μL/min.
Subsequently, peptides were separated on the analytical column using
a 100 min linear gradient from 3 to 60% eluent B (0.1% FA in ACN)
in eluent A (0.1% FA in H2O) at 200 nL/min.
Shotgun LC-MS/MS
Analysis
Shotgun proteomics analyses
were performed using an UltiMate 3000 RSLC UHPLC system connected
to an Orbitrap Q Exactive Plus mass spectrometer operating in the
data-dependent acquisition (DDA) mode. A sample volume corresponding
to 1 μg of total protein (based on the micro BCA assay) was
injected onto a Acclaim PepMap100 C18 trap column (vide supra) using
μL-pickup with 0.1% FA in H2O at 20 μL/min.
Peptides were separated on a 50 cm Acclaim PepMap RSLC C18 analytical
column (vide supra) which was kept at 40 °C, using a 117 min
linear gradient from 3 to 40% eluent B (0.1% FA in ACN) in eluent
A (0.1% FA in H2O) at a flow rate of 200 nL/min. For DDA,
survey scans from 300 to 1650 m/z were acquired at a resolution of 70 000 (at 200 m/z) with an AGC target value of 3 × 106 and a maximum ion injection time of 50 ms. From the survey
scan, a maximum number of 12 of the most abundant precursor ions with
a charge state of 2+ to 6+ were selected for
higher energy collisional dissociation (HCD) fragment analysis between
200 and 2000 m/z at a resolution
of 17 500 (at 200 m/z) with
an AGC target value of 5 × 104, a maximum ion injection
time of 50 ms, a normalized collision energy of 28%, an isolation
window of 1.6 m/z, an underfill
ratio of 1%, an intensity threshold of 1 × 104, and
the dynamic exclusion parameter set at 20 s.
Data Processing
Raw data for the targeted proteomics
analyses were processed using the Skyline software and were furthermore
analyzed using Microsoft Excel (more details on processing of targeted
proteomics data have been published previously).[21] Shotgun proteomics data were processed using PEAKS Studio
software,[22] and a detailed overview of
applied PEAKS search criteria is included in Method S-8 (Supporting Information). Label-free quantification
using ion counts was performed on the basis of the results of the
principal PEAKS search followed by further filtering and processing
of the data using an in-house developed script in R and R Studio.
With respect to peptide quantification, peptide areas were summed
for all peptides with the same primary amino acid sequence after removing
PTMs and independently of the charge states. For protein quantification,
areas of peptides belonging to the same protein group were summed,
yet only if they were unique for the corresponding protein group.
For both peptide and protein quantification, DDA data was scaled by
median scale normalization.[23]
Bioinformatics
Analysis
Data analysis and visualization
was performed using R, R studio, Microsoft Excel, and GraphPad Prism.
For evaluation of the physicochemical properties of proteins and peptides,
the R “Peptides” and “ggplot2” packages
were employed for, respectively, calculating and visualizing corresponding
data.
Results
Relative Losses of Peptides and Proteins
Method-induced
losses were evaluated on the basis of peptides and proteins that were
quantified in all 20 replicates (four methods, five replicates per
method) per tissue. Average levels were calculated for each method,
the highest observed average level was set to 100%, and the other
three average levels were related to the highest average level, which
gave the relative average peptide and protein levels (see Figure ). For the QconCAT-multiple
reaction monitoring (MRM) experiments, digested QconCATs (with 13C/15N-labeled arginines and lysines) were added
in fixed amounts to the samples prior to LC-MS analysis to compare
peptide losses (yet also methodological variation) for the different
methods.
Figure 1
Assessment of method-induced losses of peptides as quantified by
(a) MRM and (b) DDA and (c) proteins as quantified by DDA for the
different tissues and the pooled samples. For visualization purposes,
levels are expressed as percentage of the highest observed average
level for each peptide. For every tissue and for pooled sample analysis,
statistically significant differences (p < 0.05,
two-tailed Wilcoxon rank-sum test; performed on the absolute average
levels) were found between all methods, unless specified otherwise
in the figure. Corresponding descriptive statistics are presented
in Table S-2 (Supporting Information).
Assessment of method-induced losses of peptides as quantified by
(a) MRM and (b) DDA and (c) proteins as quantified by DDA for the
different tissues and the pooled samples. For visualization purposes,
levels are expressed as percentage of the highest observed average
level for each peptide. For every tissue and for pooled sample analysis,
statistically significant differences (p < 0.05,
two-tailed Wilcoxon rank-sum test; performed on the absolute average
levels) were found between all methods, unless specified otherwise
in the figure. Corresponding descriptive statistics are presented
in Table S-2 (Supporting Information).For all tissues, the largest losses
were observed for IGD with
(median relative average) peptide and protein levels of 27–40%
as shown in Figure . This figure furthermore shows that the smallest losses were typically
observed for ISD, with the exception of the palatine tonsil MRM experiment
and all experiments targeting the parotid gland. For the latter tissue,
OFD yielded the highest peptide and protein levels (together with
OPD), and this method furthermore gave similar (DDA) or higher (MRM)
peptide levels for palatine tonsils compared to ISD. However, OFD’s
protein losses for the latter tissue and also the losses of peptides
(both DDA and MRM) and proteins for nasal polyps were considerably
larger compared to ISD, as demonstrated by the 16% (MRM) and 9% (DDA)
lower peptide levels as well as the 27% lower protein levels for this
tissue. Moreover, Figure shows that OPD featured losses comparable to those of OFD
for nasal polyps and parotid gland (15–29% and 3–6%
for OPD versus 16–27% and 2–7% for OFD), yet OPD performed
less well in the experiments targeting the palatine tonsils with OPD’s
levels being around two-thirds of the corresponding levels for ISD
and OFD.In summary, IGD’s peptide and proteins levels
were around
three times lower compared to the other three methods. ISD and OFD
generally performed best in terms of peptide and protein losses, although
both methods featured markedly increased losses in case of one of
the three tissues (i.e., parotid gland for ISD and nasal polyps for
OFD). Conversely, OPD gave the highest peptide and protein levels
for one of the three tissues (i.e., parotid gland) whereas considerable
losses were observed for the other two.
Precision of Peptide and
Protein Quantification
To
assess methodological precision, peptides and proteins that were quantified
in all 20 replicates (four methods, five replicates per method) per
tissue were included. Relative standard deviations (RSDs) were calculated
using the five replicates per method, and data were visualized in
beeswarm plots (MRM experiments) or RSD relative frequency polygon
plots (discovery proteomics experiments) (see Figure ). For the QconCAT-MRM experiments, digested
QconCATs were added in a fixed amount to the samples before LC-MS
analysis (as described in the section above), and for the discovery
proteomics experiments, data were normalized following median scale
normalization.[23] Plots for the non-normalized
data are shown in Figure S-1 (Supporting Information).
Figure 2
Assessment of methodological precision of peptide (as measured
by (a) MRM and (b) DDA) and (c) protein (as measured by DDA) quantification
for the different tissues and for the pooled samples. For every tissue
and for pooled sample analysis, statistically significant differences
(p < 0.05, two-tailed Wilcoxon rank-sum test)
were found between all methods, unless specified otherwise in the
figure. Discovery proteomics data were normalized by median scale
normalization, though plots for non-normalized data are included in
Figure S-1 (Supporting Information). Descriptive
statistics for the data is in this figure are presented in Table S-3 (Supporting Information).
Assessment of methodological precision of peptide (as measured
by (a) MRM and (b) DDA) and (c) protein (as measured by DDA) quantification
for the different tissues and for the pooled samples. For every tissue
and for pooled sample analysis, statistically significant differences
(p < 0.05, two-tailed Wilcoxon rank-sum test)
were found between all methods, unless specified otherwise in the
figure. Discovery proteomics data were normalized by median scale
normalization, though plots for non-normalized data are included in
Figure S-1 (Supporting Information). Descriptive
statistics for the data is in this figure are presented in Table S-3 (Supporting Information).In the targeted proteomics experiments, variability
introduced
by the LC-MS system itself, as determined by five repeated injections
of a pooled sample, was similarly low for all four methods (median
RSDs ranging from 2.3% to 3.3%) as shown in Figure a. Variability due to the upstream sample
preparation steps was furthermore consistently low for IGD and OFD
with (median) RSDs of 8–10% and 6–9%, respectively.
ISD exhibited similar RSDs though with exception of the nasal polyps
experiment for which an RSD of 12% was observed. RSDs around 12% were
also observed for OPD in the parotid gland and palatine tonsil samples,
yet an up to two times increased RSD (25%) was found for nasal polyps.
Thereby, OPD featured rather moderate precision of peptide quantification
in the MRM experiments, whereas good precision in all three tissues
was observed for IGD and OFD and good precision in two out of the
three tissues for ISD.For the discovery proteomics analyses,
variability introduced by
the LC-MS system was higher compared to the MRM measurements with
(median) peptide RSDs of 5.7–9.5% (see Figure b) and protein RSDs of 14.5–18.9%
(see Figure c). For
peptide quantification, additional variability, as introduced by the
sample preparation methods, led to minor RSD increases (2–5%)
in all experiments, except for ISD in the nasal polyps experiment
for which an RSD increment of 7% was observed. Corresponding variability
for protein quantification also revealed minor RSD increases for ISD,
OFD, and OPD (3–6%, 0–4%, and 2–2%, respectively)
whereas slightly higher increases (6–9%) were observed for
IGD. In terms of overall variability, Figure c shows that precision for peptide quantification
was rather comparable for the four methods, and only IGD in the parotid
gland experiment gave considerably higher RSDs compared to the other
three methods. Moreover, Figure c shows that protein quantification (based on the sum
of the areas of unique peptides belonging to the same protein group)
was generally less precise than peptide quantification, and IGD furthermore
featured the highest RSDs for all tissues. With respect to these increases,
it should, however, be noted that (for any approach) RSDs increased
with decreasing protein and peptide quantities (see Figure S-2 in
the Supporting Information). The larger
losses for IGD should thus be considered as an (at least partial)
explanation for the greater methodological imprecision observed for
IGD.On a final note, precision data for the discovery proteomics
experiments
were influenced to various degrees by the median scale normalization
procedure (see Figure S-1 and the Tables S-3 and S-4 in the Supporting Information). In case of ISD and OFD,
relative standard deviations were rather unaffected by this normalization
procedure, though this procedure led to some improvements in methodological
precision for OPD and even larger improvements for IGD.
Discovery Potential
The total number and the overlap
of identifications were assessed for peptides (see Figure a) and proteins (see Figure b) that were identified
in at least three of the five replicates for the different tissues.
Peptides and proteins identified in at least four and five out of
five replicates resulted in, respectively, around 20% and 40% fewer
peptide identifications as well as 15% and 30% fewer protein identifications
(see Figures S-3 and S-4 in the Supporting
Information).
Figure 3
Discovery potential of the different sample preparation
approaches.
Venn diagrams of (a) peptides and (b) proteins identified in at least
three out of the five replicates per sample preparation method for
the different tissues. Venn diagrams displaying the distribution of
peptides and proteins identified in at least four out of five and
five out of five replicates for the different tissues as well as those
identified in the pooled samples are shown in the Figures S-3–S-5
(Supporting Information). Percentage of
peptides identified in the pooled samples containing (c) 0, 1, and
2 or 3 missed cleavages; (d) oxidized methionine residues (relative
to the number of methionine-carrying peptides); (e) deamidated asparagine
and/or glutamine residues (relative to the number of asparagine- and/or
glutamine-carrying peptides); and (f) carbamidomethylated (CAM) cysteine
residues (relative to the total number of peptides).
Discovery potential of the different sample preparation
approaches.
Venn diagrams of (a) peptides and (b) proteins identified in at least
three out of the five replicates per sample preparation method for
the different tissues. Venn diagrams displaying the distribution of
peptides and proteins identified in at least four out of five and
five out of five replicates for the different tissues as well as those
identified in the pooled samples are shown in the Figures S-3–S-5
(Supporting Information). Percentage of
peptides identified in the pooled samples containing (c) 0, 1, and
2 or 3 missed cleavages; (d) oxidized methionine residues (relative
to the number of methionine-carrying peptides); (e) deamidated asparagine
and/or glutamine residues (relative to the number of asparagine- and/or
glutamine-carrying peptides); and (f) carbamidomethylated (CAM) cysteine
residues (relative to the total number of peptides).The highest numbers of peptides were identified
for ISD and OPD,
whereas 10–20% fewer peptide identifications were observed
for IGD and OFD. Most identified proteins were observed for ISD and
OPD in nasal polyps and parotid gland, though 10% fewer identifications
for OPD were observed in palatine tonsils. Furthermore, the 10–20%
fewer peptide identifications for IGD and OFD corresponded to 5–10%
fewer proteins identified for OFD and notably to 20–30% fewer
protein identifications for IGD. The latter observation should be
evaluated in the context of IGD’s peptide and protein losses
and the approximately three times lower peptide and protein levels
observed for IGD compared to the other three methods (see Figure ); however, the effect
of triplicating the injection volume for IGD revealed modest increases
in peptide and protein identifications of 11% and 12%, respectively
(see Figure S-6 in the Supporting Information).To zoom in further on the qualitative performance of the
methods,
trypsin digestion efficiency and the abundance of selected post-translational
modifications (PTMs) and/or sample preparation artifacts were assessed.
The proportion of peptides displaying zero missed cleavages was 95%,
89%, 93%, and 94% for IGD, ISD, OFD, and OPD, respectively (see Figure c). For ISD, 10%
of the peptides contained one missed cleavage as compared to 5–6%
for the other methods, and only one percent (or less) of the peptides
exhibited two or more missed cleavages. Moreover, methionine-containing
peptides were more frequently oxidized (see Figure d) and asparagine- and/or glutamine-containing
peptides more frequently deamidated (see Figure e) in IGD compared to ISD, OFD, and OPD (31%
versus 4–8% and 17% versus 7–10%, respectively). Other
modifications were assessed as well (see Figure S-7 in the Supporting Information) revealing considerable
overalkylation in all samples (up to 2.4% for OFD and 3.1% for OPD),
lysine and N-terminal carbamylation of around 1% in IGD, and protein
N-terminal acetylation of 0.7–1.1% for the studied methods.The degree and extent of cysteine carbamidomethylation was studied
more closely due to the absence of a distinct reduction step prior
to thiol alkylation in the original (and also in newer versions of
the) filter-aided sample preparation (FASP) protocol, which forms
the basis of the applied OFD protocol. For all methods, cysteine carbamidomethylation
was rather complete (see Figure S-8A in the Supporting Information), yet only 8% of the peptides identified for OFD
contained cysteine residues compared to 15% for IGD and 14% for both
ISD and OPD (see Figure f). The occurrence of the other 19 amino acids were evaluated as
well (see the Figures S-8B and S-8c in the Supporting Information), though relevant differences were only observed
for cysteine in case of the OFD approach.
Peptide and Protein Characteristics
The distribution
of peptides and proteins according to their molecular weight (MW),
isoelectric point (pI), and hydrophobicity (as expressed by the grand
average of hydropathy (GRAVY) scale using the method of Kyte and Doolittle[24]) were evaluated for all sample preparation methods.
For proteins, distributions according to the three physicochemical
characteristics were rather similar (see Figure ); however for IGD, the distributions for
MW feature modest shifts toward larger proteins (see Figure a), and the proportion of acidic
proteins (pH ± 5) appears to be lower compared with other approaches
(see Figure b). In
comparison with the expected distributions based on all proteins present
in the human reference proteome (i.e., UniProtKB Homo
sapiens UP000005640, canonical with 70 956
entries; represented by the straight lines in Figure ), relatively fewer small and basic proteins
were detected by the different methods (see Figure a,b). Furthermore, the distributions of GRAVY
scores for observed proteins were slightly narrower compared to the
corresponding distribution of all proteins present in the reference
proteome (see Figure c).
Figure 4
Distribution of identified proteins according to (a) molecular
weight, (b) pI, and (c) hydrophobicity (GRAVY) based on proteins identified
in three out of five replicates for the pooled samples. Graphs include
(colored) lines for the different methods as well as lines for the
theoretical distributions of all proteins present in the human reference
proteome (straight line) and the distributions of all proteins detected
in any of the pooled samples (dashed line). Corresponding plots for
the different tissues are shown in the Figures S-9–S-11 (Supporting Information).
Distribution of identified proteins according to (a) molecular
weight, (b) pI, and (c) hydrophobicity (GRAVY) based on proteins identified
in three out of five replicates for the pooled samples. Graphs include
(colored) lines for the different methods as well as lines for the
theoretical distributions of all proteins present in the human reference
proteome (straight line) and the distributions of all proteins detected
in any of the pooled samples (dashed line). Corresponding plots for
the different tissues are shown in the Figures S-9–S-11 (Supporting Information).Regarding the physicochemical properties of the detected
peptides,
corresponding distributions were also rather comparable for the different
methods (see Figure ). However, relatively more acidic peptides (pI ± 4) were observed
for OFD (see Figure b) and the MW distribution for IGD featured a minor shift toward
smaller peptides (see Figure a). Differences were also observed when comparing the distributions
of the four methods to those of in silico predicted
tryptic peptides derived from all proteins present in the above-mentioned
reference proteome (straight black lines in Figure ) and undetected (in silico predicted tryptic) peptides from the proteins that were actually
detected in the specific tissue samples (dash-dot lines in Figure ). Notably, the MW
distributions of peptides for the four methods were smaller and shifted
toward larger peptides (see Figure a), and the GRAVY distributions featured modest shifts
toward positive scores (more hydrophobic peptides) compared with the
undetected peptides (see Figure c). In addition, the peptide pI distributions for all
four methods indicate an underrepresentation of peptides with a pI
around 8.5 (see Figure b), which thus include peptides having their lowest solubility around
the pH value of the digestion buffer used in this study (i.e., 50
mM ammonium bicarbonate, pH ± 8.3).
Figure 5
Distribution of identified
peptides according to (a) molecular
weight, (b) pI, and (c) hydrophobicity (GRAVY) based on peptides identified
in three out of five replicates for the pooled samples. Graphs include
(colored) lines for the different methods as well as lines for the
theoretical distributions of peptides derived from all proteins present
in the human reference proteome (straight line), distributions of
all peptides detected in any of the pooled samples (dashed line),
and theoretical distributions of undetected peptides (at least five
amino acids in length) derived from all proteins detected in any of
the pooled samples (dash-dot line). Corresponding plots for the different
tissues are shown in the Figures S-12–S-14 (Supporting Information).
Distribution of identified
peptides according to (a) molecular
weight, (b) pI, and (c) hydrophobicity (GRAVY) based on peptides identified
in three out of five replicates for the pooled samples. Graphs include
(colored) lines for the different methods as well as lines for the
theoretical distributions of peptides derived from all proteins present
in the human reference proteome (straight line), distributions of
all peptides detected in any of the pooled samples (dashed line),
and theoretical distributions of undetected peptides (at least five
amino acids in length) derived from all proteins detected in any of
the pooled samples (dash-dot line). Corresponding plots for the different
tissues are shown in the Figures S-12–S-14 (Supporting Information).
Discussion
Various sample preparation methods have
been described for bottom-up
proteomics experiments targeting (solid) tissues, and a wide range
of modifications to these methods can also be found in literature.[7,12,13] The most straightforward methods
involve direct (in-solution) digestion of proteins without distinct
procedures to remove contaminants including detergents, chaotropes,
lipids, and nucleic acids.[7,9,10] In our study, we show that such an in-solution digestion (ISD) approach
is a good option for quantitative proteomics featuring limited losses
and good precision for peptide and protein quantification on the basis
of simple and highly automatable workflows. ISD furthermore gave the
highest numbers of identified peptides and proteins in the discovery
proteomics experiments and did not exhibit a bias regarding amino
acid composition or physicochemical properties of identified peptides
and proteins, as compared with other methods. However, it is important
for direct digestion approaches that samples are sufficiently “clean”,
and we did observe column contamination leading to carryover and shifting
retention times, which was particularly an issue for the targeted
(timed MRM) experiments. In addition, we observed increased proportions
of miscleaved peptides in the ISD samples which can likely be attributed
to their lower degree of purity.[25] Moreover,
chemicals used in ISD workflows need to be compatible with proteolytic
digestion as well as LC-MS detection, and, for example, detergents
which are often used in proteomics workflows to solubilize proteins
(e.g., SDS, NP-40, and CHAPS), are not compatible with mass spectrometric
detection.[7−11] MS-compatible alternatives, however, do exist (e.g., PPS Silent
Surfactant, ProteaseMAX, Invitrosol, and RapiGest SF, which was used
in our study), yet the noncompatible detergents are still mostly used
thus requiring appropriate procedures to remove these compounds prior
to LC-MS analysis.[26,27]Common methods for detergent
removal are based on precipitating
proteins with acid (e.g., trichloroacetic acid) or organic solvents
(e.g., acetone, which was used in our study for the on-pellet digestion
method) while keeping detergents in solution, or by trapping proteins
in gels or onto centrifugal filters allowing the separation of proteins
from contaminants.[7,12−16] These approaches lead to cleaner samples compared
to ISD, which we also observed in our study as corresponding samples
did not lead to noticeable carryover or retention time shifts. These
approaches are, however, prone to induce considerable protein losses,
which we found were most relevant for the in-gel digestion (IGD) method,
which is a rather labor-intensive method featuring many steps during
which losses may occur. Despite these losses, IGD enabled efficient
contaminant removal and detection of considerable numbers of proteins
and peptides. Good precision was furthermore achieved in both targeted
and discovery experiments. However, enabling precise (label-free)
quantification in the discovery experiments required (median scale)
normalization of the data, which was likely due to the lower amounts
of material that were eventually analyzed by LC-MS.The on-pellet
digestion (OPD) method is comparable to ISD with
regard to its simplicity and high-throughput capabilities, yet also
based on its performance for the nasal polyps and parotid gland samples
in terms of the numbers of identifications, losses, and precision
of quantification. However, median scale normalization of the data
was also required for OPD to enable precise quantification in the
discovery experiments. In the palatine tonsil experiments, losses
were considerably larger for OPD and also relatively fewer proteins
were identified. Accordingly, OPD’s reduced performance for
this tissue highlights that one method may not always be performing
optimally for just any type of tissue and that furthermore the outcome
of a comparative study of sample preparation methods depends greatly
on the selected tissue(s).One of the most widely used sample
preparation methods in present-day
proteomics research is the “FASP” method which relies
on an on-filter sample cleanup and protein digestion protocol and
furthermore features considerable high-throughput capabilities.[15,28] In our study, we have tested on-filter digestion (OFD) on the basis
of the original “FASP II” protocol[15] which showed limited losses (comparable with ISD), good
precision in both targeted and discovery proteomics experiments, and
high numbers of identified peptides and proteins, which were only
somewhat lower compared to ISD and OPD. With respect to the latter,
we observed a significant (negative) bias for OFD regarding the identification
of cysteine-containing peptides. Even though our tissue lysates did
contain a reducing agent, the absence of a distinct reduction step
in the OFD protocol prior to thiol alkylation may have led to this
bias. This artifact likely affected the numbers of identifications
negatively, and it would thus be advised to assess the recovery of
cysteine-containing peptides when using OFD or to consider including
a distinct reduction step in the protocol.
Conclusions
Every
method has its specific advantages and challenges (e.g.,
the absence of a sample cleanup procedure in the ISD protocol, the
relatively large losses for IGD or the rather varying losses for OPD,
and the risk of losing cysteine-containing peptides with OFD, as observed
in our study), and for all methods, numerous alternative protocols
exist in literature which address these, and other challenges thereby
resulting in optimized protocols, often for specific applications.
With our study, we could not possibly grasp the full range of available
methods and variants, nor could we draw any hard, general conclusions
regarding the performances of the four methods included our study.
In fact, our study shows that a method’s performance is depending
on the type of sample being studied, and the outcomes of our comparative
study could have been different if only one of the three tissues was
included, and likely even so if three other tissues had been included.
It may furthermore be speculated that if a different detection principle
(e.g., data independent acquisition, DIA) had been employed for our
study, other differences, nuances, or outcomes could have been revealed.
Nonetheless, our data do show the relevance of selecting the most
suitable protocol for an experiment based on a fit-for-purpose evaluation
rather than just using the same method for every type of sample. In
addition, we also show that peptides and proteins detected with the
four methods share similar distributions of physicochemical characteristics,
which in turn are considerably different from those of potentially
present proteins (as predicted from the human proteome) and peptides
(as predicted from the identified proteins). Accordingly, efforts
to improve the detection capabilities of proteomics workflows, for
example by improving the detectability of currently undetected peptides,
are needed to increase the potential of proteomics research.
Authors: Kim Kultima; Anna Nilsson; Birger Scholz; Uwe L Rossbach; Maria Fälth; Per E Andrén Journal: Mol Cell Proteomics Date: 2009-07-12 Impact factor: 5.911
Authors: Tieneke B M Schaaij-Visser; Ruud H Brakenhoff; C René Leemans; Albert J R Heck; Monique Slijper Journal: J Proteomics Date: 2010-02-04 Impact factor: 4.044
Authors: Michael J Doerksen; Robert S Jones; Michael W H Coughtrie; Abby C Collier Journal: Eur J Drug Metab Pharmacokinet Date: 2020-12-19 Impact factor: 2.441
Authors: Michael F Sharpnack; Nilini Ranbaduge; Arunima Srivastava; Ferdinando Cerciello; Simona G Codreanu; Daniel C Liebler; Celine Mascaux; Wayne O Miles; Robert Morris; Jason E McDermott; James L Sharpnack; Joseph Amann; Christopher A Maher; Raghu Machiraju; Vicki H Wysocki; Ramaswami Govindan; Parag Mallick; Kevin R Coombes; Kun Huang; David P Carbone Journal: J Thorac Oncol Date: 2018-07-11 Impact factor: 15.609
Authors: Rose Waldron; Jamie McGowan; Natasha Gordon; Charley McCarthy; E Bruce Mitchell; David A Fitzpatrick Journal: PLoS One Date: 2019-05-01 Impact factor: 3.240
Authors: Keesha E Erickson; Oleksii S Rukhlenko; Md Shahinuzzaman; Kalina P Slavkova; Yen Ting Lin; Ryan Suderman; Edward C Stites; Marian Anghel; Richard G Posner; Dipak Barua; Boris N Kholodenko; William S Hlavacek Journal: PLoS Comput Biol Date: 2019-01-17 Impact factor: 4.475
Authors: Andres Gil; Wenxuan Zhang; Justina C Wolters; Hjalmar Permentier; Theo Boer; Peter Horvatovich; M Rebecca Heiner-Fokkema; Dirk-Jan Reijngoud; Rainer Bischoff Journal: Anal Bioanal Chem Date: 2018-07-02 Impact factor: 4.142
Authors: Dmitrii A Luzik; Olga N Rogacheva; Sergei A Izmailov; Maria I Indeykina; Alexei S Kononikhin; Nikolai R Skrynnikov Journal: Sci Rep Date: 2019-12-27 Impact factor: 4.379