Joshua J Kellogg1, Tyler N Graf1, Mary F Paine2, Jeannine S McCune3, Olav M Kvalheim4, Nicholas H Oberlies1, Nadja B Cech1. 1. Department of Chemistry & Biochemistry, University of North Carolina at Greensboro , Greensboro, North Carolina 27412, United States. 2. College of Pharmacy, Washington State University , Spokane, Washington 99202, United States. 3. Department of Pharmaceutics, University of Washington , Seattle, Washington 99202, United States. 4. Department of Chemistry, University of Bergen , Bergen 5020, Norway.
Abstract
A challenge that must be addressed when conducting studies with complex natural products is how to evaluate their complexity and variability. Traditional methods of quantifying a single or a small range of metabolites may not capture the full chemical complexity of multiple samples. Different metabolomics approaches were evaluated to discern how they facilitated comparison of the chemical composition of commercial green tea [Camellia sinensis (L.) Kuntze] products, with the goal of capturing the variability of commercially used products and selecting representative products for in vitro or clinical evaluation. Three metabolomic-related methods-untargeted ultraperformance liquid chromatography-mass spectrometry (UPLC-MS), targeted UPLC-MS, and untargeted, quantitative 1HNMR-were employed to characterize 34 commercially available green tea samples. Of these methods, untargeted UPLC-MS was most effective at discriminating between green tea, green tea supplement, and non-green-tea products. A method using reproduced correlation coefficients calculated from principal component analysis models was developed to quantitatively compare differences among samples. The obtained results demonstrated the utility of metabolomics employing UPLC-MS data for evaluating similarities and differences between complex botanical products.
A challenge that must be addressed when conducting studies with complex natural products is how to evaluate their complexity and variability. Traditional methods of quantifying a single or a small range of metabolites may not capture the full chemical complexity of multiple samples. Different metabolomics approaches were evaluated to discern how they facilitated comparison of the chemical composition of commercial green tea [Camellia sinensis (L.) Kuntze] products, with the goal of capturing the variability of commercially used products and selecting representative products for in vitro or clinical evaluation. Three metabolomic-related methods-untargeted ultraperformance liquid chromatography-mass spectrometry (UPLC-MS), targeted UPLC-MS, and untargeted, quantitative 1HNMR-were employed to characterize 34 commercially available green tea samples. Of these methods, untargeted UPLC-MS was most effective at discriminating between green tea, green tea supplement, and non-green-tea products. A method using reproduced correlation coefficients calculated from principal component analysis models was developed to quantitatively compare differences among samples. The obtained results demonstrated the utility of metabolomics employing UPLC-MS data for evaluating similarities and differences between complex botanical products.
It is common
practice in many
research fields to conduct in vitro or clinical evaluation of complex
botanical products. The selection of appropriate study material for
such investigations is confounded by the complexity and variability
of botanical source material. Botanical products contain diverse phytochemicals,
of which the identities of many are often not known. In addition,
substantive variability in phytochemical composition exists in these
products depending on the method of preparation or source material
used, and industrial processing of botanical supplements frequently
renders them unable to be analyzed using genetic techniques, such
as DNA barcoding.[1,2] Such variability in phytochemical
composition can greatly impact the interpretation of both in vitro
and clinical studies. There is currently a lack of definitive guidelines
for ensuring the quality of the product to be tested.[3] The United States Food and Drug Administration (FDA) guidance
for clinical trials involving botanical drug products[4] recommends that investigational new drug applications contain
“a chemical identification for the active constituents or characteristic
markers in the drug substance, if possible”. However, specific
guidelines for comparing available products and selecting appropriate
representative samples for investigation are currently lacking.The goal of this study was to compare the effectiveness of several
metabolomics approaches for evaluating the variability in the phytochemical
composition of a series of commercial botanical products. Green tea
[leaves from Camellia sinensis (L.) Kuntze (Theaceae)]
were employed as a test case. Green tea is one of the most commonly
consumed beverages worldwide[5] and is also
a popular dietary supplement, ranking fifth in sales in the United
States in 2015.[6] Green tea products have
been reported to possess numerous health-protective qualities, including
cardioprotection, chemoprevention, and weight loss.[7−9] However, many
green tea clinical samples are delivered as a complex mixture (tea
or extract) as opposed to single-molecule interventions.[10−12]The phytochemical composition of green tea is similar to that
of
fresh Camellia sinensis leaves except for a few enzymatically
catalyzed reactions that occur immediately after harvest.[5,13] Green tea contains over 200 previously identified constituents,
including polyphenols, xanthines, theanine, inorganic salts, and individual
elements.[14] Polyphenols constitute up to
30% of the dry leaf by mass and are the major constituents in green
tea.[15] Catechins, specifically flavan-3-ols
and flavan-3-gallates, represent the largest group of polyphenols
in green tea leaves and are thought to be largely responsible for
the diverse bioactivity demonstrated in green tea studies.[16] The extraction efficiency of green tea polyphenols
depends on the extraction method, contact time with the solvent, solvent
composition, and the form of tea (i.e., bagged or loose).[17,18] This variability is increased with the incomplete or inconsistent
application of analytical methods, making determination of dose content
challenging.[19] Meta-analysis studies of
green tea products used in clinical studies reported polyphenol doses
ranging from 200 to 1207 mg.[10−12]Metabolomics-based approaches
have emerged as important tools in
assessing large chemical and biological data sets, including those
related to disease pathology,[20] drug response,[21] environmental toxicity,[22] and natural products discovery.[23,24] The primary
goal of metabolomics is to correlate changes in the chemical profile
of a sample with a corresponding shift in macroscopic phenotype due
to a perturbation.[25] Metabolomic studies
coupled with statistical analysis (chemometric studies) have been
employed to characterize the relationships between the metabolome
of green teas and corresponding genotype, origin, quality, or other
biotic or abiotic attributes.[26−28]Several different analytical
techniques are used for metabolomic
profiling, including infrared and Raman spectroscopy, NMR spectroscopy,
and mass spectrometry (MS).[29,30] NMR-based metabolomic
techniques, when acquired under quantitative conditions (qNMR), offer
an unbiased assessment of a complex sample composition, allow the
simultaneous identification and quantification of diverse metabolites,
and are nondestructive of the sample.[24] Mass spectrometry-based metabolomic methods have the advantages
of orders of magnitude greater sensitivity than NMR spectroscopy and
the ability to couple directly to separation techniques such as gas
chromatography (GC) or liquid chromatography (LC).[31] A disadvantage of analysis via mass spectrometry is that
ionization is required to detect sample components, yet all chemical
compounds are not universally ionized in a mass spectrometer.[32] With these advantages and disadvantages in mind,
this study was undertaken to compare the effectiveness of untargeted
and targeted mass spectrometry and NMR spectroscopy as methods for
chemically characterizing green tea products.One of the critical
questions in selecting a botanical product
(in this case a sample of green tea) for further study is how it compares
to other available products. Chemometric analysis of metabolomics
data sets can be used to make these comparisons. Ascribing similarity
between metabolomic profiles is often achieved via multivariate statistical
modeling procedures, such as principal component analysis (PCA).[26] PCA is a graphical representation of data that
can be used to ascribe clusters of similar samples, but is not equipped
to quantify variability and similarity between samples, and generally
employs only two principal components at a time to classify the samples.[33] Hierarchical cluster analysis (HCA) can be used
to cluster samples based upon similarity, but provides only information
on similarity between adjacent samples and not for overarching comparisons
between all samples in a data set.[34,35] For the work
described herein, an alternate approach for comparison of samples
was employed, that of a reproduced correlation coefficient matrix.
The reproduced correlation coefficient matrix is based on PCA scores
and loadings, but is derived from all principal components (i.e.,
not just a pair of components, as in traditional PCA plots). As demonstrated
herein using the example of green tea, the correlation matrix displays
a series of correlation coefficients that can be used to quantitatively
compare multiple samples in a data set and determine which are most
chemically similar. Such information can then be used to inform product
selection for later in vitro or clinical evaluation.
Results and Discussion
Comparison
of Extraction Techniques
An important first
step in comparing the chemistry of complex botanical products is selecting
the appropriate solvent extraction technique. Two extraction techniques
were considered for this study: hot water extraction and methanol
extraction. Hot water extraction replicates the traditional process
of brewing tea leaves and should therefore yield results relevant
to consumer use. However, methanol extraction was appealing due to
its nonselective ability to extract a wide range of secondary metabolites
and the ease in removing methanol solvent for extract storage and
processing.[36] To aid in the decision between
hot water and methanol extraction, triplicate extracts of a National
Institute of Standards and Technology (NIST) green tea standard were
prepared in both hot water and methanol, and their chemical composition
was compared.Overall, the two different extraction techniques—hot
water and methanol—yielded similar quantities of the major
polyphenolic metabolites (Figure S1, Supporting Information). The metabolite profile as determined by mass
spectrometric analysis appeared similar between the two techniques.
The hot water extraction sample had a higher (−)-gallocatechin
content relative to the methanol extraction sample, whereas the methanol
extraction sample displayed higher levels of (−)-epicatechin
gallate, (−)-epigallocatechin gallate, and gallic acid relative
to the water extraction sample. Methanol was selected as the extraction
solvent for subsequent metabolomics analysis due to the overall similarities
of the extracted quantities and the ease of preparing and handling
methanolic extracts.
Differentiation of Green Tea Samples by Untargeted
Mass Spectrometry
Metabolomics
Commercially available green tea products (n = 34) were selected using consumer sales reports[37] and product quality reports[38,39] (Table S1, Supporting Information). A
turmeric–ginger tea served as a negative control (T23), and
NIST reference standards (T26, T27, and T37) served as positive controls.
For the sake of further comparison, two of the selected green teas
contained additional botanical additives (T24 and T38) (Table S1, Supporting Information).Untargeted metabolomic
analysis of the green tea samples using UPLC-MS yielded 2270 marker
ions (unique retention time–m/z ion pairings) for 114 objects (i.e., 38 green tea samples prepared
by extraction in triplicate), which were analyzed using PCA (Figure A). The extraction
replicates (e.g., T01-1, T01-2, and T01-3) of each green tea product
were overlaid on the PCA plot (Figure A), indicating excellent repeatability of the extraction
technique and subsequent UPLC-MS analysis. PCA using untargeted metabolomics
data identified one injection, T28-1, which originally appeared to
be an outlier, but had been mislabeled in the UPLC injection queue.
The ability to identify and address this mislabeled injection highlights
the benefit and importance of having replicate samples (each analyzed
separately) for metabolomic analyses.
Figure 1
Principal component analysis (PCA) scores
plot of green tea samples
drawn with Hotelling’s 95% confidence ellipse. Data points
representing triplicate green tea samples were closely clustered,
and distinct clusters were observed between green tea supplements,
green teas, and the negative control (turmeric–ginger tea,
T23, indicated in the figure as “non-green tea”). Representative
samples are highlighted (T23, T24, T26, T27, and T37) to demonstrate
the reproducibility of the extraction and analytical protocol. SRM
represents standard reference material from the National Institute
of Standards and Technology (NIST); data points indicated as “suppl”
are green tea supplements.
Principal component analysis (PCA) scores
plot of green tea samples
drawn with Hotelling’s 95% confidence ellipse. Data points
representing triplicate green tea samples were closely clustered,
and distinct clusters were observed between green tea supplements,
green teas, and the negative control (turmeric–ginger tea,
T23, indicated in the figure as “non-green tea”). Representative
samples are highlighted (T23, T24, T26, T27, and T37) to demonstrate
the reproducibility of the extraction and analytical protocol. SRM
represents standard reference material from the National Institute
of Standards and Technology (NIST); data points indicated as “suppl”
are green tea supplements.Inspection of the data indicated that the sample clusters
were
located at different points in the two-dimensional space prescribed
by two vectors, principal component 1 (PC1 = 38%) and principal component
2 (PC2 = 22%) (Figure A). Three distinct clusters were observed in the data, corresponding
to the three different sample types (green tea, green tea supplement,
and the non-green tea) studied. The negative control (T23) was located
beyond the boundary of the Hotelling’s 95% confidence ellipse.
The loose leaf and powdered green teas were grouped together and were
separate from the green tea supplements. The two green teas that contained
additional botanical components (T24 and T38), although roughly grouped
with the green tea samples, were visibly drawn away from the main
cluster of tea samples in the PCA. The positive controls (NIST samples
T27 and T37) also clustered with their commercial counterparts (loose
leaf tea and green tea supplements, respectively) and were located
centrally within each grouping. Thus, the NIST standards do, indeed,
appear to be representative of green teas used commercially by U.S.
consumers. Using the unsupervised PCA analysis, it was not possible
to visually differentiate between the green tea leaf and powdered
samples, which suggests that the chemistry of such samples is similar.Smaller groupings apparent within the loose leaf and powdered tea
clusters were noted (Figure ). These clusterings could have represented variations in
tea cultivar, product source, and/or processing methods.[26−28,31,33,40,41] However, commercial
suppliers do not traditionally offer records of cultivars, geographic
locations, or precise processing procedures of their tea products.
Therefore, it was not possible to determine the underlying characteristics
that produced these smaller clusters. This did not present a problem
for the investigation being conducted here, as the goal was not to
compare green tea composition from different geographical locations
but evaluate the variability in samples used by consumers and to determine
which samples are most representative for further clinical and in
vitro evaluation.Averaging the peak area from each marker ion
across the analyses
of the triplicate extraction samples resulted in a data matrix (2270
marker ions and 38 objects) that yielded a similar clustering pattern
to that observed for the individual replicates (Figure ). The averaged data set was used for comparison
against targeted chemometric analysis and NMR-based metabolomics.The separation observed in the PCA can be explained through some
of the identified metabolites highlighted in the loading plots for
PC1 (Figure A) and
PC2 (Figure B). Principal
component loadings estimate the degree to which each independent variable
contributes to the individual principal components of a PCA model;
the greater the magnitude of a particular variable’s loading
to a component, the more it contributes to that principal component.
Plotting each principal component’s loading demonstrates which
variables (marker ions) are responsible for the clusterings and shifts
observed in the PCA scores plot. The loadings plot for the first principal
component (Figure A) highlighted the catechins (−)-epigallocatechin gallate,
(−)-epigallocatechin, (−)-epicatechin gallate, (−)-epicatechin
and a flavonol, rutin, as all contributing negatively to PC1. These
ions were the dominant marker ions responsible for separating the
green tea samples from the negative control (T23) (Figure A), as they were present in
the green tea samples in higher concentrations than the negative control.
The components detected were identified by comparing retention times,
accurate masses, and MS-MS fragmentation patterns against standards.
Other ions that were dominant in the green tea samples included two
digallate dimers of epigallocatechin gallate, theasinensin A (m/z 913.1639) and (−)-epigallocatechin-3-O-(3-O-methyl) gallate (m/z 472.1005), which were identified tentatively
by comparing their accurate masses to literature values.[42,43] It is recognized that definitive identification of these ions is
not possible without isolation and confirmation with NMR spectroscopy,
but such experiments were beyond the scope of the current study, for
which the primary goal was to compare metabolomics data collection
and analysis approaches.
Figure 2
Loadings plots from untargeted MS-based PCA
of green tea samples.
Metabolites with more negative correlation values along the x-axis (PC1, green labels) were present in higher concentrations
in the green tea samples versus the negative control (T23, turmeric–ginger
tea) and were responsible for the separation observed along the horizontal
axis of Figure . Labeled
metabolites with greater positive correlation along the y-axis (PC2, brown labels) were more heavily represented in green
tea supplement samples versus the loose leaf green tea samples and
were the dominant metabolites underlying the differentiation of the
two sample groups in the vertical axis of Figure . Metabolites were identified by comparison
against analytical standards. In cases where standards were not present,
comparisons against the literature using m/z values from high-resolution mass spectrometry are provided.
Identifications based on mass without reference standards are tentative.
Loadings plots from untargeted MS-based PCA
of green tea samples.
Metabolites with more negative correlation values along the x-axis (PC1, green labels) were present in higher concentrations
in the green tea samples versus the negative control (T23, turmeric–ginger
tea) and were responsible for the separation observed along the horizontal
axis of Figure . Labeled
metabolites with greater positive correlation along the y-axis (PC2, brown labels) were more heavily represented in green
tea supplement samples versus the loose leaf green tea samples and
were the dominant metabolites underlying the differentiation of the
two sample groups in the vertical axis of Figure . Metabolites were identified by comparison
against analytical standards. In cases where standards were not present,
comparisons against the literature using m/z values from high-resolution mass spectrometry are provided.
Identifications based on mass without reference standards are tentative.The second principal component
discriminated between the green
tea supplements and the leaf/powdered teas. The corresponding loading
plot revealed several metabolites that were present in higher concentrations
in the green tea supplements than the leaf and powdered teas and,
thus, were dominant peaks in the positive direction. Myricetin, kaempferol,
and quercetin aglycones, identified via comparison against accurate
mass and fragmentation pattern of standards, were all present as leading
discriminating ions (Figure B). In addition, based upon accurate mass measurements, the
tentatively identified theaflavin 3-O-(3-O-methyl) gallate, which is formed via oxidative coupling
of epicatechin and epigallocatechin 3-O-(3-O-methyl) gallate,[43] and the
dimer epicatechin (4β→8)-epigallocatechin-3-O-gallate[44] were observed in the positive
direction of the PC2 loading plot. The data suggest that these compounds
are present at higher levels in green tea supplements compared to
the loose teas. Again, follow-up studies with NMR structure elucidation
would be necessary to confirm the chemical identity of the 3-O-(3-O-methyl) gallate and epicatechin-(4β→8)-epigallocatechin-3-O-gallate.
Targeted Metabolomics Analysis of Green Tea
Samples
As a comparison with the untargeted metabolomics
approach, targeted
quantitative analysis was conducted of a series of green tea components
for which commercial standards were available. These standards included
catechins [(+)-catechin, (−)-epicatechin, (−)-epicatechin
gallate, (−)-epigallocatechin, (−)-epigallocatechin
gallate, and (−)-gallocatechin], flavonols (kaempferol, myricetin,
quercetin, and rutin), phenolic acids (caffeic acid, chlorogenic acid,
coumaric acid, and gallic acid), an amino acid (theanine), and a purine
alkaloid (caffeine). The calibration curves for each standard were
linear over a range of 0.5–200 μg/mL, with a coefficient
of determination (R2) > 0.992 (Table
S2, Supporting Information). The 16 standards
were
detected in all green tea samples (Figure ), and the concentration of each constituent
was determined (Table S3, Supporting Information). Based on the corresponding heat map of quantified standards (Figure ), concentrations
of the main catechins and caffeine ranged between 7 and 107 μg/mL
extract in all the green tea samples analyzed. The negative control
(T23) yielded lower concentrations of most metabolites than the green
tea samples, and a number of green tea metabolites were not detected
(Figure ). The NIST
green tea leaf standard (T26) yielded similar concentrations of (−)-epicatechin,
(−)-epicatechin gallate, (−)-gallocatechin, and gallic
acid compared to the published certificate of analysis (Table S4, Supporting Information), while NIST reported
higher concentrations of (−)-epigallocatechin and (−)-epigallocatechin
gallate. These differences are likely a result of interlaboratory
differences in extraction procedures.[45] Metabolite concentrations in all green tea samples were within the
same order of magnitude as those reported by Phenol-Explorer, a database
dedicated to the aggregation of phenolic content data from dietary
sources,[46] and published by others conducting
quantitative analysis of green tea constituents.[18,47]
Figure 3
Annotated
mass spectral profile identifying green tea metabolites
used in this study. (A) Positive electrospray ionization mode. (B)
Negative electrospray ionization mode.
Figure 4
Quantification of green tea standards in tea samples. Boxes represent
average concentrations of triplicate samples in μg/mg extract. #Negative control (turmeric–ginger tea); ‡NIST standard reference materials; §green teas containing
other botanical additives. ext, extract.
Annotated
mass spectral profile identifying green tea metabolites
used in this study. (A) Positive electrospray ionization mode. (B)
Negative electrospray ionization mode.Quantification of green tea standards in tea samples. Boxes represent
average concentrations of triplicate samples in μg/mg extract. #Negative control (turmeric–ginger tea); ‡NIST standard reference materials; §green teas containing
other botanical additives. ext, extract.With only the 16 quantified constituents as independent variables
in a 16 × 38 matrix, targeted metabolomics chemometric analysis
resulted in a less discriminating PCA scores plot (Figure ) compared to that generated
with the untargeted metabolomics analysis (Figure ). Using the targeted metabolite PCA plot,
it was possible to differentiate the green tea samples from the negative
control (T23). However, the NIST positive control (T26) and several
leaf tea samples (T33 and T34) were interspersed among the green tea
supplement samples. Thus, it was not possible to effectively discriminate
between the tea and supplement samples using only the targeted metabolite
data, whereas the untargeted approach (Figure ) yielded clear delineations between the
varying types of green tea. Inclusion of more standards could have
potentially improved the targeted metabolite analysis;[47] however, green tea products routinely contain
more than 200 known bioactive phytochemical constituents and many
more undiscovered compounds.[14] Quantifying
every known phytochemical would still represent a fraction of the
>2000 individual marker ions used in the untargeted metabolomics
approach.
Thus, untargeted chemometrics provided an inherent advantage for classifying
samples, which was borne out in the observed differences between PCA
score plots (Figure and Figure ) and
their ability to distinguish between the various green tea sample
types.
Figure 5
Principal component analysis of targeted mass spectrometry data,
drawn with Hotelling’s 95% confidence ellipse. The chemometric
matrix consisted of 15 quantified samples (targeted variables) and
38 objects (for quantification data, see Table S3, Supporting Information). Representative samples are highlighted
(T23, T26, T27, and T37).
Principal component analysis of targeted mass spectrometry data,
drawn with Hotelling’s 95% confidence ellipse. The chemometric
matrix consisted of 15 quantified samples (targeted variables) and
38 objects (for quantification data, see Table S3, Supporting Information). Representative samples are highlighted
(T23, T26, T27, and T37).
Comparison of 1H NMR Spectroscopy and Mass Spectrometry
Chemometric Analyses
The performance of untargeted mass spectrometry
and 1H NMR chemometric analysis of green tea extract samples
were further compared. Separating the 1H NMR region of
δ 0.5 to 8.0 ppm into bins of 0.05 ppm yielded 150 spectral
bins (independent variables) across all samples to describe the metabolite
profile (Figure S2, Supporting Information).When compared to the PCA plot obtained from untargeted mass
spectrometry data (Figure ), the NMR spectroscopic results (Figure ) displayed similar trends in clustering
tea samples. In both score plots, separation was observed between
green tea samples and the negative control (T23), although in the 1H NMR PCA plot, T23 was visually closer to the boundary of
Hotelling’s confidence ellipse. However, the NMR metabolomics
data displayed more overlap between green tea supplement samples and
loose leaf tea and powdered tea samples (Figure ) than was observed for the untargeted mass
spectrometry data (Figure ). The overlap was due to the dispersal of variables along
PC1; the variables were not clustered as cleanly as they were in the
untargeted mass spectrometry analysis (Figure ). This suggests that the spectral bins containing
overlapping information lowered discrimination between tea samples.
However, a higher field instrument using a cryoprobe has improved
resolution. Using such an instrument (as was achieved by Yuk et al.,
2013) could provide better discrimination between green teas and improve
the overall metabolomics analysis.[48]
Figure 6
Principal component
analysis (PCA) scores plot of data from 1H NMR metabolomics
analysis of green tea extracts. Representative
samples are highlighted (T23, T26, T27, and T37).
Principal component
analysis (PCA) scores plot of data from 1H NMR metabolomics
analysis of green tea extracts. Representative
samples are highlighted (T23, T26, T27, and T37).
Comparison of Similarity Using a Reduced Correlation Matrix
One of the potential criticisms of using unsupervised statistical
methods, such as PCA, to evaluate similarity between samples is the
reduction of the model to only two dimensions (i.e., PC1 vs PC2),
which inherently limits the analysis. For the green tea untargeted
approach, PC1 and PC2 represented 60% of the total variation in the
sample (38% and 22%, respectively). This has been observed in other
metabolomics studies, where the principal components used for visual
discrimination represented only a fraction of the total variation
present in the samples.[26,31,40]To address the limitations of using only two principal components
to describe the variability in the data set, a “reproduced
correlation matrix” (Figure ) was calculated, which is based on four principal
components and is calculated according to eq . Collectively, the four components used to
generate the data in Figure encapsulate 84% of the variation in the metabolomics data.
The reproduced correlation matrix is a simple and useful way to compare
differences among samples in a complex metabolomics data set. The
values in the matrix range from −1.0 to 1.0, and it is possible,
by selecting the relevant correlation value in the matrix, to obtain
a quantitative measure of the similarity between any two samples in
the data set (Table S5, Supporting Information). For example, the data set could be used to select a commercial
sample that is similar to the NIST loose leaf standard (T26). Samples
T02, T13, T21, and T22, which demonstrate correlations with T26 of
0.974, 0.983, 0.990, and 0.997, respectively, might be good choices.
Conversely, the “superantioxidant” botanical-containing
green tea T24, which shows a correlation of −0.029, is, based
on the metabolomics data, less similar to the NIST standard.
Figure 7
Heat map correlation
matrix for green tea samples. Correlation
was based upon the averaged metabolomic profile for each sample and
calculated from the reproduced correlation coefficient matrix comprised
of a four principal component model (Table S5, Supporting Information). Darker shades represent stronger
correlation between samples. #Negative control; ‡NIST standard reference materials; §green teas with
botanical additives.
Heat map correlation
matrix for green tea samples. Correlation
was based upon the averaged metabolomic profile for each sample and
calculated from the reproduced correlation coefficient matrix comprised
of a four principal component model (Table S5, Supporting Information). Darker shades represent stronger
correlation between samples. #Negative control; ‡NIST standard reference materials; §green teas with
botanical additives.This study demonstrates the utility of untargeted mass spectrometry-based
metabolomics to effectively discriminate between multiple classes
of green tea products. Chemometric analysis using an untargeted metabolomics
profiling was more effective in clustering loose leaf green teas from
green tea supplements compared to targeted mass spectrometry analysis
or 1H NMR metabolomics.Previous green tea studies
highlight the benefits of using NMR
spectroscopy to study metabolomic differences, given that this method
can detect all 1H-containing species in a sample, including
phytochemical compounds that could be difficult to analyze via mass
spectrometry.[27,33,49] One widely accepted limitation of MS has been the nonuniversality
of natural product ionization; the utilization of both positive and
negative modes in this study resulted in a wider range of metabolites
detected and used in the overall metabolomic analysis. For the samples
evaluated, the UPLC-MS data were more useful for distinguishing various
sample types (i.e., supplement versus tea) than NMR-based metabolomics
analysis. The improved performance of mass spectrometry-based metabolomics
as compared to NMR-based metabolomics can be attributed to the ease
with which mass spectrometry can be coupled to separation methods
(liquid or gas chromatography), which provide another dimension of
separation in the data. In addition, mass spectrometry is a far more
sensitive technique than NMR spectroscopy, with limits of detection
in the pM to nM range.[24,31,36] In contrast, NMR analysis is limited to the more abundant metabolites,
which may or may not be the most relevant with respect to bioactivity.[50] Other studies have also demonstrated the improved
ability of mass spectrometry to differentiate complex supplement products
due to higher sensitivity.[51] Despite the
somewhat superior data obtained here with mass spectrometric analysis
as compared to NMR spectroscopy, results from the current work suggest
that either NMR or MS could be effective methods to aid in selection
of complex natural products or botanical products.Analyzing
similarity and variation from a range of commercial products
remains a challenge when the study material is a complex natural product
or botanical sample.[1] Green tea products,
like other complex botanical preparations, contain a wide variety
of bioactive secondary metabolites, which vary considerably depending
on the cultivar used, geography, processing, and formulation.[52] The results illustrate the usefulness of untargeted
metabolomics to obtain a snapshot of this variability. Information
obtained by metabolomics analysis could be employed to make an informed
opinion as to which products are most representative of those used
by consumers or to identify outliers in a data set. Comparison of
the data in Figure (PCA based on metabolite profile) and Figure (PCA based solely on representative marker
compounds) indicates the advantage of making such decisions using
metabolomics information rather than data for selected marker compounds.
The marker compounds represent only a subset of the chemical diversity
of the samples; thus, one might presume (incorrectly) based on the
marker compound data that samples are chemically similar, when in
fact important differences exist. Specific to the test case under
investigation here, based on the PCA of complex metabolomics data
(Figure ), it is clear
that the chemical makeup of green tea supplements is different from
that of powdered or whole leaf tea samples. If one conducted the comparison
among samples using exclusively marker compound data (Figure ), these differences might
have been overlooked. Such an oversight could have important ramifications
for future studies, given that differences in the chemistry of tea
versus supplement samples could lead to different results in in vitro
or clinical studies.The need to ascertain similarity and variability
has numerous applications
in natural products research, whether it is to monitor quality control
of products for adulteration,[53,54] authenticate botanical
samples,[55] or select samples for further
in vitro or in vivo studies. The chemometric approach described herein
(untargeted mass spectrometric analysis coupled with reproduced data
matrix calculation) has the potential to provide a wealth of data
for comparisons of multiple, complex data sets. One of the challenges
of using metabolite profiles to characterize similarities and differences
among samples is handling the magnitude and complexity of the data
that are generated with such analyses. The bottleneck for metabolomics
experiments tends not to be in the data collection, but in meaningful
data interpretation. An important contribution of this study is the
application of the reproduced correlation coefficient matrix as a
simple metric for measuring the similarity between multiple samples
in a complex data set based on the whole metabolite profile. The reproduced correlation coefficient is a single value that incorporates
multiple PCA model components and provides a useful alternative to
visual inspection for comparing samples in a PCA plot.
Experimental Section
Chemicals
Unless
otherwise noted, all chemicals were
of reagent or spectroscopic grade and obtained from Fisher Scientific
(Waltham, MA, USA).
Green Tea Product Selection
Green
tea products were
selected using readily available consumer sales reports[37] and product quality reports.[38,39] The 34 products included 21 whole-leaf teas, six powders, and seven
supplements (Table S1, Supporting Information) and were each coded with a T number. A single non-green tea (turmeric–ginger
tea) was included as a negative control (T23), and Camellia
sinensis standard reference materials from NIST for loose
leaf tea (T26), supplement (T27), and oral dosage form (T37) (nos.
3254, 3255, and 3256, respectively) served as positive controls (Table
S1, Supporting Information). Two green
teas that contained other botanical additives were also selected for
comparison (T24 and T38). A retention sample of each product, containing
several grams of material, was maintained in the lab for future reference.
Green Tea Product Extraction and Isolation
Green tea
products were extracted in triplicate via a hot water or methanol
extraction procedure. For the hot water procedure, 200 mg of sample
and 20 mL of water were added to a 20 mL scintillation vial and heated
to 90 °C. The mixture was stirred in a hot water bath for 5 min,
after which the suspension was immediately filtered and cooled to
room temperature. Each sample was lyophilized and stored at −80
°C until analysis. Sample extractions in methanol were performed
in the same 10:1 ratio as the hot water extracts. Thus, to each 200
mg tea sample was added 20 mL of reagent-grade methanol, and the mixtures
were shaken overnight at room temperature, filtered, and dried under
reduced pressure. NMR and MS analyses were conducted upon these samples
in triplicate.
1H NMR Analysis
NMR spectra
were acquired
with a JEOL ECA-400 NMR spectrometer (400 MHz, JEOL Ltd., Tokyo, Japan)
equipped with a high-sensitivity JEOL Royal probe and a 24-slot autosampler.
NMR chemical shift values were referenced to residual solvent signals
for CD3OD (δH 3.31 ppm). To collect 1H NMR data, each sample was resuspended in CD3OD
(Cambridge Isotope Laboratories, Andover, MA, USA) to a concentration
of 10 mg/mL.
Mass Spectrometry Analysis
Ultraperformance
(UP) LC-MS
data were acquired using a Q Exactive Plus quadrupole-orbitrap mass
spectrometer (Thermo Scientific, Waltham, MA, USA) with an electrospray
ionization source coupled to an Acquity UPLC system (Waters, Milford,
MA, USA). Before UPLC-MS analysis, each sample was resuspended in
methanol to yield a concentration of 1 mg/mL. Triplicate injections
of 3 μL were then performed. Samples were eluted from the column
(Acquity UPLC BEH C18 1.7 μm, 2.1 × 50 mm, Waters)
at a flow rate of 0.3 mL/min using the following binary gradient,
with solvent A consisting of H2O (0.1% formic acid added)
and solvent B consisting of CH3CN (0.1% formic acid added):
initial isocratic composition of 95:5 (A:B) for 1.0 min, increasing
linearly to 0:100 over 20 min, followed by an isocratic hold at 0:100
for 1 min, gradient returned to starting conditions of 95:5 for 2
min, and held isocratic again for 1 min. The mass spectrometer was
operated in the positive/negative switching ionization mode over a
full scan range of m/z 150–2000
with the following settings: capillary voltage, 5 V; capillary temperature,
300 °C; tube lens offset, 35 V; spray voltage, 3.80 kV; sheath
gas flow and auxiliary gas flow, 35 and 20 arbitrary units, respectively.
Metabolite Quantification
Quantification of the major
catechin, phenolic acid, and flavonoid components of the green tea
products used 15 calibration standards obtained from Chromadex (Irvine,
CA, USA) (Table S2, Supporting Information). LC-MS analysis was conducted as described above. Standards were
prepared in spectrometric-grade MeOH and diluted in a 2-fold dilution
series ranging from 0.1 to 200 μg/mL before injection. A calibration
curve was constructed by plotting the area of the selected ion chromatogram
for each standard versus nominal concentration. Concentrations of
each standard in the extracts were determined by 1/x2 weighted least-squares linear regression.
Chemometric
Analysis
Chemometric analysis was conducted
using a slightly modified version of a previously reported method.[56] Both the untargeted and targeted UPLC-MS data
sets for each sample were individually analyzed, aligned, and filtered
with MZmine 2.20 software (http://mzmine.sourceforge.net/).[57] Peak detection in MZmine was achieved using the following parameters
for peak detection: noise level (absolute value), 5 × 105 counts; minimum peak duration, 0.05 min; tolerance for m/z variation, 0.05; and tolerance for m/z intensity variation, 20%. Deisotoping,
peak list filtering, and retention time alignment algorithm packages
were used to refine peak detection. Finally, the join algorithm integrated
all metabolomic profiles into a single data matrix using the following
parameters: the balance between m/z and retention time was set at 10.0 each, m/z tolerance was set at 0.001, and retention time tolerance
size was defined as 0.5 min. The spectral data matrix was exported
for analysis, both as a set of peak areas for individual ions detected
in triplicate extractions and the average peak areas for the triplicate
extractions. Throughout the MZmine data processing steps, samples
that did not possess a particular marker ion were coded with a peak
area of 0, to maintain the same number of variables for all data sets.
Chemometric analysis was performed on the data sets (both the individual
triplicate data and the average of the triplicates for each sample)
using Sirius version 9.0 (Pattern Recognition Systems AS, Bergen,
Norway).[58] Initially, transformation from
heteroscedastic to homoscedastic noise was carried out by a fourth
root transform of the spectral variables.[59] Principal component analysis was used to provide unsupervised statistical
analysis of the green tea samples.Correlation coefficients
(r) were calculated from principal component models
and used to indicate similarity between samples. A data matrix X can be decomposed into the sum of the mean values of the
variables (X̅), the data estimated from principal
components representing the major variation in X (X̂PCA), and residual data (noise and other
sources of small variation, EPCA) (eq ).The product of the estimated PCA matrix and
its transpose (X̂PCAT) divided by the norm of the two matrices
yields a reproduced
correlation coefficient matrix that can be used for comparison between
variables (eq ) or objects
(samples) (eq ):Thus, the reproduced correlation coefficient between each
pair
of variables (eq ) or
objects (eq ) can be
determined from the scalar product divided by the norm of the vectors
(X̂PCA) (eq ). Thus, for any two objects, and ,Equation provides
a correlation coefficient that describes the extent to which a given
sample (in the present case, a green tea extract) correlates with
any other sample in the data set after removing noise and other sources
of small variation from the data. Coefficient values closer to 1 demonstrate
a stronger correlation (i.e., greater similarity) between the two
samples. This calculation was performed in Excel (Microsoft, Redmond,
WA, USA), using the PCA loading and score information obtained from
the Sirius software output.For NMR-based metabolomics, NMR
spectra were processed using Mnova
(Mestrelab Research, Santiago de Compostela, Spain) with exponential
apodization (exponent 1); global phase correction; Bernstein-Polynomial
baseline correction; Savitzky–Golay line smoothing; and normalization
using total spectral area as provided in Mnova. Spectral regions from
0.5 to 8 ppm were included in the normalization and analysis. The
NMR spectra of all the green tea samples were binned by 0.05 ppm.
The narrow bin size allowed details to be revealed and provided much
information on low-intensity peaks.[60] In
this data set, each bin was considered to be one variable. Binned
data were used to conduct PCA using the Sirius software.
Authors: Kang Mo Ku; Jung Nam Choi; Jiyoung Kim; Jeong Kee Kim; Lang Gook Yoo; Sang Jun Lee; Young-Shick Hong; Choong Hwan Lee Journal: J Agric Food Chem Date: 2010-01-13 Impact factor: 5.279
Authors: Arun Sreekumar; Laila M Poisson; Thekkelnaycke M Rajendiran; Amjad P Khan; Qi Cao; Jindan Yu; Bharathi Laxman; Rohit Mehra; Robert J Lonigro; Yong Li; Mukesh K Nyati; Aarif Ahsan; Shanker Kalyana-Sundaram; Bo Han; Xuhong Cao; Jaeman Byun; Gilbert S Omenn; Debashis Ghosh; Subramaniam Pennathur; Danny C Alexander; Alvin Berger; Jeffrey R Shuster; John T Wei; Sooryanarayana Varambally; Christopher Beecher; Arul M Chinnaiyan Journal: Nature Date: 2009-02-12 Impact factor: 49.962
Authors: Marguerite A Klein; Richard L Nahin; Mark J Messina; Jeanne I Rader; Lilian U Thompson; Thomas M Badger; Johanna T Dwyer; Young S Kim; Carol H Pontzer; Pamela E Starke-Reed; Connie M Weaver Journal: J Nutr Date: 2010-04-14 Impact factor: 4.798
Authors: Weam Siheri; Tong Zhang; Godwin Unekwuojo Ebiloma; Marco Biddau; Nicola Woods; Muattaz Yassein Hussain; Carol J Clements; James Fearnley; RuAngelie Edrada Ebel; Timothy Paget; Sylke Muller; Katharine C Carter; Valerie A Ferro; Harry P De Koning; David G Watson Journal: PLoS One Date: 2016-05-19 Impact factor: 3.240
Authors: Kelly A Shipkowski; Joseph M Betz; Linda S Birnbaum; John R Bucher; Paul M Coates; D Craig Hopp; Duffy MacKay; Hellen Oketch-Rabah; Nigel J Walker; Cara Welch; Cynthia V Rider Journal: Food Chem Toxicol Date: 2018-04-04 Impact factor: 6.023
Authors: Emily J Johnson; Vanessa González-Peréz; Dan-Dan Tian; Yvonne S Lin; Jashvant D Unadkat; Allan E Rettie; Danny D Shen; Jeannine S McCune; Mary F Paine Journal: Drug Metab Dispos Date: 2018-05-07 Impact factor: 3.922
Authors: Joshua J Kellogg; Emily D Wallace; Tyler N Graf; Nicholas H Oberlies; Nadja B Cech Journal: J Pharm Biomed Anal Date: 2017-07-29 Impact factor: 3.935
Authors: E Diane Wallace; Daniel A Todd; James M Harnly; Nadja B Cech; Joshua J Kellogg Journal: Anal Bioanal Chem Date: 2020-04-29 Impact factor: 4.142
Authors: Dan-Dan Tian; Joshua J Kellogg; Neşe Okut; Nicholas H Oberlies; Nadja B Cech; Danny D Shen; Jeannine S McCune; Mary F Paine Journal: Drug Metab Dispos Date: 2018-02-21 Impact factor: 3.922
Authors: Tyler E Gaston; Donna L Mendrick; Mary F Paine; Amy L Roe; Catherine K Yeung Journal: Regul Toxicol Pharmacol Date: 2020-03-18 Impact factor: 3.271
Authors: Joshua J Kellogg; Mary F Paine; Jeannine S McCune; Nicholas H Oberlies; Nadja B Cech Journal: Nat Prod Rep Date: 2019-08-14 Impact factor: 13.423
Authors: Trevor N Clark; Joëlle Houriet; Warren S Vidar; Joshua J Kellogg; Daniel A Todd; Nadja B Cech; Roger G Linington Journal: J Nat Prod Date: 2021-03-05 Impact factor: 4.050