Literature DB >> 28453261

Comparison of Metabolomics Approaches for Evaluating the Variability of Complex Botanical Preparations: Green Tea (Camellia sinensis) as a Case Study.

Joshua J Kellogg¹, Tyler N Graf¹, Mary F Paine², Jeannine S McCune³, Olav M Kvalheim⁴, Nicholas H Oberlies¹, Nadja B Cech¹.

Abstract

A challenge that must be addressed when conducting studies with complex natural products is how to evaluate their complexity and variability. Traditional methods of quantifying a single or a small range of metabolites may not capture the full chemical complexity of multiple samples. Different metabolomics approaches were evaluated to discern how they facilitated comparison of the chemical composition of commercial green tea [Camellia sinensis (L.) Kuntze] products, with the goal of capturing the variability of commercially used products and selecting representative products for in vitro or clinical evaluation. Three metabolomic-related methods-untargeted ultraperformance liquid chromatography-mass spectrometry (UPLC-MS), targeted UPLC-MS, and untargeted, quantitative 1HNMR-were employed to characterize 34 commercially available green tea samples. Of these methods, untargeted UPLC-MS was most effective at discriminating between green tea, green tea supplement, and non-green-tea products. A method using reproduced correlation coefficients calculated from principal component analysis models was developed to quantitatively compare differences among samples. The obtained results demonstrated the utility of metabolomics employing UPLC-MS data for evaluating similarities and differences between complex botanical products.

Entities: CellLine Chemical Disease Species

Mesh：

Substances：
Tea

Year: 2017 PMID： 28453261 PMCID： PMC5469520 DOI： 10.1021/acs.jnatprod.6b01156

Source DB: PubMed Journal: J Nat Prod ISSN： 0163-3864 Impact factor: 4.050

It is common practice in many research fields to conduct in vitro or clinical evaluation of complex botanical products. The selection of appropriate study material for such investigations is confounded by the complexity and variability of botanical source material. Botanical products contain diverse phytochemicals, of which the identities of many are often not known. In addition, substantive variability in phytochemical composition exists in these products depending on the method of preparation or source material used, and industrial processing of botanical supplements frequently renders them unable to be analyzed using genetic techniques, such as DNA barcoding.[1,2] Such variability in phytochemical composition can greatly impact the interpretation of both in vitro and clinical studies. There is currently a lack of definitive guidelines for ensuring the quality of the product to be tested.[3] The United States Food and Drug Administration (FDA) guidance for clinical trials involving botanical drug products[4] recommends that investigational new drug applications contain “a chemical identification for the active constituents or characteristic markers in the drug substance, if possible”. However, specific guidelines for comparing available products and selecting appropriate representative samples for investigation are currently lacking. The goal of this study was to compare the effectiveness of several metabolomics approaches for evaluating the variability in the phytochemical composition of a series of commercial botanical products. Green tea [leaves from Camellia sinensis (L.) Kuntze (Theaceae)] were employed as a test case. Green tea is one of the most commonly consumed beverages worldwide[5] and is also a popular dietary supplement, ranking fifth in sales in the United States in 2015.[6] Green tea products have been reported to possess numerous health-protective qualities, including cardioprotection, chemoprevention, and weight loss.[7−9] However, many green tea clinical samples are delivered as a complex mixture (tea or extract) as opposed to single-molecule interventions.[10−12] The phytochemical composition of green tea is similar to that of fresh Camellia sinensis leaves except for a few enzymatically catalyzed reactions that occur immediately after harvest.[5,13] Green tea contains over 200 previously identified constituents, including polyphenols, xanthines, theanine, inorganic salts, and individual elements.[14] Polyphenols constitute up to 30% of the dry leaf by mass and are the major constituents in green tea.[15] Catechins, specifically flavan-3-ols and flavan-3-gallates, represent the largest group of polyphenols in green tea leaves and are thought to be largely responsible for the diverse bioactivity demonstrated in green tea studies.[16] The extraction efficiency of green tea polyphenols depends on the extraction method, contact time with the solvent, solvent composition, and the form of tea (i.e., bagged or loose).[17,18] This variability is increased with the incomplete or inconsistent application of analytical methods, making determination of dose content challenging.[19] Meta-analysis studies of green tea products used in clinical studies reported polyphenol doses ranging from 200 to 1207 mg.[10−12] Metabolomics-based approaches have emerged as important tools in assessing large chemical and biological data sets, including those related to disease pathology,[20] drug response,[21] environmental toxicity,[22] and natural products discovery.[23,24] The primary goal of metabolomics is to correlate changes in the chemical profile of a sample with a corresponding shift in macroscopic phenotype due to a perturbation.[25] Metabolomic studies coupled with statistical analysis (chemometric studies) have been employed to characterize the relationships between the metabolome of green teas and corresponding genotype, origin, quality, or other biotic or abiotic attributes.[26−28] Several different analytical techniques are used for metabolomic profiling, including infrared and Raman spectroscopy, NMR spectroscopy, and mass spectrometry (MS).[29,30] NMR-based metabolomic techniques, when acquired under quantitative conditions (qNMR), offer an unbiased assessment of a complex sample composition, allow the simultaneous identification and quantification of diverse metabolites, and are nondestructive of the sample.[24] Mass spectrometry-based metabolomic methods have the advantages of orders of magnitude greater sensitivity than NMR spectroscopy and the ability to couple directly to separation techniques such as gas chromatography (GC) or liquid chromatography (LC).[31] A disadvantage of analysis via mass spectrometry is that ionization is required to detect sample components, yet all chemical compounds are not universally ionized in a mass spectrometer.[32] With these advantages and disadvantages in mind, this study was undertaken to compare the effectiveness of untargeted and targeted mass spectrometry and NMR spectroscopy as methods for chemically characterizing green tea products. One of the critical questions in selecting a botanical product (in this case a sample of green tea) for further study is how it compares to other available products. Chemometric analysis of metabolomics data sets can be used to make these comparisons. Ascribing similarity between metabolomic profiles is often achieved via multivariate statistical modeling procedures, such as principal component analysis (PCA).[26] PCA is a graphical representation of data that can be used to ascribe clusters of similar samples, but is not equipped to quantify variability and similarity between samples, and generally employs only two principal components at a time to classify the samples.[33] Hierarchical cluster analysis (HCA) can be used to cluster samples based upon similarity, but provides only information on similarity between adjacent samples and not for overarching comparisons between all samples in a data set.[34,35] For the work described herein, an alternate approach for comparison of samples was employed, that of a reproduced correlation coefficient matrix. The reproduced correlation coefficient matrix is based on PCA scores and loadings, but is derived from all principal components (i.e., not just a pair of components, as in traditional PCA plots). As demonstrated herein using the example of green tea, the correlation matrix displays a series of correlation coefficients that can be used to quantitatively compare multiple samples in a data set and determine which are most chemically similar. Such information can then be used to inform product selection for later in vitro or clinical evaluation.

Results and Discussion

Comparison of Extraction Techniques

An important first step in comparing the chemistry of complex botanical products is selecting the appropriate solvent extraction technique. Two extraction techniques were considered for this study: hot water extraction and methanol extraction. Hot water extraction replicates the traditional process of brewing tea leaves and should therefore yield results relevant to consumer use. However, methanol extraction was appealing due to its nonselective ability to extract a wide range of secondary metabolites and the ease in removing methanol solvent for extract storage and processing.[36] To aid in the decision between hot water and methanol extraction, triplicate extracts of a National Institute of Standards and Technology (NIST) green tea standard were prepared in both hot water and methanol, and their chemical composition was compared. Overall, the two different extraction techniques—hot water and methanol—yielded similar quantities of the major polyphenolic metabolites (Figure S1, Supporting Information). The metabolite profile as determined by mass spectrometric analysis appeared similar between the two techniques. The hot water extraction sample had a higher (−)-gallocatechin content relative to the methanol extraction sample, whereas the methanol extraction sample displayed higher levels of (−)-epicatechin gallate, (−)-epigallocatechin gallate, and gallic acid relative to the water extraction sample. Methanol was selected as the extraction solvent for subsequent metabolomics analysis due to the overall similarities of the extracted quantities and the ease of preparing and handling methanolic extracts.

Differentiation of Green Tea Samples by Untargeted Mass Spectrometry Metabolomics

Commercially available green tea products (n = 34) were selected using consumer sales reports[37] and product quality reports[38,39] (Table S1, Supporting Information). A turmeric–ginger tea served as a negative control (T23), and NIST reference standards (T26, T27, and T37) served as positive controls. For the sake of further comparison, two of the selected green teas contained additional botanical additives (T24 and T38) (Table S1, Supporting Information). Untargeted metabolomic analysis of the green tea samples using UPLC-MS yielded 2270 marker ions (unique retention time–m/z ion pairings) for 114 objects (i.e., 38 green tea samples prepared by extraction in triplicate), which were analyzed using PCA (Figure A). The extraction replicates (e.g., T01-1, T01-2, and T01-3) of each green tea product were overlaid on the PCA plot (Figure A), indicating excellent repeatability of the extraction technique and subsequent UPLC-MS analysis. PCA using untargeted metabolomics data identified one injection, T28-1, which originally appeared to be an outlier, but had been mislabeled in the UPLC injection queue. The ability to identify and address this mislabeled injection highlights the benefit and importance of having replicate samples (each analyzed separately) for metabolomic analyses.

Figure 1

Principal component analysis (PCA) scores plot of green tea samples drawn with Hotelling’s 95% confidence ellipse. Data points representing triplicate green tea samples were closely clustered, and distinct clusters were observed between green tea supplements, green teas, and the negative control (turmeric–ginger tea, T23, indicated in the figure as “non-green tea”). Representative samples are highlighted (T23, T24, T26, T27, and T37) to demonstrate the reproducibility of the extraction and analytical protocol. SRM represents standard reference material from the National Institute of Standards and Technology (NIST); data points indicated as “suppl” are green tea supplements. Inspection of the data indicated that the sample clusters were located at different points in the two-dimensional space prescribed by two vectors, principal component 1 (PC1 = 38%) and principal component 2 (PC2 = 22%) (Figure A). Three distinct clusters were observed in the data, corresponding to the three different sample types (green tea, green tea supplement, and the non-green tea) studied. The negative control (T23) was located beyond the boundary of the Hotelling’s 95% confidence ellipse. The loose leaf and powdered green teas were grouped together and were separate from the green tea supplements. The two green teas that contained additional botanical components (T24 and T38), although roughly grouped with the green tea samples, were visibly drawn away from the main cluster of tea samples in the PCA. The positive controls (NIST samples T27 and T37) also clustered with their commercial counterparts (loose leaf tea and green tea supplements, respectively) and were located centrally within each grouping. Thus, the NIST standards do, indeed, appear to be representative of green teas used commercially by U.S. consumers. Using the unsupervised PCA analysis, it was not possible to visually differentiate between the green tea leaf and powdered samples, which suggests that the chemistry of such samples is similar. Smaller groupings apparent within the loose leaf and powdered tea clusters were noted (Figure ). These clusterings could have represented variations in tea cultivar, product source, and/or processing methods.[26−28,31,33,40,41] However, commercial suppliers do not traditionally offer records of cultivars, geographic locations, or precise processing procedures of their tea products. Therefore, it was not possible to determine the underlying characteristics that produced these smaller clusters. This did not present a problem for the investigation being conducted here, as the goal was not to compare green tea composition from different geographical locations but evaluate the variability in samples used by consumers and to determine which samples are most representative for further clinical and in vitro evaluation. Averaging the peak area from each marker ion across the analyses of the triplicate extraction samples resulted in a data matrix (2270 marker ions and 38 objects) that yielded a similar clustering pattern to that observed for the individual replicates (Figure ). The averaged data set was used for comparison against targeted chemometric analysis and NMR-based metabolomics. The separation observed in the PCA can be explained through some of the identified metabolites highlighted in the loading plots for PC1 (Figure A) and PC2 (Figure B). Principal component loadings estimate the degree to which each independent variable contributes to the individual principal components of a PCA model; the greater the magnitude of a particular variable’s loading to a component, the more it contributes to that principal component. Plotting each principal component’s loading demonstrates which variables (marker ions) are responsible for the clusterings and shifts observed in the PCA scores plot. The loadings plot for the first principal component (Figure A) highlighted the catechins (−)-epigallocatechin gallate, (−)-epigallocatechin, (−)-epicatechin gallate, (−)-epicatechin and a flavonol, rutin, as all contributing negatively to PC1. These ions were the dominant marker ions responsible for separating the green tea samples from the negative control (T23) (Figure A), as they were present in the green tea samples in higher concentrations than the negative control. The components detected were identified by comparing retention times, accurate masses, and MS-MS fragmentation patterns against standards. Other ions that were dominant in the green tea samples included two digallate dimers of epigallocatechin gallate, theasinensin A (m/z 913.1639) and (−)-epigallocatechin-3-O-(3-O-methyl) gallate (m/z 472.1005), which were identified tentatively by comparing their accurate masses to literature values.[42,43] It is recognized that definitive identification of these ions is not possible without isolation and confirmation with NMR spectroscopy, but such experiments were beyond the scope of the current study, for which the primary goal was to compare metabolomics data collection and analysis approaches.

Figure 2

Loadings plots from untargeted MS-based PCA of green tea samples. Metabolites with more negative correlation values along the x-axis (PC1, green labels) were present in higher concentrations in the green tea samples versus the negative control (T23, turmeric–ginger tea) and were responsible for the separation observed along the horizontal axis of Figure . Labeled metabolites with greater positive correlation along the y-axis (PC2, brown labels) were more heavily represented in green tea supplement samples versus the loose leaf green tea samples and were the dominant metabolites underlying the differentiation of the two sample groups in the vertical axis of Figure . Metabolites were identified by comparison against analytical standards. In cases where standards were not present, comparisons against the literature using m/z values from high-resolution mass spectrometry are provided. Identifications based on mass without reference standards are tentative. The second principal component discriminated between the green tea supplements and the leaf/powdered teas. The corresponding loading plot revealed several metabolites that were present in higher concentrations in the green tea supplements than the leaf and powdered teas and, thus, were dominant peaks in the positive direction. Myricetin, kaempferol, and quercetin aglycones, identified via comparison against accurate mass and fragmentation pattern of standards, were all present as leading discriminating ions (Figure B). In addition, based upon accurate mass measurements, the tentatively identified theaflavin 3-O-(3-O-methyl) gallate, which is formed via oxidative coupling of epicatechin and epigallocatechin 3-O-(3-O-methyl) gallate,[43] and the dimer epicatechin (4β→8)-epigallocatechin-3-O-gallate[44] were observed in the positive direction of the PC2 loading plot. The data suggest that these compounds are present at higher levels in green tea supplements compared to the loose teas. Again, follow-up studies with NMR structure elucidation would be necessary to confirm the chemical identity of the 3-O-(3-O-methyl) gallate and epicatechin-(4β→8)-epigallocatechin-3-O-gallate.

Targeted Metabolomics Analysis of Green Tea Samples

As a comparison with the untargeted metabolomics approach, targeted quantitative analysis was conducted of a series of green tea components for which commercial standards were available. These standards included catechins [(+)-catechin, (−)-epicatechin, (−)-epicatechin gallate, (−)-epigallocatechin, (−)-epigallocatechin gallate, and (−)-gallocatechin], flavonols (kaempferol, myricetin, quercetin, and rutin), phenolic acids (caffeic acid, chlorogenic acid, coumaric acid, and gallic acid), an amino acid (theanine), and a purine alkaloid (caffeine). The calibration curves for each standard were linear over a range of 0.5–200 μg/mL, with a coefficient of determination (R2) > 0.992 (Table S2, Supporting Information). The 16 standards were detected in all green tea samples (Figure ), and the concentration of each constituent was determined (Table S3, Supporting Information). Based on the corresponding heat map of quantified standards (Figure ), concentrations of the main catechins and caffeine ranged between 7 and 107 μg/mL extract in all the green tea samples analyzed. The negative control (T23) yielded lower concentrations of most metabolites than the green tea samples, and a number of green tea metabolites were not detected (Figure ). The NIST green tea leaf standard (T26) yielded similar concentrations of (−)-epicatechin, (−)-epicatechin gallate, (−)-gallocatechin, and gallic acid compared to the published certificate of analysis (Table S4, Supporting Information), while NIST reported higher concentrations of (−)-epigallocatechin and (−)-epigallocatechin gallate. These differences are likely a result of interlaboratory differences in extraction procedures.[45] Metabolite concentrations in all green tea samples were within the same order of magnitude as those reported by Phenol-Explorer, a database dedicated to the aggregation of phenolic content data from dietary sources,[46] and published by others conducting quantitative analysis of green tea constituents.[18,47]

Figure 3

Annotated mass spectral profile identifying green tea metabolites used in this study. (A) Positive electrospray ionization mode. (B) Negative electrospray ionization mode.

Figure 4

Quantification of green tea standards in tea samples. Boxes represent average concentrations of triplicate samples in μg/mg extract. #Negative control (turmeric–ginger tea); ‡NIST standard reference materials; §green teas containing other botanical additives. ext, extract.

Annotated mass spectral profile identifying green tea metabolites used in this study. (A) Positive electrospray ionization mode. (B) Negative electrospray ionization mode. Quantification of green tea standards in tea samples. Boxes represent average concentrations of triplicate samples in μg/mg extract. #Negative control (turmeric–ginger tea); ‡NIST standard reference materials; §green teas containing other botanical additives. ext, extract. With only the 16 quantified constituents as independent variables in a 16 × 38 matrix, targeted metabolomics chemometric analysis resulted in a less discriminating PCA scores plot (Figure ) compared to that generated with the untargeted metabolomics analysis (Figure ). Using the targeted metabolite PCA plot, it was possible to differentiate the green tea samples from the negative control (T23). However, the NIST positive control (T26) and several leaf tea samples (T33 and T34) were interspersed among the green tea supplement samples. Thus, it was not possible to effectively discriminate between the tea and supplement samples using only the targeted metabolite data, whereas the untargeted approach (Figure ) yielded clear delineations between the varying types of green tea. Inclusion of more standards could have potentially improved the targeted metabolite analysis;[47] however, green tea products routinely contain more than 200 known bioactive phytochemical constituents and many more undiscovered compounds.[14] Quantifying every known phytochemical would still represent a fraction of the >2000 individual marker ions used in the untargeted metabolomics approach. Thus, untargeted chemometrics provided an inherent advantage for classifying samples, which was borne out in the observed differences between PCA score plots (Figure and Figure ) and their ability to distinguish between the various green tea sample types.

Figure 5

Principal component analysis of targeted mass spectrometry data, drawn with Hotelling’s 95% confidence ellipse. The chemometric matrix consisted of 15 quantified samples (targeted variables) and 38 objects (for quantification data, see Table S3, Supporting Information). Representative samples are highlighted (T23, T26, T27, and T37).

Comparison of 1H NMR Spectroscopy and Mass Spectrometry Chemometric Analyses

The performance of untargeted mass spectrometry and 1H NMR chemometric analysis of green tea extract samples were further compared. Separating the 1H NMR region of δ 0.5 to 8.0 ppm into bins of 0.05 ppm yielded 150 spectral bins (independent variables) across all samples to describe the metabolite profile (Figure S2, Supporting Information). When compared to the PCA plot obtained from untargeted mass spectrometry data (Figure ), the NMR spectroscopic results (Figure ) displayed similar trends in clustering tea samples. In both score plots, separation was observed between green tea samples and the negative control (T23), although in the 1H NMR PCA plot, T23 was visually closer to the boundary of Hotelling’s confidence ellipse. However, the NMR metabolomics data displayed more overlap between green tea supplement samples and loose leaf tea and powdered tea samples (Figure ) than was observed for the untargeted mass spectrometry data (Figure ). The overlap was due to the dispersal of variables along PC1; the variables were not clustered as cleanly as they were in the untargeted mass spectrometry analysis (Figure ). This suggests that the spectral bins containing overlapping information lowered discrimination between tea samples. However, a higher field instrument using a cryoprobe has improved resolution. Using such an instrument (as was achieved by Yuk et al., 2013) could provide better discrimination between green teas and improve the overall metabolomics analysis.[48]

Figure 6

Principal component analysis (PCA) scores plot of data from 1H NMR metabolomics analysis of green tea extracts. Representative samples are highlighted (T23, T26, T27, and T37).

Comparison of Similarity Using a Reduced Correlation Matrix

One of the potential criticisms of using unsupervised statistical methods, such as PCA, to evaluate similarity between samples is the reduction of the model to only two dimensions (i.e., PC1 vs PC2), which inherently limits the analysis. For the green tea untargeted approach, PC1 and PC2 represented 60% of the total variation in the sample (38% and 22%, respectively). This has been observed in other metabolomics studies, where the principal components used for visual discrimination represented only a fraction of the total variation present in the samples.[26,31,40] To address the limitations of using only two principal components to describe the variability in the data set, a “reproduced correlation matrix” (Figure ) was calculated, which is based on four principal components and is calculated according to eq . Collectively, the four components used to generate the data in Figure encapsulate 84% of the variation in the metabolomics data. The reproduced correlation matrix is a simple and useful way to compare differences among samples in a complex metabolomics data set. The values in the matrix range from −1.0 to 1.0, and it is possible, by selecting the relevant correlation value in the matrix, to obtain a quantitative measure of the similarity between any two samples in the data set (Table S5, Supporting Information). For example, the data set could be used to select a commercial sample that is similar to the NIST loose leaf standard (T26). Samples T02, T13, T21, and T22, which demonstrate correlations with T26 of 0.974, 0.983, 0.990, and 0.997, respectively, might be good choices. Conversely, the “superantioxidant” botanical-containing green tea T24, which shows a correlation of −0.029, is, based on the metabolomics data, less similar to the NIST standard.

Figure 7

Heat map correlation matrix for green tea samples. Correlation was based upon the averaged metabolomic profile for each sample and calculated from the reproduced correlation coefficient matrix comprised of a four principal component model (Table S5, Supporting Information). Darker shades represent stronger correlation between samples. #Negative control; ‡NIST standard reference materials; §green teas with botanical additives. This study demonstrates the utility of untargeted mass spectrometry-based metabolomics to effectively discriminate between multiple classes of green tea products. Chemometric analysis using an untargeted metabolomics profiling was more effective in clustering loose leaf green teas from green tea supplements compared to targeted mass spectrometry analysis or 1H NMR metabolomics. Previous green tea studies highlight the benefits of using NMR spectroscopy to study metabolomic differences, given that this method can detect all 1H-containing species in a sample, including phytochemical compounds that could be difficult to analyze via mass spectrometry.[27,33,49] One widely accepted limitation of MS has been the nonuniversality of natural product ionization; the utilization of both positive and negative modes in this study resulted in a wider range of metabolites detected and used in the overall metabolomic analysis. For the samples evaluated, the UPLC-MS data were more useful for distinguishing various sample types (i.e., supplement versus tea) than NMR-based metabolomics analysis. The improved performance of mass spectrometry-based metabolomics as compared to NMR-based metabolomics can be attributed to the ease with which mass spectrometry can be coupled to separation methods (liquid or gas chromatography), which provide another dimension of separation in the data. In addition, mass spectrometry is a far more sensitive technique than NMR spectroscopy, with limits of detection in the pM to nM range.[24,31,36] In contrast, NMR analysis is limited to the more abundant metabolites, which may or may not be the most relevant with respect to bioactivity.[50] Other studies have also demonstrated the improved ability of mass spectrometry to differentiate complex supplement products due to higher sensitivity.[51] Despite the somewhat superior data obtained here with mass spectrometric analysis as compared to NMR spectroscopy, results from the current work suggest that either NMR or MS could be effective methods to aid in selection of complex natural products or botanical products. Analyzing similarity and variation from a range of commercial products remains a challenge when the study material is a complex natural product or botanical sample.[1] Green tea products, like other complex botanical preparations, contain a wide variety of bioactive secondary metabolites, which vary considerably depending on the cultivar used, geography, processing, and formulation.[52] The results illustrate the usefulness of untargeted metabolomics to obtain a snapshot of this variability. Information obtained by metabolomics analysis could be employed to make an informed opinion as to which products are most representative of those used by consumers or to identify outliers in a data set. Comparison of the data in Figure (PCA based on metabolite profile) and Figure (PCA based solely on representative marker compounds) indicates the advantage of making such decisions using metabolomics information rather than data for selected marker compounds. The marker compounds represent only a subset of the chemical diversity of the samples; thus, one might presume (incorrectly) based on the marker compound data that samples are chemically similar, when in fact important differences exist. Specific to the test case under investigation here, based on the PCA of complex metabolomics data (Figure ), it is clear that the chemical makeup of green tea supplements is different from that of powdered or whole leaf tea samples. If one conducted the comparison among samples using exclusively marker compound data (Figure ), these differences might have been overlooked. Such an oversight could have important ramifications for future studies, given that differences in the chemistry of tea versus supplement samples could lead to different results in in vitro or clinical studies. The need to ascertain similarity and variability has numerous applications in natural products research, whether it is to monitor quality control of products for adulteration,[53,54] authenticate botanical samples,[55] or select samples for further in vitro or in vivo studies. The chemometric approach described herein (untargeted mass spectrometric analysis coupled with reproduced data matrix calculation) has the potential to provide a wealth of data for comparisons of multiple, complex data sets. One of the challenges of using metabolite profiles to characterize similarities and differences among samples is handling the magnitude and complexity of the data that are generated with such analyses. The bottleneck for metabolomics experiments tends not to be in the data collection, but in meaningful data interpretation. An important contribution of this study is the application of the reproduced correlation coefficient matrix as a simple metric for measuring the similarity between multiple samples in a complex data set based on the whole metabolite profile. The reproduced correlation coefficient is a single value that incorporates multiple PCA model components and provides a useful alternative to visual inspection for comparing samples in a PCA plot.

Experimental Section

Chemicals

Unless otherwise noted, all chemicals were of reagent or spectroscopic grade and obtained from Fisher Scientific (Waltham, MA, USA).

Green Tea Product Selection

Green tea products were selected using readily available consumer sales reports[37] and product quality reports.[38,39] The 34 products included 21 whole-leaf teas, six powders, and seven supplements (Table S1, Supporting Information) and were each coded with a T number. A single non-green tea (turmeric–ginger tea) was included as a negative control (T23), and Camellia sinensis standard reference materials from NIST for loose leaf tea (T26), supplement (T27), and oral dosage form (T37) (nos. 3254, 3255, and 3256, respectively) served as positive controls (Table S1, Supporting Information). Two green teas that contained other botanical additives were also selected for comparison (T24 and T38). A retention sample of each product, containing several grams of material, was maintained in the lab for future reference.

Green Tea Product Extraction and Isolation

Green tea products were extracted in triplicate via a hot water or methanol extraction procedure. For the hot water procedure, 200 mg of sample and 20 mL of water were added to a 20 mL scintillation vial and heated to 90 °C. The mixture was stirred in a hot water bath for 5 min, after which the suspension was immediately filtered and cooled to room temperature. Each sample was lyophilized and stored at −80 °C until analysis. Sample extractions in methanol were performed in the same 10:1 ratio as the hot water extracts. Thus, to each 200 mg tea sample was added 20 mL of reagent-grade methanol, and the mixtures were shaken overnight at room temperature, filtered, and dried under reduced pressure. NMR and MS analyses were conducted upon these samples in triplicate.

1H NMR Analysis

NMR spectra were acquired with a JEOL ECA-400 NMR spectrometer (400 MHz, JEOL Ltd., Tokyo, Japan) equipped with a high-sensitivity JEOL Royal probe and a 24-slot autosampler. NMR chemical shift values were referenced to residual solvent signals for CD3OD (δH 3.31 ppm). To collect 1H NMR data, each sample was resuspended in CD3OD (Cambridge Isotope Laboratories, Andover, MA, USA) to a concentration of 10 mg/mL.

Mass Spectrometry Analysis

Ultraperformance (UP) LC-MS data were acquired using a Q Exactive Plus quadrupole-orbitrap mass spectrometer (Thermo Scientific, Waltham, MA, USA) with an electrospray ionization source coupled to an Acquity UPLC system (Waters, Milford, MA, USA). Before UPLC-MS analysis, each sample was resuspended in methanol to yield a concentration of 1 mg/mL. Triplicate injections of 3 μL were then performed. Samples were eluted from the column (Acquity UPLC BEH C18 1.7 μm, 2.1 × 50 mm, Waters) at a flow rate of 0.3 mL/min using the following binary gradient, with solvent A consisting of H2O (0.1% formic acid added) and solvent B consisting of CH3CN (0.1% formic acid added): initial isocratic composition of 95:5 (A:B) for 1.0 min, increasing linearly to 0:100 over 20 min, followed by an isocratic hold at 0:100 for 1 min, gradient returned to starting conditions of 95:5 for 2 min, and held isocratic again for 1 min. The mass spectrometer was operated in the positive/negative switching ionization mode over a full scan range of m/z 150–2000 with the following settings: capillary voltage, 5 V; capillary temperature, 300 °C; tube lens offset, 35 V; spray voltage, 3.80 kV; sheath gas flow and auxiliary gas flow, 35 and 20 arbitrary units, respectively.

Metabolite Quantification

Quantification of the major catechin, phenolic acid, and flavonoid components of the green tea products used 15 calibration standards obtained from Chromadex (Irvine, CA, USA) (Table S2, Supporting Information). LC-MS analysis was conducted as described above. Standards were prepared in spectrometric-grade MeOH and diluted in a 2-fold dilution series ranging from 0.1 to 200 μg/mL before injection. A calibration curve was constructed by plotting the area of the selected ion chromatogram for each standard versus nominal concentration. Concentrations of each standard in the extracts were determined by 1/x2 weighted least-squares linear regression.

Chemometric Analysis

Chemometric analysis was conducted using a slightly modified version of a previously reported method.[56] Both the untargeted and targeted UPLC-MS data sets for each sample were individually analyzed, aligned, and filtered with MZmine 2.20 software (http://mzmine.sourceforge.net/).[57] Peak detection in MZmine was achieved using the following parameters for peak detection: noise level (absolute value), 5 × 105 counts; minimum peak duration, 0.05 min; tolerance for m/z variation, 0.05; and tolerance for m/z intensity variation, 20%. Deisotoping, peak list filtering, and retention time alignment algorithm packages were used to refine peak detection. Finally, the join algorithm integrated all metabolomic profiles into a single data matrix using the following parameters: the balance between m/z and retention time was set at 10.0 each, m/z tolerance was set at 0.001, and retention time tolerance size was defined as 0.5 min. The spectral data matrix was exported for analysis, both as a set of peak areas for individual ions detected in triplicate extractions and the average peak areas for the triplicate extractions. Throughout the MZmine data processing steps, samples that did not possess a particular marker ion were coded with a peak area of 0, to maintain the same number of variables for all data sets. Chemometric analysis was performed on the data sets (both the individual triplicate data and the average of the triplicates for each sample) using Sirius version 9.0 (Pattern Recognition Systems AS, Bergen, Norway).[58] Initially, transformation from heteroscedastic to homoscedastic noise was carried out by a fourth root transform of the spectral variables.[59] Principal component analysis was used to provide unsupervised statistical analysis of the green tea samples. Correlation coefficients (r) were calculated from principal component models and used to indicate similarity between samples. A data matrix X can be decomposed into the sum of the mean values of the variables (X̅), the data estimated from principal components representing the major variation in X (X̂PCA), and residual data (noise and other sources of small variation, EPCA) (eq ). The product of the estimated PCA matrix and its transpose (X̂PCAT) divided by the norm of the two matrices yields a reproduced correlation coefficient matrix that can be used for comparison between variables (eq ) or objects (samples) (eq ): Thus, the reproduced correlation coefficient between each pair of variables (eq ) or objects (eq ) can be determined from the scalar product divided by the norm of the vectors (X̂PCA) (eq ). Thus, for any two objects, and , Equation provides a correlation coefficient that describes the extent to which a given sample (in the present case, a green tea extract) correlates with any other sample in the data set after removing noise and other sources of small variation from the data. Coefficient values closer to 1 demonstrate a stronger correlation (i.e., greater similarity) between the two samples. This calculation was performed in Excel (Microsoft, Redmond, WA, USA), using the PCA loading and score information obtained from the Sirius software output. For NMR-based metabolomics, NMR spectra were processed using Mnova (Mestrelab Research, Santiago de Compostela, Spain) with exponential apodization (exponent 1); global phase correction; Bernstein-Polynomial baseline correction; Savitzky–Golay line smoothing; and normalization using total spectral area as provided in Mnova. Spectral regions from 0.5 to 8 ppm were included in the normalization and analysis. The NMR spectra of all the green tea samples were binned by 0.05 ppm. The narrow bin size allowed details to be revealed and provided much information on low-intensity peaks.[60] In this data set, each bin was considered to be one variable. Binned data were used to conduct PCA using the Sirius software.

46 in total

1. Stability of tea polyphenol (-)-epigallocatechin-3-gallate and formation of dimers and epimers under common experimental conditions.

Authors: Shengmin Sang; Mao-Jung Lee; Zhe Hou; Chi-Tang Ho; Chung S Yang
Journal: J Agric Food Chem Date: 2005-11-30 Impact factor: 5.279

2. Metabolomics analysis reveals the compositional differences of shade grown tea (Camellia sinensis L.).

Authors: Kang Mo Ku; Jung Nam Choi; Jiyoung Kim; Jeong Kee Kim; Lang Gook Yoo; Sang Jun Lee; Young-Shick Hong; Choong Hwan Lee
Journal: J Agric Food Chem Date: 2010-01-13 Impact factor: 5.279

3. High-resolution mass spectrometry associated with data mining tools for the detection of pollutants and chemical characterization of honey samples.

Authors: Jérôme Cotton; Fanny Leroux; Simon Broudin; Mylène Marie; Bruno Corman; Jean-Claude Tabet; Céline Ducruix; Christophe Junot
Journal: J Agric Food Chem Date: 2014-11-11 Impact factor: 5.279

4. Comparison of in vitro antioxidant activities and bioactive components of green tea extracts by different extraction methods.

Authors: Xi Jun; Shen Deji; Li Ye; Zhang Rui
Journal: Int J Pharm Date: 2011-02-12 Impact factor: 5.875

5. Metabolic dependence of green tea on plucking positions revisited: a metabolomic study.

Authors: Jang-Eun Lee; Bum-Jin Lee; Jeong-Ah Hwang; Kwang-Sup Ko; Jin-Oh Chung; Eun-Hee Kim; Sang-Jun Lee; Young-Shick Hong
Journal: J Agric Food Chem Date: 2011-09-16 Impact factor: 5.279

Review 6. Beneficial effects of green tea--a review.

Authors: Carmen Cabrera; Reyes Artacho; Rafael Giménez
Journal: J Am Coll Nutr Date: 2006-04 Impact factor: 3.169

7. Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression.

Authors: Arun Sreekumar; Laila M Poisson; Thekkelnaycke M Rajendiran; Amjad P Khan; Qi Cao; Jindan Yu; Bharathi Laxman; Rohit Mehra; Robert J Lonigro; Yong Li; Mukesh K Nyati; Aarif Ahsan; Shanker Kalyana-Sundaram; Bo Han; Xuhong Cao; Jaeman Byun; Gilbert S Omenn; Debashis Ghosh; Subramaniam Pennathur; Danny C Alexander; Alvin Berger; Jeffrey R Shuster; John T Wei; Sooryanarayana Varambally; Christopher Beecher; Arul M Chinnaiyan
Journal: Nature Date: 2009-02-12 Impact factor: 49.962

8. Guidance from an NIH workshop on designing, implementing, and reporting clinical studies of soy interventions.

Authors: Marguerite A Klein; Richard L Nahin; Mark J Messina; Jeanne I Rader; Lilian U Thompson; Thomas M Badger; Johanna T Dwyer; Young S Kim; Carol H Pontzer; Pamela E Starke-Reed; Connie M Weaver
Journal: J Nutr Date: 2010-04-14 Impact factor: 4.798

9. Application of metabolomics in the analysis of manufacturing type of pu-erh tea and composition changes with different postfermentation year.

Authors: Kang Mo Ku; Jiyoung Kim; Hye-Jin Park; Kwang-Hyeon Liu; Choong Hwan Lee
Journal: J Agric Food Chem Date: 2010-01-13 Impact factor: 5.279

10. Chemical and Antimicrobial Profiling of Propolis from Different Regions within Libya.

Authors: Weam Siheri; Tong Zhang; Godwin Unekwuojo Ebiloma; Marco Biddau; Nicola Woods; Muattaz Yassein Hussain; Carol J Clements; James Fearnley; RuAngelie Edrada Ebel; Timothy Paget; Sylke Muller; Katharine C Carter; Valerie A Ferro; Harry P De Koning; David G Watson
Journal: PLoS One Date: 2016-05-19 Impact factor: 3.240

21 in total

Review 1. Naturally complex: Perspectives and challenges associated with Botanical Dietary Supplement Safety assessment.

Authors: Kelly A Shipkowski; Joseph M Betz; Linda S Birnbaum; John R Bucher; Paul M Coates; D Craig Hopp; Duffy MacKay; Hellen Oketch-Rabah; Nigel J Walker; Cara Welch; Cynthia V Rider
Journal: Food Chem Toxicol Date: 2018-04-04 Impact factor: 6.023

2. Selection of Priority Natural Products for Evaluation as Potential Precipitants of Natural Product-Drug Interactions: A NaPDI Center Recommended Approach.

Authors: Emily J Johnson; Vanessa González-Peréz; Dan-Dan Tian; Yvonne S Lin; Jashvant D Unadkat; Allan E Rettie; Danny D Shen; Jeannine S McCune; Mary F Paine
Journal: Drug Metab Dispos Date: 2018-05-07 Impact factor: 3.922

3. Conventional and accelerated-solvent extractions of green tea (camellia sinensis) for metabolomics-based chemometrics.

Authors: Joshua J Kellogg; Emily D Wallace; Tyler N Graf; Nicholas H Oberlies; Nadja B Cech
Journal: J Pharm Biomed Anal Date: 2017-07-29 Impact factor: 3.935

4. Identification of adulteration in botanical samples with untargeted metabolomics.

Authors: E Diane Wallace; Daniel A Todd; James M Harnly; Nadja B Cech; Joshua J Kellogg
Journal: Anal Bioanal Chem Date: 2020-04-29 Impact factor: 4.142

5. Identification of Intestinal UDP-Glucuronosyltransferase Inhibitors in Green Tea (Camellia sinensis) Using a Biochemometric Approach: Application to Raloxifene as a Test Drug via In Vitro to In Vivo Extrapolation.

Authors: Dan-Dan Tian; Joshua J Kellogg; Neşe Okut; Nicholas H Oberlies; Nadja B Cech; Danny D Shen; Jeannine S McCune; Mary F Paine
Journal: Drug Metab Dispos Date: 2018-02-21 Impact factor: 3.922

Review 6. "Natural" is not synonymous with "Safe": Toxicity of natural products alone and in combination with pharmaceutical agents.

Authors: Tyler E Gaston; Donna L Mendrick; Mary F Paine; Amy L Roe; Catherine K Yeung
Journal: Regul Toxicol Pharmacol Date: 2020-03-18 Impact factor: 3.271

Review 7. Selection and characterization of botanical natural products for research studies: a NaPDI center recommended approach.

Authors: Joshua J Kellogg; Mary F Paine; Jeannine S McCune; Nicholas H Oberlies; Nadja B Cech
Journal: Nat Prod Rep Date: 2019-08-14 Impact factor: 13.423