Identification of the origins of Panax ginseng has been issued in Korea scientifically and economically. We describe a metabolomics approach used for discrimination and prediction of ginseng roots from different origins in Korea. The fresh ginseng roots from six ginseng cooperative associations (Gangwon, Gaeseong, Punggi, Chungbuk, Jeonbuk, and Anseong) were analyzed by UPLC-MS-based approach combined with orthogonal projections to latent structure-discriminant analysis multivariate analysis. The ginsengs from Gangwon and Gaeseong were easily differentiated. We further analyzed the metabolomics results in subgroups. Punggi, Chungbuk, Jeonbuk, and Anseong ginseng could be easily differentiated by the first two orthogonal components. As a validation of the discrimination model, we performed blind prediction tests of sample origins using an external test set. Our model predicted their geographical origins as 99.7% probability. The robust discriminatory power and statistical validity of our method suggest its general applicability for determining the origins of P. ginseng samples.
Identification of the origins of Panax ginseng has been issued in Korea scientifically and economically. We describe a metabolomics approach used for discrimination and prediction of ginseng roots from different origins in Korea. The fresh ginseng roots from six ginseng cooperative associations (Gangwon, Gaeseong, Punggi, Chungbuk, Jeonbuk, and Anseong) were analyzed by UPLC-MS-based approach combined with orthogonal projections to latent structure-discriminant analysis multivariate analysis. The ginsengs from Gangwon and Gaeseong were easily differentiated. We further analyzed the metabolomics results in subgroups. Punggi, Chungbuk, Jeonbuk, and Anseong ginseng could be easily differentiated by the first two orthogonal components. As a validation of the discrimination model, we performed blind prediction tests of sample origins using an external test set. Our model predicted their geographical origins as 99.7% probability. The robust discriminatory power and statistical validity of our method suggest its general applicability for determining the origins of P. ginseng samples.
Entities:
Keywords:
Discrimination of origins; Metabolomics; Orthogonal projections to latent structure-discriminant analysis; Panax ginseng
Ginseng is a plant in the genus Panax found in the Northern Hemisphere and in East Asian countries. It is estimated that about 65% to 80% of the world’s population is using traditional medicine as the primary form of healthcare [1]. Ginseng has been reported to contain polyacetylenes, sesquiterpenes, polysaccharides, peptidoglycans, and vitamins, besides more than 30 ginsenosides have primarily been associated of the major therapeutic effects of ginseng roots [2,3]. It has recently been reported that differences in the bioactivities among species, geographical origin, and extraction methods have been linked to different ratio of ginsenosides [4,5]. As compositions of ginseng roots could vary significantly according to the origin and age, it is important to discriminate among ginseng roots from different sources [6-9].Metabolomics has been applied to the classification of plant materials, with principal component analysis (PCA) as the main statistical approach [10-12]. The orthogonal projection to latent structure-discriminant analysis (OPLS-DA) was recently developed, a type of supervised classification [13]. The predictive and orthogonal components supported by OPLS-DA facilitate the interpretation of class discrimination and the prediction of the class membership of unknown samples. Therefore, OPLS-DA is more appropriated than PCA to differentiate origins in cases where many factors can affect metabolite profiles [14,15]. Metabolite profiling has been applied using a number of techniques including NMR or different combination of LC, GC, and MS [16-21]. Moreover, the quality and profiling of natural products have been reported to vary widely depending on the geographical origin, growing environment, storage condition, and postharvest processing of the herbal ingredients [12,22,23].In the present study, a metabolomics approach combining UPLC-MS-based analysis with OPLS-DA was developed for discriminating the six geographical origins (Gangwon, Gaeseong, Punggi, Chungbuk, Jeonbuk, and Anseong) of Korean P. ginseng roots. The OPLS-DA model was statistically allowed for differences among their origins. Importantly, validation by predicted model on blind test samples gave a statistical measurement of the reliability of the approach, which is required for practical application.
MATERIALS AND METHODS
Chemicals and materials
Leucine-enkephalin and formic acid were purchased from Sigma-Aldrich (St. Louis, MO, USA). Acetonitrile and methanol of HPLC grade were obtained from SK Chemical Reagent (Seoul, Korea). All aqueous solutions were prepared with ultrapure water produced by Milli-Q system (18.2 MΩ; Millipore, Bedford, MA, USA). Four hundred twenty-nine Korean P. ginseng roots (4 to 6 years old) were collected locally from Gangwon (GW, 90 samples), Gaeseong (GA, 78 samples), Punggi (PG, 18 samples), Chungbuk (CB, 108 samples), Jeonbuk (JB, 36 samples) and Anseong (AS, 99 samples) Insam Cooperative Associations of National Agricultural Cooperative Federation (Nonghyup) during October to November in 2009 (Fig. 1).
Fig. 1.
The origins of ginseng samples. GA, Gaeseong; GW, Gangwon; AS, Anseong; CB, Chungbuk; PG, Punggi; JB, Jeonbuk.
The roots of each were washed with distilled water to remove adhered soil particles. Each ginseng root body including hair was freeze-dried and ground. The obtained dried ginseng (0.5 g) was weighed, and 5 mL of 70% methanol were added. The methanol extract of ginseng was obtained in an ultrasonic water bath for 60 min at room temperature [24]. The fluid was filtered through a syringe filter (0.22 μm) and injected directly into the UPLC system.
UPLC-QTOF/MS analysis
P. ginseng metabolite profiling was performed using an ACQUITY UPLC system (Waters Corporation, Milford, MA, USA) which was equipped with a binary solvent delivery manager, and a sample manger coupled to Micromass Q-TOF Premier mass spectrometer (Waters Corporation) equipped with an electrospray interface. Chromatographic separations were performed on a 2.1×100 mm, 1.7 μm ACQUITY BEH C18 chromatography column. The column temperature was maintained at 35℃, and the mobile phases A and B were water with 0.1% formic acid and acetonitrile with 0.1% formic acid, respectively. The gradient duration program was: 0 min, 10% B; 0 to 7 min, 10% to 33% B; 7 to 14 min, 33% to 56% B; 14 to 21 min, 56% to 100% B; 21 to 23 min, 100% B; 23 to 25 min 10% B. The flow rate was 0.4 mL/min and the injection volume 5 μL with partial loop mode.The mass spectrometer was operated in a positive ion mode. N2 was used as the desolvation gas. The desolvation temperature was set to 350℃ at a flow rate of 500 L/h and source temperature of 100℃. The capillary and cone voltages were set to 2,700 and 27 V, respectively. The data were collected for each sample from 200 to 1,500 Da with a 0.25-second scan time and a 0.01-second interscan delay over a 25-minute analysis time. Leucine-enkephalin (m/z 556.2771) was used as a reference compound.
Chemometric data analysis
The mass raw data were analyzed by the MarkerLynx applications manager ver. 4.1 (Waters, Manchester, UK). The parameters were as follows: retention time range 0.5 to 18.5 min, mass range 200 to 1,500 Da, mass tolerance 0.04 Da, isotopic data were excluded for analysis, noise elimination level was set at 10, mass window and retention time window were set at 0.04 and 0.1 min, respectively. After creating a process for mean-centered and par-scaled data set the Create Dataset window, then, PCA and OPLS-DA were performed to discriminate and predict the geographical origins of ginseng roots. The resulting two-dimensional matrix data of mass values and their peak intensities were further investigated with SIMCA-P+ software 12.0 (Umetrics, Umea, Sweden) for multivariate statistical analysis. In the OPLS method, the systematic variations in X were separated into two parts; one that is linear to Y and the other is orthogonal to Y
[15,25], hence, the OPLS model comprises two blocks of model variations: 1) the Y-predictive block, which represents the between class variation, and 2) the Y-orthogonal block also referred to as the uncorrelated variation, which constitutes the within class variation [26].
RESULTS AND DISCUSSION
In a total ion chromatography (TIC), over thousands ion peaks from each extract of P. ginseng were detected in the range of m/z 200 to 1,500 at 0.5 to 18.5 min. Because of complexity and similarity of TICs, multivariate analysis such as PCA or OPLS-DA was performed to discriminate the origin of each sample. In the PCA analysis, ginseng samples of 6 geographical origins were not clearly discriminated because the major peaks were severely overlapped (data were not shown).However, in the score plot of OPLS-DA model, three clusters were clearly separated as GW, GA, and the other 4 regions (Fig. 2A) the cross-validated predictive ability (Q2) and the variance related to the differences among the classes R2(y) were found to be significant (Q2=0.794, R2[y]=0.875) with five predictive and five orthogonal (5+5) components. The separation of GW and GA clusters from a cluster of 4 other regions could be easily achieved by combining predictive component 1 with predictive component 2. After the data sets of GW and GA were removed, the rest of samples was also clearly separated into four clusters with R2(y)=0.965 and Q2=0.833 in the OPLS-DA model (Table 1 and Fig. 2B). This result implied that the tested ginseng roots can be discriminated as six regions by two successive steps of OPLS-DA.
Fig. 2.
Mutlvariate statistical analysis of ginseng samples. (A) Orthogonal projection to latent structure-discriminant analysis (OPLS-DA) score plot of 6 geographical origins (Gaeseong [GA], Gangwon [GW], Anseong [AS], Chungbuk [CB], Punggi [PG], and Jeonbuk [JB]). (B) OPLS-DA score plots of 4 geographical origins (AS, PG, JB, and CB).
Table 1.
The predictive ability (Q2) and total variance (R2[y]) of each OPLS-DA model
The predictive ability (Q2) and total variance (R2[y]) of each OPLS-DA modelOPLS-DA, orthogonal projection to latent structure-discriminant analysis; GW, Gangwon; GA, Gaeseong; JB, Jeonbuk; PG, Punggi; CB, Chungbuk; AS, Anseong.1) No. of blind test set.Another critical step in a statistical multivariate analysis is to validate a model on samples not used in building the model itself. For the validation of the model, we randomly took out 143 test samples (1/3 of total; 26 samples of GA, 30 samples of GW, 33 samples of AS, 36 samples of CB, 12 samples of JB, and 6 samples of PG) as blind samples and processed the OPLS-DA prediction model. All of the blind samples of GW and GA were correctly belonged to their origins on the predicted score plot (Fig. 3A) except only one sample (GW-88), which was located between GW cluster and PG of mixed cluster. However, GW 88 was also positioned in GW correctly in the OPLS-DA predicted models with GW and PG samples (Fig. 3B). Also, the other prediction models between GW and each of the rest (CB, JB, or AS) conducted the same results that all of the blind samples were restored to the corresponding origins (Fig. 3C-E). The predictive components, orthogonal components, R2(y) values, and Q2 values of described score plots and predictive plots were shown on Table 1.
Fig. 3.
Predicted score plot of the ginseng for discrimination of geographical origins. (A) Predicted with Gaeseong (GA), Gangwon (GW), Anseong (AS), Chungbuk (CB), Punggi (PG), and Jeonbuk (JB) origins. (B) Predicted with GW and PG origins. (C) Predicted with GW and CB origins. (D) Predicted with GW and JB origins. (E) Predicted with GW and AS origins. (F) Predicted with AS, JB, PG and CB origins. (G) Predicted with PG and CB origins.
The predicted OPLS-DA was processed for the prediction of origins in four overlapped regions. The blind samples of JB and AS were perfectly positioned in their own origins; however, samples of PG and CB were not clearly belonged to their own cluster (Fig. 3F). When the samples of PG and CB were compared directly by OPLS-DA model, the prediction of origins was not successful between PG and CB (Fig. 3G). The classification score (Y calculated) of sample data sets and the prediction score (Y predicted) of blind sample data sets of PG and CB were represented in Fig. 4.
Fig. 4.
Prediction of origins of the Punggi and Chungbuk ginseng samples (■ Punggi [PG] ginseng, ● Chungbuk [CB] ginseng, □ no class of Punggi ginseng, and ○ no class of Chungbuk ginseng).
PG-4 sample, close to CB cluster in Fig. 3G, was positioned in the borderline of 0.5 as the threshold level. The ginsengs of CB and PG origins had been discriminated statistically, even though some CB and PG blind test samples were located to close on borderline. Our results suggested that Korean ginseng could be identified the geographical origin as 99.7% probability. This method was used as a stringent judgment tool in recent discrimination origins and age differentiation of ginseng studies [27-29].Multivariate models find relations among correlated variables to separate systematic variation from noise. OPLS-DA has the more advantage than an unsupervised PCA method; it separates the predictive variation from the orthogonal variation and can be studied and interpreted separately. In this study, OPLS-DA multivariated analysis showed that the geographical origin of P. ginseng cultiavated in Korea could be determined by metabolites based on LC-MS data. The discrimination of ginseng from six origins in Korea was successfully performed by OPLS-DA models, which were achieved stage by stage as excluding identified origins from raw data set. Also, each validation of models was applied for predicting their origins with blind test samples that predicted as 99.7% probability. Our results suggested that the growth origin of Korean ginsengs could be discriminated by multivariate models; it could be contributed to development ginseng industry for region characterization in Korea.
Authors: Alexander W Chassy; Linh Bui; Erica N C Renaud; Mark Van Horn; Alyson E Mitchell Journal: J Agric Food Chem Date: 2006-10-18 Impact factor: 5.279
Authors: Shiladitya Sengupta; Sue-Anne Toh; Lynda A Sellers; Jeremy N Skepper; Pieter Koolwijk; Hi Wun Leung; Hin-Wing Yeung; Ricky N S Wong; Ram Sasisekharan; Tai-Ping D Fan Journal: Circulation Date: 2004-08-30 Impact factor: 29.690
Authors: Sefater Gbashi; Patrick Berka Njobeh; Ntakadzeni Edwin Madala; Marthe De Boevre; Victor Kagot; Sarah De Saeger Journal: Sci Rep Date: 2020-06-25 Impact factor: 4.379