Literature DB >> 23240883

Feasibility of identifying the tobacco-related global metabolome in blood by UPLC-QTOF-MS.

Ping-Ching Hsu¹, Bin Zhou, Yi Zhao, Habtom W Ressom, Amrita K Cheema, Wallace Pickworth, Peter G Shields.

Abstract

Metabolomics is likely an ideal tool to assess tobacco smoke exposure and the impact of cigarette smoke on human exposure and health. To assess reproducibility and feasibility of this by UPLC-QTOF-MS, three experiments were designed for the assessment of smokers' blood. Experiment I was an analysis of 8 smokers with 8 replicates. Experiment II was an analysis of 62 pooled quality control (QC) samples from 7 nonsmokers' plasma placed as every tenth sample among a study of 613 samples from 160 smokers. Finally, to examine the feasibility of metabolomic study in assessing smoke exposure, Experiment III consisted of 9 smokers and 10 nonsmokers' serum to evaluate differences in their global metabolome. There was minimal measurement and sample preparation variation in all experiments, although some caution is needed when analyzing specific parts of the chromatogram. When assessing QC samples in the large scale study, QC clustering indicated high stability, reproducibility, and consistency. Finally, in addition to the identification of nicotine metabolites as expected, there was a characteristic profile distinguishing smokers from nonsmokers. Metabolites selected from putative identifications were verified by MS/MS, showing the potential to identify metabolic phenotypes and new metabolites relating to cigarette smoke exposure and toxicity.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Nicotine

Year: 2013 PMID： 23240883 PMCID： PMC3579455 DOI： 10.1021/pr3007705

Source DB: PubMed Journal: J Proteome Res ISSN： 1535-3893 Impact factor: 4.466

Introduction

Tobacco smoking is a major cause of morbidity and mortality in developed countries.[1] There are more than 4500 identified chemicals[2] and over 60 potential or probable human carcinogens in cigarette smoke.[2] The complex mixture of cigarette smoke has been classified by IARC (The International Agency for Research on Cancer) as a Group 1 known human carcinogen.[3] The Institute of Medicine (IOM) in 2000, at the request of the Food and Drug Administration (FDA), considered whether harm reduction approaches through reducing toxin exposures in smoke via modified-risk tobacco products (MRTPs) could feasibly enhance tobacco control for smokers who will not or could not quit.[4] They concluded that such an approach was feasible. Recently, the FDA has been given legislative authority over tobacco products, including the ability to establish performance standards (e.g., regulating the amount of carcinogen emissions and exposure to smokers through product design changes) and evaluating manufacturers’ health claims of MRTPs.[5] In the last several years, there has been a renewed interest by the tobacco companies to manufacture these.[6] While tobacco product design changes developed by the manufacturers or mandated by the FDA can be screened in the laboratory for changes in smoke chemical constituents and toxicology,[7] ultimately the impact of such changes must be evaluated in humans using biomarkers. However, there are only a limited number of biomarkers that assess exposure, and none that have been sufficiently validated for lung cancer risk.[8] Given the wide variety of tobacco toxicants, a broad range of biomarkers needs to be developed and validated to assess the impact of cigarette design changes on human exposure and health. The best available cigarette smoke biomarkers are chemically specific, reflecting only a narrow range of known toxicants. These target polycyclic aromatic hydrocarbons (e.g., benzo(a)pyrene), tobacco-specific nitrosamines (e.g., NNN, NNK), aromatic amines, and volatile hydrocarbons (e.g., benzene, 1,3-butadiene and acrolein).[9−11] Among these, polycyclic aromatic hydrocarbons (PAHs) and tobacco-specific nitrosamines (TSNAs) are considered the major causative agents in cigarette smoke contributing to lung cancer,[12−14] and for cancer risk overall.[15] 1,3-Butadiene is a potent lung carcinogen in experimental animals.[15] Acrolein is a suspected human carcinogen and carries the highest level of risk for respiratory effects in experimental animals,[15] while benzene is a known human leukemogen.[16] Metabolomics is an ‘omics technology that provides a simultaneous assessment of numerous molecules (i.e., the metabolome) allowing for the quantification of individual metabolites and the identification of a phenotypic profile for clusters of metabolites.[17−19] It provides information about metabolites from exogenous toxin exposures and those from cellular endogenous pathways as a result of the toxic exposures.[17] In contrast to currently available smoke exposure biomarkers that are mostly chemically specific,[13] metabolomics can provide information allows for broader phenotype assessments incorporating profiles within and across disease pathways. Separately, metabolomics provide phenotypic information about the cell’s environment and mechanistic pathways that genomics and transcriptomics do not.[20,21] Metabolomics is becoming an important component of systems biology, especially in determining the global metabolic profile by detecting thousands of small and large molecules in various media ranging from cell cultures to human biological fluids such as urine, saliva, and blood.[18,22−25] More and more modern instrumentations and technologies are emerging for the metabolomics research, such as gas chromatography–mass spectrometry (GC–MS), liquid chromatography–mass spectrometry (LC–MS), high-performance liquid chromatography coupled with electrochemical coulometric array (LCECA), and nuclear magnetic resonance (NMR)-based study.[26] Each platform has its specific strengths and weaknesses. For example, GC–MS has great separation efficiency and resolution but analyzes only the volatile metabolic compounds;[27] LCECA has great reproducibility and sensitivity but is low-throughput and provides only limited chemical structure information;[19,28,29] NMR is a fast and reliable nondestructive detector but has poor resolving power and poor sensitivity requiring larger amount of analyte compared to mass spectrometry.[19,29,30] LC–MS is an important tool with great flexibility in metabolomics compared to the other methods.[31] It is often used to identify the low-abundance metabolites for a targeted study or to obtain the largest metabolomics profile for a global approach. With the development of Ultrahigh Pressure Liquid Chromatography (UPLC), better chromatographic resolution and peak capacity compared to HPLC is achieved, and therefore it has become the platform of choice in our study.[19,32] There has been a great increase in the research of metabolomics, and a number of studies have shown the utility in assessing human disease risk.[33−35] However, while some studies show the utility in urine samples,[36−39] little has been shown to validate the reproducibility of global metabolomics profile using LC–MS on human blood samples.[40] Thus, this study assesses Ultra Performance Liquid Chromatography coupled to Quadrupole with Time-of-Flight Mass Spectrometry (UPLC–QTOF-MS) method for use in metabolomics and applies the procedure to human samples from smokers and nonsmokers.

Materials and Methods

Reagents and Chemicals

All reagents and solvents were of HPLC grade. 4-nitrobenzoic acid (4-NBA), debrisoquine (as debrisoquine sulfate), cotinine, (±)-nicotine, 1,11-undecanedicarboxylic acid and 3-hydroxycoumarin were purchased from Sigma-Aldrich (St. Louis, MO); pseudooxynicotine, 3-hydroxycotinine, and cotinine N-oxide were purchased from Toronto Research Chemicals (North York, ON, Canada); acetonitrile (ACN) and water were purchased from Fisher Optima grade (Fisher Scientific, Waltham, MA).

Experimental Design

Three experiments were conducted. In Experiment I, in order to test the variations of UPLC–QTOF-MS assay, eight smokers’ plasma were split into two aliquots (5 μL), and one aliquot was sampled consecutively 5 times, immediately followed by the second aliquot sampled three times. A water blank was sampled between each of the subjects. The eight subjects were sampled consecutively, followed by a repeat of the experiment for the first two subjects. Replicate comparison analysis is indicated below. Experiment II takes advantage of a quality control procedure used for ensuring assay consistency during the course of a large sample set analysis. Here, a pooled sample from 7 nonsmoker subjects undergoing therapeutic phlebotomy were placed as every tenth sample among a sample set of 613 independent samples from 160 smokers; 62 injections of the pooled control were analyzed. Replicate comparison analysis is indicated below. Experiment III was intended to determine if there was a metabolomic result difference between smokers and nonsmokers. There were nine smokers and ten nonsmokers; the latter recruited from a three-arm study validating biomarkers in smokers, former smokers, and nonsmokers.

Subjects

Blood samples were obtained from 160 cross-sectional study of smokers intended to characterize cigarette smoke exposure and the development and validation of biomarkers. These individuals were healthy and had no history of cancer. As an eligibility criterion, they had a stable smoking pattern for at least 6 months of 10 cigarettes per day or more. There were 613 samples from these subjects that were analyzed in a single session. Interspersed with these samples were replicates as indicated in Experiment II. Two groups of nonsmokers were utilized. The samples in the first group, used for the laboratory validation studies for Experiment II, were nonsmokers who were undergoing therapeutic phlebotomy for hemachromatosis or myeloproliferative diseases. The former had low iron levels and both groups had stable disease. Their plasma was pooled and aliquots were interspersed with the smokers’ samples in every tenth aliquot. The second group, used to assess metabolomic differences between smokers and nonsmokers for Experiment III, used nonsmokers from a study of biomarker validation in smokers, former smokers, and nonsmokers from the Tobacco Product Assessment Consortium (TobPRAC). These were healthy persons who smoked less than 100 cigarettes in their lifetime and recruited by IRB-approved local media. Demographics of all the participants are listed in Supplementary Tables 1 to 3 in the Supporting Information. Blood was collected as serum or plasma (heparin green top tubes) as indicated.

UPLC–QTOF-MS Analysis

Sample aliquots were mixed with 195 μL of 66% ACN containing the internal standards debrisoquine and 4-NBA. The samples were centrifuged at 16000× g for 10 min at 4 °C to remove particulates and precipitated proteins. One-hundred fifty microliters of the supernatant was then transferred into an autosampler vial, followed by UPLC–QTOF-MS analysis. Samples were injected onto a reverse-phase 50 × 2.1 mm ACQUITY 1.7-μm C18 column (Waters, Milford, MA) using an ACQUITY UPLC system (Waters, Milford, MA) with a gradient mobile phase consisting of 2% ACN in water containing 0.1% formic acid (A) and 2% water in ACN containing 0.1% formic acid (B). Each sample was resolved for 10 min at a flow rate of 0.5 mL/min. The gradient consisted of 100% A for 0.5 min then a ramp of curve 6 to 60% B from 0.5 to 4.0 min, then a ramp of curve 6 to 100% B from 4.0 to 8.0 min, hold at 100% B until 9.0 min, then a ramp of curve 6 to 100% A from 9.0 to 9.2 min, followed by a hold at 100% A until 10 min. The column eluent was introduced directly into the mass spectrometer by electrospray. Mass spectrometry was performed on a Q-TOF Premier (Waters, Milford, MA) operating in either negative-ion (ESI−) or positive-ion (ESI+) electrospray ionization mode with a capillary voltage of 3200 V and a sampling cone voltage of 20 V in negative mode and 35 V in positive mode. The desolvation gas flow was set to 800 L/h and the temperature was set to 350 °C. The cone gas flow was 25 L/h, and the source temperature was 120 °C. Accurate mass was maintained by introduction of LockSpray interface of sulfadimethoxine (311.0814 [M + H]+ or 309.0658 [M – H]−) at a concentration of 250 pg/μL in 50% aqueous acetonitrile and a rate of 150 μL/min. Data were acquired in centroid mode from 50 to 850 m/z in MS scanning. The metabolite identifications were confirmed by comparing the retention time under the same chromatographic conditions and by matching the fragmentation pattern of the parent ion from the biological sample to that of the standard metabolite using tandem mass spectrometry (UPLC–QTOF-MS/MS).

Data Analysis

The raw data from UPLC–QTOF instrument were converted to Network Common Data Format (NetCDF) files. They were then preprocessed using XCMS[41] for peak detection, retention time correction and peak matching to obtain a peak list in which each peak is represented by its m/z value, retention time and intensities (peak area) across samples. Preprocessed data sets were analyzed using Matlab (MathWorks, Natick, MA) and Metaboanalyst (www.metaboanalyst.ca)[42] to perform scatter plot, hierarchical clustering analysis and principal component analysis (PCA). R was used for ANOVA (Analysis of Variance) and coefficients of variation analysis in Experiment I and Random Forest classification. Significant features were searched against the Madison-Qingdao Metabolomic Consortium Database (MMCD)[43] and the Human Metabolome Database (HMDB)[44] with the mass accuracy of 10 parts per million to identify putative metabolite identifications in Experiment III.

Results and Discussion

For the metabolomic experiments to provide sound biological insights into pathobiology, it is imperative to demonstrate that the variability of metabolomics measurements is within acceptable limits. Previously, the reproducibility of the LC–MS platform for the metabolomic analysis of urine samples has been examined from unspecified healthy subjects,[39] which indicated that the within-day reproducibility of UPLC–QTOF-MS system is sufficient to ensure data quality in global metabolomic studies, after sufficient equilibration of the system.[36,39] In this study, the reproducibility of LC–MS-based metabolomic experiments using blood samples of smokers and nonsmokers is examined. The use of blood, while more difficult to collect compared to urine, is conceptually a better matrix for biomarkers because it is not dependent on renal excretion, for example, a need to adjust for urinary creatinine or stability of a metabolite in urine in vivo and ex vivo. For blood, there is a choice of serum or plasma; the former will include metabolites that result from coagulation and blood processing, and so includes compounds not existing in vivo. Significant differences for plasma versus serum have been noted by Wedge et al.,[45] where they found correlations of many metabolites including glycerophosphocholines, creatinine, erythritol and glutamine in the biological pathways with the prognosis of small cell lung cancer in plasma but not in serum, indicating the clinical feasibility of plasma for metabolomics study. A recent study comparing metabolite profiles in both biofluids has reported a higher concentration of metabolites in serum, and a better reproducibility in plasma.[46] However, the overall reproducibility was good in both biofluids. For smoke exposure in particular, more nicotine-related metabolites were seen in serum samples while better stability were observed in the plasma. The choice of serum versus plasma in our study was dictated by the availability of large volumes of sample needed for the numerous replicate experiments. The data indicate, however, that the choice for serum versus plasma was not important for the purposes of validation and reproducibility. The reproducibility analysis is performed in three parts: the first utilizes plasma from 8 smokers in Experiment I to analyze the reproducibility of the LC–MS-based metabolomics data over a short period of time. The various factors that affect the quality of the data are examined and their respective contributions to the data variability are analyzed. As part of Experiment I, immediately following the multiple injections of the 8 smokers, the samples from the first two subjects were reinjected 5 times and 3 times. Thus, the replicates followed the initial 64 chromatographic runs and 13 h in time. Results of these latter two subjects are compared against the first analyses to examine the effect of running time and potential carry-over on the data. Next, the reproducibility of LC–MS platform over multiple days is examined using Experiment II through a QC analysis similar to the approach by Gika, et al.[36] After establishing the reproducibility of LC–MS-based metabolomics data for plasma samples, Experiment III was conducted to validate the use of metabolomics for the assessment of smoke exposure, as recommended by Hatsukami, et al.[47]

Factors Affecting Data Quality and Their Contributions to Variability

Previously in LC–MS-based proteomics experiments, it is assumed that the variability of the intensity of a peak can be attributed to either sample preparation variation or machine measurement.[48,49] For the two aliquots of each S, i = 1, 2,...,8 in Experiment I, their difference is mainly due to the sample preparation variation. And for the repeated injections of each aliquot, their difference is derived from the measurement noise, which results from variability in the chromatography and the mass spectroscopic measurements. We refer to the variability due to biological differences among subjects as “inter-individual variation”. For the first 8 biological subjects S1–S8, it can be shown that there is minimal sample preparation variability and measurement noise but large differences due to the interindividual variation. The intensity of each peak from the peak-wise variation in the metabolomics data can be modeled aswhere y is the log transformation of the observed intensity of the ith biological subject (i = 1, 2, ...,8), the jth aliquot (j = 1, 2), and the kth injection (k = 1,2,...,5 if j = 1, k = 1,2,3 if j = 2); μ represents the average concentration of each peak; ε, ε and ε follow the normal distribution with mean zero and variances σB2, σS2, and σM2 respectively, and ε, ε and ε are mutually independent. A random effects analysis of variance (ANOVA) model was used to estimate the variations σB2, σS2, and σM2, which characterize the contributions of the interindividual difference, sample preparation difference and measurement noise to the variability of the data. The variability due to chromatography and machine measurement error[36] was first assessed by visually inspecting the total ion chromatograms (TICs) of the repeated runs from the same aliquot. The TIC is a direct representation of the raw LC–MS data without preprocessing. As a result, the assessment of the variability will not be affected by different choices of preprocessing schemes. In Figure 1, the TICs of the five runs from the first aliquot of S1 are compared with each other. It was found that there was perfect qualitative reproducibility for all five runs from the same biological subject and aliquot.

Figure 1

Total ion chromatogram of the five injections of the first aliquot of S1.

Total ion chromatogram of the five injections of the first aliquot of S1. Then the peak list (list of peaks) from preprocessing was used to assess the variability among the repeated injections from an aliquot. For the first aliquot of biological subject S1 to S8, the logarithmic intensities of all the peaks from the first injection were compared against the logarithmic intensities of all the peaks from the other injections using scatter plots in Figure 2. The scatter plots of the peak intensities from two injections are largely on the diagonal line and exhibits a high correlation (Pearson Correlation =0.98; p < 0.001), indicating a high resemblance between the repeated injections. Since the five repeated injections span a time of 57.5 min, it is evident that the LC–MS system is stable over a short period of time.

Figure 2

Evaluation of the variation due to the measurement noise. Scatter plots of logarithmic intensities of the five injections of the first aliquot of S1 to S8 represents (a) first injections vs second injections, (b) first injections vs third injections, (c) first injections vs fourth injections, and (d) first injections vs fifth injections. The variance calculated from the ANOVA model and the coefficient of variation were compared along with m/z values and retention times to see if the measurement error of the LC–MS system is more significant over a particular region of chromatogram and mass spectrum, as sample preparation variation and interindividual variation should not be affected by chromatogram or mass measurement. Generally, the largest variation appears at the beginning (less than 25 seconds) of the chromatogram run (Figure 3). The reason for the large variation at the beginning of the chromatogram may be attributed to the high aqueous gradient of the reverse-phase chromatography that cannot retain the very polar compounds or hydrophilic metabolites in the beginning of the elution.[50] If this area is considered to be important, it likely can be resolved by using HILIC (hydrophilic interaction liquid chromatography).[51] The variation toward the end of the chromatogram is often observed partly due to baseline shift and can result in an overestimate of the intensities of the analytes.[31] Thus, we have discarded the last two minutes of the chromatography during the preprocessing of the data. As shown in Figure 3, the retention time of peaks only go up to 480 s, where the variability of data is moderate. We also examined the variance along with peak intensities. The variation caused by the measurement noise shows a significant negative correlation (Spearman correlation = −0.66; p < 0.001, see Supplementary Figure 1, Supporting Information) with peak intensity. It is understandable as the noise effect is less severe on high-intensity peaks and it is relatively easy to detect and quantify high-intensity peaks. Coefficients of variation (CV) were determined. Overall, the mean CV for the measurement noise was 0.02, which ranged from 0.003 to 0.297. Similar to the above, the greatest CVs were in the areas at the beginning of the chromatogram. The correlation coefficient for the CV in relation to peak intensities is shown in Figure 4 (Spearman correlation = −0.785; p < 0.001). Those metabolites with CV more than 10% have been removed from the analysis for biomarker discovery.

Figure 3

Figure 4

Scatter plot of the estimated measurement noise in coefficient of variation (%) versus the mean intensities of peaks over S1 to S8.

Distribution of the estimated measurement error in CV (%) over m/z values and retention times. Each dot represents a single peak. The dot size and color corresponds to the CV value: the larger the dot, and the brighter the color of the dot, the larger the CV value of the individual peak. Scatter plot of the estimated measurement noise in coefficient of variation (%) versus the mean intensities of peaks over S1 to S8. The scatter plot also was used to examine the variability due to sample preparation. The repeated injections from each aliquot were averaged to obtain the mean intensity of each peak for an aliquot. The logarithmic intensities of all peaks of the first aliquot of the eight biological subjects were compared with the logarithmic intensities of peaks of the second aliquot of the biological subjects. The two aliquots show a high degree of resemblance as the intensities are largely on the diagonal line (Supplementary Figure 2, Supporting Information) and exhibits a high correlation (Pearson Correlation = 0.99; p < 0.001). The results demonstrate that the sample preparation does not significantly affect the quality of the LC–MS data. Hierarchical clustering of the 8 subjects each with 8 injections (five plus three injections) from Experiment I is shown in Figure 5a. The clustering clearly distinguishes biological subjects but not injections from different aliquots. The result confirms that the variability due to sample preparation is generally equal to or smaller than the measurement error, so it is sometimes masked by the measurement error.

Figure 5

Hierarchical clustering of all metabolites (a) from subject 1 (S1) to subject 8 (S8) and (b) from S1 to S8 plus the analytical replicates S1′ and S2′. Heat map colors represent relative values, in which red represents values above the mean, black represents the mean, and green represents values below the mean of a row (metabolite) across all columns (samples).

Effects of Running Time and Carry-overs on Data Variability

In Experiment I, the first two biological subjects S1 and S2 were repeated at the end of the experiment as S1′ and S2′, with separate sample preparations and injections. Between S1 (S2) and S1′ (S2′), there were 56 other sample runs, and they were more than 10 h separated in time. The effects of run time and carry-overs on data variability can be evaluated by comparing S1 vs S1′ and S2 vs S2′. The logarithmic intensities of the four aliquots from S1 and S2 were compared with those of the four aliquots from S1′ and S2′. A high degree of resemblance is evident between S1, S2 and S1′, S2′ (Supplementary Figure 3, Supporting Information) and exhibits a high correlation (Pearson Correlation = 0.97; p < 0.001), which showcases that the LC–MS platform is stable after a moderate time period (more than 10 h). We then analyzed a new hierarchical clustering adding experimental replicates (S1′ and S2′) that were run at the end of the experiment, and the result showed that the same sample’s analytical replicates were clustered together (S1 to S1′, and S2 to S2′, see Figure 5b). Thus, the increased running time and carry-overs of the sample analyzed between them does not significantly affect the data reproducibility. The estimated variances are shown as the box plot in Figure 6 and the variance and coefficient variation are summarized in Table 1. The results confirm that the variations between measurement error and sample preparation variation are comparable, while they are smaller than the interindividual variations, even if these replicates were analyzed 13 h apart.

Figure 6

Box-plot of the estimated variances due to interindividual variation, sample preparation and measurement noise.

Table 1

Summary of the Measured Variance and Coefficient Variation in Human Plasma Samples

	variance	CV
	mean (medium)	mean (medium)
Interindividual variation	0.068 (0.021)	0.032 (0.025)
Sample preparation variation	0.016 (0.002)	0.011 (0.007)
Measurement noise	0.021 (0.005)	0.019 (0.013)

Box-plot of the estimated variances due to interindividual variation, sample preparation and measurement noise. From the model equation (1), we can deduce that the variance of the mean intensity of all the biological subjects isor more generalwhere I is number of biological subjects, j is the number of aliquots per subject and k is the number of injections per aliquot. This means that the variability of the mean of the observed peak intensity is dominated by the variation because of interindividual difference. Thus, the most effective way to reduce the variability in LC–MS data is to increase the number of biological subjects under investigation, given the total number of sample runs. Because of this understanding, in Experiment II, only one aliquot per subject and one injection per aliquot are used for the LC–MS-based metabolomics analysis of 613 biological subjects.

Reproducibility of LC–MS platform over multiple days

Experiment II provides the quality control data from a large metabolomics study of smokers (n = 160) with up to four separate blood draws before and after two cigarettes each (613 samples). For the quality control, 62 pooled healthy control samples were prepared by mixing equal volumes (400 μL) of plasma from seven nonsmokers and were inserted as every 10th sample of the run. Nonsmokers were used in order to assess background low level peaks. The pooled aliquot is assumed to be homogeneous and repeatedly injected across the entire LC–MS analysis with real biological samples analyzed between them. In Experiment II, 62 pooled quality control samples were injected during the analysis of 675 samples, including the 62 pooled samples. The analyses were conducted uninterrupted over 138 h. The peak list is acquired after peak detection and alignment using XCMS. It allows more in-depth analysis of LC–MS data. The preprocessing may help to reduce the variation in data due to instrument noise as only peak regions are considered in the analysis. This assesses reproducibility of the chromatography, quantitation of the spectroscopy, and run-to-run carryover and cross-contamination while running as a batch in a chromatographic system. The reproducibility of the quality control replicates was analyzed through unsupervised principal component analysis (PCA) by MetaboAnalyst. PCA has been shown to be an effective approach to visualize high-dimensional data by projecting the data point into a low-dimension space. If a certain degree of platform stability has been attained, the QC samples should cluster tightly together in the PCA score plot. PCA of the experimental samples and QC samples has revealed a pattern as shown in Figure 7, which gives an indication about the reproducibility of the data. Although there are some variations among the QC samples, they occupy a relatively constrained space in the PCA score plot. When we compare our study (675 injections with 62 QCs, total run time 138 h) with the validation guideline for urine samples from Gika et al.[36] and the same experiment presented by Want et al.[38] (130 injections with 16 QCs, total run time 29 h), the performance of our QC samples showed little variation in tight clustering until samples run later in time. The deviation from the QC cluster might be due to the time-related drift in instrumental performance over a long time span in the large scale study,[52,53] or the matrix effect due to the largely peptide/protein-based plasma samples compare to urine.[54] Figure 8 is the hierarchical clustering analysis of samples and QCs shown as a heatmap using Pearson’s correlation for the similarity measure (distance), and clustering algorithms using Ward’s linkage (clustering to minimize the sum of squares of any two clusters). The result demonstrated that our platform is capable of discriminating the metabolome from clusters of QC samples and experimental samples, which again provides confidence in the quality of our pooled healthy controls throughout the run.

Figure 7

Figure 8

Clustering result shown as heatmap (distance measure using pearson, and clustering algorithm using ward) of the pooled QC samples and experimental samples.

Scores plot between the selected principle components (PCs) showed difference between samples and QCs in their metabolomic profiles. The explained variances captured by each PC were shown in brackets. Clustering result shown as heatmap (distance measure using pearson, and clustering algorithm using ward) of the pooled QC samples and experimental samples.

Biomarkers Indicative of Smoking Behavior

There is wide interindividual variation for smoking behavior, which is affected by several factors such as race,[55] gender,[56] psychological factors[57] and genetic background.[58] Metabolomic profiling may identify differences in exposure and response, reflecting inherent biological differences. This could also lead to better prevention and early detection strategies. Metabolomics has the power to simultaneously detect carcinogen metabolites and endogenous metabolites affected by smoke exposure. Thus, both exposure and effect can be assessed. To examine the feasibility of a metabolomic study in assessing smoke exposure, Experiment III of 9 smokers and 10 nonsmokers was done using serum to evaluate differences of their global metabolomic profiles. Random Forests analysis was used to perform supervised classification and feature selection, and top 50 important features were selected with 100% accuracy by multidimensional scaling plot (Figure 9a) and can be visualized according to their rank importance (Figure 9b). To preliminarily identify the metabolites in Experiment III, the Madison-Qingdao Metabolomic Consortium Database and the Human Metabolome Database were searched; 169 putative features from the positive mode and 53 from the negative mode were identified. Then, 12 candidate metabolites were manually selected based on their putative identifications and availability of chemical standards for comparisons, as listed in Table 2. Validation for these candidate metabolites were then done by acquiring MS/MS spectra. As expected, the MS/MS spectra of nicotine, cotinine, 3-hydroxycotinine and cotinine N-oxide were verified and matched well with those from authentic compounds in our serum samples (Figure 10 and Supplementary Figure 4, Supporting Information). Nicotine is the major addictive component in cigarette smoke.[59] In humans, it is primarily metabolized to cotinine by cytocrome P450 2A6 (CYP2A6) enzyme and further metabolized by the same enzyme to 3′-hydroxycotinine.[60,61] Thus, the ratio of cotinine and 3′-hydroxycotinine reflect a stable CYP2A6 metabolic activity for nicotine metabolism.[62] Pseudooxynicotine, 1,11-undecanedicarboxylic acid and 3-hydroxycoumarin were also verified by MS/MS identification. Pseudooxynicotine is an amino ketone product of nicotine by soil bacteria[63,64] and was reported to be the direct precursor to the tobacco-specific lung carcinogen NNK in the bacterial systems.[65] It was shown in vitro by Hecht et al. that the incubation of nicotine with human liver microsomes produced pseudooxynicotine through 2′-hydroxylation of nicotine[63] but has not been identified in humans. Since microflora has been found to play a crucial role in human metabolome,[66,67] tobacco-related metabolites metabolized by bacteria could have contributions to carcinogenesis that hadn’t been thought of before. Nicotine-related metabolites including nicotine, cotinine, 3-hydroxycotinine, cotinine N-oxide (a minor metabolite of nicotine) and pseudooxynicotine validated by MS/MS were all significantly higher in intensity among smokers compare to virtually zero among nonsmokers (Figure 11). In our results, 3-hydroxycoumarin and 11-undecanedicarboxylic acid were seen in both groups but higher in peak areas among nonsmokers compare to smokers (Figure 11). 3-Hydroxycoumarin is the metabolic product of the natural compound coumarin which can be found in plants and spices. Metabolism of coumarin in humans is mainly carried out by CYP2A6 to 7-hydroxycoumarin[68] but also can be metabolized by CYP3A4 to 3-hydroxycoumarin.[68] Nicotine is mainly metabolized by CYP2A6 enzyme, and thus could increase bioactivation of CYP2A6 and the 7-:3-hydroxycoumarin ratio among smokers. However, most of the reports on 3-hydroxycoumarin were done in rodent models and thus needs further clarification in human. 11-Undecanedicarboxylic acid is a dicarboxylic acid with a 13-carbon dibasic acid (tridecanedioic) occurring in plant and animal tissues. The relationship of 11-undecanedicarboxylic acid to cigarette smoke is still unknown, maybe through altering the carboxylic acid metabolism in the tobacco leaf or during the manufacture of cigarettes.

Figure 9

Table 2

Candidate metabolites selected from 222 putative identifications differentiating smokers versus non-smokers

mode	metabolite ID	observed m/z	m/z	RT (min)	mass difference (in ppm)	smoker vs nonsmokers	HMDB ID
Positive	Cotinine	177.10	176.09	0.35	4.89	↑	HMDB01046
	3-Hydroxycotinine	193.10	192.09	0.36	4.92	↑	HMDB01390
	Cotinine N-oxide	193.10	192.09	0.36	4.92	↑	HMDB01411
	Pseudooxynicotine	179.12	178.11	0.71	3.43	↑	HMDB01240
	Nicotine	163.12	162.12	0.51	6.32	↑	HMDB14330
	Cysteine-S-sulfate	201.99	200.98	3.44	9.25	↓	HMDB00731
	7a,12a-dihydroxy-3-oxo-4-cholenoic acid	405.26	404.26	4.89	2.60	↓	HMDB00447
	Trans-3-hydroxycotinine glucuronide	369.13	368.13	0.37	6.70	↑	HMDB01204
Negative	Aminoparathion	260.05	261.06	3.62	7.36	↑	HMDB01504
	1,11-Undecanedicarboxylic acid	243.16	244.17	3.60	0.75	↓	HMDB02327
	3-Hydroxycoumarin	161.02	162.03	4.44	6.90	↓	HMDB02149
	Alpha-CEHC	277.14	278.15	4.47	1.92	↓	HMDB01518

Figure 10

MS/MS spectrum from authentic compounds of cotinine (a1), hydroxycotinine (b1), pseudooxynicotine (c1), and 1,11-undecanedicarboxylic acid (d1). MS/MS spectrum obtained from serum samples of smoker (a2, b2, c2) or nonsmoker (d2) were also presented for comparison.

Figure 11

Box plots of peak areas for seven candidate metabolites among smokers and nonsmokers. The points outside the quartiles are outliers.

Top 50 metabolites selected from Random Forests. (a) Forest accuracies were calculated and the sample classifications were presented by Multidimensional scaling (MDS) plot. In this plot, nonsmokers (red) and smokers (blue) were well separated in serum samples. (b) Visualization of the top metabolites across all samples identifies the rank importance of the ions. MS/MS spectrum from authentic compounds of cotinine (a1), hydroxycotinine (b1), pseudooxynicotine (c1), and 1,11-undecanedicarboxylic acid (d1). MS/MS spectrum obtained from serum samples of smoker (a2, b2, c2) or nonsmoker (d2) were also presented for comparison. Box plots of peak areas for seven candidate metabolites among smokers and nonsmokers. The points outside the quartiles are outliers. Our results further show the feasibility and potential of the metabolomics profiling to identify new biomarkers of cigarette smoke exposure and lung cancer risk. Given that cigarette smoke is a complex mixture, it is possible that metabolic profiles will provide additional information for tobacco-related disease risk different from existing chemically specific biomarkers, such as those reported by Hecht et al.[13] Also, broad detection methods such as used here can screen for unanticipated changes in smoke exposure as cigarette designs change. For example, the technology used to decrease some constituents might increase those of others (e.g., for the Eclipse cigarette,[8] acrolein and CO increased[69]). Another reason for needing a broad biomarker such as a metabolomics profile of exposure and risk is to help tailor prevention and early detection strategies for former and current smokers. While this study will not directly assess biomarkers for lung cancer risk, the first step in developing these is the development of biomarkers of smoke exposures.[47]

Conclusions

Metabolomics provides information about the metabolic status of living systems and can provide phenotypic information about the cell’s environment and mechanistic pathways, as well as having clinical utility including as a risk biomarker.[70−72] It is inexpensive and high-throughput, and has the potential to identify new biomarkers of cigarette smoke exposure and the consequent disease risk. Such biomarkers will assist in the evaluation of tobacco products and performance standards, and for identifying smokers at risk for lung cancer. Upon the basis of our study using UPLC–QTOF-MS, low variability is observed in measurement variation and sample preparation variation. When applying QC samples to the large scale study, QC clustering presents high stability of the platform but not the ones run later in time. Various factors including the stability and maintenance of individual instrument, sample preparation techniques, and the choice of column could affect the performance of the chromatography result. Therefore, we recommend that for future metabolomics studies, especially large scale epidemiological studies, a separate experiment to assess the variability of the platform before applying precious human samples is needed so as not to exceed the results reported herein. After determining the required sample size, the number of aliquots and injections needed according to the variability of the specific platform, a pooled QC sample set to be run in between the experimental samples throughout the run is crucial in order to further assess the repeatability of the experiment. After preprocessing and analyzing the data, metabolites with CV no more than 10–15% should be considered for the biomarker discovery. Metabolomics can potentially be used to better characterize smoking-related disease risks to enhance prevention and early detection methods, and by the FDA in the regulation of tobacco products. The recent authority to the FDA over tobacco products allows the FDA to mandate product performance standards governing smoke exposure to toxic constituents and also requires the FDA to evaluate manufacture’s health claims for modified-tobacco products of purported reduced exposure.[5] The FDA decisions, however, will need to be supported by scientific studies and a better understanding of cigarette smoke toxicology. It also will require support through human studies and biomarkers that reflect the complex exposure to tobacco smoke. Today, though, there are only a few biomarkers of exposure and no validated biomarkers of cancer risk.[73] As proof of principle, several new biomarkers were identified herein that have not thus far considered to be biomarkers of tobacco smoke exposure. Thus, metabolomics has the potential to develop a metabolic phenotype in healthy smokers of smoke exposure and disease risk.

60 in total

1. Global metabolic profiling procedures for urine using UPLC-MS.

Authors: Elizabeth J Want; Ian D Wilson; Helen Gika; Georgios Theodoridis; Robert S Plumb; John Shockcor; Elaine Holmes; Jeremy K Nicholson
Journal: Nat Protoc Date: 2010-06 Impact factor: 13.491

2. Characterization of differences between blood sample matrices in untargeted metabolomics.

Authors: Judith R Denery; Ashlee A K Nunes; Tobin J Dickerson
Journal: Anal Chem Date: 2010-12-22 Impact factor: 6.986

3. Feasibility of identifying pancreatic cancer based on serum metabolomics.

Authors: Oliver F Bathe; Rustem Shaykhutdinov; Karen Kopciuk; Aalim M Weljie; Andrew McKay; Francis R Sutherland; Elijah Dixon; Nicole Dunse; Dina Sotiropoulos; Hans J Vogel
Journal: Cancer Epidemiol Biomarkers Prev Date: 2010-11-23 Impact factor: 4.254

4. Challenges and developments in tandem mass spectrometry based clinical metabolomics.

Authors: Uta Ceglarek; Alexander Leichtle; Mathias Brügel; Linda Kortz; Romy Brauer; Kristin Bresler; Joachim Thiery; Georg Martin Fiedler
Journal: Mol Cell Endocrinol Date: 2008-10-19 Impact factor: 4.102

5. Relationship of type A behavior pattern in smokers to carbon monoxide exposure and smoking topography.

Authors: T Lombardo; L Carreno
Journal: Health Psychol Date: 1987 Impact factor: 4.267

6. Menthol and nonmenthol cigarettes and smoke exposure in black and white women.

Authors: K Ahijevych; J Gillespie; M Demirci; J Jagadeesh
Journal: Pharmacol Biochem Behav Date: 1996-02 Impact factor: 3.533

7. Human Metabolome Database: completing the 'human parts list'.

Authors: David S Wishart
Journal: Pharmacogenomics Date: 2007-07 Impact factor: 2.533

8. The human serum metabolome.

Authors: Nikolaos Psychogios; David D Hau; Jun Peng; An Chi Guo; Rupasri Mandal; Souhaila Bouatra; Igor Sinelnikov; Ramanarayan Krishnamurthy; Roman Eisner; Bijaya Gautam; Nelson Young; Jianguo Xia; Craig Knox; Edison Dong; Paul Huang; Zsuzsanna Hollander; Theresa L Pedersen; Steven R Smith; Fiona Bamforth; Russ Greiner; Bruce McManus; John W Newman; Theodore Goodfriend; David S Wishart
Journal: PLoS One Date: 2011-02-16 Impact factor: 3.240

9. Differences between human plasma and serum metabolite profiles.

Authors: Zhonghao Yu; Gabi Kastenmüller; Ying He; Petra Belcredi; Gabriele Möller; Cornelia Prehn; Joaquim Mendes; Simone Wahl; Werner Roemisch-Margl; Uta Ceglarek; Alexey Polonikov; Norbert Dahmen; Holger Prokisch; Lu Xie; Yixue Li; H-Erich Wichmann; Annette Peters; Florian Kronenberg; Karsten Suhre; Jerzy Adamski; Thomas Illig; Rui Wang-Sattler
Journal: PLoS One Date: 2011-07-08 Impact factor: 3.240

10. Metabolomics applied to diabetes research: moving from information to knowledge.

Authors: James R Bain; Robert D Stevens; Brett R Wenner; Olga Ilkayeva; Deborah M Muoio; Christopher B Newgard
Journal: Diabetes Date: 2009-11 Impact factor: 9.461

16 in total

1. Persistent alterations of gene expression profiling of human peripheral blood mononuclear cells from smokers.

Authors: Daniel Y Weng; Jinguo Chen; Cenny Taslim; Ping-Ching Hsu; Catalin Marian; Sean P David; Christopher A Loffredo; Peter G Shields
Journal: Mol Carcinog Date: 2015-08-21 Impact factor: 4.784

2. Cigarette smoking behaviour and blood metabolomics.

Authors: Fangyi Gu; Andriy Derkach; Neal D Freedman; Maria Teresa Landi; Demetrius Albanes; Stephanie J Weinstein; Alison M Mondul; Charles E Matthews; Kristin A Guertin; Qian Xiao; Wei Zheng; Xiao-Ou Shu; Joshua N Sampson; Steven C Moore; Neil E Caporaso
Journal: Int J Epidemiol Date: 2015-12-31 Impact factor: 7.196

3. Untargeted metabolomics reveals multiple metabolites influencing smoking-related DNA methylation.

Authors: Yunfeng Huang; Qin Hui; Douglas I Walker; Karan Uppal; Jack Goldberg; Dean P Jones; Viola Vaccarino; Yan V Sun
Journal: Epigenomics Date: 2018-03-12 Impact factor: 4.778

Review 4. A Review of Pulmonary Toxicity of Electronic Cigarettes in the Context of Smoking: A Focus on Inflammation.

Authors: Peter G Shields; Micah Berman; Theodore M Brasky; Jo L Freudenheim; Ewy Mathe; Joseph P McElroy; Min-Ae Song; Mark D Wewers
Journal: Cancer Epidemiol Biomarkers Prev Date: 2017-06-22 Impact factor: 4.254

5. Metabolomic profiles of current cigarette smokers.

Authors: Ping-Ching Hsu; Renny S Lan; Theodore M Brasky; Catalin Marian; Amrita K Cheema; Habtom W Ressom; Christopher A Loffredo; Wallace B Pickworth; Peter G Shields
Journal: Mol Carcinog Date: 2016-08-22 Impact factor: 4.784

6. Noninvasive urinary metabolomic profiling identifies diagnostic and prognostic markers in lung cancer.

Authors: Ewy A Mathé; Andrew D Patterson; Majda Haznadar; Soumen K Manna; Kristopher W Krausz; Elise D Bowman; Peter G Shields; Jeffrey R Idle; Philip B Smith; Katsuhiro Anami; Dickran G Kazandjian; Emmanuel Hatzakis; Frank J Gonzalez; Curtis C Harris
Journal: Cancer Res Date: 2014-04-15 Impact factor: 12.701

7. Development and Application of an LC-MS/MS Untargeted Exposomics Method with a Separated Pooled Quality Control Strategy.

Authors: Gianfranco Frigerio; Camilla Moruzzi; Rosa Mercadante; Emma L Schymanski; Silvia Fustinoni
Journal: Molecules Date: 2022-04-16 Impact factor: 4.927

8. Effects of pre-analytical processes on blood samples used in metabolomics studies.

Authors: Peiyuan Yin; Rainer Lehmann; Guowang Xu
Journal: Anal Bioanal Chem Date: 2015-03-04 Impact factor: 4.142

9. Dysregulated Metabolites Serve as Novel Biomarkers for Metabolic Diseases Caused by E-Cigarette Vaping and Cigarette Smoking.

Authors: Qixin Wang; Xiangming Ji; Irfan Rahman
Journal: Metabolites Date: 2021-05-29

10. Total Exposure Study Analysis consortium: a cross-sectional study of tobacco exposures.

Authors: Andrew W Bergen; Ruth Krasnow; Harold S Javitz; Gary E Swan; Ming D Li; James W Baurley; Xiangning Chen; Lenn Murrelle; Barbara Zedler
Journal: BMC Public Health Date: 2015-09-07 Impact factor: 3.295