Jonathan N Thomas1, Joanna Roopkumar1, Tushar Patel1. 1. Department of Transplantation, Division of Gastroenterology and Hepatology, Mayo Clinic, Jacksonville, Florida, United States of America.
Abstract
Disease-related effects on hepatic metabolism can alter the composition of chemicals in the circulation and subsequently in breath. The presence of disease related alterations in exhaled volatile organic compounds could therefore provide a basis for non-invasive biomarkers of hepatic disease. This study examined the feasibility of using global volatolomic profiles from breath analysis in combination with supervised machine learning to develop signature pattern-based biomarkers for cirrhosis. Breath samples were analyzed using thermal desorption-gas chromatography-field asymmetric ion mobility spectroscopy to generate breathomic profiles. A standardized collection protocol and analysis pipeline was used to collect samples from 35 persons with cirrhosis, 4 with non-cirrhotic portal hypertension, and 11 healthy participants. Molecular features of interest were identified to determine their ability to classify cirrhosis or portal hypertension. A molecular feature score was derived that increased with the stage of cirrhosis and had an AUC of 0.78 for detection. Chromatographic breath profiles were utilized to generate machine learning-based classifiers. Algorithmic models could discriminate presence or stage of cirrhosis with a sensitivity of 88-92% and specificity of 75%. These results demonstrate the feasibility of volatolomic profiling to classify clinical phenotypes using global breath output. These studies will pave the way for the development of non-invasive biomarkers of liver disease based on volatolomic signatures found in breath.
Disease-related effects on hepatic metabolism can alter the composition of chemicals in the circulation and subsequently in breath. The presence of disease related alterations in exhaled volatile organic compounds could therefore provide a basis for non-invasive biomarkers of hepatic disease. This study examined the feasibility of using global volatolomic profiles from breath analysis in combination with supervised machine learning to develop signature pattern-based biomarkers for cirrhosis. Breath samples were analyzed using thermal desorption-gas chromatography-field asymmetric ion mobility spectroscopy to generate breathomic profiles. A standardized collection protocol and analysis pipeline was used to collect samples from 35 persons with cirrhosis, 4 with non-cirrhotic portal hypertension, and 11 healthy participants. Molecular features of interest were identified to determine their ability to classify cirrhosis or portal hypertension. A molecular feature score was derived that increased with the stage of cirrhosis and had an AUC of 0.78 for detection. Chromatographic breath profiles were utilized to generate machine learning-based classifiers. Algorithmic models could discriminate presence or stage of cirrhosis with a sensitivity of 88-92% and specificity of 75%. These results demonstrate the feasibility of volatolomic profiling to classify clinical phenotypes using global breath output. These studies will pave the way for the development of non-invasive biomarkers of liver disease based on volatolomic signatures found in breath.
The liver has a central role in metabolism, and disease related effects on hepatic functioning can alter the nature and quantity of metabolites that are generated. Amongst these are volatile organic compounds (VOC), high vapor pressure molecules that can diffuse through the circulation and eventually be exhaled in the breath. While VOC only account for <1% of breath, hundreds of high vapor pressure molecules associated with systemic metabolic functions can be detected within each breath [1]. Thus, alterations in the VOC metabolomic (volatolomic) output of the liver associated with disease pathophysiology can be detected in exhaled breath [2]. This phenomenon has been recognized for millennia, forming the basis of fetor hepaticus and other breath-based manifestations of disease.The application of VOC analysis to capture disease relevant information from exhaled breath provides an untapped opportunity to develop non-invasive biomarkers for liver diseases that may facilitate earlier diagnosis or guide patient management. Chronic liver disease and cirrhosis may be present in the absence of symptoms but yet are a major cause of morbidity and mortality [3, 4]. A timely diagnosis of cirrhosis may enable interventions to limit inflammation or progression of fibrosis as well as the initiation of surveillance approaches for early detection of hepatocellular cancer. Once cirrhosis is present, decompensation is clinically defined by the onset of complications such as ascites, and portends a higher risk of morbidity, hospitalizations, prolonged care and mortality [5]. Furthermore, the hepatic volatolomic output could potentially be altered as a consequence of progressive portal hypertension and hepatic dysfunction prior to the onset of clinical manifestations of decompensation.Although prior studies have described and analyzed VOC in the breath of patients with liver diseases, the feasibility of using breath VOC analysis for disease detection remains poorly defined. Accurate identification of individual VOC is highly dependent on the detection technology used. This has varied across studies and has confounded efforts to identify or catalog disease specific compounds. Consequently, there is a lack of consensus on the optimal use of individual or groups of VOC to differentiate between different clinical states. This has hampered the use of exhaled breath analysis for biomarker applications. Compared with the study of single VOC, global or broad volatolomic analysis would incorporate changes that are reflected within a wider range of low abundance disease associated VOC present in exhaled breath [6-PLoS One. 2013 ">9]. We performed a proof of concept study to establish the utility of breath volatolomic profiling to develop predictive models. A highly sensitive separation and detection approach to generate volatolomic profiles from exhaled breath samples was developed by combining thermal desorption (TD) with both gas chromatography (GC) and field asymmetric ion mobility spectrometry (FAIMS). This enabled us to capture multi-dimensional volatolomic data based on both chromatographic separation and ion-mobility spectrometry to separate ions based on their drift in high electric fields. The data was then combined with supervised machine learning to generate breath volatolomic based classifiers for the presence or stage of cirrhosis, thereby demonstrating the feasibility of using this approach to develop non-invasive biomarkers associated with liver diseases.
Methods
Ethical approval
The study was conducted under a Mayo Clinic Insitutional Review Board (IRB) approved protocol and conformed to the ethical guidelines of the Declaration of Helsinki. Informed consent was obtained from each participant in writing. The trial was registered at clinical trials.gov (NCT04341012).
Study design and participants
The study was a prospective, single-institution study. All study participants were enrolled between September 2019 and March 2020. The study inclusion criteria were the ability to provide informed consent and age greater than 18 years. There were no exclusion criteria. Participants were categorized into groups based on absence or presence of cirrhosis and/or portal hypertension, or their complications as determined on the basis of histologic, clinical, biochemical or elastographic features. Participants with no cirrhosis or portal hypertension were designated as Stage 0. Participants with cirrhosis or portal hypertension were designated as Stage 1, 2 or 3 based on the absence or presence of complications of portal hypertension (ascites, variceal hemorrhage, hepatic encephalopathy) or liver insufficiency (jaundice). Stage 1 had no varices or other clinically evident complications, Stage 2 had varices only but no other complications, while Stage 3 had decompensated disease, manifest with ascites, variceal hemorrhage, or hepatic encephalopathy. The clinical diagnoses were made independently by two hepatologists. All participants completed a questionnaire at the time of the breath collection regarding their lifestyle, recent dietary choices, current symptoms and other clinical information.
Breath sample collection
Breath samples were collected using the ReCIVA breath sampler (Owlstone Medical, Cambridge, UK) and analyzed by TD-GC-FAIMS (Fig 1). Subjects were asked to fast for at least four hours prior to the breath collection, avoiding solid food prior to the collection. A breath sample was collected by a trained researcher using the breath sampler. Sample collection was performed with the patient seated upright, resting for at least 10 minutes. An air supply unit attached to the sampler pumps provided filtered, ambient air with reduced VOC at 40 L/min for the patient to inhale. 1 L of exhaled breath from both the upper and lower airways was collected, at a flow rate of 200 mL min-1, onto four preconditioned Bio-Monitoring TD tubes (Markes International, South Wales, United Kingdom). The breath sampler uses pressure and CO2 sensors to monitor the patient’s breathing rate to regulate the timing of its two pumps for the four sorbent tube ports. This allowed for control over the total volume, flow rate, and the specific phase of exhaled air collected.
Fig 1
Collection and analysis of breath samples.
Volatile organic compounds (VOC) in exhaled breath are collected onto thermal desorption tubes that pre-concentrate and focus analytes into a gas chromatography (GC) column for physical separation. Subsequently, VOC are further distinguished by field asymmetric ion mobility spectrometry that applies an alternating electric dispersion field with maximum voltage of 45V, 55V or 65V. Data output matrices for both positive and negative ions detected are preprocessed and the maximum ion peak intensity identified for each positive and negative ion resolved by GC retention time. Molecular features describe intensity resolved VOC defined by intensity greater than a threshold level, whereas separation chromatograms capture all VOC in a time-resolved manner. Differences in VOC output measured as molecular features or in separation chromatograms between controls (green lines) and disease states (red lines) are analyzed by machine learning to generate classifier models. Image generated with Biorender.com.
Collection and analysis of breath samples.
Volatile organic compounds (VOC) in exhaled breath are collected onto thermal desorption tubes that pre-concentrate and focus analytes into a gas chromatography (GC) column for physical separation. Subsequently, VOC are further distinguished by field asymmetric ion mobility spectrometry that applies an alternating electric dispersion field with maximum voltage of 45V, 55V or 65V. Data output matrices for both positive and negative ions detected are preprocessed and the maximum ion peak intensity identified for each positive and negative ion resolved by GC retention time. Molecular features describe intensity resolved VOC defined by intensity greater than a threshold level, whereas separation chromatograms capture all VOC in a time-resolved manner. Differences in VOC output measured as molecular features or in separation chromatograms between controls (green lines) and disease states (red lines) are analyzed by machine learning to generate classifier models. Image generated with Biorender.com.
Collection of environmental sample blanks
Room air sample and air supply collections were performed immediately after the breath sample collection using the ReCIVA breath sampler and TD tubes from the same conditioned batch as the breath samples. For room air sample collection, the breath sampler was placed sideways on a pre-cleaned metal surface, facing outwards, with the air supply on. The ReCIVA was set to keep one pump (right) always on and collect 1 L of room air at 200 mL min-1 using either one or two sorbent tubes. During air supply collection, the ReCIVA and field blank collection tubes were strapped securely to a pre-cleaned glass head and set to collect onto the other one or two tubes (left) using the same parameters. The cart and glass head were cleaned using a 70% ethanol solution or isopropyl wipes at least 1 hour prior. Environmental blanks were stored, transported, and analyzed alongside corresponding breath samples.
Preconditioning of thermal desorption tubes
Prior to breath sample collection, TD tubes were preconditioned using a TC-20 TD device (Markes International, South Wales, UK) with 55 to 60 mL min-1 nitrogen (99.9999%) gas flow at 330°C for at least 2 hours. Tubes were capped with stainless steel travel caps with Viton O-rings (Owlstone Medical, Cambridge, UK) if the collection was within 7 days; or with brass caps fitted with polytetrafluoroethylene ferrules if stored for a longer period. All tubes were wrapped in non-coated aluminum and were placed in aluminum screw-top canisters, sealed with aluminum wrap, and stored at 4°C. Further, the wrapped tube canisters were transported via a cooler to the clinic before and after collection. These additional measures were taken to help prevent the contamination, loss of sample, and slow diffusion of analytes across the different sorbent beds inside the tube, a porous polymer and graphitized carbon [10].
Separation and isolation of VOC using TD-GC-FAIMS
Thermal desorption was carried out using a Unity-xr TD unit (Markes International, South Wales, UK) equipped with a material emissions cold trap. During analysis, each tube was pre-purged with nitrogen gas for 10 minutes with split flow on at 50 mL min-1. Sample tube desorption was set to 120°C for 10 minutes at 50 mL min-1 onto the 0°C cold trap solely. The trap was purged for 2 minutes, then heated at a rate of 100°C min-1 to 140°C and held for 6 minutes, sending VOC through the 130°C TD-GC transfer line. Separation was performed using a Trace 1310 GC (Thermo Fisher Scientific, Waltham, Massachusetts) coupled with a Lonestar FAIMS detector (Owlstone Medical, Cambridge, UK). VOC were separated on a HP-5 (Agilent, Santa Clara, California) fused silica GC column (30m length × 0.25 μm thickness × 0.32 mm inner diameter) with helium (99.999%) carrier gas. GC controls were set for an initial 40°C hold of 2 min, ramp to 120°C at 5°C/min, hold for 2 min, and final ramp to 200°C at 8°C/min for a final hold of 6 min. Medical-grade clean air was introduced into the 130°C GC-FAIMS transfer line at a rate of 2700 mL min-1, providing the reactive ion cloud needed for ionization of emerging analytes.The FAIMS was configured such that the magnitude of the alternating electric field, or dispersion field (DF), voltage cycled through 45 V, 55 V, and 65 V with a total of 5192 scans across the GC runtime. The compensation field (CF) voltage scanned to correct differential ion drift across 512 preset increments between– 6 V and 6 V direct current. Detected ion intensity was measured over a range from 0 to 10 pA.
FAIMS data processing
Data from FAIMS were pre-processed and parsed using an automated MATLAB 2019b (MathWorks, Natick, MA) processing pipeline to (1) separate the ion intensity matrices for each DF setting, (2) subtract out environmental VOC and background current fluctuations using room air or air filter field control blanks (as required), and (3) generate a max peak ion chromatogram for each sample. The first step involved parsing the raw DF settings, and then combining the negative and positive ion intensity matrices to generate three FAIMS DF-specific matrices (512 CF scan points by 3460 time charge points). Next, environmental samples were directly subtracted from their corresponding breath samples across the entire 1.77 million data points to generate a separate dataset for later classification analysis. To further simplify the data to a single axis, the outer matrix cells with values below the overall max baseline intensity (0.0104 pA) were eliminated, limiting the matrix to 256 CF scans between -3 V and 3 V and removing the terminal ~40 seconds (60 time charge points) of the GC run The maximum intensity value across all CF scans for each time resolved point was selected, simplifing the ion peaks to a single time charge axis S1 Fig. The breath sample data were now represented by the resulting three DF-specific time resolved separation-based chromatograms (SC), comprised of 3400 ion intensity values S2 Fig.
Supervised machine learning analysis
For generating disease state classifiers, SC were imported into MATLAB 2019b Classification Learner App.. A training set comprised of six randomly selected patients from each of stages 0, 1, 2 and 3. Twenty-four classifier model types were trained and subsequently tested for each analysis, generating a confusion matrix and model performance characteristics using 5-fold cross-validation. The classifiers with the best performance metrics (AUC > 0.7) were selected and then further tested and evaluated in an independent external validation set.
Results
Intra-individual variability of volatolomic detection
To analyze intra-individual variability,breath samples were obtained from a single healthy individual and using a standard protocol. Five breath samples comprising of one liter of breath were collected on five separate days within a 7-day period. Each sample was collected after an overnight fast of at least six hours with only water, and collected between 7:30 and 10:00 AM. Data was collected using FAIMS DF settings of 45 V, 55 V, or 65 V, at a CF of 0.55V. From each dataset, molecular features (MF) along the positive reactive ions detected were identified and analyzed. The max positive ion peak intensity derived from the background air passed through the FAIMS was 0.391 pA. MF were thus defined as those with distinctive retention times and peak maximum intensity beyond a threshold set at 0.5 pA. MF may reflect one VOC, although superposition of peaks can occur in some instances due to co-elution. Of note, even small shifts in ion intensity could reflect the presence of individual low abundance VOC. The thresholds set therefore enabled elimination of all background fluctuations, and focused the analysis on the most abundant volatolomic content in the sample. First we analyzed the variability in MF detected in technical replicates collected on the same day. The overall coefficient of variation (CV) in technical replicates across all DF settings was 5.7%. Next, we determined the biological variation in detection of MF from day-to-day. An average of 60.1 MF were detected across all settings from day-to-day, with a CV of 14.8%. Next, we evaluated the detection of MF and variability at different DF settings. The average number of features varied at each DF setting, with far fewer detected at DF 65 V when compared with either DF 45 V or DF 55 V (Fig 2). The overall CV in number of MF detected ranged from 13% at DF 55 V to 34% at DF 65 V.
Fig 2
Intra-individual variability of molecular features detected in breath analysis.
Molecular features (MF) were defined based on identified peaks with an intensity greater than a threshold of 0.5 pA. Twenty breath samples were collected using a standardized protocol from a single healthy volunteer over a five-day period. Chromatograms were extracted at a pre-defined compensation field at dispersion field (DF) settings of 45 V, 55 V or 65 V, and the number of MF present were quantitated. (A). The total number of MF identified at each DF setting, or with all three DF combined, on each of separate collection day. (B). The coefficient of variation (CV) of separate analyses at each DF setting alone or with all three DF combined is depicted. Colored dots indicate day of sample collection.
Intra-individual variability of molecular features detected in breath analysis.
Molecular features (MF) were defined based on identified peaks with an intensity greater than a threshold of 0.5 pA. Twenty breath samples were collected using a standardized protocol from a single healthy volunteer over a five-day period. Chromatograms were extracted at a pre-defined compensation field at dispersion field (DF) settings of 45 V, 55 V or 65 V, and the number of MF present were quantitated. (A). The total number of MF identified at each DF setting, or with all three DF combined, on each of separate collection day. (B). The coefficient of variation (CV) of separate analyses at each DF setting alone or with all three DF combined is depicted. Colored dots indicate day of sample collection.
Study subjects
The study population comprised of 50 subjects, with an equal proportion of males and females. 86% of study participants were white and 90% were non-Hispanic. The mean age of the population was 55.4 years and the overall mean BMI was 29.8. All except one individual were non-smokers. None of the participants reported any known occupational exposure to vapors. None of the participants reported any kind of upper respiratory infection. Other self-reported symptoms included cough (51%), dyspnea (18%), abdominal pain (38%), diarrhea (15%) and halitosis (11%) within the two weeks preceding sample collection.All group and stage designations were verified by two experienced hepatologists with full consensus. 11 study participants did not have cirrhosis or portal hypertension and were designated as stage 0. Cirrhosis or portal hypertension was present in 39 participants. Of these the primary etiology was non-alcoholic steatohepatitis (21 persons), chronic hepatitis C virus infection (5 persons), alcohol-related liver disease (6 persons), hemochromatosis (1 person), primary sclerosing cholangitis (2 persons), and non-cirrhotic portal hypertension (4 persons). Of these 39 participants, 14 were designated as stage 1 (no ascites, no varices), 15 as stage 2 (with varices present but no ascites), and 10 as stage 3 (with decompensated disease). Two persons had hepatic encephalopathy. None of the participants had severe stage 4 disease as defined by a history of recurrent variceal hemorrhage, refractory ascites, and hyponatremia or hepatorenal syndrome. In these participants, the median Model for End Stage Liver Disease (MELD) score, was 10 with a range from 6 to 28, the median AST to platelet ratio index (APRI) was 0.744 with a range from 0.215 to 3.539, and the median Fibrosis-4 (FIB_4) score, was 3.37 with a range from 0.59 to 14.82. Detailed characteristics of the groups are described in Table 1.
Table 1
Study subjects.
Stage 0
Stage 1
Stage 2
Stage 3
Overall
(n = 11)
(n = 14)
(n = 15)
(n = 10)
(n = 50)
Age mean (SD)
43.8 (12.2)
56.2 (10.4)
58.1 (13.2)
62.8 (11.0)
55.4 (13.2)
Age median (range)
45 (24–60)
57.5 (35–69)
61 (33–76)
64.5 (42–74)
57 (24–76)
n,% female
6 (54%)
6 (42%)
8 (53%)
5 (50%)
25 (50%)
n, % white
5 (45%)
13 (92%)
15 (100%)
10 (100%)
43 (86%)
n, % Hispanic
1 (9%)
1 (7%)
3 (20%)
0
5 (10%)
Cirrhosis (number)
0
13
12
10
35
Body mass index, mean (SD)
29.2 (5.9)
30.8 (5.6)
29.7 (6.8)
29.1 (5.2)
29.8 (5.8)
n,% nose breathers
10 (90%)
10 (71%)
10 (67%)
6 (60%)
36 (72%)
n,% cough
0
1 (7%)
2 (13%)
2 (20%)
5 (10%)
n, % shortness of breath
0
1 (7%)
3 (20%)
1 (10%)
5 (10%)
Mean duration of fast (hours)
Solid foods
12.8
13.5
11.7
10.4
12.2
Liquids
5.8
4.7
4.9
1.4
4.3
Most recent use of live yoghurt (n, %)
More than one week
8 (72%)
9 (64%)
13 (86%)
4 (40%
34 (68%)
Within past week
2 (18%)
2 (14%)
1 (7%)
4 (40%)
9 (18%)
Within past day
1 (10%)
3 (22%)
1 (7%)
2 (20%)
7 (14%)
Sample collection and storage
A standardized set of instructions and protocol was used. A four-hour fast was required. However, most subjects had fasted overnight as collections were scheduled during the early morning. 350 samples (200 exhaled breath collections and 150 room air or air filter collections) were collected for this study. Samples were stored in sorbent tubes for a median of 7 days prior to TD-GC-FAIMS analysis. When long-term stored samples were removed for studies of optimized conditions, the median storage was 4 days.
Analysis of molecular features
MF, bracketed within time defined parameters, were identified using the three DF-specific SC from each breath sample. The peak maximum ion intensity and peak area were calculated for each MF, which were averaged across all technical replicates for each patient. We examined the variability of each MF within biological replicates with age, or underlying etiology. A greater variability was observed with the latter, particularly in samples from participants aged 45yr or younger S3 Fig. We first analyzed both peak max and area separately within each disease group to determine which was more informative. A one-tailed paired student’s t-test was used to compare these between samples from healthy controls, participants with stage 0, and those with stage 1/2/3 disease. Ten MF with peak intensity and eight MF with peak areas were identified that had >30% differences between these two groups, with p values < 0.05. The area under a receiver operating characteristic curve (AUC) was determined for each one of these. The AUC ranged from 0.547 to 0.785 for individual MF peak intensity, and from 0.379 to 0.774 for individual MF peak area. There was a strong correlation (R = 0.93) between AUC for peak intensity and AUC for peak area for individual MF. These findings indicated that the use of either the peak max intensity or the peak max area would suffice for analytical use to generate models. Eight unique features had an AUC greater than 0.7 (Fig 3). Amongst these, four showed a trend to increase with disease stage. We postulated that these MF would be more likely to reflect VOC that are directly impacted by liver function or portal hypertension. A logistic regression analysis of the MF with the highest AUC and associated with disease stage was performed and a MF score derived (Fig 4). Next, we assessed the relationship of the MF score to disease stage using MELD score and FIB-4 scores. The MF score was higher in Stage 3 disease than in Stage 1/2 disease. A MF score of 0.45 had a sensitivity of 90% and specificity of 57% for classifying the presence of cirrhosis or portal hypertension. The AUC of the MF score for classifying the presence of cirrhosis was 0.785. Thus, simple predictive scores can be generated from an analysis of intensity-threshold defined features in volatolomic analysis.
Fig 3
Breath analysis of molecular features.
Breath samples were collected using a standard protocol from 50 patients. Chromatograms depicting the maximum intensity across all compensation fields at dispersion field (DF) settings of 45V, 55V or 65V were extracted. MF were defined based on identified peaks with an intensity greater than a threshold of 0.5 pA and designated with a four-digit number that included the DF setting as the first two, and order of separation as the second two digits. MF with differential ion intensities were isolated. The data represents the (A) the area under a receiver operator characteristic curve (AUC) for individual MF for the detection of cirrhosis; (B) MF peak intensity in samples from persons with different stages of disease. The data represents the mean and standard deviation for MF peak ion intensity in pA.
Fig 4
Performance of MF score.
An MF score was derived using logistic regression analysis for the molecular feature (MF) with the highest area under a receiver operating characteristic curve (AUC). Receiver operator curves for the MF score for presence or absence of cirrhosis (A) or for the indicated cirrhosis disease stages (B). Variation in MF score at different disease stages (C), and with Model for End Stage Liver Disease (MELD) or FIB-4 scores (D). The data represent mean and SD of the MF score.
Breath analysis of molecular features.
Breath samples were collected using a standard protocol from 50 patients. Chromatograms depicting the maximum intensity across all compensation fields at dispersion field (DF) settings of 45V, 55V or 65V were extracted. MF were defined based on identified peaks with an intensity greater than a threshold of 0.5 pA and designated with a four-digit number that included the DF setting as the first two, and order of separation as the second two digits. MF with differential ion intensities were isolated. The data represents the (A) the area under a receiver operator characteristic curve (AUC) for individual MF for the detection of cirrhosis; (B) MF peak intensity in samples from persons with different stages of disease. The data represents the mean and standard deviation for MF peak ion intensity in pA.
Performance of MF score.
An MF score was derived using logistic regression analysis for the molecular feature (MF) with the highest area under a receiver operating characteristic curve (AUC). Receiver operator curves for the MF score for presence or absence of cirrhosis (A) or for the indicated cirrhosis disease stages (B). Variation in MF score at different disease stages (C), and with Model for End Stage Liver Disease (MELD) or FIB-4 scores (D). The data represent mean and SD of the MF score.
Classifiers based on machine learning of global volatolomic output
Disease-associated alterations in VOC that have a low abundance may be detectable but may fall below arbitrarily defined intensity thresholds. To determine the utility of incorporating these minor, yet potentially informative volatolomic changes within a biomarker algorithm, we analyzed the entire GC-FAIMS output for each sample. Classifier models were generated by using supervised machine learning in an unbiased approach to analyze the time resolved SC under DF 45 V. The average analytical run time was 2164.1 ± 1.6 (SD) seconds, with a range between 2160.2 to 2168.0 seconds. While minor, the inherent variability in run times (± 0.18%), can confound an assessment of VOC output.To determine whether technical variations in volatolomic detection would preclude effective disease classification, we determined the inter-and intra-individual variability across different samples or participant technical replicates using a pre-trained convolutional neural network (CNN), ResNet-50. First, the entire FAIMS DF-specific matrices were imported into the fully connected CNN, generating 2048 intermediate prediction values that are used to determine the final categorization. We calculated the Euclidean mean distance (EMD) between these predictive values, providing a pair-wise measurement that reflects the dissimiliarity in ResNet-50 classification between samples. The average EMD across four biological replicates from a single healthy individual collected over five separate days was 1.44, and across technical replicates on each day was 1.39. In comparison, the average EMD for samples from a random selection of cirrhotic patients was 1.99 and in the healthy controls group was 2.45 S4 Fig. Thus, the variability across different individuals exceeded that occurring as a result of technical or biological variation within a single individual’s FAIMS breath sample.These data support the potential utility of global volatolomic analyses to develop classifier biomarkers. In order to further reduce technical variation, we eliminated samples that had been stored for more than 6 weeks prior to analysis, or where artifacts due to humidity contributed to signal degradation. 173 samples (87%) met these selection criteria.Machine-learning training and independent validation was done using MATLAB and the performance validated in an independent validation set.Classifier model SC-2A was generated using ensemble learning using a random under-sampling boosted trees (RUSBT); it had a specificity of 75% with a sensitivity of 88% for the detection of cirrhosis (Fig 5). To evaluate the potential impact of environmental VOC on these classifiers, SCs were generated with the respective air supply filter blank sample subtracted or room air blank sample subtracted, and the impact on classifier model performance was assessed. Subtraction of either the air filter or room air data prior to model generation improved the sensitivity of the classifier models for detection of cirrhosis, with former being 100% sensitive. However, elimination of either room air or air filter flanks did not improve specificity for detection of cirrhosis.
Fig 5
Performance of volatolomic models for the detection of cirrhosis.
Volatolomic classifier algorithms were generated by machine learning based analysis of time resolved separation chromatograms (SC). (A.) Classifiers were generated for the detection of cirrhosis. Models were trained on a random set of samples from 24 patients, and the exported models’ performance was assessed in an independent validation set of samples from the remaining patients. The sensitivity and specificity of models based on random under-sampling boosted trees (RUBST) or Subspace K nearest neighbors (SKNN) are shown along with performance characteristics for each model. (B.) Effect of environmental volatile compounds on performance of models for the detection of cirrhosis. Volatolomic classifiers were generated from analysis of baseline chromatograms, or after subtraction of concomitantly collected air-filter or room-air blank sample data. (C.) Classifiers were generated for the detection of decompensated disease (Stage 3) in persons with cirrhosis alone–Model SC-2B using Gaussian Naïve Bayes (GNB), or in the persons with cirrhosis or non-cirrhotic portal hypertension–Model RT-4B using Medium Gaussian support vector machines (GSVM). These models were trained on a random set of samples from 18 liver disease patients, and performance was assessed in an independent validation set of samples from 17 patients for SC-2B or 21 patients for RT-4B. The sensitivity and specificity of models are shown along with performance characteristics for each. AUC: area under the receiver operator characteristic curve.
Performance of volatolomic models for the detection of cirrhosis.
Volatolomic classifier algorithms were generated by machine learning based analysis of time resolved separation chromatograms (SC). (A.) Classifiers were generated for the detection of cirrhosis. Models were trained on a random set of samples from 24 patients, and the exported models’ performance was assessed in an independent validation set of samples from the remaining patients. The sensitivity and specificity of models based on random under-sampling boosted trees (RUBST) or Subspace K nearest neighbors (SKNN) are shown along with performance characteristics for each model. (B.) Effect of environmental volatile compounds on performance of models for the detection of cirrhosis. Volatolomic classifiers were generated from analysis of baseline chromatograms, or after subtraction of concomitantly collected air-filter or room-air blank sample data. (C.) Classifiers were generated for the detection of decompensated disease (Stage 3) in persons with cirrhosis alone–Model SC-2B using Gaussian Naïve Bayes (GNB), or in the persons with cirrhosis or non-cirrhotic portal hypertension–Model RT-4B using Medium Gaussian support vector machines (GSVM). These models were trained on a random set of samples from 18 liver disease patients, and performance was assessed in an independent validation set of samples from 17 patients for SC-2B or 21 patients for RT-4B. The sensitivity and specificity of models are shown along with performance characteristics for each. AUC: area under the receiver operator characteristic curve.While other models, such as SC-1A generated using Subspace k-Nearest Neighbors (SKNN) had a higher sensitivity of 94.9%, their specificity was lower. The performance of the models was similar across different stages S5 Fig. Notably, models trained on datasets that included cases of non-cirrhotic portal hypertension showed a higher sensitivity of 92% while maintaining specificity of 75%. Thus, changes related to portal hypertension may be important contributors to volatolomic outputs.Additional models were generated for the detection of stage 3 disease in persons with known cirrhosis or portal hypertension. A classifier based on a Gaussian Naive Bayes (GNB) SC-2B had a specificity of 0.769 and a sensitivity of 0.739 with an AUC of 0.754. With the inclusion of data from patients with non-cirrhotic portal hypertension, classifiers for prediction of decompensated cirrhosis could be generated using Medium Gaussian support vector machines (GSVM) that had a higher specificity (0.903) albeit with a lower sensitivity. Preprocessing to subtract out room air blanks improved the sensitivity but not the specificity. However, the subtraction of air filter blank data did not improve either sensitivity or specificity.Composite tandem models were generated by combining individual SC based classifiers for the prediction of cirrhosis that could also further separate into compensated or decompensated cirrhosis (Fig 6). Combining both RUSBT and GSVM classifiers into a single tandem model RT-4AB performed well in distinguishing either compensated or decompensated cirrhosis from those without cirrhosis. The tandem model had an accuracy of 89% for detection of the presence of cirrhosis, and 84% for the detection of decompensation when cirrhosis was present. A separate tandem model SC-2AB combining both RUSBT and GNB models and that included data from patients with non-cirrhotic portal hypertension had better performance with a sensitivity of 83% and specificity of 78% for detection of stage 3 disease. In conclusion, with particular attention to pre-analytical variables sample collection and processing, the use of automated machine learning derived models based on time resolved volatolomic profiling provide a higher performance alternative to the use of predictive scores based on intensity derived MF based scores in breath samples.
Fig 6
Performance of tandem classifier models.
Tandem models were created by combining individual models for classification of cirrhosis and for classification of stage 3 (decompensated) disease. (A) In the tandem model, samples classified as cirrhosis using the former model would then be subsequently sub-classified into either compensated or decompensated disease using the latter. Models were trained and validated on a set of optimized samples, and performance validated on an independent set of samples using the exported tandem model. (B,C) The sensitivity and specificity (top) and confusion matrices (bottom) for tandem models for distinguishing between disease stages in independent validation cohorts are shown for subjects with (B) cirrhosis only (SC-2AB), or (C) on either cirrhosis or non-cirrhotic portal hypertension. (RT-4AB).
Performance of tandem classifier models.
Tandem models were created by combining individual models for classification of cirrhosis and for classification of stage 3 (decompensated) disease. (A) In the tandem model, samples classified as cirrhosis using the former model would then be subsequently sub-classified into either compensated or decompensated disease using the latter. Models were trained and validated on a set of optimized samples, and performance validated on an independent set of samples using the exported tandem model. (B,C) The sensitivity and specificity (top) and confusion matrices (bottom) for tandem models for distinguishing between disease stages in independent validation cohorts are shown for subjects with (B) cirrhosis only (SC-2AB), or (C) on either cirrhosis or non-cirrhotic portal hypertension. (RT-4AB).
Discussion
In this pilot study, we demonstrate the feasibility of a systematic approach to the detection of exhaled breath-based volatolomic profiles by illustrating their use for the detection of cirrhosis. These profiles capture the breadth of metabolomic activity without direct identification of individual VOC, and can capture information from low abundance VOC. The volatolomic profiles were generated by TD-GC-FAIMS as a three-dimensional data matrix comprising of time resolved ion intensities at different compensation field points. Intensity defined features derived from these data matrices can be used to generate a biomarker score whereas time-resolved features can be used to generate disease classifiers using machine learning. Thus both intensity and time resolved features of global breath volatolomic analyses could be used to generate clinically useful biomarkers that are distinctive, yet complementary. The multimodality separation approach combining GC for physical and time dependent separation with FAIMS for ion differential mobility separation provides higher resolution separation of VOC within a single work stream. Combining the data obtained with the experimentally derived algorithmic classifiers offers a platform that can be adopted within diagnostic laboratories.The variability and sensitivity of VOC detection on breath analysis have limited the ability to develop breath-based biomarkers. Sources of variation can include environmental, technical, biological or patient-specific factors. Patient age, gender, diet, oral hygiene, smoking history, body mass, medical co-morbidities, and concomitant use of probiotics, antibiotics or other drugs could potentially impact on breath VOC changes. However in one study, alterations in hematological or biochemical markers such as white-blood cell count, cholesterol, or triglyceride levels were not reflected in changes in VOC profiles [11]. Technical factors that can contribute to variability can include instrument settings or scanning rate. GC separation is susceptible to minor RT variations during volatile physical separation; although, the use of TD technology provides a more consistent method for sample introduction into the column. The humidity of the FAIMS clean air supply can alter background noise and reduce the sensitivity. Additionally, perceptible but minor increases in scan rate were observed with current FAIMS settings. Robust deep learning approaches that can incorporate these effects should be evaluated when analyzing disease-associated volatolomic profiles. Data from raw detector outputs such as those used in this study are less amenable to noise filtering or other correction steps when compared with data generated from established chromatographic methods. Although many technical factors that can contribute to variation cannot be completely eliminated, their impact can be minimized by using meticulous collection and analytic protocols. The utility of volatolomic signatures as disease biomarkers will thus be highly dependent on disease-associated alterations that are of sufficiently greater magnitude to overcome some of these variations. As demonstrated in this pilot study, this is feasible for individuals with cirrhosis using GC-FAIMS.Approaches using GC-MS-based VOC identification requires hands-on, stringent analysis by skilled personnel. Operator-generated discrepancies further increase the amount of non-biological information within the dataset. Automated assessment of raw instrument data output bypasses the need for manual specialist involvement and processing while ensuring consistency in detection and analysis. The supervised use of CNN trained on raw GC-MS abundance matrices based on time resolved mass-to-charge ratio has shown high sensitivity for VOC detection [12]. Automation of analysis would reduce the labor required for large cohort metabolomic studies and also provide a framework for standardization for multi-site studies that may enable detection of batch effects [13]. In addition, automated methods of assigning time or intensity defined descriptors to individual VOC verified through the use of standards could further result in the streamlined recognition of individual disease associated VOC.Sample storage conditions are of particular importance for breath VOC analysis, but their effects can be mitigated by limiting the storage time of samples prior to analysis. Although storage of breath samples at 4°C and analysis within 30 days has been recommended [14, 15], VOC stability and lack of storage artefacts were reported during storage for 1.5 months at -80°C using dual-bed Tenax TA and Carbograph sorbent tubes. Our models performed best when trained and tested on samples that had not undergone prolonged storage. Sources of confounding artifacts during storage could result from the migration and separation of trapped VOC between beds within multi-sorbent tubes, leakage out of their caps, or contamination from VOC that diffuse into the storage tubes onto the sorbent material from the coolant, external environment, or from foreign substances adhering to the non-emitting tube caps.The study has some limitations. The study cohort encompassed a broad range of diseases of diverse etiologies that may have variable metabolic effects on VOC production. Having demonstrated the feasibility within this context, further studies to determine the utility of volatolomic profiling as a biomarker of specific clinical phenotypes in disease-specific cohorts are warranted. A further limitation is the reliance on algorithmic approaches for data obtained from a single study site. The use of a novel separation approach precluded the validation in an independent setting. Cross-site validation studies will become possible once the approaches in this study have been adopted and implemented in other settings. These will require particular attention to evaluate for potential batch effects that could arise as a result of the collection or analysis environment, instrument use and operator practices. Standardization of volatolomic profiling across different sites will be necessary prior to further use as diagnostic biomarkers in practice. This would entail the development and use of volatolomic-centric quality control mixtures within and between studies to compare cross-study measurements and the use of within-batch correction algorithms to mitigate the impact of any batch effects that are observed [16].The use of volatolomic signatures and machine learning to generate and analyze predictive biomarker profiles obviates the need for detailed identification of individual VOC. Future studies directed towards the targeted detection and identification of specific VOC metabolites that are informative components of the volatolomic biomarker profiles may be considered, and could eventually enhance our understanding of underlying disease pathophysiology.
Data pre-processing pipeline.
(PDF)Click here for additional data file.
Separation based chromatograms from breath analysis of patients at different disease stages.
(PDF)Click here for additional data file.
Variability of molecular features with age or etiology.
(PDF)Click here for additional data file.
Inter-individual variability in separation chromatograms.
(PDF)Click here for additional data file.
Performance of classifier models in distinguishing across disease stages.
(PDF)Click here for additional data file.27 May 2021PONE-D-21-15285Machine learning analysis of volatolomic profiles in breath can identify non-invasive biomarkers of liver disease: A pilot studyPLOS ONEDear Dr. Patel, Dear Tushar,Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.Please submit your revised manuscript by Jul 02 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.Please include the following items when submitting your revised manuscript:A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.We look forward to receiving your revised manuscript.Kind regards,Matias A Avila, Ph.D.Academic EditorPLOS ONEJournal Requirements:When submitting your revision, we need you to address these additional requirements.1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found athttps://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf andhttps://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf2. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.3. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.In your revised cover letter, please address the following prompts:a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.We will update your Data Availability statement on your behalf to reflect the information you provide.4. We note that you have included the phrase “data not shown” in your manuscript. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository. Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository (such as Figshare or Dryad) and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data.5. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.[Note: HTML markup is below. Please do not edit.]Reviewers' comments:Reviewer's Responses to QuestionsComments to the Author1. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.Reviewer #1: YesReviewer #2: Yes**********2. Has the statistical analysis been performed appropriately and rigorously?Reviewer #1: YesReviewer #2: Yes**********3. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: YesReviewer #2: Yes**********4. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.Reviewer #1: YesReviewer #2: Yes**********5. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)Reviewer #1: The manuscript is technically sound and it represents an extensive piece of research in machine learning field.I have only some reflections and considerations to share with the authors.The major issues had been resolved adequately taking into account the limitation, principally the sample size. As far as it demonstrates the consistency of the data, techniques (5fold-CV, correctly defined train-validation cohorts) and the groups are well balanced, there is nothing to remark. In the case of the SVM and other mentioned approaches, when they reach a 100% sensitivity and the specificity falls, often is a result of sacrificing the smaller group classifying all samples into the bigger group, so the accuracy (normally the score used to fit the model) will reach the maximum.The use of ensemble and tandem methods is particularly indicated when the data is complex and the individual markers have small to medium predictive capacity (AUC=0.5-0.7) as it is in this case. The implementation of ResNet50 is a foot in the door in the good sense, because it’s easily customizable to normalize and remove the noise from raw data, for this purpose is currently used in computer vision. So it can be easily seen the future application of it to automatize the analysis.As a final remark, it will be very exciting to implement a reinforced learning to this approach, this will give to the algorithm the capacity of learning on each example and thereby it will be very useful in diagnostic units, but this is the future as far as it is a pilot study.Reviewer #2: Thomas and colleagues demonstrate in this proof of concept study the feasibility and utility of breath volatolomic profiling to classify liver disease (cirrhosis or portal hypertension). The manuscript is well conducted, material and methods detailed and results well structured and supported.I have only few minor suggestions:1) Similar to determination of inter- and intra-individual variability across different samples or participant technical replicates, did authors evaluate the age-related variability? They have participants from 24 to 76 years old.2) Did authors explore the correlation with the primary liver disease etiology?3) In line 269, related to Figure 3 (graph on the right): authors have written that “four showed a relationship with disease stage”. I would suggest to explain that these four trend to increase with disease stage.**********6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: Yes: Jose Maria Herranz AlzuetaReviewer #2: No[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.22 Oct 2021Response to reviewers.We thank the editor and reviewers for their careful and thoughtful review of our manuscript and for their comments. We have responded to each of these below:Reviewer #1: The manuscript is technically sound and it represents an extensive piece of research in machine learning field. I have only some reflections and considerations to share with the authors.The major issues had been resolved adequately taking into account the limitation, principally the sample size. As far as it demonstrates the consistency of the data, techniques (5fold-CV, correctly defined train-validation cohorts) and the groups are well balanced, there is nothing to remark. In the case of the SVM and other mentioned approaches, when they reach a 100% sensitivity and the specificity falls, often is a result of sacrificing the smaller group classifying all samples into the bigger group, so the accuracy (normally the score used to fit the model) will reach the maximum. The use of ensemble and tandem methods is particularly indicated when the data is complex and the individual markers have small to medium predictive capacity (AUC=0.5-0.7) as it is in this case. The implementation of ResNet50 is a foot in the door in the good sense, because it’s easily customizable to normalize and remove the noise from raw data, for this purpose is currently used in computer vision. So it can be easily seen the future application of it to automatize the analysis. As a final remark, it will be very exciting to implement a reinforced learning to this approach, this will give to the algorithm the capacity of learning on each example and thereby it will be very useful in diagnostic units, but this is the future as far as it is a pilot study.Response: We thank the reviewer for their helpful comments and insights. We concur that reinforced learning approaches could be interesting and we will embark on additional studies to explore these approaches in future.Reviewer #2: Thomas and colleagues demonstrate in this proof of concept study the feasibility and utility of breath volatolomic profiling to classify liver disease (cirrhosis or portal hypertension). The manuscript is well conducted, material and methods detailed and results well-structured and supported. I have only few minor suggestions:1) Similar to determination of inter- and intra-individual variability across different samples or participant technical replicates, did authors evaluate the age-related variability? They have participants from 24 to 76 years old.Response: We have now examined the age-related variability in molecular features within biological replicates. A greater variability was observed in samples from participants aged 45yr or younger. These data are now included in the revised manuscript in lines 258-261, and the data included in the Supplementary Information as S3 Fig.2) Did authors explore the correlation with the primary liver disease etiology?Response: We have examined the average peak area, and the median SD of each molecular feature across replicates from individuals with disease of different etiology. These data are now included in the revised manuscript and the data included in the Supplementary Information as S3 Fig.3) In line 269, related to Figure 3 (graph on the right): authors have written that “four showed a relationship with disease stage”. I would suggest to explain that these four trend to increase with disease stage.Response: We thank the expert reviewer for their helpful comments and have revised this statement to clarify this as recommended by the reviewer.Submitted filename: Response to reviewers.docxClick here for additional data file.3 Nov 2021Machine learning analysis of volatolomic profiles in breath can identify non-invasive biomarkers of liver disease: A pilot studyPONE-D-21-15285R1Dear Dr. Patel,We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.Kind regards,Matias A Avila, Ph.D.Academic EditorPLOS ONEAdditional Editor Comments (optional):Reviewers' comments:17 Nov 2021PONE-D-21-15285R1Machine learning analysis of volatolomic profiles in breath can identify non-invasive biomarkers of liver disease: A pilot studyDear Dr. Patel:I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.If we can help with anything else, please email us at plosone@plos.org.Thank you for submitting your work to PLOS ONE and supporting open access.Kind regards,PLOS ONE Editorial Office Staffon behalf ofDr Matias A AvilaAcademic EditorPLOS ONE
Authors: L Blanchet; A Smolinska; A Baranska; E Tigchelaar; M Swertz; A Zhernakova; J W Dallinga; C Wijmenga; F J van Schooten Journal: J Breath Res Date: 2017-02-22 Impact factor: 3.262
Authors: Sean W Harshman; Nilan Mani; Brian A Geier; Jae Kwak; Phillip Shepard; Maomian Fan; Gregory L Sudberry; Ryan S Mayes; Darrin K Ott; Jennifer A Martin; Claude C Grigsby Journal: J Breath Res Date: 2016-10-12 Impact factor: 3.262
Authors: Jesica Dadamio; Sandra Van den Velde; Wim Laleman; Paul Van Hee; Wim Coucke; Frederik Nevens; Marc Quirynen Journal: J Chromatogr B Analyt Technol Biomed Life Sci Date: 2012-07-31 Impact factor: 3.205
Authors: Georgios Stavropoulos; Daisy M A E Jonkers; Zlatan Mujagic; Ger H Koek; Ad A M Masclee; Marieke J Pierik; Jan Dallinga; Frederik Jan van Schooten; Agnieszka Smolinska Journal: J Breath Res Date: 2020-03-02 Impact factor: 3.262
Authors: Antonio De Vincentis; Giorgio Pennazza; Marco Santonico; Umberto Vespasiani-Gentilucci; Giovanni Galati; Paolo Gallo; Chiara Vernile; Claudio Pedone; Raffaele Antonelli Incalzi; Antonio Picardi Journal: Sci Rep Date: 2016-05-05 Impact factor: 4.379
Authors: Kirsten E Pijls; Agnieszka Smolinska; Daisy M A E Jonkers; Jan W Dallinga; Ad A M Masclee; Ger H Koek; Frederik-Jan van Schooten Journal: Sci Rep Date: 2016-01-29 Impact factor: 4.379
Authors: Mikolaj Wieczorek; Alexander Weston; Matthew Ledenko; Jonathan Nelson Thomas; Rickey Carter; Tushar Patel Journal: Front Med (Lausanne) Date: 2022-09-29