| Literature DB >> 35057817 |
L Plantier1,2,3, A Smolinska4, R Fijten4,5, M Flamant6, J Dallinga4, J J Mercadier6, D Pachen4, M P d'Ortho6,7, F J van Schooten4, B Crestani8,9,10, A W Boots11.
Abstract
BACKGROUND: Fibrotic Interstitial lung diseases (ILD) are a heterogeneous group of chronic lung diseases characterized by diverse degrees of lung inflammation and remodeling. They include idiopathic ILD such as idiopathic pulmonary fibrosis (IPF), and ILD secondary to chronic inflammatory diseases such as connective tissue disease (CTD). Precise differential diagnosis of ILD is critical since anti-inflammatory and immunosuppressive drugs, which are beneficial in inflammatory ILD, are detrimental in IPF. However, differential diagnosis of ILD is still difficult and often requires an invasive lung biopsy. The primary aim of this study is to identify volatile organic compounds (VOCs) patterns in exhaled air to non-invasively discriminate IPF and CTD-ILD. As secondary aim, the association between the IPF and CTD-ILD discriminating VOC patterns and functional impairment is investigated.Entities:
Keywords: Connective tissue disease associated-ILD (CTD-ILD); Diagnostic profiles; Gas chromatography–time of flight–mass spectrometry (GC-tof–MS); Idiopathic pulmonary fibrosis (IPF); Volatile organic compound (VOC)
Mesh:
Year: 2022 PMID: 35057817 PMCID: PMC8772159 DOI: 10.1186/s12931-021-01923-5
Source DB: PubMed Journal: Respir Res ISSN: 1465-9921
Fig. 1Conceptual flowchart presenting the approach used for statistical analysis. In step 1, a database is build with all clinical data and the preprocessed VOCs data contain three main groups: IPF (n =53), CTD-ILD (n=51) and healthy controls (n=51). In step 2, the machine learning method Random Forests (RF) was used to find discriminatory VOCs. For that purpose three different discriminatory RF models were built. Each discriminatory RF model was constructed on a training set (containing 80% of samples of each group) and validated using an independent test set (containing 20% of samples of each group). Training and test sets were selected using Duplex method (27). First RF algorithm was applied on VOCs data containing IPF and controls to find compounds linked to IPF. The second classification model was constructed on chromatograms belonging to CTD-ILD and healthy controls to allow selecting of VOCs related solely to CTD-ILD. The third RF algorithm was applied on data encompassing breath samples of IPF and CTD-ILD with the purpose to find VOCs differentially profiled between these two pulmonary pathologies. To demonstrate the performance of each RF analysis the receiver operating characteristic curve (ROC) is used and sensitivities and specificities determined. In step 3, the compounds selected as significant in step 2 are combined. In step 4, the final RF model is constructed using chromatograms belonging to IPF, CTD-ILD and heathy controls. In order to demonstrate the differences between the three groups Principal Component Analysis (PCA) is performed on proximities obtained from the final RF model (step 5) with the purpose to visualize the relation between all breath samples
Fig. 2The importance of the variables for each of the three comparisons. The dashed horizontal lines indicate the chosen cut-off to select the most important VOCs for chemical identification. A IPF vs. controls; B CTD vs. controls; C IPF vs. CTD-ILD
Characteristics of all subjects analyzed in the present study
| IPF patients | CTD-ILD patients | Significance Ctrls vs. IPF | Significance Ctrls vs. CTD-ILD | Significance IPF vs. CTD-ILD | ||
|---|---|---|---|---|---|---|
| Number of subjects | 51 | 53 | 51 | ns | ns | Ns |
| Age (yrs. ± STD) | 53 (± 10) | 69 (± 9) | 57 (± 12) | 5.8*10–6 | ns | 1*10–6 |
| % male | 68 | 75 | 26 | ns | 0.002 | 1.63*10–6 |
| % smokers ex/current | 35/16 | 60/0 | 24/4 | 0.02 | ||
| mMRC class 0 | 44 | 5 | 5 | ns | ||
| mMRC class 1 | 7 | 29 | 22 | ns | ||
| mMRC class 2 | 0 | 12 | 16 | ns | ||
| mMRC class 3 | 0 | 6 | 5 | ns | ||
| mMRC class 4 | 0 | 1 | 3 | ns | ||
| VC (% ± STD) | NA | 77 (± 22) | 78 (± 23) | ns | ||
| TLC (% ± STD) | NA | 71 (± 15) | 76 (± 15) | ns | ||
| FRC (% ± STD) | NA | 74 (± 17) | 78 (± 15) | ns | ns | 0.001 |
| FEV1 (% ± STD) | NA | 80 (± 22) | 78 (± 25) | ns | ns | 0.0008 |
| DLCO (% ± STD) | NA | 49 (± 17) | 49 (± 17) | ns | ns | 0.03 |
| PaO2 (± STD mm Hg) | NA | 75 (± 9) | 80 (± 12) | ns | ||
| PaC02 (± STD mm Hg) | NA | 41 (± 4) | 39 (± 4) | ns | ||
| 6MWD (% ± STD) | NA | 88 (± 20) | 78 (± 20) | ns |
Continuous variables were displayed as mean and standard deviation or percentages. Lung function parameters are displayed as a percentage of the predicted value based on age and gender. mMRC modified medical research council dyspnea scale VC vital capacity, TLC total lung capacity, FRC functional residual capacity, FEV1 forced expiratory volume in 1 s; DLCO diffusing capacity for carbon monoxide, PaO and PaCO oxygen and carbon dioxide pressure in arterial blood, 6MWD six-minute walk distance, NA not available. Significances were calculated using a Student’s t-test (the data are normally distributed based on Lilliefors test) in combination with False Discovery Rate correction. ns not significant
Fig. 3VOC profiling for IPF versus controls. A ROC curve of the 34-VOC IPF versus controls profile. The AUC is 91.2%. B 3D PCA plot of Random Forests proximities comparing IPF and controls. The distance between individual points expresses their similarity, i.e. short distance indicates s highly similar VOC profile and vice versa
Chemical putative identities of the most contributing VOCs of the comparison between IPF and controls
| Chemical identity | CAS number | Change in IPF with respect to controls |
|---|---|---|
| Ethanol | 64-17-5 | Down |
| Heptane | 142-82-5 | Down |
| Benzaldehyde | 100-52-7 | Up |
| Unknown | NA | Down |
| Dimethyl sulfide | 75-18-3 | Down |
Chemical putative identities of the most contributing VOCs of the comparison between CTD-ILD and controls
| Chemical identity | CAS number | Change in CTD-ILD with respect to controls |
|---|---|---|
| 2-Heptanone | 110-43-0 | Down |
| 4-penten-ol | 821-09-0 | Down |
| 2,5-dimethyl furan | 625-86-5 | Down |
| Ethanol | 64-17-5 | Down |
Fig. 4VOC profiling for CTD-ILD patients versus controls. A ROC curve of the 11-VOC CTD-ILD versus controls profile. AUC is 83.9%. B 3D PCA plot of Random Forests proximities comparing CTD-ILD patients and controls
Fig. 5VOC profiling for IPF patients versus CTD-ILD patients. A ROC curve of the 16-VOC IPF versus CTD-ILD profile. AUC is 83.8%. B 3D PCA plot of Random Forests proximities comparing IPF and CTD-ILD patients
Chemical putative identities of all discriminatory VOCs of the comparison between IPF and CTD-ILD
| Chemical identity | CAS number | Change in IPF with respect to CTD-ILD |
|---|---|---|
| Acetone | 67-64-1 | Down |
| Dimethylsulfone | 3877-15-4 | Up |
| Heptane | 142-82-5 | Down |
| 4-methyl-2-heptene | 3404-56-6 | Up |
| Branched C11H24 | NA | Up |
| Undecane | 1120-21-4 | Down |
| Tridecane | 629-50-5 | Up |
| Octadecane | 593-45-3 | Down |
| Branched C12H24 | NA | Up |
| Pyrrolidine | 123-75-1 | Down |
| Decanal | 112-31-2 | Down |
| 2-heptanone | 110-43-0 | Up |
| Branched C14H30 | NA | Up |
| 4-penten-ol | 821-09-0 | Up |
| 2,5-methylfuran | 625-86-5 | Down |
| 2-thiapentane | 3877-15-4 | Up |
Fig. 6VOC profiling of IPF versus CTD-ILD versus controls. 3D score plot of combined binary classification RF model
The influence of the significant study parameters on the discriminatory VOC profiles
| Study Parameter | Comparison | p-value |
|---|---|---|
| Age | Controls vs. IPF | 0.212 |
| Age | IPF vs. CTD-ILD | 0.219 |
| Gender | CTD-ILD vs. Controls | 0.072 |
| Gender | IPF vs. CTD-ILD | 0.814 |
| Smoking | IPF vs. CTD | 0.637 |
Regularized MANOVA was used to test whether significant study parameters were influential. A p-value < 0.05 was considered significant
Fig. 7Correlation between the discriminatory VOCs and lung function parameters TLC and 6MWD. This correlation plot depicts the canonical variate of the VOCs on the x-axis and the canonical variate of the TLC and 6MWD on the y-axis
Fig. 8Relative concentrations of individual VOCs reported in literature to differ in the breath of IPF patients and healthy controls. The displayed boxplots represent the following volatiles: A Isoprene, B p-Cymene, C Ethylbenzene, D m- and/or p-Xylene, E o-Xylene. In each plot, the p-value is displayed, where a p-value < 0.05 is considered significant. m-, p-, and o-xylene are hard to distinguish from ethylbenzene, leading to possible misidentification, thus their significances are also reported