Literature DB >> 35480252

Exploring the performance of a functionalized CNT-based sensor array for breathomics through clustering and classification algorithms: from gas sensing of selective biomarkers to discrimination of chronic obstructive pulmonary disease.

Giovanni Drera^1,2, Sonia Freddi^1,2,3, Aleksei V Emelianov^4,5, Ivan I Bobrinetskiy^4,6, Maria Chiesa¹, Michele Zanotti^1,2, Stefania Pagliara^1,2, Fedor S Fedorov⁷, Albert G Nasibulin^7,8, Paolo Montuschi⁹, Luigi Sangaletti^1,2.

Abstract

An array of carbon nanotube (CNT)-based sensors was produced for sensing selective biomarkers and evaluating breathomics applications with the aid of clustering and classification algorithms. We assessed the sensor array performance in identifying target volatiles and we explored the combination of various classification algorithms to analyse the results obtained from a limited dataset of exhaled breath samples. The sensor array was exposed to ammonia (NH3), nitrogen dioxide (NO2), hydrogen sulphide (H2S), and benzene (C6H6). Among them, ammonia (NH3) and nitrogen dioxide (NO2) are known biomarkers of chronic obstructive pulmonary disease (COPD). Calibration curves for individual sensors in the array were obtained following exposure to the four target molecules. A remarkable response to ammonia (NH3) and nitrogen dioxide (NO2), according to benchmarking with available data in the literature, was observed. Sensor array responses were analyzed through principal component analysis (PCA), thus assessing the array selectivity and its capability to discriminate the four different target volatile molecules. The sensor array was then exposed to exhaled breath samples from patients affected by COPD and healthy control volunteers. A combination of PCA, supported vector machine (SVM), and linear discrimination analysis (LDA) shows that the sensor array can be trained to accurately discriminate healthy from COPD subjects, in spite of the limited dataset. This journal is © The Royal Society of Chemistry.

Entities: Chemical

Year: 2021 PMID： 35480252 PMCID： PMC9041100 DOI： 10.1039/d1ra03337a

Source DB: PubMed Journal: RSC Adv ISSN： 2046-2069 Impact factor: 4.036

Introduction

Pristine and functionalized carbon nanotube (CNT) layers are largely used to develop gas sensors.[1] Indeed, CNT-based sensors display a remarkable stability over time and can operate at room temperature. Furthermore, their virtually one-dimensional nature is at the origin of the high surface to volume ratio and the remarkable charge transport properties, which are both required for highly sensitive gas/volatile sensors. Functionalization and decoration of CNTs offer manifold solutions that can be exploited to develop multiplexed systems for volatile organic compound (VOC) profiling in exhaled breath, a non-invasive, real-time, potential diagnostic tool of many diseases.[2-4] For these reasons, electronic noses may become an important tool in diagnostics and in health screening programs. Since each sensor in the electronic nose can respond to several different volatiles, thus reducing the selectivity towards specific analytes, suitable data processing from the whole set of sensors is required to obtain relevant information on exhaled breath VOC patterns. In this context, chemometrics and classification machine learning methods such as linear discriminant analysis (LDA), support vector machine (SVM), and principal component analysis (PCA) represent effective tools to analyse electronic nose data.[5] Along with the sensitivity of properly functionalized CNTs, this statistical approach can, therefore, be quite helpful to cope with a fundamental challenge of gas/volatile sensing, i.e. the capability to monitor environments where a manifold of target molecules is present, providing a response that is able to discriminate the desired property, even though the sensors can be individually poorly selective. Despite CNT-based sensors are largely used as single gas sensors, studies on electronic noses completely based on CNT and exposed to human breath samples are still few.[6-10] Usually, these arrays are exposed to simulated (synthetic) human breath, consisting of a mixture of gas/volatile molecules[11-13] or to single target gases/volatiles that are known biomarkers of specific diseases.[14-31] For example, exhaled breath C6H6 is found elevated in lung cancer patients,[21] whereas high exhaled breath NH3 concentrations are related to liver or kidney disease.[2,32-34] Low ammonia concentrations[35] and high NO2 concentrations are related to COPD;[14,36,37] finally, H2S has been proposed as a biomarker of asthma.[38,39] The use of CNT-based arrays in breath analysis applications is detailed in Table 1, where the different CNT-based sensing layers, data analysis strategy, and analysed samples (human breath, synthetic breath, biomarkers) are listed. Studies using CNT-based sensor arrays to analyse exhaled breath from sick individuals, are relatively few,[6-10] mainly focused on cancer, particularly lung cancer.[8,10,19,21-26] Most of these studies used PCA for data analysis. Only one study has been reported in patients affected by a chronic lung pathology other than cancer, i.e., chronic obstructive pulmonary disease (COPD).[6] However, the recent COVID 19 pandemic has pointed out the importance of developing a rapid and reliable diagnostic test for respiratory diseases such as pneumonia and seasonal flu,[40] along with the possibility to deploy screening test for the detection of COVID-19 infections.[41] In this frame, e-noses might play a major role in the screening of respiratory diseases.

CNT-based sensor arrays for breathomics applications: materials, targeted disease, testing gas/volatile and data analysis methodology. DFA is the discriminant functional analysis

Sensors material	Disease	Testing gas/volatiles	Data analysis	Ref.
SWCNT-organic semiconductors layers	COPD	Exhaled breath + 9 biomarkers	PCA, SVM	6
Bilayers of polycyclic aromatic hydrocarbons and SWCNT	Multiple sclerosis	Exhaled breath	DFA	7
Organically stabilized spherical gold NPs and SWCNTs capped with polycyclic aromatic hydrocarbons	Lung cancer	Exhaled breath	DFA	8
CNT and gold NPs	Precancerous gastric lesions and gastric cancer	Exhaled breath	DFA	9
Polymer and functionalized SWCNTs	Liver cancer	Exhaled breath + 5 biomarkers	PCA	10
CNT coated with nonpolymeric organic materials	Lung cancer	Simulated breath	PCA	11
CNT	Chronic renal failure	Simulated breath	PCA	12
Ionic liquid-CNT	—	Simulated breath and 8 biomarkers	PCA	13
Nanoparticle decorated SWCNT	—	9 biomarkers	PCA	16
Transition metal decorated SWCNT	Lung cancer	1 biomarker		17
CNT/Hexa-peri-hexabenzocoronene bilayers	Cancer	5 biomarkers	PCA	18
CNT-conductive polymer nanocomposites	Lung diseases	9 biomarkers	PCA	19
MWCNT and gold NPs	—	7 biomarkers	PCA	20
CNT-conductive polymer nanocomposite	Lung cancer	19 biomarkers	PCA	21
COOH-MWCNT functionalized with polyhedral oligomeric silsesquioxanes (POSS)	Lung cancer, diabetes, malignant pleural mesothelioma	9 biomarkers	PCA	22
Sulfonated poly(ether ketone) (SPEEK) nanocomposites based on hybrid nanocarbons	Lung cancer	7 biomarkers	PCA	23
Surfactant-CNT	Lung cancer	15 biomarkers	PCA	24
Polymer coated CNT	Lung cancer	8 biomarkers	PCA	25
Carbon nanorods	Lung diseases	19 biomarkers	PCA	26
DNA-CNT	Skin cancer	18 biomarkers		27
SWCNT and metallic NPs	—	4 biomarkers	PCA	28
DNA-functionalized SWCNT	Lung cancer, excessive drinking and diabetes	7 biomarkers		29
SWCNT, SWCNT + NPs, polymer coated SWCNT	—	6 biomarkers	PCA	30
Metal decorated MWCNT	—	1 biomarker	LDA	31

In the present study, the overall sensing behaviour of an 8-element array of CNT-based sensors, each with different functionalization, is investigated. In a previous study,[6] we presented the general physical properties and behaviour of the array; in this study, we focus on a more comprehensive data analysis at all stages of the array testing: exposure to selected biomarkers, extraction of calibration curves of individual sensors, and exposure to a limited number of samples from COPD. Finally, we assess various multivariate analysis strategies to optimize the chemical sensor array classification capability. According to this scheme, a set of NH3, NO2, C6H6, H2S exposures was carried out and the calibration curves for each of the 8 sensors in the array exposed to the 4 volatiles were built. Data were analysed by PCA, to assess the array selectivity and its capability to discriminate the four different target volatile molecules. Next, the sensor array was exposed to exhaled breath samples from COPD patients and healthy control volunteers. The array ability to discriminate breathprints of sick and healthy volunteers was assessed by three different algorithms: PCA, SVM and LDA. In particular, we explored the classification ability of the present device in a limited number of individuals, including repeated measurements taken from the same individual and collected in different conditions (such as ambient relative humidity and temperature, different days). The sample collection was defined by a minimal set of rules, mostly based on the use of disposable materials. This approach might have applications in harsh environments where control over strict sample collection protocols may not be feasible.

Experimental

Sample preparation and characterization

To synthesise single-walled carbon nanotubes (SWCNTs) we utilised an aerosol chemical vapor deposition (CVD) method.[42] Namely, SWCNTs were grown by gas-phase formation based on the thermal decomposition of ferrocene in the presence of carbon monoxide and deposited onto low-adhesive filter in the same process. SWCNT films with the thickness of about 30 nm were collected on the filter from the gas phase downstream of the reactor and subsequently dry-transferred[43] on a polyethylene terephthalate (PET) substrate with a dimension of the sample of 1.5 × 1.0 cm2. The array of sensors was composed of 8 SWCNT films: one was made of pristine SWCNTs, the other one was UV-functionalized (hereafter called COOH), while the rest 6 films were functionalized with selected organic molecules, namely by DNA oligonucleotide (hereafter called DNA), α,α′-dihexylquaterthiophene (hereafter called Hex-4T-Hex), polyaniline (hereafter called PANI), perylene-3,4,9,10-tetracarboxylic-dianhydride (hereafter called PTCDA), tris(4-carbazoyl-9-ylphenyl)amine (hereafter called TCTA) and 4,4′-cyclohexylidenebis [N,N bis(4-methylphenyl) benzene-amine] (hereafter called TAPC). DNA and PANI layers were deposited from an aqueous solution with a concentration of 0.5 mM and 0.1 μM, respectively, forming the monolayer on the top of nanotubes. Film of Hex-4T-Hex molecules was deposited via thermal evaporation at 70 °C and speed of 0.01 nm s−1 to form an average thickness of 5 nm. PTCDA molecules were evaporated at 120 °C and 0.025 nm s−1 and form a porous 4 nm film. TCTA film with the average thickness of 6 nm was formed via thermal evaporation at 105 °C and 0.012 nm s−1 speed. Film of TAPC with the thickness of 10–20 nm was deposited by the same technique at 77 °C and 0.1 nm s−1 speed. The structural and optical properties of the samples were characterized by atomic force microscopy (AFM), transmission electron microscopy (TEM), UV-Vis and Raman spectroscopy.[6] Cr/Pd (5 nm/150 nm) electrodes were sputtered through a paper mask on the opposite sides of the 1.5 × 1.0 cm2 sample.

Sensing properties

A schematic view of the SWCNT-based sensor is shown in Fig. 1a. The 8 sensors were set on a properly designed board (Fig. 1b), with 8 independent channels (single read out scheme is shown in Fig. 1c) for the simultaneous detection of each sensor's response. Relative humidity (RH) (humidity sensor HIH-4000 series – Honeywell sensing) and temperature (Thermistor NTC PCB 5K – Murata) were also collected by placing these sensors at the centre of the board (R and T in Fig. 1b). The sensor signal was acquired using a script written in the LabVIEW environment.

Fig. 1

(a) Single sensor layout; (b) sensor array layout. (b) T and RH represent the temperature and relative humidity sensors, respectively (c) single sensor readout scheme. (d) Resistance change vs. time measured by the eight sensors array during exposure to two different ammonia concentrations: 15.9 ppm and 8.1 ppm, respectively. Exposure time was set at 180 s.

The sensing properties of the array upon gas/volatile exposures were analysed in the chemo-resistive configuration, where the presence of gases/volatiles is detected monitoring the change in the resistance value of the sensitive element, i.e. bundles of SWCNTs (pristine or functionalized). The chemiresistor basic electronic circuit includes a load resistor (RL) in series with the sensor, to which a constant voltage (V = 5 V) is applied. The resistance RS of the sensor is tracked by monitoring the voltage VOUT across the sample. The response, defined as ΔR/R0 = (R − R0)/R0, where R0 is the baseline resistance of the sensor measured without the gas/volatile and ΔR is the resistance change during the gas/volatile exposure, was then measured. Following a set of multiple exposures, calibration curves for each sensor were obtained by plotting the sensor response ΔR/R0versus the gas/volatile concentration. In order to probe the capability of the sensors array to discriminate different target molecules, the sensors array was exposed to 4 gases/volatiles: ammonia, benzene, hydrogen sulphide and nitrogen dioxide. The exposure to ammonia has been carried out in laboratory air and NH3 concentration was measured with a properly calibrated sensor. Responses to benzene, hydrogen sulfide, and nitrogen dioxide were carried out in a sealed test chamber. In this case gas concentration was controlled by properly mixing dry air with analyte molecules through mass flow controllers. In all cases, humidity was constantly monitored through the RH sensor placed in the middle of the board. All measurements were performed at room temperature.

Breath analysis

Breath samples were collected (after signed consent) from 11 volunteers aged 22–88 years. Among them, 7 volunteers suffer from COPD, while 4 were healthy control volunteers. All volunteers were recruited within a research project funded by the Università Cattolica del Sacro Cuore in the frame of the 2016–2018 D 3.2 Strategic Program “Anapnoi”. For each volunteer, several samples were collected on different days. An overall number of 52 samples were analysed. Subject characteristics including age, gender, COPD category as well as the number of tests carried out for each subject are shown in Table S1 (in the ESI†). Breath sampling was carried out in a disposable bag (volume = 0.6 liters), containing the sensor array, and inflated by breath through a disposable plastic straw. This procedure took around 10–15 seconds until the bag was fully inflated. We did not record significant differences among volunteers during the bag inflation phase, likely due to the reduced volume to fill and to the lack of any filter along the collection pipeline, which could hinder the bag inflation step. The overall sensor exposure time inside the bag was set to 3 minutes, to let all sensors fully interact with the breath sample.

Data analysis algorithms

Data obtained from gas/volatile exposures and exhaled breath samples are shown in ‘Sensing of selected biomarkers’ and ‘Breath analysis’ sections. Column mean-centring was used for data pre-treatment. PCA was used to analyse a set of 26 exposures from 4 gases/volatiles. PCA, LDA, and SVM were used to analyse 52 exhaled breath samples obtained from 11 subjects.

Results and discussion

Sensing of selected biomarkers

Fig. 1d shows an example of sensor array resistance variation during 3 min exposure to ammonia at 8.1 ppm and 15.9 ppm concentrations. All sensors show increased resistance due to the presence of ammonia, suggesting a p-type doping of CNTs.[44] For all sensors, the resistance increase depends on ammonia concentration and the recovery is always reached in about 20 minutes. Calibration curves obtained by sensor array exposures at various ammonia concentrations are shown in Fig. 2. The same y-axis scale was used to plot all the sensor responses; the error bars, estimated on the basis of the signal-to-noise ratio, are basically negligible. COOH and PANI sensors displayed the greatest responses to ammonia.

Fig. 2

Calibration curves extracted from measurements upon exposure to ammonia (NH3), nitrogen dioxide (NO2), hydrogen sulphide (H2S), and benzene (C6H6). Error bars are estimated on the basis of the signal-to-noise ratio. Concentration range: 0–60 ppm for NH3, 0–2.5 ppm for NO2, 0–2.5 ppm for H2S, 0–0.30 ppm for C6H6.

All calibration curves shown in Fig. 2 display a linear behaviour, except PANI and COOH curves, which show a clear sublinear behaviour across the ammonia concentration range; all data were interpolated by the Freundlich isotherm (ΔR/R0 = A[NH3]pow, dashed line in Fig. 2), with an exponential factor (pow) that resulted to be virtually 1 for the six sensors that display a linear behaviour (see also Fig. S1 in the ESI†). Similar data was obtained exposing the sensor array to different nitrogen dioxide, hydrogen sulphide and benzene concentrations as shown in Fig. 2. All curves display a sub-linear behaviour. As expected, the ΔR/R0 value decreased upon NO2 exposure, as NO2 is known as an oxidizing molecule. In particular, COOH and TAPC sensors displayed the highest responses to nitrogen dioxide exposure, while DNA and TCTA sensors displayed the lowest ones. Hex-4T-Hex and CNT responses to hydrogen sulphide and benzene were quite low as compared to the noise and their evaluation was difficult; thus, we did not proceed with the dataset fitting. The same holds for the TAPC response to benzene. Finally, it is worth noticing that the lowest volatile concentrations used to build the calibration curves were in the sub-ppm range (0.3 ppm for nitrogen dioxide, 0.12 ppm for hydrogen sulphide, and 50 ppb for benzene). Following the analysis of the calibration curves, nitrogen dioxide (NO2) and ammonia (NH3) were selected to carry out a benchmarking with literature results based on CNT sensors and published since 2000. These volatiles are among the most used compounds to test chemical sensors, due to their oxidizing and reducing properties, respectively; in particular, testing of CNT-based sensors to evaluate their performances is often carried out with nitrogen dioxide (NO2) and ammonia (NH3).[44] For these reasons, ammonia (NH3) and nitrogen dioxide (NO2) exposure datasets available in literature are much larger than hydrogen sulphide (H2S) and benzene exposure datasets, making a benchmarking statistically more significant.[16,45-101] Fig. 3 shows this benchmarking, carried out by considering the sensitivity, S, defined as S = 100 × (ΔR/R0)/[gas/volatile], where [gas/volatile] is the gas/volatile (ammonia, NH3 or nitrogen dioxide, NO2) concentration. The comparison shows that the present results, with testing mainly in the low gas/volatile concentration range (1–10 ppm) are quite remarkable in terms of sensitivity for all sensors in the case of NO2 (Fig. 3 upper panel), and especially for PANI and COOH sensors in the case of NH3 (Fig. 3 lower panel).

Fig. 3

Benchmarking of the sensor array with respect to CNT-based chemiresistor performances for nitrogen dioxide (NO2) (upper panel) and ammonia (NH3) (lower panel) exposures.

Sensor responses obtained following exposures to the 4 gases/volatiles and used to draw the calibration curves have been analyzed with PCA as shown in Fig. 4a and b. The resulting variance in the PCA space is 99.71% by considering the first and second components (PC1–PC2 plot, Fig. 4a) and 96.61% for the first and the third components (PC1–PC3 plot, Fig. 4b).

Fig. 4

PCA of the sensor array responses following nitrogen dioxide (0.3–2.5 ppm), hydrogen sulphide (0.12–2.5 ppm), benzene (0.05–0.25 ppm) and ammonia (2.2–60 ppm) exposures. (a) space generated by PC1 and PC2, (b) space generated by PC1 and PC3, (c) loadings on PC1; (d) loadings on PC2; (e) loadings on PC3.

The contribution of ammonia and nitrogen dioxide are well separated in the PC1 vs. PC2 space (Fig. 4a), while benzene and hydrogen sulphide exposure data are overlapped. However, benzene and hydrogen sulphide can clearly be discriminated considering a 2D space generated by PC1 and PC3 (Fig. 4b). As expected, all 4 gases/volatiles follow a clear trend, which goes from the edges to the plot center as gas/volatile concentrations decrease. Furthermore, ammonia and nitrogen dioxide, which are a reducing and an oxidizing gas/volatile, respectively, span the opposite side of the space generated by both PC1–PC2, and PC1–PC3 component couples. Loading plots for PC1, PC2, and PC3 help clarify the role of each individual sensor into the discrimination of target gas/volatile molecules. As shown in Fig. 4c, all sensors equally contribute to PC1 loadings, resulting in the capability to separate nitrogen dioxide (NO2) and ammonia (NH3) measurements along the PC1 axis of the PC1–PC2 space. Benzene and hydrogen sulphide (H2S) exposure data are not separated from each other and are found nearly superposed on a vertical line in the bottom part of the PC2 range. Thus, PC2 separates the ammonia (NH3) and nitrogen dioxide (NO2) pair couple from the benzene and hydrogen sulphide (H2S) pairs. The loading plots (Fig. 4d) show that RH sensor, and TCTA to a lesser extent, provide the major contribution to PC2. As shown by loading plots for PC3 (Fig. 4e), the discrimination between hydrogen sulphide (H2S) and benzene, provided by the PC3 component in the PC1–PC3 plot, can be related to the contribution of the PANI sensor and, to a lesser extent, to TCTA and RH sensors. We exposed the sensors array to breath samples obtained from 4 healthy volunteers and 7 subjects with COPD. To avoid systematic bias in breath sampling, samples were collected on different days and environmental conditions. Subject characteristics and number of samples collected from each person are presented in Table S1† (in the ESI). Samples were collected asking the participants to fully exhale through a plastic straw into a disposable plastic bag where the sensor array was placed. After breath collection, the bags were sealed, and the sensor array exposed to breath sample for 180 seconds. Then, bags were opened, and the sensor array was flushed with dry air until recovery to baseline signal before exposing to the next breath sample. Fig. 5 shows a schematic representation of the breath sampling setup (left panel), along with an example of the 8-sensors array responses to an exhaled breath sample from a healthy subject (right panel). Sensors responses were analysed by PCA.

Fig. 5

Left panel: schematic representation of the breath sampling setup. Right panel: resistance change vs. time measured by the eight sensors array upon 180 s exposure to an exhaled breath sample from a healthy subject. The COOH sensor curve has been rescaled by a 0.3 factor.

An overview of the data collected from the 52 exposures considered in this study (Fig. 6) shows that a lower response of each sensor in the array is clearly detectable in sick patients with respect to healthy patients. The average and standard deviation for these values is reported in Fig. 7 for each sensor of the array. This finding is rather appealing and can be rationalized by relating the array response to the specific sensitivity to NO2 and NH3. Qualitatively, on the basis of literature (ref. 14, 36, 37 and 35, respectively) we know that COPD patients display a higher NO2 concentration and a lower NH3 concentration in the exhaled breath. Both facts are expected to decrease the overall resistance of the p-doped CNT layers upon exposure to exhaled breath. Indeed, NO2 is known to act as an electron acceptor when interacting with CNTs while NH3 behaves as an electron donor (see, e.g. ref. 44). Therefore, a high NO2 concentration increases the density of carriers (holes) in p-type CNTs thus decreasing their resistivity, while a decrease of NH3 concentration decreases the resistivity, as NH3 decreases the density of holes in p-type CNTs.

Fig. 6

ΔR/R0 data used for the PCA analysis. (a) Sick patients (b) healthy subjects. Each vertical stack of points represents an e-nose readout after a single exposure to exhaled breath. Data from the COOH-doped sensor have been multiplied by 0.5 to provide a more effective representation of the data set.

Fig. 7

Mean ΔR/R0 and standard deviation registered for each sensor in the breath analysis. Sick and healthy patients have been separately summed up to calculate mean and standard deviation: COPD = green dots; healthy subjects = blue dots.

Apparently, the difference in signal intensity between healthy and sick patients could be regarded as a method for classification, as the mean value for healthy patients is always higher than the corresponding value for sick patients, but standard deviation, displayed in Fig. 7 is still too large, and superposition between the two classes (sick and healthy) can still occur. For this reason, PCA with SVM or LDA can provide more robust classification schemes, which are indeed explored in detail in the following. Fig. 8a shows the 2D-PC spaces obtained carrying out the analysis on the full sensors data set. An overall good cluster separation between healthy and COPD subjects can be detected on the first principal components, and classes can be well separated by a linear border with the inclusion of the second (Fig. 8a) and third (Fig. 8b) PCs. These are very promising results, since they were obtained with an unsupervised, simple data reduction algorithm such as PCA. The maximal data variance, related to the first PC and to its specific sensor linear combination, can then be used to discriminate between healthy and COPD subjects.

Fig. 8

(a) 2D PCA of exhaled breath sample data from seven (green dots) COPD and four healthy (blue dots) subjects; the number in each dot identifies the individual from whom it was obtained. (b) 3D PCA with best boundary plane; COPD and healthy subjects are shown with green and blue spheres, respectively.

For this analysis, an additional humidity sensor was also considered. In fact, several studies report on the importance of taking into account humidity when performing gas/volatile sensing measurements.[11,12,16,100] In our study, we found a small influence of RH in the data clustering. Finally, in Fig. 8a, each dot representing a breath sample has been labelled with the study number of the individual from whom it was obtained, as shown in Table S1 (in the ESI†); it is worth noticing that each subject contribution, collected at different times and on different days, tends to cluster in the same spatial region, especially healthy subjects data. These results confirm the reproducibility and reliability of our method. In order to explore the accuracy of COPD classification, our dataset was further analysed by three algorithms: PCA, LDA, and SVM. In a practical application, the discriminating algorithm should be trained with a well-characterized initial dataset, which should provide the optimal linear combination of sensor results and the best boundary definition between the two data cluster. In this context, LDA can be used to better separate the two clusters, since it is specifically designed to maximize the variance of each subgroups with respect to their average distance; moreover, LDA results can be used to directly classify unknown data. We used SVM approach to find the optimal cluster separation plane for PCA. The graphs of the two first principal components projections for both techniques are shown in Fig. 9a and b for PCA and LDA, respectively.

Fig. 9

(a) PCA first and second principal components with optimal linear cluster boundaries (dashed lines). The corresponding support vectors are highlighted with larger dots. Please, note that the two data clusters can also be resolved by considering the first three PCA component (Fig. 6b). (b) first PCA component vs. LDA projection (top), with LDA projection histogram (bottom) and predicted probability for COPD identification (black line).

Data have been standardized in the usual way, i.e. by removing each sensor average response and by normalizing by the corresponding standard deviation. As already shown, despite being an unsupervised method, the PCA principal component (PCA-1, horizontal axis in Fig. 9a) shows a remarkable separation of the two data classes; however, the class separation is not observed for the PCA-2 component. LDA (Fig. 9b) leads to a C-1 dimensional data projection, where C is the class number. In this case, it defines a single direction in the feature space with the best cluster separation. We then represented the data by showing the PC-1 vs. LDA subspace; the cluster separation is clearly better for LDA. However, being a supervised method, it requires the a priori knowledge of each individual class, making it less suitable for prediction models. A good metric for LDA performance is the ratio between the internal class averages and the sum of the standard deviations of each class; we then obtained 1.7 for PC-1 and 2.9 for the LDA projection. The data distribution histogram is shown in the lower panel of Fig. 9b, together with the calculated probability function for COPD identification (black line). For LDA we defined the overall accuracy as the average of the identified probabilities associated to the projected data. In order to better quantify the performances of PCA, we resorted to SVM analysis, as implemented in the LibSVM package.[102] Such method allows defining a cluster boundary, by finding a data subgroup (the support vectors, larger dots in Fig. 9a) with the minimal interclass distance. Even if the two classes are formally perfectly separable, for this work we applied a standard linear kernel with a soft margin, thus trading the “perfect” boundary identification for a better reliability of the prediction model on possible additional unclassified data. The related cost function was calculated through cross-correlation methods. Due to the clear class separation and the relatively small number of measurements, in SVM we avoided the use of more complex kernel, such as Gaussian or polynomial ones. Furthermore, in addition to the identification of the best class border, SVM also allows for the calculation of identification probability, i.e. for the definition of a continuous function, which assigns a likelihood for each data point to belong to a certain class. An example of probability functions for PCA is shown in Fig. 10a with contour lines. For each dataset, we can then evaluate an identification performance index (the ratio of the correctly labelled data) and an overall accuracy index, which represent the average probability. In a best-case scenario, both these values should be close to 1, since we expect to assign each data point to each class with the highest possible accuracy.

Fig. 10

(a) PCA calculated on a random data subset (filled circles). SVM calculated probabilities (dashed lines) evaluated on the test subset (empty circles). (b) LDA projection histograms (left axis) calculated on a random subset (dark green and blue bars) and on the projected test set (light green and blue bars). Red circle highlights two outliers among the test data, evaluated through the calculated probability curve (black, right axis).

For the full dataset both methods show a perfect identification score, since the SVM boundary perfectly separates the two classes, but LDA shows a much better accuracy index (0.99) than PCA (0.91), due to the better class separation with respect to the boundary. We then investigated the capability of these methods to actually classify unknown data. We carried out this analysis by removing a random section of the dataset to be used as an unlabelled test subset for SVM results and LDA. We removed up to Nt = 24 test patients from the N = 52 total group, by picking 4000 random subsets for each case. For the PCA it is possible to actually perform data reduction with the full data set (thus with both classified and unclassified data), and then perform the SVM classification on its results. We choose to work with three principal components (as for LDA), since they describe 95% of the data variance. An example of PCA-SVM analysis with a reduced set (Nt = 24) is given in Fig. 10a. SVM probability function and boundary are calculated only considering the training set (filled circles) and then applied on a test subset (open circles). In this particular case, one COPD case (empty green circle) falls outside the p = 0.5 threshold and thus is wrongly labelled. It should be noticed that the data point position in Fig. 10a is identical to Fig. 9a, since the PCA is carried out on the whole dataset. For the LDA case, it is not possible to actually perform the data reduction with the full data set because of the lack of classification for the unknown data. After normalizing the whole dataset (as in PCA), we then directly split the sensor data into a training and a test set, and we perform LDA on the training set only. The test set is then projected on the LDA direction. An example of LDA analysis with a reduced set is given in Fig. 10b. In this case, the point distribution of the LDA plot is different from Fig. 9b results, since the data reduction is evaluated on a different set. In the example of Fig. 10, two COPD case studies appeared as outliners among the test data (red circle). The results for the classification and accuracy index are shown in Fig. 11. PCA-SVM performs well, with an identification ratio larger than 92% and an accuracy index greater than 86% even after the removal of nearly half of the dataset. However, the removal of just one data point in PCA can lead to detection errors, dropping the identification ratio to 94% for the N = 51 dataset. On the contrary, LDA discrimination performances are extremely good, being greater than 95% even with the reduced dataset. Accuracy and identification ratios are very similar, due to the wide separation between classes and to the steepness of the probability functions in the inter-clusters region.

Fig. 11

Identification ratio (dotted line) and accuracy index (solid line) for PCA-SVM (black) and LDA (red), evaluated for different datasets. Bottom axis: training set dimension (N, with 28 < N < 51). Top axis: test dataset dimension (52-N).

In the present approach SVM with a linear kernel turned out to be enough to separate the two expected clusters. This means that data are linearly separable, i.e. the dataset can be classified into two classes by using a single straight line (in the PC1–PC2 2-dimensional plot) or by a single plane (in the PC1–PC2–PC3 3-dimensional plot). With the aim to explore non-linear kernels, that can in principle be expected for larger data set, we added new experimental data to enlarge the dataset up to 130 measurements, collected from 50 (30 COPD + 20 healthy) patients. As shown in the ESI file (Fig. S2†), a cluster separation with a straight line (in the PC1–PC2 2-dimensional plot) or plane (in the PC1–PC2–PC3 3-dimensional plot) still works with this enlarged dataset, for this reason we did not further proceed in classification with non-linear kernels. The present results then support the application potential for this sensor array, even with a relatively small training dataset such as the one shown in this study. While PCA still performs remarkably well and is computationally lighter, for a practical application LDA should be preferred, depending on the desired accuracy.

Conclusions

In conclusion, the overall sensing behaviour of an 8-element array of CNT-based sensors was investigated by exposing the array to a set of volatile biomarkers. Several exposures to ammonia (NH3), nitrogen dioxide (NO2), benzene (C6H6), and hydrogen sulphide (H2S) were carried out and calibration curves for each of the 8 sensors in the array exposed to the 4 volatiles were built. Data were analyzed by PCA, assessing the selectivity of the array, as well as its capability to discriminate the four different target gas/volatile molecules. The sensors performances have been evaluated through a benchmarking with CNT-based sensors in the case of NH3 and NO2 detection, while the main characteristics of the e-nose are compared with an updated list of e-noses equipped with CNT-based sensors. Following this phase, the sensors array was exposed to the exhaled breath samples obtained from COPD patients and healthy control volunteers. A combination of PCA, SVM, and LDA methods shows that the present sensor array can be trained to clearly discriminate healthy from COPD subjects, in spite of the relatively limited dataset. Accuracy indexes above 0.90 and 0.97 were obtained for PCA and LDA respectively. Thus, the robustness of this approach supports its potential applications in harsh environments where control over strict sample collection protocols may not be feasible. Overall, further developments in the field are expected with the use of graphene-based sensors[103,104] that, as the case of CNT, can operate at RT, a requirement that is drawing attention also in the case of metal-oxide based chemiresistors.[105]

Author contributions

G. Drera: formal analysis, methodology, software, writing – original draft; S. Freddi: formal analysis, writing – original draft, visualization, investigation; A. V. Emelianov: investigation, visualization; I. I. Bobrinetskiy: investigation, resources, writing – review & editing; M. Chiesa: resources, writing – review & editing; M. Zanotti: formal analysis, methodology, software; S. Pagliara: investigation; writing – review & editing F. S. Fedorov: investigation, writing – review & editing; A. G. Nasibulin: funding, investigation, resources, writing – review & editing; P. Montuschi: conceptualization, funding, writing – review & editing: L. Sangaletti: conceptualization, funding, writing – original draft.

Conflicts of interest

There are no conflicts to declare.

41 in total

1. Carbon nanotube/hexa-peri-hexabenzocoronene bilayers for discrimination between nonpolar volatile organic compounds of cancer and humid atmospheres.

Authors: Yael Zilberman; Ulrike Tisch; Gregory Shuster; Wojciech Pisula; Xinliang Feng; Klaus Müllen; Hossam Haick
Journal: Adv Mater Date: 2010-10-08 Impact factor: 30.849

2. Detection of nonpolar molecules by means of carrier scattering in random networks of carbon nanotubes: toward diagnosis of diseases via breath samples.

Authors: Gang Peng; Ulrike Tisch; Hossam Haick
Journal: Nano Lett Date: 2009-04 Impact factor: 11.189

3. Enhancing the sensitivity of chemiresistor gas sensors based on pristine carbon nanotubes to detect low-ppb ammonia concentrations in the environment.

Authors: Federica Rigoni; Silvia Tognolini; Patrizia Borghetti; Giovanni Drera; Stefania Pagliara; Andrea Goldoni; Luigi Sangaletti
Journal: Analyst Date: 2013-11-12 Impact factor: 4.616

4. Enhanced NO2 Gas Sensing Properties of WO3-Coated Multiwall Carbon Nanotube Sensors.

Authors: Hyunsung Ko; Sunghoon Park; Suyoung Park; Chongmu Lee
Journal: J Nanosci Nanotechnol Date: 2015-07

Review 5. Prospects and Challenges of Volatile Organic Compound Sensors in Human Healthcare.

Authors: Ahmed H Jalal; Fahmida Alam; Sohini Roychoudhury; Yogeswaran Umasankar; Nezih Pala; Shekhar Bhansali
Journal: ACS Sens Date: 2018-06-21 Impact factor: 7.711

6. Exhaled Breath Analysis for Monitoring Response to Treatment in Advanced Lung Cancer.

Authors: Inbar Nardi-Agmon; Manal Abud-Hawa; Ori Liran; Naomi Gai-Mor; Maya Ilouze; Amir Onn; Jair Bar; Dekel Shlomi; Hossam Haick; Nir Peled
Journal: J Thorac Oncol Date: 2016-03-09 Impact factor: 15.609

7. Inflammatory markers and acid-base equilibrium in exhaled breath condensate of stable and unstable asthma patients.

Authors: Maria Magdalena Tomasiak-Lozowska; Ziemowit Zietkowski; Katarzyna Przeslaw; Marian Tomasiak; Roman Skiepko; Anna Bodzenta-Lukaszyk
Journal: Int Arch Allergy Immunol Date: 2012-05-30 Impact factor: 2.749

8. An e-nose made of carbon nanotube based quantum resistive sensors for the detection of eighteen polar/nonpolar VOC biomarkers of lung cancer.

Authors: S Chatterjee; M Castro; J F Feller
Journal: J Mater Chem B Date: 2013-07-31 Impact factor: 6.331

9. Development of a Sensing Array for Human Breath Analysis Based on SWCNT Layers Functionalized with Semiconductor Organic Molecules.

Authors: Sonia Freddi; Aleksei V Emelianov; Ivan I Bobrinetskiy; Giovanni Drera; Stefania Pagliara; Daria S Kopylova; Maria Chiesa; Giuseppe Santini; Nadia Mores; Umberto Moscato; Albert G Nasibulin; Paolo Montuschi; Luigi Sangaletti
Journal: Adv Healthc Mater Date: 2020-05-06 Impact factor: 9.933

10. Multiplexed Nanomaterial-Based Sensor Array for Detection of COVID-19 in Exhaled Breath.

Authors: Benjie Shan; Yoav Y Broza; Wenjuan Li; Yong Wang; Sihan Wu; Zhengzheng Liu; Jiong Wang; Shuyu Gui; Lin Wang; Zhihong Zhang; Wei Liu; Shoubing Zhou; Wei Jin; Qianyu Zhang; Dandan Hu; Lin Lin; Qiujun Zhang; Wenyu Li; Jinquan Wang; Hu Liu; Yueyin Pan; Hossam Haick
Journal: ACS Nano Date: 2020-08-27 Impact factor: 15.881

1 in total

Review 1. Trends in the Development of Electronic Noses Based on Carbon Nanotubes Chemiresistors for Breathomics.

Authors: Sonia Freddi; Luigi Sangaletti
Journal: Nanomaterials (Basel) Date: 2022-08-29 Impact factor: 5.719

1 in total