| Literature DB >> 35659391 |
Shidiq Nur Hidayat1, Trisna Julian2, Agus Budi Dharmawan3, Mayumi Puspita2, Lily Chandra4, Abdul Rohman5, Madarina Julia6, Aditya Rianjanu7, Dian Kesumapramudya Nurputra6, Kuwat Triyana8, Hutomo Suryo Wasisto2.
Abstract
Breath pattern analysis based on an electronic nose (e-nose), which is a noninvasive, fast, and low-cost method, has been continuously used for detecting human diseases, including the coronavirus disease 2019 (COVID-19). Nevertheless, having big data with several available features is not always beneficial because only a few of them will be relevant and useful to distinguish different breath samples (i.e., positive and negative COVID-19 samples). In this study, we develop a hybrid machine learning-based algorithm combining hierarchical agglomerative clustering analysis and permutation feature importance method to improve the data analysis of a portable e-nose for COVID-19 detection (GeNose C19). Utilizing this learning approach, we can obtain an effective and optimum feature combination, enabling the reduction by half of the number of employed sensors without downgrading the classification model performance. Based on the cross-validation test results on the training data, the hybrid algorithm can result in accuracy, sensitivity, and specificity values of (86 ± 3)%, (88 ± 6)%, and (84 ± 6)%, respectively. Meanwhile, for the testing data, a value of 87% is obtained for all the three metrics. These results exhibit the feasibility of using this hybrid filter-wrapper feature-selection method to pave the way for optimizing the GeNose C19 performance.Entities:
Keywords: Breath analysis; Electronic nose; Feature permutation importance; GeNose C19; Hierarchical agglomerative clustering; Machine learning
Mesh:
Year: 2022 PMID: 35659391 PMCID: PMC9110307 DOI: 10.1016/j.artmed.2022.102323
Source DB: PubMed Journal: Artif Intell Med ISSN: 0933-3657 Impact factor: 7.011
Fig. 1Configuration of a portable e-nose for COVID-19 detection (GeNose C19). a Output signal characteristics of chemoresistive metal-oxide-semiconductor (MOS) gas sensors. The sensor conductivity changes because of redox reactions between the active MOS material and adsorbed gas molecules. The real-time signal monitoring regarding VOC exposure to the sensor surface is performed using data logging software (DAQ software). b Procedure to collect the breath samples and process the data utilizing an extra-tress classifier. A hybrid learning algorithm combining hierarchical agglomerative clustering (HAC) analysis and permutation feature importance method enhances the GeNose C19 performance and simultaneously reduces the required sensor number.
Selective target gases for all chemoresistive sensors used in GeNose C19. Cross-sensitivity toward different gases has been a typical characteristic for such inorganic MOS gas sensors. In terms of selectivity, each sensor sensitively reacts to more than two target gases.
| Gas sensor | Selective target gases |
|---|---|
| S1 | Carbon monoxide, ethanol, hydrogen, isobutane, and methane |
| S2 | Ammonia, ethanol, hydrogen, hydrogen sulfide, and toluene |
| S3 | Ethanol, hydrogen, isobutane, and methane |
| S4 | Carbon monoxide, ethanol, hydrogen, isobutane, and methane |
| S5 | Carbon monoxide, ethanol, hydrogen, isobutane, methane, and propane |
| S6 | Carbon monoxide, ethanol, hydrogen, isobutane, methane, and propane |
| S7 | Carbon monoxide, ethanol, hydrogen, and methane |
| S8 | Acetone, benzene, carbon monoxide, ethanol, isobutane, methane, and n-hexane |
| S9 | Ammonia, ethanol, hydrogen, and isobutane |
| S10 | Chlorofluorocarbons, ethanol, and hydrofluorocarbons |
Clinical characteristics of tested patients, including age, sex, and comorbid condition. The numbers of the RT-qPCR-confirmed positive and negative COVID-19 patients are nP = 230 and nN = 230, respectively.
| Characteristics | RT-qPCR-confirmed positive COVID-19 ( | RT-qPCR-confirmed negative COVID-19 ( | Total number |
|---|---|---|---|
| Age distribution (years old) | |||
| 0–20 | 43 | 52 | 95 |
| 21–40 | 69 | 118 | 187 |
| 41–60 | 85 | 43 | 128 |
| 61–80 | 33 | 17 | 50 |
| Sex distribution | |||
| Male | 127 | 161 | 288 |
| Female | 103 | 69 | 172 |
| Patients with symptoms | 64 | ||
| Comorbidities | |||
| Respiratory problems | 38 | ||
| Thermoregulation problems | 26 | ||
| Anosmia and hypogeusia | 16 | ||
| Gastroenteritis problems | 11 | ||
| Systematic problems | 14 | ||
Fig. 2Raw and preprocessed data of sensor output signals recorded by GeNose C19 from breath measurements.a Typical raw and b normalized sensor signals in the breath measurements recorded by the GeNose C19 DAQ software (2 s baseline time, 40 s sampling time, and 3 s purging time). The distributions of the c baseline and d temperature and relative humidity values calculated from all 460 training data. e Calculated feature values of all sensors (S1–S10) based on their area under the curve (AUC) after signal preprocessing steps. f PC1 and PC2 plot showing the distributions of all P and N training data.
Performance of the hybrid learning-based classification model for different numbers of selected sensors evaluated using a 5-fold cross-validation and repeated 10 times. Similar performances in terms of accuracy, sensitivity, and specificity can be achieved by the models with 5 and 10 sensors, demonstrating the possibility of reducing the used sensor number in GeNose C19.
| Number of sensors | Selected sensors | Accuracy (%) | Sensitivity (%) | Specificity (%) |
|---|---|---|---|---|
| 2 | S4, S9 | 78 ± 3 | 78 ± 6 | 78 ± 7 |
| 3 | S4, S9, S10 | 83 ± 3 | 86 ± 4 | 80 ± 6 |
| 4 | S4, S9, S10, S2 | 85 ± 3 | 89 ± 5 | 82 ± 6 |
| 5 | S4, S9, S10, S2, S8 | 86 ± 3 | 88 ± 6 | 84 ± 6 |
| 6 | S4, S9, S10, S2, S8, S3 | 85 ± 3 | 87 ± 6 | 83 ± 6 |
| 10 | S1–S10 | 86 ± 3 | 87 ± 6 | 84 ± 6 |
Fig. 3Processing results of the positive (P) and negative COVID-19 data using hybrid learning method.a Dendrogram of the hierarchical agglomerative clustering (HAC) on the training data employing Ward's linkage. b Boxplot analysis of the permutation feature importance using the extra-tree classifier to obtain the importance value of each sensor for classifying the class labels of positive (P) and negative (N) COVID-19. Confusion matrix results and receiver operating characteristic (ROC) curves demonstrating learning model performances from the testing data when c all (10) and d 5-selected sensor models are utilized.
Fig. 4Correlation plot among features on the training data using Ward's linkage method. S1 and S4 have a correlation value of 0.97, indicating their strong positive correlation. The high response of S4 results in a high S1 output signal. In addition, S7 possesses a high positive correlation with S1 and S4. S7 is also negatively correlated with S2, where an increase in the response of one sensor is followed by a decrease in the response of the other sensor. S5 and S6 are positively correlated with a value of 0.82, and both are positively correlated with S10 having correlation values of 0.72 and 0.74, respectively. All in all, some sensors possess a high multicollinearity with one another, which leads to a feasibility to optimize feature selection.
Comparison of different e-nose technologies used for COVID-19 detection in exhaled breath. The compared parameters include sensor type, sensor number, breath sample number, positive rate of samples, measurement time, and results during assessment in exhaled breath test.
| Electronic nose (e-nose) technology | Number of sensors | Number of samples | Positive rate of samples | Measurement time | Results | Ref. |
|---|---|---|---|---|---|---|
| MOS sensor (PEN3 e-nose) | 10 | 503 | 5.4% | 80 s | 66.7% of true positive rate | |
| MOS sensor (aeoNose) | 3 | 219 | 26.0% | 300 s | 86% of sensitivity and 92% of negative predictive value | |
| MOS sensor (SpiroNose) | 7 | 4510 | 7.7% | Not available | 93.1% of ROC-AUC | |
| Multiplexed nanomaterial-based chemoresistive sensor | 8 | 130 | 37.7% | 3 s | 100% of sensitivity and 61% of specificity | |
| MOS sensor (GeNose C19) | 10 reduced to 5 | 460 | 50.0% | 45 s | (88 ± 6)% of cross-validation sensitivity and | This work |