| Literature DB >> 30154385 |
Chi-Hsiang Huang1,2, Chian Zeng3, Yi-Chia Wang4,5, Hsin-Yi Peng6, Chia-Sheng Lin7, Che-Jui Chang8,9, Hsiao-Yu Yang10,11.
Abstract
Lung cancer is the leading cause of cancer death around the world, and lung cancer screening remains challenging. This study aimed to develop a breath test for the detection of lung cancer using a chemical sensor array and a machine learning technique. We conducted a prospective study to enroll lung cancer cases and non-tumour controls between 2016 and 2018 and analysed alveolar air samples using carbon nanotube sensor arrays. A total of 117 cases and 199 controls were enrolled in the study of which 72 subjects were excluded due to having cancer at another site, benign lung tumours, metastatic lung cancer, carcinoma in situ, minimally invasive adenocarcinoma, received chemotherapy or other diseases. Subjects enrolled in 2016 and 2017 were used for the model derivation and internal validation. The model was externally validated in subjects recruited in 2018. The diagnostic accuracy was assessed using the pathological reports as the reference standard. In the external validation, the areas under the receiver operating characteristic curve (AUCs) were 0.91 (95% CI = 0.79⁻1.00) by linear discriminant analysis and 0.90 (95% CI = 0.80⁻0.99) by the supportive vector machine technique. The combination of the sensor array technique and machine learning can detect lung cancer with high accuracy.Entities:
Keywords: electronic nose; lung cancer; sensor array
Mesh:
Year: 2018 PMID: 30154385 PMCID: PMC6164114 DOI: 10.3390/s18092845
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Schematic of the system framework and sample collection.
Figure 2Flow diagram depicting the inclusion and exclusion of the study subjects. We employed an independent external validation set and conducted a repeated double cross-validation. The repeated double cross-validation used two nested loops. The inner loop used the study subjects enrolled between 2016 and 2017 as a calibration set for model selection and parameter optimization and were divided into a training set (80%) and an internal validation set (20%). The outer loop used the prediction model established from the calibration set to externally validate the study subjects enrolled in 2018.
Demographic characteristics of the study subjects.
| Characteristics | Lung Cancer Cases ( | Non-Tumour Controls ( |
|---|---|---|
| Age (year), mean (SD) | 65.3 (8.8) | 53.5 (16.1) |
| Male, no. (%) | 12 (21.4) | 106 (56.4) |
| Cigarette smoking | ||
| Pack-years, mean (SD) | 21.0 (10.7) | 20.6(18.3) |
| Smoking status | ||
| Current smokers, no. (%) | 2 (3.6) | 25 (13.3) |
| Former smokers, no. (%) | 8 (14.3) | 11 (5.9) |
| Never smoked, no. (%) a | 44 (78.6) | 150 (79.8) |
| Second-hand smokers (%) | 2 (3.6) | 2 (1.1) |
| Tumour histological type | ||
| Squamous cell carcinoma, no. (%) | 1 (1.8%) | |
| Adenocarcinoma, no. (%) | 52 (92.9%) | |
| Small cell lung cancer, no. (%) | 1 (1.8%) | |
| Other carcinomas, no. (%) | 2 (3.6%) | |
| Clinical stage | ||
| I | 37 (66.1%) | |
| II | 7 (12.5%) | |
| III | 11 (19.6%) | |
| IV | 1 (1.8%) | |
a “Never smoked” means having smoked fewer than 20 packs of cigarettes in a lifetime or less than one cigarette per day for one year.
Diagnostic accuracy of the E-nose.
| Model | Sensitivity | Specificity | PPV | NPV | FP | FN | Accuracy |
|---|---|---|---|---|---|---|---|
| LDA internal validation | 100.0% | 88.6% | 60.0% | 100.0% | 12.4% | 0.0% | 90.2% |
| LDA external validation | 75.0% | 96.6% | 90.0% | 90.3% | 3.4% | 25.0% | 85.4% |
| SVM internal validation | 92.3% | 92.9% | 85.7% | 96.3% | 7.1% | 7.7% | 92.7% |
| SVM external validation | 83.3% | 86.2% | 71.4% | 92.6% | 13.8% | 16.7% | 85.4% |
LDA, linear discriminant analysis; SVM, support vector machine; PPV, positive prediction rate; NPV, negative prediction value; FP, false-positive; FN, false-negative.
Figure 3Receiver operating characteristic curves for lung cancers in the internal and external validation sets determined by LDA and SVM. The internal validation shows high accuracy by both linear and non-linear methods. The accuracy slightly decreases in the external validation.