| Literature DB >> 35159268 |
Rian Ka Praja1,2, Molin Wongwattanakul2, Patcharaporn Tippayawat2,3, Wisitsak Phoksawat4,5, Amonrat Jumnainsong2,6, Kanda Sornkayasit2, Chanvit Leelayuwat2,6.
Abstract
In the aging process, the presence of interleukin (IL)-17-producing CD4+CD28-NKG2D+T cells (called pathogenic CD4+ T cells) is strongly associated with inflammation and the development of various diseases. Thus, their presence needs to be monitored. The emergence of attenuated total reflectance-Fourier transform infrared (ATR-FTIR) spectroscopy empowered with machine learning is a breakthrough in the field of medical diagnostics. This study aimed to discriminate between the elderly with a low percentage (LP; ≤3%) and a high percentage (HP; ≥6%) of pathogenic CD4+CD28-NKG2D+IL17+ T cells by utilizing ATR-FTIR coupled with machine learning algorithms. ATR spectra of serum, exosome, and HDL from both groups were explored in this study. Only exosome spectra in the 1700-1500 cm-1 region exhibited possible discrimination for the LP and HP groups based on principal component analysis (PCA). Furthermore, partial least square-discriminant analysis (PLS-DA) could differentiate both groups using the 1700-1500 cm-1 region of exosome ATR spectra with 64% accuracy, 69% sensitivity, and 61% specificity. To obtain better classification performance, several spectral models were then established using advanced machine learning algorithms, including J48 decision tree, support vector machine (SVM), random forest (RF), and neural network (NN). Herein, NN was considered to be the best model with an accuracy of 100%, sensitivity of 100%, and specificity of 100% using serum spectra in the region of 1800-900 cm-1. Exosome spectra in the 1700-1500 and combined 3000-2800 and 1800-900 cm-1 regions using the NN algorithm gave the same accuracy performance of 95% with a variation in sensitivity and specificity. HDL spectra with the NN algorithm also showed excellent test performance in the 1800-900 cm-1 region with 97% accuracy, 100% sensitivity, and 95% specificity. This study demonstrates that ATR-FTIR coupled with machine learning algorithms can be used to study immunosenescence. Furthermore, this approach can possibly be applied to monitor the presence of pathogenic CD4+ T cells in the elderly. Due to the limited number of samples used in this study, it is necessary to conduct a large-scale study to obtain more robust classification models and to assess the true clinical diagnostic performance.Entities:
Keywords: aging; attenuated total reflectance-Fourier transform infrared (ATR-FTIR) spectroscopy; immunosenescence; interleukin (IL)-17; sub-population CD4+ T cells
Mesh:
Substances:
Year: 2022 PMID: 35159268 PMCID: PMC8834052 DOI: 10.3390/cells11030458
Source DB: PubMed Journal: Cells ISSN: 2073-4409 Impact factor: 6.600
Figure 1Workflow of the development of classification models and their performance tests. All spectra collected from two groups of samples were allocated with a ratio of 70:30 into two datasets for training and testing. The training set was used for creating classification models and the performance of the classification models was tested with a testing set.
The confusion matrix (2 × 2 table).
| Predictive Model | Flowcytometric Analysis | |
|---|---|---|
| HP | LP | |
|
| A | B |
|
| C | D |
Figure 2Results of spectral band area analysis of serum, exosome, and HDL spectra. Representative spectra from serum (blue), exosome (red), and HDL (green). Regions selected for the spectral band area analysis were 3000–2800 cm−1 lipid (orange region), 1700–1500 cm−1 protein (green region), and 1270–960 cm−1 nucleic acid (blue region) (A). The comparison of spectral band area between the LP and HP groups based on serum (B), exosome (C), and HDL spectra (D); * = p < 0.05; ** = p < 0.01; *** = p < 0.001.
Figure 3Oxidative stress levels based on the ratio of the spectral band area. A ratio of v C=O to vas CH3 (lipid peroxidation) (A) and the ratio of v C=O to amide II (protein carbonyl) (B) All data are shown as median with 95% CI.
Figure 4Averaged second derivative ATR-FTIR spectra with SNV normalization in the regions of 3000–2800 cm−1 (pale-pink box) and 1800–900 cm−1 (pale-grayish green box). Comparison of averaged second derivative spectra between LP and HP groups in serum (A), exosome (B), and HDL spectra (C). Comparison of band intensity was done by independent sample t-test. A significant difference in band intensity is depicted by the arrow (→). Blue and red represent the elderly groups with a low percentage (LP; ≤3%) and a high percentage (HP; ≥6%) of pathogenic CD4+ T cells, respectively.
Figure 5PCA analysis of the 1700–1500 cm−1 FTIR exosome spectral range. PCA score plots (A) and PCA loading plots (B). PCA score plots showed distinct clustering between the LP (blue box) and HP groups (pink box). PCA loading plots identify specific important peaks for the LP and HP groups.
Figure 6PLS-DA analysis results. A score plot of PLS-DA of the 1700–1500 cm−1 FTIR exosome spectral range (A), regression coefficient (B), and predictive results of PLS-DA generated using the 1700–1500 cm−1 region (C). False predictions are depicted with stars (*). Nine false-negative and five false-positive predictions were identified with the PLS-DA predictive model.
Comparison of multiple advanced machine learning algorithms for classification models in serum samples.
| Sample | Region | Algorithm | Performance | ||||
|---|---|---|---|---|---|---|---|
| Acc (%) | Sens (%) | Spec (%) | PPV (%) | NPV (%) | |||
| Serum | 3000–2800 | J48 Decision Tree | 54 | 54 | 53 | 65 | 42 |
| RF | 51 | 53 | 50 | 50 | 53 | ||
| SVM | 44 | 46 | 38 | 60 | 26 | ||
| NN (4) | 51 | 53 | 50 | 50 | 53 | ||
| 1800–900 | J48 Decision Tree | 72 | 80 | 67 | 60 | 84 | |
| RF | 92 | 100 | 86 | 85 | 100 | ||
| SVM | 77 | 79 | 75 | 75 | 79 | ||
| NN (20) | 100 | 100 | 100 | 100 | 100 | ||
| 1700–1500 | J48 Decision Tree | 69 | 75 | 65 | 60 | 79 | |
| RF | 90 | 90 | 89 | 90 | 89 | ||
| SVM | 62 | 58 | 75 | 90 | 32 | ||
| NN (14) | 90 | 86 | 94 | 95 | 84 | ||
| 1500–900 | J48 Decision Tree | 56 | 60 | 54 | 45 | 68 | |
| RF | 90 | 90 | 89 | 90 | 89 | ||
| SVM | 72 | 74 | 70 | 70 | 74 | ||
| NN (12) | 97 | 100 | 95 | 95 | 100 | ||
| 3000–2800 and | J48 Decision Tree | 74 | 81 | 70 | 65 | 84 | |
| RF | 87 | 89 | 85 | 85 | 89 | ||
| SVM | 56 | 58 | 55 | 55 | 58 | ||
| NN (11) | 98 | 95 | 100 | 100 | 97 | ||
Abbreviations: Acc—accuracy; Sens—sensitivity; Spec—specificity; PPV—positive predictive value; NPV—negative predictive value; RF—random forest; SVM—support vector machine; NN—neural network. Values in the parentheses after NN indicate the number of hidden layers used in the NN parameter. Values highlighted in grey are the best model in each spectral region.
Comparison of multiple advanced machine learning algorithms for classification models in exosome samples.
| Sample | Region | Algorithm | Performance | ||||
|---|---|---|---|---|---|---|---|
| Acc (%) | Sens (%) | Spec (%) | PPV (%) | NPV (%) | |||
| Exosome | 3000–2800 | J48 Decision Tree | 85 | 79 | 93 | 95 | 74 |
| RF | 74 | 75 | 74 | 75 | 74 | ||
| SVM | 72 | 71 | 72 | 75 | 68 | ||
| NN (14) | 77 | 79 | 75 | 75 | 79 | ||
| 1800–900 | J48 Decision Tree | 82 | 81 | 83 | 85 | 79 | |
| RF | 90 | 90 | 89 | 90 | 89 | ||
| SVM | 74 | 81 | 70 | 65 | 84 | ||
| NN (10) | 90 | 94 | 86 | 85 | 95 | ||
| 1700–1500 | J48 Decision Tree | 67 | 67 | 67 | 70 | 63 | |
| RF | 79 | 83 | 76 | 75 | 84 | ||
| SVM | 72 | 76 | 68 | 65 | 79 | ||
| NN (11) | 95 | 95 | 95 | 95 | 95 | ||
| 1500–900 | J48 Decision Tree | 85 | 94 | 78 | 75 | 95 | |
| RF | 87 | 86 | 89 | 90 | 84 | ||
| SVM | 72 | 76 | 68 | 65 | 79 | ||
| NN (16) | 92 | 90 | 94 | 95 | 89 | ||
| 3000–2800 & | J48 Decision Tree | 79 | 83 | 76 | 75 | 84 | |
| RF | 90 | 90 | 89 | 90 | 89 | ||
| SVM | 82 | 84 | 80 | 80 | 84 | ||
| NN (9) | 95 | 91 | 100 | 100 | 89 | ||
Abbreviations: Acc—accuracy; Sens—sensitivity; Spec—specificity; PPV—positive predictive value; NPV—negative predictive value; RF—random forest; SVM—support vector machine; NN—neural network. Values in the parentheses after NN indicate the number of hidden layers used in the NN parameter. Values highlighted in grey were the best model in each spectral region.
Comparison of multiple advanced machine learning algorithms for classification models in HDL samples.
| Sample | Region | Algorithm | Performance | ||||
|---|---|---|---|---|---|---|---|
| Acc (%) | Sens (%) | Spec (%) | PPV (%) | NPV (%) | |||
| HDL | 3000–2800 | J48 Decision Tree | 69 | 79 | 64 | 55 | 84 |
| RF | 44 | 45 | 42 | 45 | 42 | ||
| SVM | 56 | 56 | 57 | 70 | 42 | ||
| NN (8) | 72 | 70 | 75 | 80 | 63 | ||
| 1800–900 | J48 Decision Tree | 72 | 74 | 70 | 70 | 74 | |
| RF | 85 | 85 | 84 | 85 | 84 | ||
| SVM | 74 | 73 | 76 | 80 | 68 | ||
| NN (14) | 97 | 100 | 95 | 95 | 100 | ||
| 1700–1500 | J48 Decision Tree | 79 | 83 | 76 | 75 | 84 | |
| RF | 74 | 75 | 74 | 75 | 74 | ||
| SVM | 51 | 52 | 50 | 60 | 42 | ||
| NN (8) | 79 | 77 | 82 | 85 | 74 | ||
| 1500–900 | J48 Decision Tree | 92 | 95 | 90 | 90 | 95 | |
| RF | 90 | 83 | 100 | 100 | 79 | ||
| SVM | 77 | 74 | 81 | 85 | 68 | ||
| NN (9) | 92 | 100 | 86 | 85 | 100 | ||
| 3000–2800 & | J48 Decision Tree | 90 | 94 | 86 | 85 | 95 | |
| RF | 82 | 84 | 80 | 80 | 84 | ||
| SVM | 69 | 67 | 73 | 80 | 58 | ||
| NN (15) | 90 | 100 | 83 | 80 | 100 | ||
Abbreviations: Acc—accuracy; Sens—sensitivity; Spec—specificity; PPV—positive predictive value; NPV—negative predictive value; RF—random forest; SVM—support vector machine; NN—neural network. Values in the parentheses after NN indicate the number of hidden layers used in the NN parameter. Values highlighted in grey were the best model in each spectral region.
Prominent ATR-FTIR exosome spectral bands for discrimination of the LP and HP groups using PCA and PLS-DA [51,62,63,64,65].
| PCA | PLS-DA | Group | Assignment |
|---|---|---|---|
| 1651 | 1651 | HP | Amide I (α-helix) |
| 1541 | 1541 | HP | Amide II |
| 1670 | 1670 | LP | Amide I (anti-parallel β-sheet) |
| 1629 | 1626 | LP | β-sheet amide I region structure |
| 1558 | 1555 | LP | Ring base |