| Literature DB >> 33173737 |
Wei Xu1,2, Jixian Lin3, Ming Gao4,5, Yuhan Chen4,5, Jing Cao1,2, Jun Pu1,2, Lin Huang6, Jing Zhao3, Kun Qian1,2.
Abstract
Stroke is a leading cause of mortality and disability worldwide, expected to result in 61 million disability-adjusted life-years in 2020. Rapid diagnostics is the core of stroke management for early prevention and medical treatment. Serum metabolic fingerprints (SMFs) reflect underlying disease progression, predictive of patient phenotypes. Deep learning (DL) encoding SMFs with clinical indexes outperforms single biomarkers, while posing challenges with poor prediction to interpret by feature selection. Herein, rapid computer-aided diagnosis of stroke is performed using SMF based multi-modal recognition by DL, to combine adaptive machine learning with a novel feature selection approach. SMFs are extracted by nano-assisted laser desorption/ionization mass spectrometry (LDI MS), consuming 100 nL of serum in seconds. A multi-modal recognition is constructed by integrating SMFs and clinical indexes with an enhanced area under curve (AUC) up to 0.845 for stroke screening, compared to single-modal diagnosis by only SMFs or clinical indexes. The prediction of DL is addressed by selecting 20 key metabolite features with differential regulation through a saliency map approach, shedding light on the molecular mechanisms in stroke. The approach highlights the emerging role of DL in precision medicine and suggests an expanding utility for computational analysis of SMFs in stroke screening.Entities:
Keywords: deep learning; diagnostics; mass spectrometry; metabolic fingerprints; stroke
Year: 2020 PMID: 33173737 PMCID: PMC7610260 DOI: 10.1002/advs.202002021
Source DB: PubMed Journal: Adv Sci (Weinh) ISSN: 2198-3844 Impact factor: 16.806
Figure 1Overall schematics for extraction of serum metabolic fingerprints (SMFs) towards stroke diagnosis by deep learning (DL). a) Only 100 nL of native serum was directly loaded in a microarray without any labeling, derivatization, and chromatography procedures. Then, b) a Nd:YAG laser (355 nm) was irradiated onto the analyte microarray, facilitating nano‐assisted desorption/ionization (LDI) process to obtain raw mass spectra. c) SMFs were extracted from the discovery cohort and validation cohort for downstream analysis. DL algorithm was employed to achieve direct d) SMF based diagnosis by using only SMFs as inputs and e) SMF based multi‐modal diagnosis by integrating SMFs with clinical indexes. The classification performances were evaluated from discovery cohort and validation cohort. Key metabolite features were selected by DL based saliency map approach.
Figure 2Diagnosis of stroke by SMF based diagnosis with DL. a) Demographics of 344 clinical samples as the discovery cohort (blue) and validation cohort (red). The age and gender of different groups were matched with no significant difference (p > 0.05). b) Typical mass spectra from a healthy control and a stroke patient by nano‐assisted LDI mass spectrometry (MS), with m/z ranging from 100 to 1000 Da. c) SMFs were extracted from raw mass spectra of 172 healthy controls and 172 stroke patients, each containing 881 m/z signals. d) Stroke network (SN) layout. The SN consisted of i) input data using SMFs (yellow), ii) four locally connected 1D layers as feature extraction part (blue), iii) stacked nonlinear feature interaction layer providing additional nonlinear transformations (green), and iv) softmax classification layer that models SMFs‐to‐disease data for classification probability output (pink). e) A sample‐level plot stratifying healthy controls (red) and stroke patients (blue) for stroke screening in validation cohort (n = 69; 35/34, control/patients). The black dashed line indicated the threshold, determined by maximized the Youden index in the discovery cohort. f) Receiver operating characteristic (ROC) curves using SN (red), least absolute shrinkage and selector operator (LASSO, blue), random forest (RF, pink), support vector machine (SVM, orange), and orthogonal partial least squares discriminant analysis (OPLS‐DA, black) to distinguish between controls and stroke patients in validation cohort (n = 69; 35/34, control/patients).
Comparison of diagnostic performance using different data inputs and algorithms
| Input data | Algorithm | Cohorts | AUC [95% CI] | Sensitivity | Specificity |
|---|---|---|---|---|---|
| Clinical indexes | SN | Discovery | 0.576 (0.508–0.643) | 34.78% | 70.80% |
| Validation | 0.703 (0.576–0.829) | 67.65% | 74.28% | ||
| Serum metabolic fingerprints (SMFs) | SN | Discovery | 0.738 (0.678–0.799) | 71.01% | 75.18% |
| Validation | 0.803 (0.698–0.907) | 73.53% | 77.14% | ||
| LASSO | Discovery | 0.600 (0.533–0.667) | 68.12% | 53.29% | |
| Validation | 0.703 (0.580–0.825) | 70.59% | 62.86% | ||
| RF | Discovery | 0.559 (0.491–0.626) | 73.19% | 32.12% | |
| Validation | 0.657 (0.528–0.787) | 82.35% | 51.43% | ||
| SVM | Discovery | 0.502 (0433–0.571) | 33.58% | 66.42% | |
| Validation | 0.653 (0.523–0.783) | 76.47% | 51.43% | ||
| OPLS‐DA | Discovery | 0.712 (0.651–0.773) | 83.94% | 44.20% | |
| Validation | 0.624 (0.492–0.757) | 74.29% | 52.94% | ||
| SMF based multi‐modal data | CSN | Discovery | 0.739 (0.679–0.800) | 74.64% | 72.26% |
| Validation | 0.845 (0.745–0.944) | 88.24% | 80.00% | ||
| Metabolite features | SN | Discovery | 0.711 (0.648–0.773) | 75.36% | 69.34% |
| Validation | 0.790 (0.681–0.899) | 73.53% | 80.00% |
SN: stroke network, LASSO: least absolute shrinkage and selector operator, RF: random forest, SVM: support vector machine, OPLS‐DA: orthogonal partial least squares discriminant analysis, CSN: clinical stroke network;
AUC referred to area‐under‐the‐curve from receiver operation curves (ROC);
Sensitivity referred to the ratio of the number of true positives to the total number of patients;
Specificity referred to the ratio of the number of true negatives to the total number of controls.
Figure 3Diagnosis of stroke by SMF based multi‐modal diagnosis with DL. a) Schematic workflow for the integration of SMFs with clinical indexes for multi‐modal recognition (clinical stroke network (CSN)). Diagnosis by SMF based multi‐modal data (CSN) as input, including a sample‐level plot stratifying healthy controls (red) and stroke patients (blue) for b) discovery cohort (n = 275; 137/138, controls/patients) and c) validation cohort (n = 69; 35/34, controls/ patients), and d) ROC curves showing the diagnostic performance of input data in discovery (blue) and validation (red) cohort. Diagnosis by clinical indexes as input, including a sample‐level plot stratifying healthy controls (red) and stroke patients (blue) for e) discovery cohort (n = 275; 137/138, controls/patients) and f) blind test cohort (n = 69; 35/34, controls/patients), and g) ROC curves showing the diagnostic performance of input data in discovery (blue) and validation (red) cohort. The black dashed line in (b,c) and (e,f) indicated the threshold, determined by maximized the Youden index in the discovery cohort.