| Literature DB >> 35885829 |
Karen E Villagrana-Bañuelos1, Carlos E Galván-Tejada1, Jorge I Galván-Tejada1, Hamurabi Gamboa-Rosales1, José M Celaya-Padilla1, Manuel A Soto-Murillo1, Roberto Solís-Robles1.
Abstract
Sudden infant death syndrome (SIDS) represents the leading cause of death in under one year of age in developing countries. Even in our century, its etiology is not clear, and there is no biomarker that is discriminative enough to predict the risk of suffering from it. Therefore, in this work, taking a public dataset on the lipidomic profile of babies who died from this syndrome compared to a control group, a univariate analysis was performed using the Mann-Whitney U test, with the aim of identifying the characteristics that enable discriminating between both groups. Those characteristics with a p-value less than or equal to 0.05 were taken; once these characteristics were obtained, classification models were implemented (random forests (RF), logistic regression (LR), support vector machine (SVM) and naive Bayes (NB)). We used seventy percent of the data for model training, subjecting it to a cross-validation (k = 5) and later submitting to validation in a blind test with 30% of the remaining data, which allows simulating the scenario in real life-that is, with an unknown population for the model. The model with the best performance was RF, since in the blind test, it obtained an AUC of 0.9, specificity of 1, and sensitivity of 0.8. The proposed model provides the basis for the construction of a SIDS risk prediction computer tool, which will contribute to prevention, and proposes lines of research to deal with this pathology.Entities:
Keywords: SIDS; biomarker; glycerophospholipids; lipidomic; machine learning; metabolomic
Year: 2022 PMID: 35885829 PMCID: PMC9317003 DOI: 10.3390/healthcare10071303
Source DB: PubMed Journal: Healthcare (Basel) ISSN: 2227-9032
Figure 1Flowchart of the steps that were followed for the development of the machine learning models, until their evaluation.
Grouped features according to their super class; ION C18 negative mode analysis.
| Group | Number of Features |
|---|---|
| Cardiolipins | 6 |
| Sphingolipids | 1 |
| Acids | 16 |
| Glycerophosphate | 1 |
| Phosphatylcholine | 24 |
| Phosphatylethalonamine | 24 |
| Phosphatidylglycerols | 15 |
| Phosphatidylinositols | 12 |
| Glycerophosphoserines | 9 |
| Lysophosphatidylethanolamine | 8 |
| Ether Phosphatidylethanolamines | 16 |
Grouped features according to their super class; ION C18 positive mode analysis.
| Group | Number of Features |
|---|---|
| Cholesterol esters | 12 |
| Diacylglycerols | 37 |
| Monoradylglycerols | 2 |
| Phosphatylcholine | 37 |
| Phosphatylethalonamine | 11 |
| Sphingomyelins | 43 |
| Triacylglycerols | 98 |
| Lysophosphatidylcholines | 25 |
| Ether Phosphatidylethanolamines | 4 |
| Ether Phosphatidylcholines | 9 |
Figure 2Exemplification of the cross-validation used in this work.
Evaluation metrics for each classifier of the standardized dataset.
| Classification Method | Features | AUC | Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|---|
| RF | 410 | 0.2857 | 0.4444 | 0.5714 | 0 |
| RF | 21 |
|
| 0.8000 |
|
| LR | 410 | 0.4500 | 0.5555 | 0.4000 | 0.7500 |
| LR | 21 | 0.7500 | 0.7777 | 0.8000 | 0.7500 |
| SVM | 410 | 0.7000 | 0.7777 | 0.8000 | 0.7500 |
| SVM | 21 |
| 0.8888 |
| 0.7500 |
| NB | 410 | 0.6750 | 0.6666 | 0.6000 | 0.7500 |
| NB | 21 | 0.8000 | 0.7777 | 0.6000 |
|
Figure 3ROC curves of the four machine learning classification models, with the 21 selected features, blind test results.
Features selected by Mann–Whitney U test of the lipidomic profile in SIDS.
| Features | Super Class | Main Class | Sub Class 1 | Formula | |
|---|---|---|---|---|---|
| PC 40:7 | Glycerophospholipids | Glycerophosphocholines | PC | C | 0.00420 |
| PI 36:2 | Glycerophospholipids | Glycerophosphocholines | PC | C | 0.00420 |
| PE 35:0 | Glycerophospholipids | Glycerophosphoethanolamines | PE | C | 0.01060 |
| DG 34:1 | Glycerolipids | Diradylglycerols | DAG | C | 0.01308 |
| PC.38.7 | Glycerophospholipids | Glycerophosphocholines | PC | C | 0.01602 |
| PE 34:3 | Glycerophospholipids | Glycerophosphoethanolamines | PE | C | 0.02355 |
| TG 57:8 | Glycerolipids | Triradylglycerols | TAG | C | 0.02355 |
| CL 70:5 | Glycerophospholipids | Cardiolipins | CL | C | 0.02355 |
| SM 40:1 | Sphingolipids | Sphingomyelins | SM | C | 0.02826 |
| PC 30:2 | Glycerophospholipids | Glycerophosphocholines | PC | C | 0.02826 |
| PC 32:3 | Glycerophospholipids | Phosphatidylcholines | PC | C | 0.03372 |
| SM 36:2 | Sphingolipids | Sphingomyelins | SM | C | 0.03372 |
| PC 33:1 | Glycerophospholipids | Glycerophosphocholines | PC | C | 0.03372 |
| CE 18:2. | Sterol Lipids | Sterol esters | Chol | C | 0.03372 |
| DG 36:2 | Glycerolipids | Diradylglycerols | DAG | C | 0.03372 |
| PC 32:1 | Glycerophospholipids | Glycerophosphocholines | PC | C | 0.03999 |
| PG 36:3 | Glycerophospholipids | Glycerophosphoglycerols | PG | C | 0.04717 |
| CE 22:6 | Sterol Lipids | Sterol esters | Chol | C | 0.04717 |
| PC 40:10 | Glycerophospholipids | Glycerophosphocholines | PC | C | 0.04717 |
| PC 42:7 | Glycerophospholipids | Glycerophosphocholines | PC | C | 0.04717 |
| SM.30.1 | Sphingolipids | Sphingomyelins | SM | C | 0.04717 |
1 PC—Phosphatidylcholines; PE—Phosphatidylethanolamines; DAG—Diacylglycerols; TAG—Triacylglycerols; CL—Cardiolipins; SM—Sphingomyelins; Chol—Cholesterol esters; PG—Phosphatidylglycerols.