| Literature DB >> 36139068 |
Jacopo Troisi1,2,3, Antonio Mollo1, Martina Lombardi2,3, Giovanni Scala2,4, Sean M Richards5,6, Steven J K Symes6,7, Antonio Travaglino8, Daniele Neola9, Umberto de Laurentiis1, Luigi Insabato9, Attilio Di Spiezio Sardo9, Antonio Raffone10, Maurizio Guida11.
Abstract
Endometrial cancer (EC) is the most common gynecological neoplasm in high-income countries. Five-year survival rates are related to stage at diagnosis, but currently, no validated screening tests are available in clinical practice. The metabolome offers an unprecedented overview of the molecules underlying EC. In this study, we aimed to validate a metabolomics signature as a screening test for EC on a large study population of symptomatic women. Serum samples collected from women scheduled for gynecological surgery (n = 691) were separated into training (n = 90), test (n = 38), and validation (n = 563) sets. The training set was used to train seven classification models. The best classification performance during the training phase was the PLS-DA model (96% accuracy). The subsequent screening test was based on an ensemble machine learning algorithm that summed all the voting results of the seven classification models, statistically weighted by each models' classification accuracy and confidence. The efficiency and accuracy of these models were evaluated using serum samples taken from 871 women who underwent endometrial biopsies. The EC serum metabolomes were characterized by lower levels of serine, glutamic acid, phenylalanine, and glyceraldehyde 3-phosphate. Our results illustrate that the serum metabolome can be an inexpensive, non-invasive, and accurate EC screening test.Entities:
Keywords: endometrial cancer; ensemble machine learning; metabolomics; oncological screening
Mesh:
Substances:
Year: 2022 PMID: 36139068 PMCID: PMC9496630 DOI: 10.3390/biom12091229
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Characteristics of subjects enrolled in the second validation set (mean ± standard deviation or number and %). Abbreviations used are CTRL: healthy controls, EC: endometrial cancer, BMI: body mass index, bpm: beats per minute, SBP: systolic blood pressure, DBP: diastolic blood pressure.
| CTRL | Non-Cancer Diseases | Other Cancers | EC | |
|---|---|---|---|---|
| Sample size ( | 171 | 473 | 101 | 126 |
| Age (y) | 45.4 ± 19.9 | 42.3 ± 12.2 * § | 56.7 ± 12.7 * § | 62.1 ± 10.3 * |
| Smoke (%) | 23.5 | 31.5 * § | 33.0 * § | 17.7 |
| Weight (Kg) | 66.9 ± 12.3 | 67.2 ± 13.9 § | 67.8 ± 15.0 § | 79.2 ± 17.1 * |
| Height (cm) | 160.5 ± 12.9 | 160.8 ± 20.7 | 153.3 ± 32.3 | 155.2 ± 31.1 |
| BMI (kg/cm2) | 27.7 ± 19.4 | 25.2 ± 5.1 * § | 26.7 ± 6.2 | 30.5 ± 6.1 |
| Hypertension (%) | 22.8 | 14.6 § | 39.6 | 53.2 * |
| Diabetes (%) | 4.9 | 4.1 § | 6.1 § | 16.4 * |
| Hypercholesterolemia (%) | 8.7 | 4.5 * | 9.0 | 9.8 |
| Hypertriglyceridemia (%) | 0.0 | 0.6 § | 2.0 * | 1.6 * |
| Hyperuricemia (%) | 0.0 | 0.6 | 2.0 * | 0.0 |
| Vasculopathies (%) | 6.8 | 7.1 § | 9.1 *§ | 13.1 * |
| Cholecystectomy (%) | 6.8 | 3.8 * | 4.0 | 6.5 |
| Endometrial thickness (mm) | 6.8 ± 5.0 | 8.0 ± 7.8 * § | 5.0 ± 5.8 * § | 13.0 ± 10.4 * |
| Abnormal uterine bleeding (%) | 14.9 | 15.8 § | 8.3 * § | 37.7 * |
| SBP (mmHg) | 118.9 ± 11.7 | 117.7 ± 9.8 § | 120.3 ± 10.6 | 123.8 ± 13.4 |
| DBP (mmHg) | 76.6 ± 6.4 | 76.0 ± 7.0 | 77.7 ± 5.9 | 77.1 ± 7.1 |
| Heart rate (bpm) | 72.4 ± 6.5 | 73.2 ± 5.3 § | 75.6 ± 7.3 § | 77 ± 6.1 * |
* Indicates p-value < 0.05 compared to CTRL; § indicates p-value < 0.05 compared to EC.
Diagnostic performance of the individual and the ensemble machine learning algorithms used for classification among the validation set from the first enrollment. Abbreviations used are NB: Naïve Bayes, GLM: Generalized Linear Model, FLM: Fast Large Margin, DL: Deep Learning, DT: Decision Tree, RF: Random Forest, PLS-DA: Partial Least Square Discriminant Analysis, EML: Ensemble Machine Learning, S: sensitivity, Sp: specificity; PLR: positive likelihood ratio, NLR: negative likelihood ratio, NPV: negative predictive value, PPV: positive predictive value, A: accuracy, ND: not determinable.
| Model | S | Sp | PLR | NLR | NPV | PPV | A |
|---|---|---|---|---|---|---|---|
| NB | 0.74 ± 0.10 | 0.94 ± 0.06 | 12.53 | 0.28 | 0.76 ± 0.09 | 0.93 ± 0.06 | 0.83 |
| GLM | 0.90 ± 0.07 | 0.88 ± 0.08 | 7.20 | 0.11 | 0.88 ± 0.08 | 0.90 ± 0.07 | 0.89 |
| FLM | 1.00 ± 0.00 | 0.24 ± 0.10 | 1.31 | 0.00 | 1.00 ± 0.00 | 0.61 ± 0.09 | 0.65 |
| DL | 1.00 ± 0.00 | 0.63 ± 0.12 | 2.67 | 0.00 | 1.00 ± 0.00 | 0.77 ± 0.08 | 0.83 |
| DT | 0.95 ± 0.05 | 0.88 ± 0.08 | 8.05 | 0.06 | 0.94 ± 0.06 | 0.90 ± 0.07 | 0.92 |
| RF | 1.00 ± 0.00 | 0.29 ± 0.11 | 1.40 | 0.00 | 1.00 ± 0.00 | 0.61 ± 0.09 | 0.67 |
| PLS-DA | 0.93 ± 0.05 | 1.00 ± 0.00 | ND | 0.07 | 0.92 ± 0.05 | 1.00 ± 0.00 | 0.96 |
| EML | 1.00 ± 0.00 | 0.96 ± 0.04 | 23.00 | 0.00 | 1.00 ± 0.00 | 0.96 ± 0.04 | 0.98 |
Figure 1Partial Least Squares Discriminant Analysis (PLS-DA) based on serum metabolites determined by GC-MS. (A) Two-dimensional score plot showing clustering and separation between healthy CTRL serum profiles (orange) and endometrial cancer-affected patients’ profiles (blue) from the n = 90 training set. (B) Metabolites showing a VIP score > 2.0 in the PLS-DA analysis. (C) Permutation test results based on 2000 iterations. (D) PLS-DA classification performance using increasing number of latent variables. The red star indicates that the best model was achieved using only 1 variable. (E) Volcano plot reporting metabolite concentration fold changes and their statistical significance comparing CTRL vs. EC subjects among the second validation cohort. 1. Glycine, 2. phenyl pyruvic acid, 3. serine, 4. valine, 5. urea, 6. oxyproline, 7. phenylalanine, 8. glyceraldehyde 3 phosphate, 9. gluconic acid 10. glycerol, 11. 3-hydroxybutyric acid, 12. stearic acid.
Figure 2(A) Ensemble Machine Learning (EML) scores of healthy controls (CTRL, blue) and endometrial cancer (EC, orange)-affected patients; the red dashed line represents the Youden index-based optimized cut-off value. (B) Receiver operating characteristic (ROC) curve obtained by varying the cut-off value when applying the EML model to the test set. Dotted blue line represents the 95% confidence bounds.
Figure 3EML score distribution among the enrolled patients’ classes. Abbreviations used are CTRL: healthy controls; E: endometriosis; EH: endometrial hyperplasia; H-SIL: high-grade squamous intraepithelial lesion; L-SIL: low-grade squamous intraepithelial lesion; M&P: myomas and/or polyps; OC: ovarian cyst; UM: uterine malformation; BC: breast cancer; CC: cervical cancer; EC: endometrial cancer; OK: ovarian cancer; US: uterine sarcoma; VC: vaginal cancer.
EML score means, min and max values among the various classes of patients enrolled in the second validation cohort. Abbreviations used are CTRL: healthy controls; CIN: cervical intraepithelial neoplasia; E: endometriosis; EH: endometrial hyperplasia; H-SIL: high-grade squamous intraepithelial lesion; L-SIL: low-grade squamous intraepithelial lesion; M&P: myomas and/or polyps; OC: ovarian cyst; UM: uterine malformation; BC: breast cancer; CC: cervical cancer; EC: endometrial cancer; OK: ovarian cancer; US: uterine sarcoma; VC: vaginal cancer. * Indicates statistically significantly different from (p < 0.05) EC subjects.
| Class | N | Mean ± St Dev | Min | Max | Classification Errors |
|---|---|---|---|---|---|
| CTRL | 171 | 40.6 ± 146.6 * | −363.4 | 435.6 | 5.6% |
| E | 42 | 80.1 ± 187.0 * | −228.7 | 443.2 | 11.9% |
| EH | 17 | 31.5 ± 152.0 * | −234.8 | 318.7 | 5.9% |
| H-SIL | 28 | 13.0 ± 190.4 * | −342.3 | 411.2 | 14.3% |
| L-SIL | 12 | 71.9 ± 133.6 | −67.6 | 301.6 | 16.7% |
| M&P | 213 | 47.4 ± 169.9 * | −351.6 | 461.8 | 9.9% |
| OC | 130 | 37.5 ± 173.4 * | −314.6 | 437.0 | 10.8% |
| UM | 31 | 29.0 ± 188.5 * | −336.9 | 406.0 | 12.9% |
| BC | 17 | −23.4 ± 126.7 * | −209.8 | 199.6 | 0.0% |
| CC | 19 | 88.6 ± 184.2 * | −207.3 | 438.1 | 21.1% |
| OK | 44 | 84.8 ± 192.0 * | −311.3 | 436.8 | 15.9% |
| US | 8 | 103.0 ± 141.5 | −79.4 | 306.0 | 12.5% |
| VC | 13 | 35.2 ± 143.1 | −234.7 | 286.8 | 7.7% |
| EC | 126 | 363.0 ± 101.6 | 122.1 | 488.9 | 4.0% |