| Literature DB >> 35145183 |
Ankur Gupta1, Ganga Sagar1, Zaved Siddiqui1, Kanury V S Rao1,2, Sujata Nayak1,2, Najmuddin Saquib3, Rajat Anand4,5.
Abstract
We integrated untargeted serum metabolomics using high-resolution mass spectrometry with data analysis using machine learning algorithms to accurately detect early stages of the women specific cancers of breast, endometrium, cervix, and ovary across diverse age-groups and ethnicities. A two-step approach was employed wherein cancer-positive samples were first identified as a group. A second multi-class algorithm then helped to distinguish between the individual cancers of the group. The approach yielded high detection sensitivity and specificity, highlighting its utility for the development of multi-cancer detection tests especially for early-stage cancers.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35145183 PMCID: PMC8831619 DOI: 10.1038/s41598-022-06274-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Demographic, ethnicity, and BMI group of the sample set used in the study.
| Parameter | Individual cancers and controls | ||||
|---|---|---|---|---|---|
| Demographic and clinical data | |||||
| Normal control (n = 250) | Endometrial cancer (n = 304) | Breast cancer (n = 303) | Cervical cancer (n = 250) | Ovarian cancer (n = 262) | |
| 20–30 | 18 | 0 | 2 | 20 | 6 |
| 31–40 | 40 | 6 | 41 | 79 | 43 |
| 41–50 | 92 | 48 | 80 | 76 | 72 |
| 51–60 | 69 | 156 | 111 | 48 | 72 |
| 61–70 | 21 | 76 | 53 | 21 | 60 |
| 71–80 | 10 | 14 | 14 | 6 | 6 |
| 81–90 | 0 | 4 | 2 | 0 | 3 |
| 10 to 30 | 119 | 130 | 270 | 90 | 86 |
| > 30 | 8 | 172 | 24 | 22 | 50 |
| White | 153 | 302 | 246 | 233 | 256 |
| Non-white | 97 | 2 | 57 | 17 | 6 |
| 0 | 0 | 0 | 36 | 70 | 19 |
| I | 0 | 304 | 267 | 180 | 243 |
Figure 1Ion chromatograms of representative samples from the normal control and the individual cancer groups. The total run time for the LC resolution was 14 min, with every sample run being alternated with a blank run. For the blank run involved injection of a 1:1 mixture of methanol and water. Comparatively, spectra in each case were following a trend with major changes seen from 200 to 600 m/z with the time ranging from 3 to 11 min.
Figure 2Age-wise detection of detected metabolites. Figure provides a graphical representation of the number of metabolites detected across the individual age groups, for the normal control set as well as the individual cancer groups. The cumulative unique metabolites detected in normal control samples were 5895. While, endometrial, breast, cervical and ovarian cancer samples were found to have 5971, 5982, 6300 and 6336 respectively.
Figure 3Data processing pipeline. The data preprocessing pipeline used to render the data amenable to AI modeling is depicted here (details are given in the text).
Figure 4PLSDA plot distinguishes between the individual cancers and also the normal controls. Figure presents a PLSDA plot of the matrix of sample-specific metabolites versus metabolite intensity for normal controls and the individual women-specific cancer sets. The separation obtained between the individual groups is shown. The R2 and Q2 values obtained are given.
Figure 5AI workflow for distinguishing BECO cancers from normal controls and its application. Panel A depicts the AI workflow employed to test the AI model for distinguishing between the women-specific cancer group (BECO) from the Normal controls. Panel B depicts the results from testing of the trained model for distinguishing women-specific cancers (BECO) from normal controls showing clear separation of disease. The separation achieved between the cancer and the control group is shown in the form of a confusion matrix, with the resulting sensitivity and specificity values also given.
Figure 6Partitioning of training and test data sets for the multiclass AI model. (A) shows the segregation of the individual cancer sets for training and testing of the multiclass AI model-2 (see text) for distinguishing between the individual cancers of the BECO group.
Figure 7Testing the multiclass model for its ability to distinguish the individual cancer groups. Panel (A) shows the results of specifically testing the multiclass trained model for separation of endometrial cancer samples from the other cancers (breast, cervical, ovarian) based on model’s Endometrial scores. The resulting confusion matrix on applying a threshold shows good accuracy, sensitivity and specificity. Panel (B) shows the results of specifically testing the multiclass trained model for separation of breast cancer samples from the other cancers (endometrial, cervical, and ovarian) based on model’s Breast scores. The resulting confusion matrix on applying a threshold shows good accuracy, sensitivity and specificity. Panel (C) the results of specifically testing the multiclass trained model for separation of cervical cancer samples from the other cancers (breast, endometrial, ovarian) based on model’s Cervical scores. The resulting confusion matrix on applying a threshold shows good accuracy, sensitivity and specificity. Panel (D) shows the results of specifically testing the multiclass trained model for separation of ovarian cancer samples from the other cancers (breast, endometrial, cervical) based on model’s Ovarian scores. The resulting confusion matrix on applying a threshold shows high accuracy, sensitivity and specificity.
List of select metabolites involved in the signature for distinguishing the BECO cancer group from normal controls.
| S. no | Metabolite name | Involvement in cancer | RT (min) | |
|---|---|---|---|---|
| 1. | 2-[(5Z,8Z,11Z,14Z,17Z)-eicosapentaenoyl]-sn-glycerol | One of the sixteen diagnostic metabolites that are able to identify early-stage ovarian cancer with high accuracy [1] | 376.259 | 9.238 |
| 2. | Varanic acid | A bile acid that has been identified as a potential biomarker in the serum of ovarian cancer patients [2] | 436.317 | 9.589 |
| 3. | Serotonin | A known growth factor for human tumor cells of different origins; implicated in cancer cell migration, metastatic dissemination, and tumor angiogenesis [3] | 176.094 | 1.696 |
| 4. | Pyridoxamine 5'-phospate | A vitamin B6 phosphate. Vitamin B6 and its derivatives are inversely associated with cancer risk [4] | 248.056 | 3.847 |
| 5. | L-Proline | Proline availability influences collagen synthesis and maturation and the acquisition of cancer cell plasticity and heterogeneity [5] | 115.063 | 0.709 |
| 6. | Lewis X | Lewis X is a type II Lewis antigen, representing a fucosylated epitope that is overexpressed on the surface of cancer cells [6] | 529.198 | 4.726 |
| 7. | 17-beta-Estradiol | Plays a key role in breast cancer and regulates cancer/immune cell interactions in the tumor microenvironment [7] | 272.176 | 7.829 |
| 8. | 7-Ketodeoxycholic acid | It was identified as a potential biomarker during the metabolic profiling of serum in ovarian cancer patients [8] | 406.27 | 10.176 |
| 9. | PAF C-18:1 | Platelet activating factors are frequently induced in cancer cell through the action of Epidermal Growth Factor [9] | 549.377 | 9.374 |
| 10. | Leukotriene F4 | Leukotrienes play intricate roles in promoting tumor growth and metastasis through shaping the tumor microenvironment [10] | 496.258 | 10.236 |
| 11. | Leukotriene D4 | Leukotriene D4 plays an intricate roles in promoting tumor growth and metastasis through shaping the tumor microenvironment [11] | 496.258 | 10.236 |
| 12. | N-Acetyl-DL-Histidine | Identified to correlate with colorectal cancer in an analysis of the fecal metabolome [12] | 197.079 | 10.16 |
| 13. | 6-Sulfatoxymelatonin | Present in the urine of women with breast cancer, although the question of whether they correlate with cancer risk remains uncertain [13] | 328.071 | 3.215 |
| 14. | Androstenedione | Is associated with increased risk for endometrial cancer in postmenopausal women. They likely influence endometrial carcinogenesis via estrogen metabolism [14] | 286.192 | 5.219 |
| 15. | Nisinic acid | While Nisinic acid itself has not been well studied, there is increasing evidence that PUFAs play a role in cancer risk and progression [15] | 356.27 | 8.176 |
| 16. | Formiminoglutamic acid | An intermediate in the degradative metabolism of histidine, elevated levels of this metabolite have been found in urine of patients with neoplastic disease [16] | 174.063 | 1.21 |
| 17. | Androsterone glucuronide | A conjugated steroid and, along with other conjugated steroids, has been implicated in risk of developing hormone-dependent breast cancer [17] | 466.254 | 8.517 |
| 18. | 3-Methoxytyramine | A prognostic biomarker that associates with high-risk disease and poor clinical outcome in neuroblastoma patients. Role in women-specific cancers yet unknown [18] | 167.094 | 1.43 |
| 19. | Dipeptides: Pro-Tyr, Asp-Gln, Lys-Tyr, Lys-Arg | Dipeptides with either Ala, Asp, or Ile at the C-terminus, and dipeptides with Lys, Arg, Pro, and Tyr at the N-terminus were found to be overabundantly present in the liver of patients with Hepatocellular carcinoma [19] | 278.125 | 3.961 |
| 20. | LysoPC(P-18:0) | Prospective case-cohort studies have revealed that higher levels of LysoPC(P-18:0) were consistently related to lower risks of breast, prostate, and colorectal cancer [20] | 507.366 | 11.098 |
| 21. | Platelet-activating factor | PAF has been implicated in development, growth, and metastatic manifestations of cancer cells [21] | 523.361 | 11.205 |
| 22. | 5-Methylcytidine | Its constituent base, 5-methylcytosine is a sensitive marker of progress of the tumor formation induced by the oxidative damage reactions [22] | 257.1 | 0.849 |
| 23. | 1-Methylinosine | Modified nucleosides such as 1-methylinosine represent accurate tumor markers for clinical diagnosis of cancer [23] | 282.095 | 3.562 |
| 24. | L-Glutamine | Glutamine is essential for tumor growth and host glutamine depletion is a hallmark of progressive tumor growth [24] | 199.095 | 5.672 |
| 25. | O-heptadecanoylcarnitine | One of the acylcarnitines that are increased in cancer patients, and in those patients with higher cancer grades [25] | 413.348 | 9.052 |
Literature references for the individual metabolites are listed in Supplementary Information. The mean m/z and retention time (RT) values for each metabolite are given.
Diverse biological processes are influenced by the perturbed metabolites specific for the BECO cancer group.
| Metabolic pathways |
|---|
| Nitrogen metabolism |
| Glutamine/glutamate metabolism |
| Aminoacyl-t-RNA biosynthesis |
| Arginine biosynthesis |
| Histidine metabolism |
| Steroid hormone biosynthesis |
| Ether lipid metabolism |
| Alanine, aspartate, and glutamate metabolism |
| Glyoxalate and dicarboxylate metabolism |
| Arachidonic acid metabolism |
| Glycerophospholipid metabolism |
| Arginine and proline metabolism |
| Pyrimidine metabolism |
| Tryptophan metabolism |
| Tyrosine metabolism |
| Purine metabolism |
Metabolanalyst software (https://www.metaboanalyst.ca/) was used for extracting the list of biological processes associated with metabolites of interest. Briefly, the set of 25 metabolite names were added to the metaboanalyst software which cross referenced it to KEGG pathways database and HMDB database to get KEGG and HMDB ids respectively of input metabolites. Following this, the associated biological processes were extracted.
Figure 8Preparation and scheduling of QC and samples for UHPLC-MS/MS. A small aliquot of each sample (coloured cylinders) was pooled to create a QC sample (multi-coloured cylinder), which was then injected periodically (every 50th injection) throughout the batch run. Variability among consistently detected metabolites was used to estimate overall process and batch variability. Every sample injection was followed by a blank injection to prevent carryover between the sample runs.
| Predicted | |||
| Actual | Negative | Positive | |
| Negative | True negative (TN) | False positive (FP) | |
| Positive | False negative (FN) | True positive (TP) | |