| Literature DB >> 34944874 |
Olatomiwa O Bifarin1,2, David A Gaul3, Samyukta Sah3, Rebecca S Arnold4,5, Kenneth Ogan4, Viraj A Master4, David L Roberts6, Sharon H Bergquist6, John A Petros4,5, Arthur S Edison1,2,7,8, Facundo M Fernández3,9.
Abstract
Urine metabolomics profiling has potential for non-invasive RCC staging, in addition to providing metabolic insights into disease progression. In this study, we utilized liquid chromatography-mass spectrometry (LC-MS), nuclear magnetic resonance (NMR), and machine learning (ML) for the discovery of urine metabolites associated with RCC progression. Two machine learning questions were posed in the study: Binary classification into early RCC (stage I and II) and advanced RCC stages (stage III and IV), and RCC tumor size estimation through regression analysis. A total of 82 RCC patients with known tumor size and metabolomic measurements were used for the regression task, and 70 RCC patients with complete tumor-nodes-metastasis (TNM) staging information were used for the classification tasks under ten-fold cross-validation conditions. A voting ensemble regression model consisting of elastic net, ridge, and support vector regressor predicted RCC tumor size with a R2 value of 0.58. A voting classifier model consisting of random forest, support vector machines, logistic regression, and adaptive boosting yielded an AUC of 0.96 and an accuracy of 87%. Some identified metabolites associated with renal cell carcinoma progression included 4-guanidinobutanoic acid, 7-aminomethyl-7-carbaguanine, 3-hydroxyanthranilic acid, lysyl-glycine, glycine, citrate, and pyruvate. Overall, we identified a urine metabolic phenotype associated with renal cell carcinoma stage, exploring the promise of a urine-based metabolomic assay for staging this disease.Entities:
Keywords: biomarker; liquid chromatography-mass spectrometry; machine learning; metabolomics; nuclear magnetic resonance spectroscopy; renal cell carcinoma; tumor metabolism
Year: 2021 PMID: 34944874 PMCID: PMC8699523 DOI: 10.3390/cancers13246253
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
Figure 1Staging protocol for classification of early- and advanced-stage RCC. Abbreviations: T, primary tumor; T1, the tumor is 7 cm or less in its greatest dimension and limited to the kidney; T2, the tumor is greater than 7 cm in its greatest dimension but limited to the kidney; T3, the tumor extends into major veins or perinephric tissues but not into the ipsilateral adrenal gland and not beyond Gerota fascia; T4, tumor invades beyond Gerota fascia (including contiguous extension into the ipsilateral adrenal gland); N, regional lymph nodes; NX, regional lymph nodes cannot be assessed; N0, no spread to regional lymph nodes; N1, spread to regional lymph node(s); M, distant metastasis; M0, no distant metastasis; M1, distant metastasis [3].
Patient cohort characteristics.
| Early RCC | Advanced RCC | |
|---|---|---|
| No of Urine Samples | 41 | 29 |
| Mean Age ± SD | 60.1 ± 13.3 | 61.6 ± 13.2 a |
| Mean BMI ± SD | 29.9 ± 5.2 | 27.9 ± 6.2 b |
| Race | ||
| Caucasian | 26 (63.4%) | 21 (72.4%) |
| Black/African American | 14 (34.1%) | 5 (17.2%) |
| American-Indian/Alaskan- Native | 1 (2.4%) | 1 (3.4%) |
| Mixed | - | 1 (3.4%) |
| Unknown/Missing | - | 1 (3.4%) |
| Smoker | ||
| Never | 24 (58.5%) | 19 (65.5%) |
| Former/Current | 17 (41.5%) | 10 (34.5%) |
| Gender | ||
| Male | 19 (46.3%) | 20 (68.9%) |
| Female | 22 (53.7%) | 9 (31.1%) |
| Histological Subtypes | ||
| Pure Clear Cell | 23 (56.1%) | 26 (89.6%) |
| Papillary | 9 (21.9%) | 1 (3.4%) |
| Clear Cell Papillary | 4 (9.8%) | - |
| Chromophobe | 4 (9.8%) | - |
| Unclassified | 1 (2.4%) | 2 (6.9%) |
| Nuclear Grade | ||
| 1 | - | - |
| 2 | 21 (51.2%) | 3 (10.3%) |
| 3 | 17 (41.5%) | 10 (34.5%) |
| 4 | 3 (7.3%) | 16 (55.2%) |
| RCC Stage | ||
| I | 33 (80.5%) | - |
| II | 8 (19.5%) | - |
| III | - | 15 (51.7%) |
| IV | - | 14 (48.3%) |
p-values were calculated using the Student t-test. a Age p-value = 0.63, b BMI p-value = 0.14. Twelve samples with missing TNM staging information were excluded.
Figure 2Metabolites with the highest correlation with maximum tumor width. Pearson correlation coefficient and p-values for testing non-correlation are provided. The threshold for the correlation coefficient was r > 0.55.
Compound annotation for the metabolites with the highest correlation (r > 0.55) with RCC tumor size.
| ID No. | Retention Time (min) | Adduct Type | Mass Error (ppm) | Elemental Formula | Metabolite Name | Confidence Level | ||
|---|---|---|---|---|---|---|---|---|
| Theoretical | Experimental | |||||||
| 2745 | 1.87 | 223.0938 | 223.0936 | [M + H]+ | −0.64 | C8H10N6O2 | cytosine dimer | 2 |
| 3163 | 3.53 | 279.1187 | 279.1194 | [M + H]+ | 2.54 | C10H18N2O7 | - | 4 |
| 5362 | 3.46 | 245.0774 | 245.0775 |
| 0.61 | C9H14N2O6 | dihydrouridine | 2 |
| 6681 | 2.80 | 244.0933 | 244.0934 | [M − H]− | 0.31 | C9H15N3O5 | hydroxyprolyl-asparagine/asparaginylhydroxyproline | 2 |
Metabolite identification level was assigned based on the following criteria: (1) exact mass, isotopic pattern, retention time, and MS/MS spectrum of chemical standard matched to the feature; (2) exact mass, isotopic pattern, and MS/MS spectrum matched with literature spectra or fragmentation ions observed are consistent with the proposed structure; (3) tentative ID assignment based on elemental formula match with literature; (4) unknowns.
Compound annotation for the 16-metabolite panel (m/z = mass-to-charge ratio, min = minutes, ppm = part per million).
| ID | Retention Time (min) | Adduct Type | Mass Error (ppm) | Elemental Formula | Metabolite Identity | Confidence Level | ||
|---|---|---|---|---|---|---|---|---|
| Theoretical | Experimental | |||||||
| 1372 | 3.94 | 146.0924 | 146.0924 | [M + H]+ | 0.03 | C5H11N3O2 | 4-guanidinobutanoic acid | 2 |
| 1904 | 4.00 | 180.0879 | 180.0880 | [M + H]+ | 0.08 | C7H9N5O | 7-aminomethyl-7-carbaguanine | 2 |
| 2122 | 1.20 | 184.1081 | 184.1080 | [M + H]+ | −0.36 | C8H13N3O2 | N,N-dimethyl-histidine | 2 |
| 2317 | 0.89, | 203.0913, 422.2020 | 203.0912, 422.2023 | [M + H]+ | −0.44 | C9H14O5 | diethyl-2-methyl-3-oxosuccinate | 3 |
| 2465 | 0.89, | 154.0498 | 154.0497, 136.0392 | [M + H]+ | −0.62 | C7H7NO3 | 3-hydroxyanthranilic acid | 2 |
| 3163 | 3.53 | 279.1187 | 279.1194 | [M + H]+ | 2.54 | C10H18N2O7 | -- | 4 |
| 3766 | 3.63 | 174.1237 | 174.1238 | [M + H]+ | 0.37 | C7H15N3O2 | apo-[3-methylcrotonoyl-CoA:carbon-dioxide ligase (ADP-forming)] | 2 |
| 4116 | 3.79 | 119.0577 | 119.0580 | [M + H]+ | 4.51 | C4H8NO3 | -- | 4 |
| 5045 | 3.49 | 218.0129 | 218.0123 | [M − H]− | −3.50 | C7H9NO5S | -- | 4 |
| 5420 | 3.38 | 205.0526 | 205.0535 | [M − H]− | 4.32 | C4H12N6P2 | -- | 4 |
| 5437 | 0.76 | 123.0114 | 123.0108 | [M − H]− | −4.47 | C9H2N | -- | 4 |
| 5713 | 1.23 | 305.0990 | 305.0989 | [M − H]− | −0.58 | C11H18N2O8 | -- | 4 |
| 5737 | 3.99 | 202.1197 | 202.1190 | [M − H]− | −3.58 | C8H17N3O3 | lys-gly/gly-lys | 2 |
| 5985 | 0.94 | 99.0087 | 99.0088 | [M − H]− | 0.21 | C4H4O3 | succinic anhydride | 2 |
| 6687 | 0.86 | 369.0517 | 369.0502 | [M − H]− | −4.30 | C6H14N10O5S2 | -- | 4 |
| 6694 | 3.82 | 409.9786 | 409.9770 | [M − H]− | −3.47 | C4H12N7O10P3 | -- | 4 |
Metabolite identification level was assigned based on the following criteria: (1) exact mass, isotopic pattern, retention time, and MS/MS spectrum of standard matched to the feature; (2) exact mass, isotopic pattern, and MS/MS spectrum matched with literature spectra or fragmentation ions observed are consistent with the proposed structure; (3) tentative ID assignment based on elemental formula match with literature; (4) unknowns.
Annotated NMR metabolites with a p-value less than 0.05. These were added to the 16-metabolite panel.
| Metabolite/Features | 1H (ppm) | 13C(ppm) | Peak Patterns | Confidence Score | Fold Change | |
|---|---|---|---|---|---|---|
| acetone | 2.23 | 32.40 | (s) | 3 | 0.49 | 0.029 |
| pyruvate | 2.41 | - | (s) | 2 | 0.31 | 0.028 |
| citrate | 2.53 | 48.52 | (d) | 3 | −0.54 | 0.003 |
| choline | 3.19 | 56.69 | (s) | 3 | 0.22 | 0.026 |
| glycine | 3.56 | 44.18 | (s) | 3 | −0.66 | 0.032 |
s = singlet, d = doublet. Fold change (FC) was calculated as the base 2 logarithm of the mean integral ratios between advanced RCC and early RCC samples. Positive FC values indicate increased abundance in advanced RCC, while negative values indicate higher abundance in early RCC. p-values were Student’s t-test. Confidence score: (1) putatively characterized compound classes, or annotated compounds, (2) matches from 1D NMR to literature and/or 1D BBiorefcode compound (AssureNMR) or other database libraries such as Biological Magnetic Resonance Bank (BMRB) and Human Metabolome Database (HMDB) (3) matched to Heteronuclear Single Quantum Coherence (HSQC).
Figure 3Box plots showing autoscaled normalized relative abundances for the 24 metabolite-panel to distinguish early-stage RCC (n = 41) from advanced-stage RCC (n = 29). The mean, upper quartile, lower quartile, minimum, and maximum values are shown. All metabolites had p-values < 0.05 (Student t-test).
Figure 4Machine learning discriminates between early-stage and advanced-stage RCC. Machine learning predictions by random forest, AdaBoost, support vector machine radial basis function (SVM-RBF), logistic regression (LR), and voting ensembles using the 24-metabolite panel. (a) Area under the ROC curve, (b) accuracy, (c) sensitivity, (d) specificity.
Comparison of early-stage and advanced-stage RCC with healthy controls using the RCC staging markers. Fold change (FC) was calculated as the base two logarithm of the average intensity ratios between two groups.
| Metabolite or ID | Early RCC/Healthy Controls | Advanced RCC/Healthy Controls | Advanced RCC/Early RCC |
|---|---|---|---|
| citrate | 0.39 | −0.16 | −0.54 |
| choline | −0.21 | 0.02 | 0.22 |
| glycine | 0.82 | 0.16 | −0.66 |
| 3-hydroxyanthranilic acid | −0.87 | 0.53 | 1.41 |
| 5045 | −1.05 | −0.02 | 1.03 |
| cytosine dimer | −0.41 | 0.29 | 0.70 |
| lys-gly/gly-lys | 0.73 | 1.87 | 1.14 |
| 7-aminomethyl-7-carbaguanine | 0.69 | 2.07 | 1.38 |
| 5713 | −0.49 | 0.53 | 1.01 |
| hydroxyprolyl-asparagine/asparaginylhydroxyproline | 0.50 | 1.44 | 0.93 |
| pyruvate | 0.09 | 0.40 | 0.31 |
| 4-guanidinobutanoic acid | 0.49 | −0.63 | −1.12 |
| diethyl-2-methyl-3-oxosuccinate | −0.82 | 0.69 | 1.51 |
| succinic anhydride | −0.50 | 1.03 | 1.53 |
| acetone | 0.16 | 0.65 | 0.49 |
| 3163 | −0.36 | 1.17 | 1.53 |
| N,N-dimethyl-histidine | −0.24 | 0.87 | 1.12 |
| dihydrouridine | 0.22 | 1.07 | 0.80 |
| 5420 | 0.22 | 1.95 | 1.73 |
| 4116 | −0.09 | −1.33 | −1.24 |
| apo-[3-methylcrotonoyl-CoA:carbon-dioxide ligase (ADP-forming)] | 0.01 | 1.05 | 1.04 |
| 6687 | −2.53 | −1.20 | 1.33 |
| 5437 | −1.67 | 0.50 | 2.18 |
| 6694 | −1.02 | −2.32 | −1.30 |
Positive FC values indicate increased abundance in the first group (numerator), while negative values indicate higher abundance in the second group (denominator).