| Literature DB >> 35805014 |
Khaled Bin Satter1, Zach Ramsey2, Paul M H Tran1, Diane Hopkins1, Gregory Bearden1, Katherine P Richardson1, Martha K Terris3, Natasha M Savage2, Sravan K Kavuri2, Sharad Purohit1,4,5.
Abstract
Malignant chromophobe renal cancer (chRCC) and benign oncocytoma (RO) are two renal tumor types difficult to differentiate using histology and immunohistochemistry-based methods because of their similarity in appearance. We previously developed a transcriptomics-based classification pipeline with "Chromophobe-Oncocytoma Gene Signature" (COGS) on a single-molecule counting platform. Renal cancer patients (n = 32, chRCC = 17, RO = 15) were recruited from Augusta University Medical Center (AUMC). Formalin-fixed paraffin-embedded (FFPE) blocks from their excised tumors were collected. We created a custom single-molecule counting code set for COGS to assay RNA from FFPE blocks. Utilizing hematoxylin-eosin stain, pathologists were able to correctly classify these tumor types (91.8%). Our unsupervised learning with UMAP (Uniform manifold approximation and projection, accuracy = 0.97) and hierarchical clustering (accuracy = 1.0) identified two clusters congruent with their histology. We next developed and compared four supervised models (random forest, support vector machine, generalized linear model with L2 regularization, and supervised UMAP). Supervised UMAP has shown to classify all the cases correctly (sensitivity = 1, specificity = 1, accuracy = 1) followed by random forest models (sensitivity = 0.84, specificity = 1, accuracy = 1). This pipeline can be used as a clinical tool by pathologists to differentiate chRCC from RO.Entities:
Keywords: chromophobe; classification; kidney neoplasms; oncocytoma; supervised machine learning
Year: 2022 PMID: 35805014 PMCID: PMC9265083 DOI: 10.3390/cancers14133242
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
Figure 1Study workflow from sample collection to RNA quantitation.
Clinical and Demographic information for chRCC and RO cases in the AUMC cohort.
| Variable | chRCC a | RO b | Overall | |
|---|---|---|---|---|
| 1 | 11(64.7%) | NA | 11(34.4%) | |
| 2 | 3(17.6%) | NA | 3(9.4%) | |
| 3 | 3(17.6%) | NA | 3(9.4%) | |
| 57.8(14.7) | 71.5(8.51) | 64.3(13.9) | 0.003 * | |
| 0.55 ** | ||||
| AA | 5(29.4%) | 5(33.3%) | 10(31.3%) | |
| C | 12(70.6%) | 9(60%) | 21(65.6%) | |
| Other | 0 | 1(6.7%) | 1(3.1%) | |
| 0.11** | ||||
| F | 8(47.1%) | 3(20%) | 11(34.4%) | |
| M | 9(52.9%) | 12(80%) | 21(65.6) | |
| 5.15(2.4) | 4.68(4.05) | 4.93(3.22) | 0.696 ** |
a chRCC: chromophobe renal cell carcinoma, b RO: renal oncocytoma cases are not staged, * Student’s t-test, ** Chi-Square test.
Univariate analysis of chRCC and RO NanoString Data. All values were calculated at the optimum cut point between chRCC and RO count data.
| Gene | Optimal Cut Point | Accuracy | AUC | Sensitivity | Specificity |
|---|---|---|---|---|---|
|
| 10.16 | 0.97 | 0.96 | 0.94 | 1 |
|
| 13.24 | 1 | 1 | 1 | 1 |
|
| 11.38 | 0.87 | 0.94 | 0.94 | 0.79 |
|
| 9.2 | 0.97 | 0.99 | 1 | 0.93 |
|
| 12.57 | 0.93 | 0.93 | 0.94 | 0.93 |
|
| 6.51 | 0.9 | 0.93 | 0.86 | 0.94 |
|
| 9.32 | 0.97 | 0.97 | 1 | 0.93 |
|
| 9.94 | 0.97 | 0.99 | 1 | 0.93 |
|
| 9.02 | 0.97 | 0.99 | 1 | 0.93 |
|
| 9.97 | 0.97 | 0.98 | 0.93 | 1 |
|
| 7.62 | 0.8 | 0.83 | 0.86 | 0.75 |
|
| 9.8 | 0.9 | 0.92 | 0.93 | 0.88 |
|
| 8.28 | 0.9 | 0.96 | 0.94 | 0.86 |
|
| 5.63 | 0.6 | 0.5 | 0.36 | 0.81 |
|
| 9.26 | 0.97 | 0.98 | 1 | 0.94 |
|
| 9.55 | 0.8 | 0.88 | 0.63 | 1 |
|
| 10.43 | 0.93 | 0.96 | 0.88 | 1 |
|
| 8.66 | 0.9 | 0.9 | 0.86 | 0.94 |
|
| 8.62 | 0.93 | 0.98 | 1 | 0.88 |
|
| 8.37 | 0.67 | 0.54 | 0.86 | 0.5 |
|
| 11.33 | 0.9 | 0.91 | 1 | 0.81 |
|
| 10.78 | 0.87 | 0.9 | 0.93 | 0.81 |
|
| 11.18 | 0.73 | 0.75 | 0.56 | 0.93 |
|
| 8.68 | 0.87 | 0.82 | 0.86 | 0.88 |
|
| 11.94 | 0.97 | 0.96 | 1 | 0.94 |
|
| 8.83 | 0.83 | 0.88 | 0.69 | 1 |
|
| 7.12 | 0.87 | 0.88 | 0.93 | 0.81 |
|
| 9.32 | 0.93 | 0.98 | 0.94 | 0.93 |
|
| 13 | 0.97 | 0.99 | 0.94 | 1 |
|
| 10.83 | 0.87 | 0.85 | 0.86 | 0.88 |
AUC: Area under the curve.
Figure 2Unsupervised classification of the NanoString data with UMAP and Hierarchical Clustering. (A): UMAP of discovery data showing two clusters were used to train the supervised UMAP model. (B): UMAP projection of the NanoString data using the trained UMAP model from A. (C): Hierarchical clustering showing two clusters for the samples, congruent with their histological classification.
Supervised Model metrics for classification of chRCC and RO.
| Metric | Random Forest | SVM | GLM | Supervised UMAP |
|---|---|---|---|---|
| Sensitivity | 0.84 | 0.76 | 0.88 | 1 |
| Specificity | 1 | 1 | 1 | 1 |
| Accuracy | 0.93 | 0.83 | 0.9 | 1 |
| 95% CI | 0.78–0.99 | 0.65–0.94 | 0.73–0.98 | - |
| 4.74 × 10−5 | 0.07659 | 0.001066 | 0 | |
| PPV | 1 | 1 | 1 | 1 |
| NPV | 0.87 | 0.64 | 0.78 | 1 |
SVM: Support Vector Machine, GLM: Generalized Linear Model, UMAP: Uniform Manifold Approximation and Projection, PPV: Positive Predictive Value, NPV: Negative Predictive Value.
Figure 3Supervised models with training on COGS discovery data and tested on NanoString data. (A): Figure 3: Development and comparison of four supervised models. Random Forest models showing a minimum of 250 trees can achieve the error = 0. (B): A sample tree showing the distribution of gene KIDINS220 expression on COGS discovery data with the arrow pointing at the best cutoff value. (C): A sample support vector model with KRT7 and BSPRY. (D): Generalized linear model with Ridge regression showing the fraction explained by the COGS genes as covariates. All these metrics show the results on NanoString data (AUMC cohort).