| Literature DB >> 29038551 |
Nathan Ing1,2, Fangjin Huang2, Andrew Conley2, Sungyong You1, Zhaoxuan Ma2, Sergey Klimov2, Chisato Ohe3, Xiaopu Yuan2, Mahul B Amin3, Robert Figlin4, Arkadiusz Gertych5,6, Beatrice S Knudsen7,8,9.
Abstract
Gene expression signatures are commonly used as predictive biomarkers, but do not capture structural features within the tissue architecture. Here we apply a 2-step machine learning framework for quantitative imaging of tumor vasculature to derive a spatially informed, prognostic gene signature. The trained algorithms classify endothelial cells and generate a vascular area mask (VAM) in H&E micrographs of clear cell renal cell carcinoma (ccRCC) cases from The Cancer Genome Atlas (TCGA). Quantification of VAMs led to the discovery of 9 vascular features (9VF) that predicted disease-free-survival in a discovery cohort (n = 64, HR = 2.3). Correlation analysis and information gain identified a 14 gene expression signature related to the 9VF's. Two generalized linear models with elastic net regularization (14VF and 14GT), based on the 14 genes, separated independent cohorts of up to 301 cases into good and poor disease-free survival groups (14VF HR = 2.4, 14GT HR = 3.33). For the first time, we successfully applied digital image analysis and targeted machine learning to develop prognostic, morphology-based, gene expression signatures from the vascular architecture. This novel morphogenomic approach has the potential to improve previous methods for biomarker development.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29038551 PMCID: PMC5643431 DOI: 10.1038/s41598-017-13196-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Vascular area delineation in H&E stained slides using sequential, machine learning-based, 2-step vascular area classification approach. (A) Training of the first classification model: The original H&E image is processed into an endothelial nuclear mask based on hematoxylin staining and an eosin intensity image. The digital image of CD31 immunohistochemistry provides vascular annotation to train the vascular area classifier. (B) The second classifier outputs a vascular area mask (VAM) where vascular areas are white, and non-vascular areas are black. (C) Receiver Operating Characteristic (ROC) curve for vascular area classification compared to the vascular annotation provided by CD31 immunohistochemistry (AUC = 0.78). (D) Post-processing of the VAM.
Figure 29 vascular features (VF) predict disease free survival (DFS) in ccRCC. (A) Workflow diagram for predicting outcomes with selected vascular features in the ccRCC discovery cohort (N = 64). (B) Hierarchical clustering of the 64 cases by 9 vascular features into two risk groups. Blue – low risk, red – high risk. (C) Principal component analysis demonstrating the first two principal components of the 9 selected VF’s. (D) Kaplan-Meier plot using 9VF’s to separate patients into low- and high-risk groups (HR = 2.4, 95% C.I. = 1.1–5.2). (E) Box plot of VF expression in good and poor DFS risk groups (two-tailed t-test *p < 0.1, **p < 0.05). The 9VF’s are described in Table 1 and Supplementary Table S7.
Associations between Vascular Features (VFs) and genes in the 14GT signature. The 9 VFs and correlated genes are divided based on their higher average expression in good versus poor outcomes groups. While VFs associated with poor prognosis demonstrate high intratumoral variance (standard deviation), those associated with good prognosis indicate a relationship between high vascularity and hot/cold spots of vascular density and favorable prognosis.
| Standard Deviation of Features | Correlated genes | |
|---|---|---|
|
| Arm orientations* | ADH5, NLRC4, RPL36A, RPLP2, SLC16A4, TNFSF8, ZNF16, SGCB, GOSR2 |
| BP Lacunarity* | ||
| EC Lacunarity | ||
| EC Density* | ||
|
|
| |
|
| Arm number (S*, K) | IFNA13, CMYA5, STAT3, KCNJ12, MED10 |
| EC Density (S, K*) | ||
| Arm Lacunarity (K*) |
Asterisks (*) indicate VFs that correlate with genes in the 14GT signature.
Figure 3A 14-gene expression signature predicts disease free survival in ccRCC. (A) Workflow diagram of selecting 14 genes correlated to 9 VF’s and training of two outcomes prediction models in the discovery cohort (N = 64). The 14VF classifier was trained on the risk groups defined by 9VF’s. The 14GT classifier was trained on 24-months disease free survival status. (B) Supervised clustering based on low risk (blue) and high risk (red) groups obtained from the 14VF model in the validation cohort (N = 301). (C) Kaplan-Meier plot of the 14VF model (n = 207, log-rank test p < 0.001, HR = 2.3). (D) Kaplan-Meier plot of the 14GT model (n = 257, log-rank test p < 0.001, HR = 3.33).
Functional annotation of genes in 14 genes correlated to 9VFs and recapitulating the risk prediction of the 9VFs. Annotation of genes was performed manually through PubMed. Associated citations may be found as part of Supplementary Table S5.
| Gene name | Activity | Disease association | |
|---|---|---|---|
| CMYA5** | Cardiomyopathy-associated 5 | Desmin binding, Vesicular transport | Cardiomyopathy, schizophrenia |
| STAT3** | Signal transducer and activator of transcription 3 | Signal transduction, Gene transcription | Angiogenesis, Vascular leakage |
| ADH5** | alcohol dehydrogenase 5 | Opposes NO signaling, protein denitrosylation | Impaired cardiovascular function |
| NLRC4 | NLR family CARD domain containing 4 | Innate immunity inflammosome | Inflammatory disease, infantile enterocolitis |
| RPL36A*** | Ribosomal protein L36 | 60S ribosomal subunit Translational regulation | Hepatocellular carcinoma |
| RPLP2*** | ribosomal protein lateral stalk subunit P2 | Phosphoprotein involved in protein elongation | Upregulated in many cancers |
| SLC16A4* | Solute carrier family 16 member 4 | Monocarboxylate transporter for pH and energy homeostasis | Prognostic biomarker in ccRCC |
| TNFSF8 | tumor necrosis factor superfamily member 8 | CD30 ligand | Inflammation |
| ZNF16 | zinc finger protein 16 | Transcription factor | Erythroid and megakaryocyte differentiation |
| IFNA13** | interferon alpha 13 | Inflammatory/reproductive cytokine | Downregulated in dilated cardiomyopathy |
| SGCB** | Sarcoglycan beta | Dystrophin complex, sarcoglycan transport | Limb-girdle muscular dystrophy cardiomyopathy |
| KCNJ12** | potassium voltage-gated channel subfamily J member 12 | Repolarization of cardiac muscle | Dilated cardiomyopathy |
| MED10** | mediator complex subunit 10 | RNA Pol-II transcriptional regulation | Heart valve development |
| GOSR2** | golgi SNAP receptor complex member 2 | Vesicular trafficking | Familial essential hypertension |
|
|
|
| |