| Literature DB >> 33869007 |
Jun Cheng1,2,3, Yuting Liu1,2, Wei Huang4, Wenhui Hong1,2, Lingling Wang1,2, Xiaohui Zhan5, Zhi Han6, Dong Ni1,2,3, Kun Huang6, Jie Zhang7.
Abstract
Computational analysis of histopathological images can identify sub-visual objective image features that may not be visually distinguishable by human eyes, and hence provides better modeling of disease phenotypes. This study aims to investigate whether specific image features are associated with somatic mutations and patient survival in gastric adenocarcinoma (sample size = 310). An automated image analysis pipeline was developed to extract quantitative morphological features from H&E stained whole-slide images. We found that four frequently somatically mutated genes (TP53, ARID1A, OBSCN, and PIK3CA) were significantly associated with tumor morphological changes. A prognostic model built on the image features significantly stratified patients into low-risk and high-risk groups (log-rank test p-value = 2.6e-4). Multivariable Cox regression showed the model predicted risk index was an additional prognostic factor besides tumor grade and stage. Gene ontology enrichment analysis showed that the genes whose expressions mostly correlated with the contributing features in the prognostic model were enriched on biological processes such as cell cycle and muscle contraction. These results demonstrate that histopathological image features can reflect underlying somatic mutations and identify high-risk patients that may benefit from more precise treatment regimens. Both the image features and pipeline are highly interpretable to enable translational applications.Entities:
Keywords: computational pathology; gastric adenocarcinoma; gastric cancer; genotype-phenotype association; prognosis; whole-slide image
Year: 2021 PMID: 33869007 PMCID: PMC8045755 DOI: 10.3389/fonc.2021.623382
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Patient demographic and clinical characteristics.
| Characteristics | Summary |
|---|---|
| No. of patients | 310 |
| Age (year) | |
| Range | 34 – 90 |
| Median | 66 |
| Gender | |
| Female | 112 (36.13%) |
| Male | 198 (63.87%) |
| Follow-up (month) | |
| Range | 0.1 – 124 |
| Median | 14.77 |
| Death | 116 (37.42%) |
| TNM staging | |
| Stage I | 43 (13.87%) |
| Stage II | 104 (33.55%) |
| Stage III | 134 (43.23%) |
| Stage IV | 29 (9.35%) |
| Tumor grade | |
| G1 | 5 |
| G2 | 105 |
| G3 | 200 |
Figure 1Study design and analysis overview. (A) Analysis workflow depicting the overall study design. (B) Cross validation scheme showing one round of leave-one-out cross validation. n is the sample size. CV, cross-validation. (C) Scheme of the gene functional enrichment analysis of genes that correlated with the contributing features that were frequently selected in the process of leave-one-out cross validation.
Figure 2Comparison of image features with respect to somatic mutation status. For each feature, the fold change is the ratio of the median feature values between the mutated group and nonmutated group (mutated/nonmutated), and the color scale is the negative logarithm of q-value. The dots’ color corresponds to the scales of the side color bars.
Figure 3Kaplan-Meier survival curves for patients stratified by MSI status (A), tumor grade (B), tumor stage (C), and lassoCox (D).
Univariable and multivariable Cox proportional hazards model analysis of the prognostic value of lassoCox and three clinical factors.
| Univariable analysis | Multivariable analysis | |||
|---|---|---|---|---|
| HR (95% CI) | p-value | HR (95% CI) | p-value | |
| LassoCox: high-risk (vs low-risk) | 2.00 (1.37-2.92) | 3.44e-4 | 2.04 (1.39-3.00) | 2.64e-4 |
| Tumor stage: III+IV (vs I+II) | 1.92 (1.30-2.82) | 9.53e-4 | 1.95 (1.32-2.88) | 7.73e-4 |
| Tumor grade: G3 (vs G1+G2) | 1.46 (0.98-2.17) | 0.0662 | 1.27 (0.85-1.91) | 0.248 |
| MSI status: MSI-L+MSI-H (vs MSS) | 0.89 (0.60-1.34) | 0.590 | 0.95 (0.63-1.42) | 0.800 |
The group in parenthesis is the reference group. HR, hazard ratio; CI, confidence interval.
The biological processes of the top five most frequently selected image features.
| Image feature | Mean coefficient | Count of selection/sample size | Enriched biological process for correlated genes |
|---|---|---|---|
| bMean_bin6 | 0.209 | 310/310 | N/A |
| distMin_bin5 | 0.250 | 310/310 | Cell cycle (GO:0007049)Cell cycle process (GO:0022402)Organelle fission (GO:0048285) |
| ratio_skewness | 0.311 | 310/310 | Muscle contraction (GO:0006936)Muscle system process (GO:0003012)Regulation of muscle contraction (GO:0006937) |
| major_bin10 | -0.124 | 283/310 | Muscle contraction (GO:0006936)Muscle system process (GO:0003012)Regulation of muscle contraction (GO:0006937) |
| distMean_bin3 | 0.050 | 268/310 | N/A |
Note for each round of LOOCV, features were selected independently, so some features may not be selected every time (for example, major_bin10 and distMean_bin3).