| Literature DB >> 36131239 |
Łukasz Rączkowski1, Iwona Paśnik2, Michał Kukiełka1, Marcin Nicoś3, Magdalena A Budzinska4, Tomasz Kucharczyk3, Justyna Szumiło2, Paweł Krawczyk3, Nicola Crosetto5,6, Ewa Szczurek7.
Abstract
BACKGROUND: Despite the fact that tumor microenvironment (TME) and gene mutations are the main determinants of progression of the deadliest cancer in the world - lung cancer, their interrelations are not well understood. Digital pathology data provides a unique insight into the spatial composition of the TME. Various spatial metrics and machine learning approaches were proposed for prediction of either patient survival or gene mutations from this data. Still, these approaches are limited in the scope of analyzed features and in their explainability, and as such fail to transfer to clinical practice.Entities:
Keywords: Bayesian deep learning; Digital pathology; Image segmentation; Mutation prediction; Survival prediction; Tumor microenvironment
Mesh:
Substances:
Year: 2022 PMID: 36131239 PMCID: PMC9490924 DOI: 10.1186/s12885-022-10081-w
Source DB: PubMed Journal: BMC Cancer ISSN: 1471-2407 Impact factor: 4.638
Fig. 1Overview of training ARA-CNN for lung cancer tissue classification. a We sourced H&E tissue slides from 55 lung cancer patients. b 26 of these slides were annotated by an expert pathologist in an active learning loop with ARA-CNN, which resulted in the LubLung dataset and a trained tissue classification model. c Example annotations of various tissue regions. d Segmentation results from ARA-CNN show that tissue heterogeneity in the TME is captured correctly. e Precision-recall curves for each tissue class obtained in a 10-fold cross-validation scheme on the LubLung dataset. The mean AUC is 0.94. f Confusion matrix for ARA-CNN trained with LubLung. Row labels indicate true classes, while column labels describe classes predicted by the model
Fig. 2Calculation and utilization of TIP and TMEC features. a H&E slides from TCGA were downloaded and split into tissue patches. Each patch was classified with ARA-CNN, producing tissue segmentations. These segmentations were next used to calculate the TIP and TMEC features. b Distribution of individual component features in TIP and TMEC. The most often occurring features for TIP were t and t. For TMEC, these were m, m and m. c Tasks performed with the help of the TIP and TMEC features. In addition to the TIP and TMEC features, clinical and mutation data was also sourced from TCGA. These datasets were combined and served as input in two tasks: survival prediction and gene mutation classification. The results were compared to those obtained using previous spatial metrics instead of TIP and TMEC
Fig. 3Survival prediction results. a-k Kaplan-Meier plots for TIP and TMEC features that result in patient stratification into two groups: with high and low values of the feature. Only features with statistically significant differences in patient survival are shown, as measured using the log rank test and the Benjamini-Hochberg procedure (p-values and critical values c in the top right corner, significance confirmed if p < 0.05 or p < c, where p is a p-value ranked in ascending order). For the latter, we set the False Discovery Rate at 0.1 and included all TIP and TMEC features. The cutoff value ρ (lower left corner) indicates the selected threshold yielding patient strata with high and low values of the feature. The results correlate with previous studies of the relationship between these features and patient survival. l c-index scores for Cox models from survival prediction experiments performed with different feature sets. The best results were obtained for models with such feature sets that included TIP and TMEC features. m Hazard ratios for the best model that utilized the TIP features. The prevalence of the necrosis tissue class in the whole slide has a statistically significant negative effect on survival. n Hazard ratios for the best model that utilized the TMEC features. The presence of the necrosis tissue class and the vessel tissue class in the TME has a statistically significant negative effect on survival
Mutation/rearrangement classification AUC scores (given as % of area under the precision-recall curve) for TCGA LUAD patients. The best result for each gene is marked in bold. In cases where the random forest classifier gave the best result, the cells are colored in yellow. Otherwise, if logistic regression gave the best result, the cells are colored in light blue
Fig. 4Feature importance for the two best performing mutation classification models that utilized TIP and TMEC features. a Feature importance for the PDGFRB gene mutation classifier (logistic regression). Here, feature importance is measured by the value of its regression coefficient. b Feature importance for the RET gene mutation classifier (random forest). Here, the importance is measured by the reduction of the Gini index obtained when the feature is added to the tree, averaged across the trees in the random forest model. c Distribution of feature values for four of the most important TIP or TMEC features, as presented in (a), divided between patients with the mutated and non-mutated PDGFRB gene. d Distribution of feature values for four of the most important TMEC features, as presented in (b), divided between patients with the mutated and non-mutated RET gene