| Literature DB >> 35595235 |
Philippe Weitz1, Yinxi Wang1, Kimmo Kartasalo1,2, Lars Egevad3, Johan Lindberg1,4, Henrik Grönberg1, Martin Eklund1, Mattias Rantalainen1.
Abstract
MOTIVATION: Molecular phenotyping by gene expression profiling is central in contemporary cancer research and in molecular diagnostics but remains resource intense to implement. Changes in gene expression occurring in tumours cause morphological changes in tissue, which can be observed on the microscopic level. The relationship between morphological patterns and some of the molecular phenotypes can be exploited to predict molecular phenotypes from routine haematoxylin and eosin (H&E) stained whole slide images (WSIs) using convolutional neural networks (CNNs). In this study, we propose a new, computationally efficient approach to model relationships between morphology and gene expression.Entities:
Year: 2022 PMID: 35595235 PMCID: PMC9237721 DOI: 10.1093/bioinformatics/btac343
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Fig. 1.Performance of modelling approaches. (a) Boxplots of distributions of Spearman correlation coefficients for different modelling approaches and validation sets, as well as a comparison of computational efficiency. Vertical dashed lines indicate the significance threshold for adjusted P-values of 0.0001 in the validation set and vertical dotted lines indicate the corresponding threshold in the test set of 0.01. corr clusters refers to correlation-based clustering, rnd clusters to random cluster assignments, lgbm to prediction with boosting models based on ResNet18 features and all gene to a cnn that predicts all 15 586 selected genes at once (distribution shown only includes compared 2636 genes). CV denotes the boxplot of Spearman correlations between gene expression and the respective CNN prediction for all 50 clusters comprising 15 586 genes in the validation data, using the corr clusters method. A total of 6618 genes had an adjusted P-value lower than 0.0001. Test denotes the boxplot of Spearman correlations of the 6618 selected genes in the held-out test set, with 5419 adjusted P-values below 0.01. (b) Comparison between a Spearman correlation for 50 randomly sampled transcript that were predicted with single transcript CNNs and the proposed method. (c) Comparison of the training time per gene for different modelling approaches. Fitting one CNN per transcript requires ∼300 times more training time as compared to the proposed cluster-based method
Fig. 2.Comparison between predicted and RNA-seq expression. The lower two rows provide examples of tiles with low and high predicted expression for selected genes. Each panel in the lower two rows contains 16 example images, divided by black lines. Each row in the subplots contains four tiles by the same patient, with four rows corresponding to four different patients. The edge length of each of the 16 tiles is 110.88 µm. (a) Scatter plot between CNN prediction and RNA-seq estimates of expression for the best predicted gene BRICD5 with a Spearman correlation of 0.749. (b) Examples of tiles with low predicted BRICD5 expression. (c) Example tiles with high predicted expression. (d–f) Corresponding plots for GNMT with a Spearman correlation of 0.501. GNMT is part of the androgen signalling pathway. (g–i) The respective relationship and examples for the DNA repair gene CDK12, with a Spearman correlation of 0.577. The corresponding plots for the CCP score are displayed in (j–l), with higher expression being associated with higher proliferation, ISUP grade and poorer prognosis
Fig. 3.Comparison between the cell-cycle progression (CCP) score based on RNA-seq and CNN predictions. (a) Spearman correlation between sequenced and predicted gene expression in the test set with bootstrapped confidence intervals. (b) Ranked CCP scores per ISUP grade both for RNA CCP as well as CNN CCP. (c) Univariate hazard analysis for time to first BCR for the RNA-seq-based and predicted CCP score in the CV data and the test set. The HR of the RNA-seq-based CCP is 1.68 (1.256, 4.713) in the CV and 1.351 (0.956, 1.909) in the test data. For the predicted CCP, the respective HR values are 2.579 (1.412, 4.713) and 2.943 (1.055, 8.212) in the test set. (d) Examples of WSIs per ISUP grade with overlaid local CCP score predictions. Penmarks in the WSIs originate from the diagnostic workflow before WSI digitization and likely indicate cancer regions