| Literature DB >> 27170365 |
Vlad Popovici1, Eva Budinská2,3, Lenka Čápková2, Daniel Schwarz2, Ladislav Dušek2, Josef Feit2, Rolf Jaggi4.
Abstract
BACKGROUND: Genomics and proteomics are nowadays the dominant techniques for novel biomarker discovery. However, histopathology images contain a wealth of information related to the tumor histology, morphology and tumor-host interactions that is not accessible through these techniques. Thus, integrating the histopathology images in the biomarker discovery workflow could potentially lead to the identification of new image-based biomarkers and the refinement or even replacement of the existing genomic and proteomic signatures. However, extracting meaningful and robust image features to be mined jointly with genomic (and clinical, etc.) data represents a real challenge due to the complexity of the images.Entities:
Keywords: Biomarker discovery; Gene expression; Histopathology images; Image analysis; Multimodal data mining
Mesh:
Year: 2016 PMID: 27170365 PMCID: PMC4864935 DOI: 10.1186/s12859-016-1072-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Codeblocks and codebook. a An example of four different hypothetical distributions of the codeblocks leading to identical frequencies. To cope with such situations, the distribution of codeblocks is also taken into account through extended image features. b A visual representation of the obtained codebook. The 70 image patches are the closest to the codeblocks obtained after k-means clustering. The three groups of codeblocks (with 29, 20 and 21 elements, respectively) correspond to the major clusters in Fig. 2 and the ordering of the image patches is the same as in the clustering
Fig. 2Hierarchical clustering of the codebook. Clustering the codeblocks led to identification of three major clusters, to which generic terms have been assigned. The codeblocks correlated with gene expression are marked with red dots. The codeblocks with potential prognostic value (in univariate analysis) are marked with blue squares (dark blue for p-value <0.01, light blue for 0.01≤p-value≤0.05
Fig. 3Regions assigned to the most prognostic codeblocks. 512×512 regions from two different samples with high image score (high risk of relapse), at 2.5× magnification. The image patches represented in full color were assigned to one of the C.41, C.56, C.65, C.67 or C.69 codeblocks. In Additional files 2 and 3, the corresponding whole slide images are provided
Fig. 4Kaplan-Meier curves for binarized scores. The genomic (a), image-based (b) and combined scores (c) were binarized by the respective median values into “low score” (low risk) and “high score” (high risk) categories. The combined score slightly improves on the genomic score
Fig. 5Prognostic scores at 4 years. Predicting the likelihood of an event (relapse) at 4 years, based on genomic signature (PRO_10 - panel a), the image-based score (panel b) and the combined score (panel c)