| Literature DB >> 33193632 |
Paul Gallins1, Ehsan Saghapour1,2, Yi-Hui Zhou1,2.
Abstract
The last several years have witnessed an explosion of methods and applications for combining image data with 'omics data, and for prediction of clinical phenotypes. Much of this research has focused on cancer histology, for which genetic perturbations are large, and the signal to noise ratio is high. Related research on chronic, complex diseases is limited by tissue sample availability, lower genomic signal strength, and the less extreme and tissue-specific nature of intermediate histological phenotypes. Data from the GTEx Consortium provides a unique opportunity to investigate the connections among phenotypic histological variation, imaging data, and 'omics profiling, from multiple tissue-specific phenotypes at the sub-clinical level. Investigating histological designations in multiple tissues, we survey the evidence for genomic association and prediction of histology, and use the results to test the limits of prediction accuracy using machine learning methods applied to the imaging data, genomics data, and their combination. We find that expression data has similar or superior accuracy for pathology prediction as our use of imaging data, despite the fact that pathological determination is made from the images themselves. A variety of machine learning methods have similar performance, while network embedding methods offer at best limited improvements. These observations hold across a range of tissues and predictor types. The results are supportive of the use of genomic measurements for prediction, and in using the same target tissue in which pathological phenotyping has been performed. Although this last finding is sensible, to our knowledge our study is the first to demonstrate this fact empirically. Even while prediction accuracy remains a challenge, the results show clear evidence of pathway and tissue-specific biology.Entities:
Keywords: embedding; genomics; histology; imaging; integration; machine learning; pathology; prediction
Year: 2020 PMID: 33193632 PMCID: PMC7644963 DOI: 10.3389/fgene.2020.555886
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Pipeline to build a prediction model with integrated imaging and expression data.
Figure 2Workflow for the network embedding of gene expression.
Summaries of predictive performance for histopathology-derived phenotypes from imaging data, gene expression data, and integrative analyses.
| Lung—fibrosis | 831 | 140/691 | 0.61 | 0.62 (RF) | 513 | 74/439 | 0.63 | 0.65 (RF) | 0.68 | 0.62 | 0.57 |
| Liver—steatosis | 600 | 260/340 | 0.73 | 0.73 (QD) | 205 | 96/109 | 0.75 | 0.74 (RF) | 0.81 | 0.75 | 0.71 |
| Liver—congestion | 600 | 259/341 | 0.70 | 0.69 (NB) | 205 | 80/125 | 0.76 | 0.76 (SVM) | 0.79 | 0.76 | 0.69 |
| Tibial artery—atherosclerosis/atherosis/sclerotic | 836 | 216/620 | 0.76 | 0.76 (RF) | 508 | 113/395 | 0.77 | 0.76 (LD) | 0.76 | 0.77 | 0.69 |
| Thyroid—Hashimoto | 892 | 71/821 | 0.89 | 0.87 (SVM) | 570 | 37/533 | 0.95 | 0.96 (SVM) | 0.93 | 0.96 | 0.82 |
| Adipose—fibrosis | 936 | 137/826 | 0.57 | 0.58 (RF) | 574 | 73/501 | 0.78 | 0.68 (LR) | 0.84 | 0.77 | 0.58 |
Figure 3Proportion of non-null p-values (π1) in the cross-tissue regression analysis of pathology phenotype vs. gene expression, and for image PCs 1–3 vs. expression. Larger dots are the π1 values corresponding to the same tissue in which pathology was determined, which is also highlighted in gray in each subfigure. Arrows indicate the tissue with the highest π1 for phenotype, in many instances coinciding with the phenotype tissue. Tissues used in the cross-tissue analysis are labeled using the same color scheme used by GTEx Consortium et al. (2020).
Figure 4Image samples for two tissues and associated pathologies, for the samples with the three highest and lowest predicted probabilities for thyroid Hashimoto's disease (Top) and tibial artery atherosclerosis (Bottom).
Figure 5QQ plot of p-values from the regression analysis of pathology phenotype against gene expression for atherosclerosis in tibial artery tissue.