Literature DB >> 29556516

High-dimensional regression analysis links magnetic resonance imaging features and protein expression and signaling pathway alterations in breast invasive carcinoma.

Michael Lehrer1, Anindya Bhadra2, Sathvik Aithala1, Visweswaran Ravikumar1, Youyun Zheng3, Basak Dogan4, Emerlinda Bonaccio5, Elizabeth S Burnside6, Elizabeth Morris7, Elizabeth Sutton7, Gary J Whitman8, Jose Net9, Kathy Brandt10, Marie Ganott11, Margarita Zuley11, Arvind Rao1.   

Abstract

BACKGROUND: Imaging features derived from MRI scans can be used for not only breast cancer detection and measuring disease extent, but can also determine gene expression and patient outcomes. The relationships between imaging features, gene/protein expression, and response to therapy hold potential to guide personalized medicine. We aim to characterize the relationship between radiologist-annotated tumor phenotypic features (based on MRI) and the underlying biological processes (based on proteomic profiling) in the tumor.
METHODS: Multiple-response regression of the image-derived, radiologist-scored features with reverse-phase protein array expression levels generated association coefficients for each combination of image-feature and protein in the RPPA dataset. Significantly-associated proteins for features were analyzed with Ingenuity Pathway Analysis software. Hierarchical clustering of the results of the pathway analysis determined which features were most strongly correlated with pathway activity and cellular functions.
RESULTS: Each of the twenty-nine imaging features was found to have a set of significantly correlated molecules, associated biological functions, and pathways.
CONCLUSIONS: We interrogated the pathway alterations represented by the protein expression associated with each imaging feature. Our study demonstrates the relationships between biological processes (via proteomic measurements) and MRI features within breast tumors.

Entities:  

Keywords:  MRI; TCGA; breast invasive carcinoma; protein expression; signaling pathway analysis

Year:  2018        PMID: 29556516      PMCID: PMC5854291          DOI: 10.18632/oncoscience.397

Source DB:  PubMed          Journal:  Oncoscience        ISSN: 2331-4737


INTRODUCTION

Breast cancer is the most common cancer in women [1], with incidence rates rising since the 1990s [2]. Molecular expression profiling of tumors has been effective in allowing for individualized therapy plans in certain types of breast cancer [3]. Expression of three receptors—estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor (HER2)—are routinely used to determine optimal treatment plans for breast cancer patients [4]. PR and ER expression are associated with luminal A and B subtypes of breast cancer, with a lower proliferation index and pathological grade [5]. Disease-free and overall survival is lower in HER2 over-expression and triple negative breast cancers when compared to luminal A and B subtypes [5]. Despite obtaining multiple specimens from percutaneous biopsies as well as analysis of surgical specimens, the temporal and spatial heterogeneity of tumor gene and protein expression cannot be adequately determined [6, 7, 8]. Readily available imaging databases such as The Cancer Imaging Archive (TCIA) are leveraged in order to address the problem of tumor heterogeneity and to predict gene expression and patient responses to therapy based on imaging data [9]. MRI, as well as other modalities, is now used by researchers for extraction of features which correlate with patient responses and gene expression [10-12]. Breast cancer radiomic signatures can potentially predict recurrence when compared with multi-gene assays [13]. Deciphering the associations between imaging features, breast tumor gene/protein expression levels, and patient outcomes holds the potential to guide personalized medicine [12, 14]. High-dimensional variable selection (Supplementary Table S1) is commonly used to analyze relationships between multiple modalities (copy number, expression, etc.) in genomic data. To avoid generating spurious correlations, a number of Bayesian and frequentist approaches have been devised. Bayesian approaches use a sparsity-inducing prior, such as spike-and-slab [15, 16], double-exponential [17], horseshoe [18], horseshoe+[19], or generalized double Pareto prior [20]. Frequentist approaches use penalized regression models: l1 [18], horseshoe+ [19], generalized double Pareto prior [20], L1-norm penalty of the LASSO [21], combined l1/l2 penalty of elastic net [21], or combined L1 and L2-norm penalty of elastic net [22]. These regression models allow us to ignore the loss of statistical efficiency that occurs through correlation structures because they treat all variables as independent [23]. Several approaches to high-dimensional variable selection in highly-correlated datasets have been taken [24-26]. In this study, we used a Bayesian approach to model the correlation structure as previously described [27]. Analyzing a cohort of 82 breast cancer patients included in the TCGA database, we built a model correlating MRI-derived imaging features with proteomics data using a high-dimensional regression approach. Though a previous study of 353 breast cancer patients assessed correlations between 21 imaging traits and mRNA transcript levels [28], to our knowledge our approach has not yet been applied to proteomics data for breast cancer.

RESULTS

Molecules were found to be significantly correlated to each imaging feature (Supplementary Table S2) with the exception of clumped non-mass internal enhancement. These molecules were obtained through high dimensional regression of the RPPA protein expression data on the imaging features set. For example, the axillary lymphadenopathy feature was found to be directly correlated with expression of EIF4EBP1 and PRDX1, and inversely correlated with RAB25, SHC1, XRCC1, and PARK7. Cell surface receptors associated with imaging features are EGFR, KDR, and PDK1. IPA analysis was implemented to determine the functional implications of the molecules. The IPA software generated p-values and Z-scores for the IPA Canonical Pathways of each feature (Figure 2), as well as scores for the IPA Diseases and Biological Functions of each feature (Figure 3). The Canonical Signaling Pathways most strongly associated with each imaging feature are summarized in Table 2. The results show that the same proteins are found to be correlated with a specific feature, irrespective of whether the data sets were separated into global or primary features.
Figure 2

Representative pattern of associations between BRCA imaging features and IPA Canonical Pathways based on (A) p-values and (B) activation Z-scores

A subset of the p-values and Z-scores are shown. Values shown are -log(p-value).

Figure 3

Representative pattern of associations between BRCA imaging features and IPA Diseases and Bio-Functions based on (A) p-values and (B) activation Z-scores

A subset of the p-values and Z-scores are shown. Values shown are -log(p-value).

Table 2

Radiological features are associated with unique pathway alterations in breast invasive carcinoma

Lists of molecules (proteins and post-translational modifications) were analyzed in IPA. Top pathways for each feature are shown with the associated –log (p-value) computed by IPA demonstrating the strength of the association of each imaging feature to each pathway.

3 greatest P-values per imaging feature123
T2 Signal IntensityPancreatic Adenocarcinoma Signaling 6.177Melanoma Signaling 4.405Non-Small Cell Lung Cancer Signaling 4.111
T2 HeterogeneityUVB-Induced MAPK Signaling 6.245EGF Signaling 6.206ErbB Signaling 5.725
Skin ThickeningEpithelial Adherens Junction Signaling 8.957Regulation of the Epithelial-Mesenchymal Transition Pathway 8.282Pancreatic Adenocarcinoma Signaling 5.692
Skin Invasion14-3-3-mediated Signaling 10.378Cell Cycle: G2/M DNA Damage Checkpoint Regulation 7.914UVB-Induced MAPK Signaling 7.385
Irregular ShapeUVC-Induced MAPK Signaling 6.845EGF Signaling 6.206STAT3 Pathway 6.112
Rim EnhancementATM Signaling 8.26AMPK Signaling 6.385Cell Cycle: G2/M DNA Damage Checkpoint Regulation 5.037
Pectoral InvasionPI3K/AKT Signaling 12.834Neuregulin Signaling 10.295p70S6K Signaling 9.069
Non-Mass Heterogeneous Internal EnhancementILK Signaling 7.812PI3K/AKT Signaling 6.646Endometrial Cancer Signaling 5.511
Non-Mass Clustered Ring Internal EnhancementATM Signaling 4.078CDK5 Signaling 3.892B Cell Receptor Signaling 3.349
Non-Mass Clumped Internal EnhancementDNA Double-Strand Break Repair by Homologous Recombination 3.181DNA Double-Strand Break Repair by Non-Homologous End Joining 3.181DNA damage-induced 14-3-3σ Signaling 3.049
Regional Non-Mass DistributionAcute Myeloid Leukemia Signaling 3.745Cancer Drug Resistance By Drug Efflux 1.94autophagy 1.853
Multiple Regions Non-Mass DistributionPI3K/AKT Signaling 4.471IL-8 Signaling 4.068CD27 Signaling in Lymphocytes 2.311
Linear Non-Mass DistributionErbB2-ErbB3 Signaling 6.883ErbB Signaling 6.421Relaxin Signaling 5.845
Focal Non-Mass DistributionUVC-Induced MAPK Signaling 11.037Cancer Drug Resistance By Drug Efflux 10.613AMPK Signaling 10.337
Diffuse Non-Mass DistributionDNA Double-Strand Break Repair by Homologous Recombination 5.395Role of BRCA1 in DNA Damage Response 3.879ATM Signaling 3.857
Nipple RetractionPI3K/AKT Signaling 6.112UVB-Induced MAPK Signaling 4.246FLT3 Signaling in Hematopoietic Progenitor Cells 4.025
Nipple InvasionEstrogen-mediated S-phase Entry 2.346Induction of Apoptosis by HIV1 1.949Lymphotoxin β Receptor Signaling 1.901
MarginMolecular Mechanisms of Cancer 3.746DNA damage-induced 14-3-3σ Signaling 2.205GADD45 Signaling 2.205
Lesion SizeHereditary Breast Cancer Signaling 7.625PI3K/AKT Signaling 5.976Insulin Receptor Signaling 5.753
Heterogeneous Enhancement IntensityProlactin Signaling 6.64Th1 Pathway 6.001Th1 and Th2 Activation Pathway 5.588
FibroglandularUVC-Induced MAPK Signaling 13.629UVB-Induced MAPK Signaling 12.176Neuregulin Signaling 11.271
Extent HeterogeneityProstate Cancer Signaling 7.076UVB-Induced MAPK Signaling 4.546FLT3 Signaling in Hematopoietic Progenitor Cells 4.325
Extent - Multi-focalCNTF Signaling 4.587UVB-Induced MAPK Signaling 4.546EGF Signaling 4.52
Extent - Multi-centricHepatic Fibrosis / Hepatic Stellate Cell Activation 3.359Tumoricidal Function of Hepatic Natural Killer Cells 2.346Coagulation System 2.182
EdemaHereditary Breast Cancer Signaling 4.797AMPK Signaling 4.425Endometrial Cancer Signaling 3.607
Dark Internal SeptumHuntington's Disease Signaling 3.418Glucocorticoid Receptor Signaling 3.267Parkinson's Signaling 2.646
BackgroundInsulin Receptor Signaling 5.753Molecular Mechanisms of Cancer 5.539NF-κB Signaling 5.331
Axillary LymphadenopathyERK/MAPK Signaling 4.8EGF Signaling 3.824Erythropoietin Signaling 3.693
Associated Non-Mass EnhancementPancreatic Adenocarcinoma Signaling 7.206UVC-Induced MAPK Signaling 6.399Cancer Drug Resistance By Drug Efflux 6.194

Sagittal T1 post-contrast MRI of a 48-year-old female patient diagnosed with infiltrating ductal carcinoma (ER-, PR-, HER2-) shows an oval rim enhancing mass

MRI sequences were obtained from The Cancer Imaging Archive [37].

Representative pattern of associations between BRCA imaging features and IPA Canonical Pathways based on (A) p-values and (B) activation Z-scores

A subset of the p-values and Z-scores are shown. Values shown are -log(p-value).

Representative pattern of associations between BRCA imaging features and IPA Diseases and Bio-Functions based on (A) p-values and (B) activation Z-scores

A subset of the p-values and Z-scores are shown. Values shown are -log(p-value).

Patient demographic information

Demographics are given for the 82 patients included in this study.

Radiological features are associated with unique pathway alterations in breast invasive carcinoma

Lists of molecules (proteins and post-translational modifications) were analyzed in IPA. Top pathways for each feature are shown with the associated –log (p-value) computed by IPA demonstrating the strength of the association of each imaging feature to each pathway. In order to determine which features were most strongly associated with functional alterations to signaling pathways, agglomerative unsupervised hierarchical clustering was performed on the p-values and Z-scores (Figures 2 and 3). This analysis separated the features into groups based on the strength of their correlations with altered pathway activity and disease functions. The most strongly deregulated IPA Diseases and Biological functions featured activation Z-scores between -3.5 and +3.5 (Supplementary Table S3).

DISCUSSION

The strength of the associations between imaging features and protein expression, signaling pathways, and biological functions was computed using a sequential analysis of the protein expression data found through RPPA analysis of MRI scans of the TCGA patients. Correlation coefficients for each possible combination of imaging feature and protein were computed using a high-dimensional regression with a Bayesian selection of covariates. Corrected p-values were computed for each correlation coefficient in order to minimize the false discovery rate (FDR). Only the strongest ten percent of significantly-correlated molecules were analyzed using the standardized Core Analysis workflow in IPA, using correlation coefficients in lieu of gene expression values. The IPA analysis provides associations with pathway activity and pathobiology, allowing for hypotheses regarding the relationship between pathway activity at the cellular level and the manifestations of the alterations at the macroscopic, imaging levels. The activation Z-scores computed from the correlation coefficients indicate whether each pathway (or function) is up- or down-regulated by upstream transcription factor activity. A similar approach integrated breast cancer transcriptomics data with imaging features and extended the interpretation with gene set enrichment analysis to identify metagene signatures such as wound response and hypoxia [28]. Our study extends this approach by leveraging the IPA Knowledge Base to interpret the patterns of protein expression associated with each imaging feature. In our study, we used a stringent two-step method to select the correlations least likely to result from chance association, overcoming a common issue with high dimensional regression analysis. Despite this, the approach we have described is essentially a hypothesis-generation pipeline, and should be interpreted carefully, following in-vivo perturbation experiments in appropriate model systems. We found that enhancing rim fraction score, a quantitative MRI feature, was shown to be significantly associated with the expression of the long, non-coding RNA HOTAIR [29]. This expression is known to be associated with breast cancer progression and metastasis [30]. The results of the high dimensional regression method used hints at the molecular underpinnings of macroscopic imaging phenotypes. It is known that MRI features correlate with pathologic stage and lymph node involvement [31]. The results found in this study point to multiple significant associations between molecular expression patterns in the tumor cells and how these manifest as MRI phenotypes [32].

METHODS

TCGA patient datasets

Eighty-two patients from multiple institutions with de-identified MRIs and reverse-phase protein array (RPPA) expression data were included in this study. All subject data was de-identified prior to the study through inclusion in The Cancer Genome Atlas (TCGA), and was thus exempt from requiring institutional review board approval, following the terms of the TCGA data use agreement. Imaging data was obtained through The Cancer Imaging Archive (TCIA) database. RPPA protein expression data was obtained from the TCGA through Firehose (https://gdac.broadinstitute.org/). Scores of twenty-nine MRI semantic features were defined by the TCGA Breast Phenotype Research Group [33]. We used the imaging features as defined by the TCGA group to include mass- and non-mass associated features as shown in Table 3. These feature groups include background features, tumor related features, tumor dimensional features, features associated with the morphology of the non-mass enhancing lesion, and T2-weighted MR acquisition features.
Table 3

List of imaging features

Feature GroupFeatures
BackgroundBackground Enhancement Fibroglandular
Tumor FeaturesIrregular Shape Heterogeneous Enhancement Intensity Dark Internal Septum Rim Enhancement Margin
Tumor DimensionsLesion Size Multicentric Extent Multifocal Extent Heterogeneity Extent
Associated FeaturesPectoral Invasion Nipple Invasion Skin Invasion Axillary Lymphadenopathy Edema Skin Thickening Nipple Retraction
Morphology of Non-Mass Enhancing LesionsAssociated Non-Mass Enhancement Non-mass Clumped Internal Enhancement Non-mass Clustered Ring Internal Enhancement Non-mass Heterogeneous Internal Enhancement Diffuse Non-Mass Distribution Focal Non-Mass Distribution Linear Non-Mass Distribution Multiple Regions Non-Mass Distribution Regional Non-Mass Distribution
Associated with T2 Weighted MR AcquisitionHeterogeneity Signal Intensity
In order to ensure that the effects of each individual feature were appropriately described, the feature set was split into three subsets: one set with only the 8 mass-associated features, one with only the 21 global features, and an aggregate set with all 29 features. The features were isolated in order to determine if there were any significant proteins, associated pathways, or biological functions that appeared in purely global or mass-associated-only feature sets.

Statistical analysis

High dimensional regression

High dimensional regression was done in Matlab using the joint Bayesian selection of covariates developed by Bhadra and Mallick [27]. In this analysis, the independent variables were the imaging features, and the molecules (proteins and phospho-proteins) were the response variables. This arrangement allowed the expression of each protein to be correlated with the expression of many other proteins.

Multiple-testing correction

Multiple testing correction was employed to control the false-discovery rate (FDR) by sequentially designating p-value thresholds [34]. First, the posterior probabilities of the covariates were thresholded at an FDR of 0.25, giving a sparse set of predictors (imaging variables). Second, t-tests were performed using “no-association” as the null hypothesis and “non-zero association” as the alternative hypothesis. The t-tests were computed between each combination of imaging features and molecules in the RPPA dataset. Correlation coefficients with p-values in the 10th percentile and that were less than 0.05 after adjustment for multiple comparisons were considered statistically significant. This approach is similar to that used to discern the relative impact of copy number alterations on messenger RNAs and microRNAs in glioblastoma [30].

Pathway analysis

Pathway analysis was performed on each of the three data sets (based on the image feature subsets) using the “Core Analysis” feature in the IPA software [35]. For the purposes of this analysis, regression correlation coefficients served as expression values. P-values and activation Z-scores were computed internally in IPA as previously described.

Hierarchical clustering

Agglomerative unsupervised hierarchical clustering of p-values and activation Z-scores was carried out the using the “Stats” package in R. Euclidean distance matrices were computed and Ward's method was minimized within-cluster variance [36].
Table 1

Patient demographic information

Demographics are given for the 82 patients included in this study.

Statistic
Mean Age at Diagnosis (Range)53.2(29 - 82)
Median Overall Survival (Months)41.72
Median Disease-Free Survival (Months)42.015
Estrogen Receptor (ER) Status (Positive / Negative)67/ 15
Progesterone Receptor (PR) Status (Positive / Negative)59/ 23
Infiltrating Lobular Carcinoma9
Infiltrating Ductal Carcinoma69
Medullary Carcinoma1
Other3
  25 in total

1.  Covariate-Adjusted Precision Matrix Estimation with an Application in Genetical Genomics.

Authors:  T Tony Cai; Hongzhe Li; Weidong Liu; Jichun Xie
Journal:  Biometrika       Date:  2012-11-30       Impact factor: 2.445

2.  Gene expression profiling predicts clinical outcome of breast cancer.

Authors:  Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao; Hans L Peterse; Karin van der Kooy; Matthew J Marton; Anke T Witteveen; George J Schreiber; Ron M Kerkhoven; Chris Roberts; Peter S Linsley; René Bernards; Stephen H Friend
Journal:  Nature       Date:  2002-01-31       Impact factor: 49.962

3.  A SPARSE CONDITIONAL GAUSSIAN GRAPHICAL MODEL FOR ANALYSIS OF GENETICAL GENOMICS DATA.

Authors:  Jianxin Yin; Hongzhe Li
Journal:  Ann Appl Stat       Date:  2011-12       Impact factor: 2.083

4.  Sparse Multivariate Regression With Covariance Estimation.

Authors:  Adam J Rothman; Elizaveta Levina; Ji Zhu
Journal:  J Comput Graph Stat       Date:  2010       Impact factor: 2.302

5.  Radiogenomic analysis of breast cancer using MRI: a preliminary study to define the landscape.

Authors:  Shota Yamamoto; Daniel D Maki; Ronald L Korn; Michael D Kuo
Journal:  AJR Am J Roentgenol       Date:  2012-09       Impact factor: 3.959

Review 6.  Accuracy and surgical impact of magnetic resonance imaging in breast cancer staging: systematic review and meta-analysis in detection of multifocal and multicentric cancer.

Authors:  Nehmat Houssami; Stefano Ciatto; Petra Macaskill; Sarah J Lord; Ruth M Warren; J Michael Dixon; Les Irwig
Journal:  J Clin Oncol       Date:  2008-05-12       Impact factor: 44.544

Review 7.  Applications and limitations of radiomics.

Authors:  Stephen S F Yip; Hugo J W L Aerts
Journal:  Phys Med Biol       Date:  2016-06-08       Impact factor: 3.609

8.  Prediction of clinical phenotypes in invasive breast carcinomas from the integration of radiomics and genomics data.

Authors:  Wentian Guo; Hui Li; Yitan Zhu; Li Lan; Shengjie Yang; Karen Drukker; Elizabeth Morris; Elizabeth Burnside; Gary Whitman; Maryellen L Giger; Yuan Ji
Journal:  J Med Imaging (Bellingham)       Date:  2015-09-23

9.  Assessment of tumor heterogeneity: an emerging imaging tool for clinical practice?

Authors:  Fergus Davnall; Connie S P Yip; Gunnar Ljungqvist; Mariyah Selmi; Francesca Ng; Bal Sanghera; Balaji Ganeshan; Kenneth A Miles; Gary J Cook; Vicky Goh
Journal:  Insights Imaging       Date:  2012-10-24

10.  Radiomics: Images Are More than Pictures, They Are Data.

Authors:  Robert J Gillies; Paul E Kinahan; Hedvig Hricak
Journal:  Radiology       Date:  2015-11-18       Impact factor: 11.105

View more
  1 in total

Review 1.  Machine learning in breast MRI.

Authors:  Beatriu Reig; Laura Heacock; Krzysztof J Geras; Linda Moy
Journal:  J Magn Reson Imaging       Date:  2019-07-05       Impact factor: 4.813

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.