Literature DB >> 36238609

Correlation between MR Image-Based Radiomics Features and Risk Scores Associated with Gene Expression Profiles in Breast Cancer.

Ga Ram Kim, You Jin Ku, Jun Ho Kim, Eun-Kyung Kim.   

Abstract

Purpose: To investigate the correlation between magnetic resonance (MR) image-based radiomics features and the genomic features of breast cancer by focusing on biomolecular intrinsic subtypes and gene expression profiles based on risk scores. Materials and
Methods: We used the publicly available datasets from the Cancer Genome Atlas and the Cancer Imaging Archive to extract the radiomics features of 122 breast cancers on MR images. Furthermore, PAM50 intrinsic subtypes were classified and their risk scores were determined from gene expression profiles. The relationship between radiomics features and biomolecular characteristics was analyzed. A penalized generalized regression analysis was performed to build prediction models.
Results: The PAM50 subtype demonstrated a statistically significant association with the maximum 2D diameter (p = 0.0189), degree of correlation (p = 0.0386), and inverse difference moment normalized (p = 0.0337). Among risk score systems, GGI and GENE70 shared 8 correlated radiomic features (p = 0.0008-0.0492) that were statistically significant. Although the maximum 2D diameter was most significantly correlated to both score systems (p = 0.0139, and p = 0.0008), the overall degree of correlation of the prediction models was weak with the highest correlation coefficient of GENE70 being 0.2171.
Conclusion: Maximum 2D diameter, degree of correlation, and inverse difference moment normalized demonstrated significant relationships with the PAM50 intrinsic subtypes along with gene expression profile-based risk scores such as GENE70, despite weak correlations. Copyrights
© 2020 The Korean Society of Radiology.

Entities:  

Keywords:  Breast Neoplasms; Gene Expression Profiling; Magnetic Resonance Imaging

Year:  2020        PMID: 36238609      PMCID: PMC9431911          DOI: 10.3348/jksr.2020.81.3.632

Source DB:  PubMed          Journal:  Taehan Yongsang Uihakhoe Chi        ISSN: 1738-2637


INTRODUCTION

Gene expression profiling by high-throughput technologies has provided deeper insight into the complex biomolecular nature of breast cancer (1234567). Some investigators have discovered gene expression profiles that can be used to classify intrinsic subtypes to predict prognosis and treatment response and this knowledge has helped to individualize treatment strategies for breast cancer patients (8). However, gene expression profiling is not yet readily applicable in daily practice. Recent advances in the computer-aided quantitative analysis of radiologic images (so called radiomics) have enabled us to go beyond the detection and diagnosis of cancer to image-based cancer phenotyping. With radiomics, we cannot only correlate images to pathologic tumor or node stage, nuclear grade and molecular subtype, but also gather further information on prognosis and treatment response. This can be done by converting images to high-throughput quantitative data and subsequently analyzing the statistical relationships between radiomics features and clinicopathologic factors. Radiogenomics refers to the study of mathematical relationships between radiomics features and genomic features (9); the Cancer Genome Atlas (TCGA) of the National Cancer Institute (10) and its imaging counterpart, the Cancer Imaging Archive (TCIA) (11) facilitate cross-disciplinary research to find relationships between imaging phenotypes and genomic subtypes. PAM50 is a well-known gene assay for breast cancer; it was developed as a 50-gene quantitative real time polymerase chain reaction assay that identifies and categorizes intrinsic molecular subtypes of breast cancer into the luminal A, luminal B, human epidermal growth factor receptor 2 (HER2)-enriched, basal-like, and normal-like phenotypes from RNA isolated from formalin-fixed, paraffin-embedded tissue. The PAM50 assay was also used to develop a prognostic score for risk of relapse based on the relative distance to the centroid of each subtype; a proliferation score based on a gene subset related to cell cycle progression; and composite scores that include tumor size with molecular phenotypes (1). The PAM50-based risk score was found to be significant in tumors less than 5 cm in size that were estrogen receptor (ER)-positive, HER2-negative and lymph node-negative (12131415). If radiomics features, such as those of MRI which is widely used in the preoperative evaluation of breast cancer, can predict the genomic features of breast cancer, we could readily acquire information that can be used to tailor treatment for individuals even within routine clinical practice. Therefore, the purpose of this study is to investigate the relationship between MR image-based radiomics features and genomic features of breast cancer by focusing on biomolecular intrinsic subtypes and gene expression profiles based on risk scores.

MATERIALS AND METHODS

DATA DOWNLOAD

We had assess to only de-identified data and the approval of the Institutional Review Board was unnecessary. Clinical and genomic data for the patients were downloaded from TCGA from the Genomic Data Commons Data Portal (https://portal.gdc.cancer.gov) along with the simultaneous MR images from TCIA (https://www.cancerimagingarchive.net/). After matching, finally 122 patients with simultaneous gene expression data and appropriate MR images were enrolled in this study. Among the included MR images, 91 cases were obtained with a GE 1.5 T MRI scanner (GE Medical Systems, Milwaukee, WI, USA), 13 with a Siemens 1.5 T MRI scanner (Siemens, Berlin, Germany), and 15 with a Phillips 1.5 T MRI scanner (Philips Medical Systems, Bothell, WA, USA). Three cases were acquired on a Phillips 3.0 T scanner (Philips Medical Systems). All MR images acquired with GE scanners were obtained using a standard double breast coil and a gadolinium-based contrast agent.

MR IMAGE REVIEW, FEATURE EXTRACTION AND INTEROBSERVER AGREEMENT

The obtained MR images were reviewed independently by two breast radiologists with more than 8 years of experience in breast imaging. Each radiologist reviewed all 122 cases and determined a representative slice for every patient. In case of discordant findings, the two radiologists discussed and reached a consensus. Region of interests (ROIs) were then drawn in a semiautomatic manner using MIPAV (https://mipav.cit.nih.gov). Radiomic features were extracted using pyradiomics (https://github.com/Radiomics/pyradiomics). A total of 100 features of 7 categories were extracted from each ROI. The 7 categories were first order statistics (18 features), shape-based features (8 features), gray-level co-occurrence matrix (23 features), gray-level run-length matrix (16 features), gray-level size zone matrix (16 features), neighboring gray-tone difference matrix (5 features), and gray-level dependence matrix (14 features). Interobserver agreement for the radiomic features extracted from the ROIs was evaluated with the intraclass correlation coefficient (ICC) and 95% confidence interval (CI). The ‘irr’ R package was used for ICC analysis (R version 3.5.1, http://www.R-project.org).

DATA AND STATISTICAL ANALYSES

The purpose of this study was twofold. The first purpose was to investigate radiomic characteristics of each biomolecular subtype based on genomic characteristics and the second purpose was to determine the correlation between radiomics features extracted from MRI and gene expression profile-based recurrence (or prognosis) risk score systems. To define radiomic characteristics, we used 100 radiomics features extracted from MRI using the pyradiomics package. The Kruskal-Wallis test was performed to identify differences in radiomics features among individual biomolecular subtypes. To investigate the correlation between radiomic features and gene expression profile-based recurrence (or prognosis) risk score systems, we first performed Spearman's correlation test for individual radiomics features and 11 risk score systems. Then, we identified statistically significant features, and used these features to establish prediction models using penalized generalized regression with the least absolute shrinkage and selection operator (LASSO). Radiomics features were selected with LASSO using the ‘glmnet’ R package to study the correlation between risk scores determined with gene expression profiles and mathematical models built on radiomics features. Cross validation was performed in a leave-one-out manner. Genomic analyses were performed using 1092 gene expression profiles by RNA sequencing. The normalized RSEM data were downloaded using ‘TCGABiolinks’ package in R. The PAM50 intrinsic subtypes were classified as described in a previous report (6) and risk scores were determined using the ‘genefu’ package in R. The risk score systems included single gene based prognosis prediction (ESR1, ERBB2, and AURKA) (16), EndoPredict (17), GENIUS (2), GGI (6), OncotypeDx (8), TamR (4), GENE70 (7), PIK3CA gene signature (5), and ROR-S (1). The analyses were run on a set of 1092 cases and the results of the 122 enrolled patients were used for further analysis. The Kruskal-Wallis test was performed to analyze the relationship between intrinsic subtypes and radiomics features. All statistical analyses were performed with R version 3.5.1. R packages including ‘TCGABiolinks’, ‘genefu’, ‘glmnet’, and ‘irr’ were used to extract TCGA data, calculate the risk score of the intrinsic subtype, and perform penalized generalized regression with the LASSO, and ICC analysis, respectively.

RESULTS

PATIENT CHARACTERISTICS

From TCGA, 122 patients with normalized RNA sequencing data available for gene expression and simultaneous TCIA MR images were enrolled. The median age of the patients was 55 years (range from 29 to 83 years). Infiltrating ductal carcinoma was the most frequent pathologic type of breast cancer observed in this study population, and followed by lobular carcinoma. The luminal A type was the most dominant molecular subtype (Table 1).
Table 1

Patient Characteristics

CharacteristicsNumber
Pathology
 Infiltrating duct and lobular carcinoma1
 Infiltrating duct carcinoma, not otherwise specified102
 Infiltrating duct mixed with other types of carcinoma1
 Lobular carcinoma, not otherwise specified16
 Medullary carcinoma, not otherwise specified1
 Pleomorphic carcinoma1
Stage
 Stage I22
 Stage Ia8
 Stage II1
 Stage IIa50
 Stage IIb24
 Stage IIIa10
 Stage IIIc7
Age, years
 Median (range)55 (29–83)
Race
 Asian1
 African American20
 Caucasian101
PAM50 subtype
 Basal17
 HER28
 Luminal A79
 Luminal B18

HER2 = human epidermal growth factor receptor 2

INTEROBSERVER AGREEMENT BETWEEN THE TWO RADIOLOGISTS

The interobserver agreement for feature extraction between the two radiologists was acceptable (ICC 95% CI, 0.768–1.000). The agreement was the highest for gray-level size zone matrix features and the lowest for first order features (Table 2).
Table 2

Interobserver Agreement

Feature ClassIntraclass Correlation Coefficient, 95% Confidence Interval
Shape0.771–1.000
Gray-level dependence matrix0.799–1.000
Gray-level co-occurrence matrix0.817–1.000
First order statistics0.768–1.000
Gray-level run-length matrix0.787–1.000
Gray-level size-zone matrix0.820–1.000
Neighboring gray-tone difference matrix0.818–1.000

RELATIONSHIP BETWEEN RADIOMICS FEATURES AND GENE EXPRESSION PROFILE-BASED FEATURES

The PAM50 subtype was significantly correlated to three radiomic features, which were the maximum 2D diameter (p = 0.0189), correlation (p = 0.0386) and inverse difference moment normalized (p = 0.0337). In univariate analysis, overall shape features seemed to be more related to risk scores than texture features contrary to the intrinsic subtype. GGI and GENE70 showed significantly more related radiomics features than the other risk score systems. ERBB2, GENIUS, and PIK3CA were not significantly related with radiomic features (Table 3). Among the risk score systems, GGI was significantly correlated to 2 shape features [elongation (p = 0.0199), and max 2D diameter column (p = 0.0139)], 2 gray-level dependence matrix features [small dependence low gray-level emphasis (p = 0.0261), and low gray-level emphasis (p = 0.331)], 2 first order features [total energy (p = 0.0412), and 10 percentile (p = 0.0214)], 2 gray-level run-length matrix features [short-run low gray-level emphasis (p = 0.0244), and low gray-level run emphasis (p = 0.0320)], and a gray-level size-zone matrix feature–[(small-area low gray-level emphasis (p = 0.0491)]. GENE70 score was significantly correlated to 3 shape features [elongation (p = 0.0122), max 2D diameter column (p = 0.0008), and surface area (p = 0.0377)], a gray-level dependence matrix feature [small dependence low gray-level emphasis (p = 0.0169)], 3 first order features [total energy (p = 0.0251), mean (p = 0.0492), and 10 percentile (p = 0.0101)], 2 gray-level run-length matrix features [short run low gray-level emphasis (p = 0.0350), and low gray-level run emphasis (p = 0.0478)], and a gray-level size-zone matrix feature [small area low gray-level emphasis (p = 0.0392)].
Table 3

Number of Radiomic Features that Demonstrated a Statistically Significant (p < 0.05) Association with Intrinsic Subtypes or Risk Scores

Feature ClassShapeGray-Level Dependence MatrixGray-Level Cooccurrence MatrixFirst Order StatisticsGray-Level Run-Length MatrixGray-Level Size-Zone MatrixNeighboring Gray-Tone Difference Matrix
Intrinsic Subtype
 PAM501020000
 CNV*0001000
 Mutation*0367321
 DNA methylation*0214320
 mRNA*1000000
 miRNA*4210817 4
 lncRNA*0400050
 Protein*1000000
 PARADIGM*2000000
Risk Score
 AURKA2100003
 ESR11000000
 ERBB20000000
 GGI2202210
 GENIUS0000000
 EndoPredict1000000
 OncotypeDx3000000
 TamR1101000
 GENE703303210
 PIK3CA0000000
 ROR-S2000000

*Clustering results from a previous study (9).

†‘For research’ and ‘NOT for clinical’ scores determined based on gene expression profiles.

On the basis of significantly correlated features, prediction models were established. Most risk score prediction modes showed statistically significant p values except ERBB2, GENIUS, and PIK3CA (Table 4). However, overall correlation was weak as the adjusted R2 values were low (below 0.3) with the adjusted R2 value of GENE70 being the highest at 0.2171.
Table 4

Penalized Generalized Regression

Risk Scores*Adjusted R2p-Value
AURKA0.1998< 0.001
ESR10.152< 0.001
ERBB2-not significant
GGI0.1835< 0.001
GENIUS-not significant
EndoPredict0.11180.00693
OncotypeDx0.14740.00167
TamR0.1991< 0.001
GENE700.2171< 0.001
PIK3CA-not significant
ROR-S0.1903< 0.001

*‘For research’ and ‘NOT for clinical’ scores determined based on gene expression profiles.

UNSUPERVISED HIERARCHICAL CLUSTERING

Differentially expressed genes according to PAM50 classification were extracted using the Kruskal-Wallis test. A p value less than 10−9 was considered statistically significant and 133 genes whose expression was significantly different among the PAM50 classifications were selected. An unsupervised hierarchical clustering analysis did not correlate well with PAM50 classification (Fig. 1A). Differentially extracted radiomic features according to PAM50 classification were selected using the Kruskal-Wallis test. Only 4 features with p values less than 10−1 were selected. An unsupervised hierarchical clustering failed to show significant correlation with PAM50 classification (Fig. 1B).
Fig. 1

Unsupervised hierarchical clustering analysis of the enrolled cases with differently expressed genes (A) and differently extracted radiomic features (B). In both the (Top) dendrograms, (Mid) the color bars indicate PAM50 classification (red: luminal A, cyan: luminal B, yellow: human epidermal growth factor receptor 2, green: normograde, and blue: basal), and (Bottom) the heatmaps of gene expression (A), and radiomic features (B).

DISCUSSION

With recent advances in computational biology, gene expression profiles allow more useful information to be collected regarding prognosis than conventional clinicopathological studies. Especially for breast cancer, there are risk score systems based on multi-gene expression profiles that provide more information to predict recurrence and treatment response than traditional clinical and histopathological factors (12456781617). Based on these risk scores, treatment strategies can be tailored to each individual patient. Studying the relationships between gene expression profiles and image phenotypes may provide valuable opportunities to develop robust tools for tailored treatment. Eventually, we will be able to obtain information regarding intrinsic subtypes and risk scores based on biomolecular characteristics in an automatized manner with software embedded in imaging machines. In our study, the interobserver agreement between the two radiologists for feature-computerized extraction by drawing the ROIs of 122 MR lesions was comparably high (ICC 95% CI, 0.768–1.000). Qualitative assessments made by humans will naturally lead to interobserver variations. The interobserver variability of three radiologists for 294 breast MR lesions was substantial for mass internal enhancement (k = 0.62) and moderate for peritumoral edema (k = 0.46) with the k agreement in a past study (18). On the other hand, the interobserver reproducibility of two radiologists for computerized extraction of texture features by drawing the ROIs of 50 breast ultrasound (US) lesions was said to be high in another study, with a somewhat lower ICC than ours (ICC 95%, 0.691–1.000) (19). We could increase interobserver agreement with semi-automatized techniques to draw ROIs. As interobserver variation originates from human judgement, we can expect automatized segmentation of tumors to eliminate this variation in the future. We found the PAM50 intrinsic subtype to be significantly related to shape and texture (gray-level co-occurrence matrix) features. The significant shape feature was the “maximum 2D diameter” which reflected the size of the ROI. Previous studies reported ER negative (ER-) and triple negative (TN) subtypes to be related to larger tumors (2021). These subtypes are known to have higher microvessel density as well as higher proliferation activity (222324). The significant texture features were ‘correlation’ and ‘inverse difference moment normalized’, which reflects texture heterogeneity. In a previous report by Waugh et al. (25), texture heterogeneity was significantly increased in HER2-enriched and TN subtypes. We also found that the number of radiomics features showed significant correlation with risk scores based on gene expression profiles. Especially, heterogeneity texture features were consistently related to risk scores with statistical significance. These radiomics features quantitatively measure the heterogeneous nature of enhancement within the ROI. Breast cancer has heterogeneous genomic characteristics with multiple driver mutations, the degree of which are known to be related to treatment resistance and poor prognosis (26). Thus, a non-invasive quantitative measurement of heterogeneity may be useful for determining optimal treatment strategies. Among the risk score systems analyzed, those with relatively fewer signature genes, tended to have none or few significantly related radiomics features. This finding indicates that radiomics features may not reflect a single gene or individual signaling pathway, but rather overall patterns of gene expression. Zhu et al. (27) made the same speculation after observing that radiomics features were not correlated to mutations or copy number profiles in their study. Although the number of radiomics features was significantly correlated to risk scores, generalized regression analyses failed to build strong prediction models. The correlation coefficient of the GENE70 model was the highest at 0.2171. This indicates that radiomics features cannot be used to predict prognostic risk for clinical use at this time and that further study is needed to develop mathematical models to predict biomolecular risk scores. This study has some limitations. First, the study enrolled a relatively small number of patients because they were collected from a limited source of data sets (TCGA and TCIA). Another limitation was the uneven quality of MR images. Most of the archives images were obtained on outdated machines without standardized protocols. Also, there might have been variability arising when radiomics features were extracted because the two radiologists drew the ROIs and MR images were obtained with machines manufactured by three different companies. Lastly, some of the risk scores were calculated for research purposes with an algorithm-based method and these calculation methods were different from the original methods for risk scores, which were not from clinical tests. Thus, there might be discrepancies between ‘research purpose’ risk scores and ‘clinical purpose’ risk scores. Despite these limitations, the results of this study suggest that image-based biomolecular phenotypes have the potential to predict the prognosis of breast cancer. In conclusion, the radiomics features of maximum 2D diameter, correlation and inverse difference moment normalized showed significant relationships with biomolecular characteristics, PAM50 intrinsic subtypes and gene expression profile-based risk scores such as GENE70, although the correlations were weak. Thus, further studies are necessary to develop adequate prediction models using MR image-based phenotypes.
  26 in total

1.  A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer.

Authors:  Soonmyung Paik; Steven Shak; Gong Tang; Chungyeul Kim; Joffre Baker; Maureen Cronin; Frederick L Baehner; Michael G Walker; Drew Watson; Taesung Park; William Hiller; Edwin R Fisher; D Lawrence Wickerham; John Bryant; Norman Wolmark
Journal:  N Engl J Med       Date:  2004-12-10       Impact factor: 91.245

2.  Gene expression profiling predicts clinical outcome of breast cancer.

Authors:  Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao; Hans L Peterse; Karin van der Kooy; Matthew J Marton; Anke T Witteveen; George J Schreiber; Ron M Kerkhoven; Chris Roberts; Peter S Linsley; René Bernards; Stephen H Friend
Journal:  Nature       Date:  2002-01-31       Impact factor: 49.962

3.  Prediction of late distant recurrence after 5 years of endocrine treatment: a combined analysis of patients from the Austrian breast and colorectal cancer study group 8 and arimidex, tamoxifen alone or in combination randomized trials using the PAM50 risk of recurrence score.

Authors:  Ivana Sestak; Jack Cuzick; Mitch Dowsett; Elena Lopez-Knowles; Martin Filipits; Peter Dubsky; John Wayne Cowens; Sean Ferree; Carl Schaper; Christian Fesl; Michael Gnant
Journal:  J Clin Oncol       Date:  2014-10-20       Impact factor: 44.544

4.  The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository.

Authors:  Kenneth Clark; Bruce Vendt; Kirk Smith; John Freymann; Justin Kirby; Paul Koppel; Stephen Moore; Stanley Phillips; David Maffitt; Michael Pringle; Lawrence Tarbox; Fred Prior
Journal:  J Digit Imaging       Date:  2013-12       Impact factor: 4.056

Review 5.  Use of Biomarkers to Guide Decisions on Adjuvant Systemic Therapy for Women With Early-Stage Invasive Breast Cancer: American Society of Clinical Oncology Clinical Practice Guideline.

Authors:  Lyndsay N Harris; Nofisat Ismaila; Lisa M McShane; Fabrice Andre; Deborah E Collyar; Ana M Gonzalez-Angulo; Elizabeth H Hammond; Nicole M Kuderer; Minetta C Liu; Robert G Mennel; Catherine Van Poznak; Robert C Bast; Daniel F Hayes
Journal:  J Clin Oncol       Date:  2016-02-08       Impact factor: 44.544

6.  Biological processes associated with breast cancer clinical outcome depend on the molecular subtypes.

Authors:  Christine Desmedt; Benjamin Haibe-Kains; Pratyaksha Wirapati; Marc Buyse; Denis Larsimont; Gianluca Bontempi; Mauro Delorenzi; Martine Piccart; Christos Sotiriou
Journal:  Clin Cancer Res       Date:  2008-08-15       Impact factor: 12.531

7.  Supervised risk predictor of breast cancer based on intrinsic subtypes.

Authors:  Joel S Parker; Michael Mullins; Maggie C U Cheang; Samuel Leung; David Voduc; Tammi Vickery; Sherri Davies; Christiane Fauron; Xiaping He; Zhiyuan Hu; John F Quackenbush; Inge J Stijleman; Juan Palazzo; J S Marron; Andrew B Nobel; Elaine Mardis; Torsten O Nielsen; Matthew J Ellis; Charles M Perou; Philip S Bernard
Journal:  J Clin Oncol       Date:  2009-02-09       Impact factor: 44.544

8.  Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis.

Authors:  Christos Sotiriou; Pratyaksha Wirapati; Sherene Loi; Adrian Harris; Steve Fox; Johanna Smeds; Hans Nordgren; Pierre Farmer; Viviane Praz; Benjamin Haibe-Kains; Christine Desmedt; Denis Larsimont; Fatima Cardoso; Hans Peterse; Dimitry Nuyten; Marc Buyse; Marc J Van de Vijver; Jonas Bergh; Martine Piccart; Mauro Delorenzi
Journal:  J Natl Cancer Inst       Date:  2006-02-15       Impact factor: 13.506

9.  Predicting prognosis using molecular profiling in estrogen receptor-positive breast cancer treated with tamoxifen.

Authors:  Sherene Loi; Benjamin Haibe-Kains; Christine Desmedt; Pratyaksha Wirapati; Françoise Lallemand; Andrew M Tutt; Cheryl Gillet; Paul Ellis; Kenneth Ryder; James F Reid; Maria G Daidone; Marco A Pierotti; Els Mjj Berns; Maurice Phm Jansen; John A Foekens; Mauro Delorenzi; Gianluca Bontempi; Martine J Piccart; Christos Sotiriou
Journal:  BMC Genomics       Date:  2008-05-22       Impact factor: 3.969

10.  Deciphering Genomic Underpinnings of Quantitative MRI-based Radiomic Phenotypes of Invasive Breast Carcinoma.

Authors:  Yitan Zhu; Hui Li; Wentian Guo; Karen Drukker; Li Lan; Maryellen L Giger; Yuan Ji
Journal:  Sci Rep       Date:  2015-12-07       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.