Literature DB >> 33959797

Radiomics and gene expression profile to characterise the disease and predict outcome in patients with lung cancer.

Margarita Kirienko1,2, Martina Sollini3,4, Marinella Corbetta1, Emanuele Voulaz1,5, Noemi Gozzi5, Matteo Interlenghi6,7, Francesca Gallivanone6, Isabella Castiglioni6,8, Rosanna Asselta1,5, Stefano Duga1,6, Giulia Soldà9,10, Arturo Chiti1,5.   

Abstract

OBJECTIVE: The objectives of our study were to assess the association of radiomic and genomic data with histology and patient outcome in non-small cell lung cancer (NSCLC).
METHODS: In this retrospective single-centre observational study, we selected 151 surgically treated patients with adenocarcinoma or squamous cell carcinoma who performed baseline [18F] FDG PET/CT. A subgroup of patients with cancer tissue samples at the Institutional Biobank (n = 74/151) was included in the genomic analysis. Features were extracted from both PET and CT images using an in-house tool. The genomic analysis included detection of genetic variants, fusion transcripts, and gene expression. Generalised linear model (GLM) and machine learning (ML) algorithms were used to predict histology and tumour recurrence.
RESULTS: Standardised uptake value (SUV) and kurtosis (among the PET and CT radiomic features, respectively), and the expression of TP63, EPHA10, FBN2, and IL1RAP were associated with the histotype. No correlation was found between radiomic features/genomic data and relapse using GLM. The ML approach identified several radiomic/genomic rules to predict the histotype successfully. The ML approach showed a modest ability of PET radiomic features to predict relapse, while it identified a robust gene expression signature able to predict patient relapse correctly. The best-performing ML radiogenomic rule predicting the outcome resulted in an area under the curve (AUC) of 0.87.
CONCLUSIONS: Radiogenomic data may provide clinically relevant information in NSCLC patients regarding the histotype, aggressiveness, and progression. Gene expression analysis showed potential new biomarkers and targets valuable for patient management and treatment. The application of ML allows to increase the efficacy of radiogenomic analysis and provides novel insights into cancer biology.
© 2021. The Author(s).

Entities:  

Keywords:  Artificial intelligence; Gene expression; Image analysis; Lung cancer; Mutation; PET/CT; Radiogenomics

Mesh:

Substances:

Year:  2021        PMID: 33959797      PMCID: PMC8440255          DOI: 10.1007/s00259-021-05371-7

Source DB:  PubMed          Journal:  Eur J Nucl Med Mol Imaging        ISSN: 1619-7070            Impact factor:   9.236


Introduction

Lung cancer is a leading global cause of cancer-related deaths, with more than 2.2 million people diagnosed and 1.9 million deaths documented worldwide in 2017 [1]. The 5-year survival rate is less than 25% when diagnosis occurs at a locally advanced or metastatic disease stage. The survival rate rises above 50% if the disease is diagnosed early when local treatment is feasible [2]. In unresectable disease, systemic therapy with cytotoxic and targeted drugs is the main treatment option [3, 4]. Targeted therapy requires tumour tissue molecular profiling to identify specific biomarkers to tailor treatments [5]. Cancer cells are characterised by high genomic instability, responsible for accumulating somatic mutations in crucial oncogenic/oncosuppressor genes, driving uncontrolled cancer cell proliferation [6]. Genetic information may be used to predict survival, as a prognostic biomarker or response to treatment, as a predictive biomarker to support clinical decisions [7]. In particular, in lung cancer, genetic mutations in ALK, BRAF, EGFR, and ROS1 guide treatment decisions in patients affected by advanced disease and recurrence [3, 4]. Beyond these alterations, other oncogenic driver mutations—even if currently not targetable—include RET, HER2, KRAS, and MET [8]. Molecular data are not routinely used in early-stage lung cancer. In cancer patients, features of tumours identified from imaging data (e.g. CT and PET) can be used as biomarkers to reveal diagnostic, predictive, and prognostic associations, based on the identification of correlations with pathological or molecular reference, response to treatment, or survival outcomes: this process is defined radiomics. Within this framework, image features extracted and used as predictors include lesion volume, shape, and texture descriptors [9]. Radiogenomics refers to the integration of imaging-derived parameters and genomic data to find clinically relevant associations. Imaging-based typing has the advantage that it can capture information from the whole tumour lesion, can be performed at multiple time points for treatment monitoring, and can be carried out when a biopsy is not feasible. It is cost-effective, relying on routinely acquired clinical imaging. Imaging-genomic maps have been shown promising in predicting molecular alterations [10]. Additionally, the integration of data generated from complementary “omics” sources may improve single-domain predictive models [11]. Multi-dimensional omics datasets (i.e. large number of features per each subject)—especially when the biological/clinical significance of features is unknown [12]—have driven research towards the application of machine learning (ML) algorithms for analysis [13]. On the one hand, ML is hampered by several issues including the “black-box” problem, trustworthiness, ethics, and responsibility [14-16]. However, it bears the advantage of learning directly from data and improving the prediction process [17]. Therefore, it is no wonder the close connection between radiomics and ML. Current data supporting the efficacy of radiomics in lung cancer predicting diagnosis, prognosis, and optimal therapy are ample and promising and support a future role for computer-assisted diagnosis and management in clinical oncology [12]. Nonetheless, radiogenomics in lung cancer patients is still in its early stages, and extensive data studies are needed to validate the concept [5]. Many radiomic and radiogenomic studies are burdened by limitations including models’ explainability and results’ interpretation. However, in some domains such as healthcare explaining and interpreting which features or how the artificial intelligence system is returning, the predictions may be far more critical than model’s performance. Indeed, explainable and interpretable models may unveil information about biological pathways, chemical mechanisms, or neural substrates, potentially leading to new scientific insights [16, 18, 19]. The objectives of our study were (1) to assess the association of [18F]FDG PET/CT radiomic features with histology and patient outcome; (2) to identify gene expression alterations and mutations in early-stage NSCLC and test their association with histology and patient outcome, and (3) to assess the association of radiogenomics with histology and patient outcome through explainable methods (both traditional statistics and ML).

Methods

Study design

In this retrospective single-centre observational study, we applied the following criteria to select patients from the Institutional database. Inclusion criteria were (i) age > 18 years, (ii) pathologic diagnosis of non-small cell lung cancer (NSCLC), (iii) enrolment from November 2011 to April 2018, (iv) availability of baseline [18F]FDG PET/CT, (v) surgical treatment, and (vi) availability of cancer tissue sample at the Institutional Biobank for those patients to be included in a subpopulation of the study cohort. Exclusion criteria were (i) diagnosis of other malignancies, but non-melanoma skin cancer, in the previous 3 years; (ii) interval time between PET/CT and surgery >3 months; (iii) neoadjuvant treatment; (iv) NSCLC other than squamous cell carcinoma (SQC) and adenocarcinoma (AC) to avoid inhomogeneity within the patient cohort. We did not exclude patients with a history of non-melanoma skin cancers since they generally do not affect patients’ prognosis. The selection workflow is reported in Fig. 1. Demographic parameters such as age and sex were collected for all patients. Smoking habits were recorded. Performance status was not considered in this analysis.
Fig. 1

Patients’ selection workflow

Patients’ selection workflow The institutional ethics committee approved the study (study number 1751). All the patients who donated their tissue samples to the biobank signed informed consent to use their data, imaging, and samples for research purposes; for the remaining patients, a specific, informed consent was waived because of the observational and retrospective study design.

Image acquisition and processing

[18F]FDG PET/CT image acquisition was performed according to versions 1.0 [20] and 2.0 [21] of the European Association of Nuclear Medicine (EANM) guidelines until and from February 2015, respectively. Briefly, patient preparation with fasting at least 4 h before [18F] FDG injection and blood glucose levels below 200 mg/dl were requested. Images were acquired 60 ± 5 min after injection of [18F] FDG, using either a Siemens Biograph 6 LSO (Siemens, Erlangen, Germany) or a General Electric Discovery 690 (General Electric Healthcare, Waukesha, WI, USA) PET/CT scanner. All PET images were corrected for attenuation using the acquired CT data. Image acquisition parameters are reported in Supplementary Table 1. The primary lung cancer lesions were delineated on PET images applying a fully automatic segmentation method, combining an automatic threshold-based algorithm to define the tumour volume and a k-means clustering algorithm to estimate the background [22]. CT target lesion was delineated using 3D Slicer application FastGrowCut, implementing a competitive region growing algorithm using cellular automata [23]. The images were resampled to have isometric voxels with a 2-mm length. Calculation of PET parameters was performed using an in-house image processing tool, running on MATLAB based on the radiomic model of Vallières et al. [24] and providing 60 features [25, 26]. The partial volume effect correction was used for standardised uptake value (SUV) calculation. CT radiomic features (n = 57) extraction was performed using the HeterogeneityCAD tool implemented in the 3D Slicer, according to Aerts et al. [27]. The different number of radiomic features for PET and CT resulted from the use of the two different processing tools. Image processing and calculation of image-derived parameters are reported in Supplementary Table 1, according to recommendations of the imaging biomarker standardisation initiative (IBSI) reporting guidelines [28].

Pathology

Histological type and staging classification were assessed according to good clinical practice on pathology samples obtained at surgery (AJCC manual). Fresh-frozen samples were collected and stored according to the Biological Biobank’s Institutional procedures for those patients who donated their tissue.

Molecular analyses (mutations and gene expression)

Molecular analysis of 74 tumour samples (21 SQC, 53 AC) was performed using a targeted RNAseq approach. Besides, six normal tissue samples were evaluated as reference. Molecular analyses by targeted RNAseq included (1) detection of genetic variants (both single nucleotide variants, SNVs, and small insertions/deletions, indels); (2) detection of fusion transcripts; and (3) gene expression analysis. RNA extraction from fresh-frozen lung tissues, preserved in RNALater-ICE (Thermo Fisher Scientific, Waltham, MA, USA), was performed using either an automated procedure with the Maxwell RSC miRNA Tissue kit (Promega, Madison, WI, USA) or a standard protocol using the Eurogold TriFast reagent (Euroclone, Wetherby, UK). RNA quality was assessed on an Agilent 4200 TapeStation (Agilent Technologies; Santa Clara, CA, USA), obtaining a mean RNA integrity number (RIN) of 6.7 (max: 9, min: 4). Libraries were prepared starting from 55 ng of total RNA with the TruSight RNA Pan-Cancer Panel (Illumina, San Diego, CA, USA), following the manufacturer’s instructions. This panel allows the simultaneous detection of fusion transcripts, point mutations, and gene expression changes, and it is characterised by a broad dynamic range, which can robustly detect RNAs of low abundance; it covers a total of 1385 genes, including all major ones found mutated in lung cancer. Sequencing (76-bp paired-end reads) was performed on a NextSeq500 platform (Illumina). Data were analysed using the RNAseq alignment v. 2.0.10 pipeline on BaseSpace (Illumina). Briefly, input reads were filtered against abundant sequences, such as mitochondrial or ribosomal sequences, using Bowtie 0.12.9, and then aligned to the reference human genome (UCSC hg19) and the RefSeq annotation of transcripts with the Spliced Transcripts Alignment to a Reference (STAR) program (v. 2.6.1a). SNVs were identified with the Strelka Variant Caller v.2.9.9, and the presence of fusion transcripts was detected with the Manta Structural Variant Caller v.1.4.0. Gene/transcript expression was quantified by Salmon v.0.11.2. Differential expression analysis among histotypes (SQC, AC, and normal tissue) was evaluated with a likelihood ratio test (LRT) for significance using the DeSeq2 Bioconductor package [29]. We looked for known and possibly recurrent oncogenic variants to extract meaningful information to correlate with imaging and clinical data. Therefore, we selected variants with the following features: SNV, nonsynonymous (missense, nonsense, splice variants mapping at ± 2 position of the splice sites), and annotated in the Clinvar database (https://www.ncbi.nlm.nih.gov/clinvar/) as pathogenic or likely pathogenic. Concerning indels, we selected rare frameshift variants occurring only in cancer samples, covered by at least 10 reads, and with a Combined Annotation Dependent Depletion, CADD, score > 15 [30].

Follow-up and outcome assessment

After surgery, further treatment and follow-up were performed according to standard procedures and guidelines after discussion at the multidisciplinary lung tumour board. As for outcome prediction, the endpoints of this study were disease recurrence, disease-free survival (DFS), and overall survival (OS). Disease recurrence was defined as relapse occurrence during follow-up. DFS was defined as the time between the date of surgery and either the date of recurrence or tumour-related death (event) or the date of last patient access (censored). OS was defined as the time between surgery and death (event) or last patient access date (censored).

Statistical analyses

All statistical procedures were carried out using specific R program packages, release 3.6.1 (http://www.r-project.org/). All P values were two-sided. P values of <0.05 were considered statistically significant.

Conventional statistics

Patient characteristics were summarised in frequency tables, and descriptive statistics were provided in terms of basic measures of central tendency (mean, median, and range) and count proportion for continuous and dichotomic variables, respectively. Furthermore, correlation between age and sex either with histotype and tumour recurrence was tested. Lastly, the correlation between the two outcomes was analysed. For KRAS/TP53/EGFR mutations, carrier frequency data were performed using the Fisher exact test. Principal component analysis (PCA) was used to explore data; the first two components were used to visualise a possible clustering. The Kaplan-Meier method was used to generate survival curves for the subgroups in each dataset, and the log-rank test was used to determine the statistical significance of differences (survminer R package). For data analysis, we first performed an unsupervised clustering, using the pheatmap R package on log-transformed data, evaluating whether patients are distinguishable in the feature space (either by histotype or the tumour recurrence). After that, we used variables to fit a binomial generalised linear model (GLM) with logistic cumulative distribution function after checking for the correct normal distribution of residuals and the homogeneity of variance across the fitted values of the model. We used either the histotype (SQC or AC) or the tumour recurrence (yes or no) as the output variable. More specifically, for gene expression analysis, differentially expressed genes were selected using a false discovery rate (FDR) ≤0.001, a fold change ≥2, and a minimum average expression (baseMean) of at least 50 counts. Differential expression analyses identified genes specifically altered in cancer status compared to normal tissues and genes specifically altered in each histotype (SQC and AC). Differential expression analysis to identify transcriptional signatures associated with tumour recurrence (yes or not) was evaluated with a Wald test for significance and the DeSeq2 package. In this case, differentially expressed genes were selected using an FDR ≤0.05 and no threshold for fold change. For radiogenomic analysis, we focused on the top differentially expressed genes, setting a stringent threshold for significance of FDR ≤0.001, fold change ≥2, and average expression (baseMean) among samples of at least 50 counts.

Machine learning analysis

In addition to classical statistics, we applied the Forecast environment of the Rulex (RULe eXtractor) 4.0 suite (www.rulex-inc.com), an integrated suite for the analysis through statistical/ML approaches. Rulex can manage all data types, including categorical/continuous variables, variables showing a high degree of correlation, or characterised by any data distribution. Three datasets were used for analyses: (1) a dataset including only complete radiomic data (149 patients; two additional patients were not included because they were missing either PET or CT data); (2) a dataset using only molecular data (gene expression and KRAS/TP53/EGFR mutational status), comprising 74 patients; and (3) a dataset including only those 73 patients for which both radiomic and molecular data were available. We applied the logic learning machine (LLM) algorithm to our datasets using as an output variable either the histotype (SQC or AC) or the patient outcome (tumour recurrence yes or no). Each dataset was split into a training cohort and a test set (70% and 30%, respectively). Rulex LLM takes as input the features and returns a series of “rules” characterised by n conditions. As typical for decision rules algorithms [31], rule’s performance was expressed as coverage rate and error. The covering is the percentage of training patterns whose output value is equal to the rule’s output that satisfy the rule (true positive), while the error is the percentage of the training patterns whose output value is different from the output of the rule that satisfies the rule (false positive). For each rule, accuracy and F1 were calculated. We used the radiogenomic rule (integrating gene expression and radiomic data), which best-predicted tumour recurrence to build a score, and we calculated the corresponding receiver operating characteristic (ROC) curve and the area under the curve (AUC). For each condition, the score corresponded to the sum of 1 or 0 points.

Results

Overall, 151 patients were included. Patient characteristics are reported in Table 1: 70% of patients developed an AC, 30% an SQC; the ratio between those that relapsed versus those that did not was exactly 1:1.
Table 1

Patient characteristics

CharacteristicsWhole dataset (N = 151)*Genetics (n = 74)**
Age—median (range)70 (41–84) years70 (41–80) years
Sex (M:F)95:5647:27
Histology (AC:SQC)106:4553:21
Smoking status (Yes:No:Ex-smokers)***42:31:7723:14:36
Outcome

Lost at follow-up

Relapse Yes:No

Follow-up/OS—median (range)

DFS

7/151

72:72

39 (1–102) months

44 (1–102) months

6/74

31:37

24 (3–79) months

40 (4–81) months

*The whole dataset consisted of 151 patients, for 2 of them either PET or CT data were missing and hence not included in the ML analysis. **This category indicates the subset of patients submitted to mutational and differential gene expression analyses (for 1 of them, we did not have radiomics data, and hence not included in the ML analysis of combined radiomics and transcriptomics data). ***For one person, we do not have data on smoking status. AC, adenocarcinoma; DFS, disease-free survival; F, females; M, males; OS, overall survival; SQC, squamous cell carcinoma

Patient characteristics Lost at follow-up Relapse Yes:No Follow-up/OS—median (range) DFS 7/151 72:72 39 (1–102) months 44 (1–102) months 6/74 31:37 24 (3–79) months 40 (4–81) months *The whole dataset consisted of 151 patients, for 2 of them either PET or CT data were missing and hence not included in the ML analysis. **This category indicates the subset of patients submitted to mutational and differential gene expression analyses (for 1 of them, we did not have radiomics data, and hence not included in the ML analysis of combined radiomics and transcriptomics data). ***For one person, we do not have data on smoking status. AC, adenocarcinoma; DFS, disease-free survival; F, females; M, males; OS, overall survival; SQC, squamous cell carcinoma No correlations were observed between patients’ age and the relapse status’s histotype (P = 0.078 and P = 0.538, respectively). As for sex, a weak correlation was observed with histotype (being the male sex more common among SQC, with a male to female ratio = 3.5, P = 0.016). Instead, no correlation was evident between sex and relapse. Finally, we observed a weak correlation between the histotype and the relapse status, with AC cases more prone to relapse than SQC patients (P = 0.032; the significance was also retained after correction for age, sex, and smoking status, P = 0.026) (Supplementary Figure 1). The two histotype patient groups did not show any significant difference in the overall survival rate nor their tendency to relapse (Supplementary Figure 2). Mutation screening by targeted RNA sequencing identified more than 130,500 variants in the 80 analysed samples (74 tumours + 6 normal), with a mean of 1632 variants per sample (1577 SNVs, 27 deletions, and 28 insertions). Each variant was covered on average by 162 reads (min: 3, max: 12,592). We obtained a list of the 142 topmost pathogenic variants, although very few of them were present in more than one sample. We found a missense KRAS mutation at codon 12 in 28% (21/74) of tumour samples and a TP53 mutation in 55% (41/74) of cases (Supplementary Figure 3). No known pathogenic variants or hotspot mutations were detected in EGFR, although eight samples carried rare nonsynonymous SNVs of uncertain significance (Supplementary Figure 4). Based on the variant type, location within the gene/protein, and predicted deleteriousness (evaluated by CADD, score > 15), at least 6 (75%) of them might represent a likely pathogenic mutation. No recurrent gene fusions in the ALK, ROS, and RET1 were detected. Gene expression analysis on RNAseq data on 80 samples (21 SQC, 53 AC, and 6 non-tumoural tissues) showed a prevalent clusterisation based on histotype at PCA, with normal tissues separating from tumour samples (Supplementary Figure 5). According to relapse status, no clusterisation was observed at least when considering the first two principal components (data not shown). Association of PET/CT radiomic data with histology and outcome A total of 60 PET and 57 CT radiomic features were extracted in 149 patients (see Table 1). Unsupervised hierarchical clustering, based on log-transformed data of extracted radiomic features, showed a good clusterisation based on histotype (Supplementary Figure 6). Conversely, the same analysis did not perform well when discriminating patients based on relapse (data not shown). PET and CT features were further independently analysed using GLM. GLM showed that among the 117 analysed features, two outperformed in discriminating AC vs SQC patients. SUV and kurtosis resulted in the best PET, and CT features, respectively, in predicting histology (Fig. 2a). SUV resulted higher in SQC than in AC (16.91 ± 7.92 and 10.13 ± 5.77, respectively; P = 2.89*10−6). Similarly, CT-derived kurtosis was greater in SQC than in AC (11.97 ± 11.17 and 3.81 ± 5.85, respectively; P = 5.49*10−6). Notably, both variables survived Bonferroni corrections for multiple testing (threshold corresponding to P = 0.00083 and P = 0.00088 for the SUV and the kurtosis, respectively). Conversely, PET and CT features poorly correlated with the relapse status. Indeed, the best performing PET and CT features in discriminating tumour recurrence were kurtosis (P = 0.035) and LRE (P = 0.0096), respectively, but neither kurtosis nor LRE survived correction for multiple testing (Fig. 2b).
Fig. 2

Top PET and CT features discriminating patients based on their lung cancer histotype or their tendency to relapse. The best discriminative features were identified using the generalised linear model approach to predict histology (a) and outcome (b). a Boxplots show standardised uptake value and kurtosis, the best performing PET and CT features, respectively, in discriminating cancer histotype. b Boxplots show kurtosis and LRE, the best performing PET and CT features, respectively, in discriminating tumour recurrence. Boxes define the interquartile range; thick central lines refer to the median. P values before Bonferroni correction are provided for each feature

Top PET and CT features discriminating patients based on their lung cancer histotype or their tendency to relapse. The best discriminative features were identified using the generalised linear model approach to predict histology (a) and outcome (b). a Boxplots show standardised uptake value and kurtosis, the best performing PET and CT features, respectively, in discriminating cancer histotype. b Boxplots show kurtosis and LRE, the best performing PET and CT features, respectively, in discriminating tumour recurrence. Boxes define the interquartile range; thick central lines refer to the median. P values before Bonferroni correction are provided for each feature The Rulex LLM software, comprehensively analysing all the 117 radiomic features, identified four and two rules to predict histology and tumour recurrence, respectively (Supplementary Table 2). Rule number 4—composed of 8 different conditions, three based on PET and five based on CT features—was the most interesting (Table 2), reaching coverage of 85.7% with an error rate of 3.6% (accuracy = 93%). Rule number 1 composed of 6 PET-based conditions reached an accuracy of 81% in predicting tumour recurrence (Table 2).
Table 2

Histotype and relapse predictions based on the best performing Rulex analysis results

RuleOutputCovering (%)*Error (%)**Accuracy (%)F1 (%)Cond 1Cond 2Cond 3Cond 4Cond 5Cond 6Cond 7Cond 8
Prediction based on radiomics
  A—Histotype
    4Histotype = SQC85.73.693.388.6Min PET > 1.312135Uniformity PET > 0.005588GLN_ GLRLM PET > 0.019649

Compactness_2

CT

≤ 0.179247

Min_Intensity

CT

≤ −309

Max_Intensity

CT

> 69

Cluster_Shade

CT > 124,107,462,960

0.887431 < SRE CT ≤ 0.936005
  B—Relapse
    1Relapse = NO66. 73.781.378.0

HISTO_Energy

PET

≤ 17,375

GLCM_Energy

PET

> 0.000853

SRLGE

PET ≤ 0.039183

SZLGE PET > 0.0026770.009156 < LZLGE PET ≤ 0.092522Complexity PET ≤ 11,892
Prediction based on mutation and gene expression data
  A—Histotype
    1Histotype = AC94.3095.997.1HIF1A ≤ 12.662616TP63 ≤ 7.748115
  B—Relapse
    4Relapse = YES91.7095.694.9AURKA > 6.211039HIST1H2AM ≤ 9.677214IL12RB2 ≤ 7.370931CXXC4 > 1.892740RYR3 ≤ 5.160035
Prediction based on radiogenomics
  A—Histotype
    1Histotype = AC92.34.893.295.1

HIF1A

≤ 12.717780

TP63

≤ 7.853678

  B—Relapse
    4Relapse = YES73.3088.285.2CXXC4 > 1.871573

GHR

≤ 4.706566

2.614300 < PAK3 ≤ 9.111535

LRHGE_PET

> 853

AC adenocarcinoma, Cond condition, SQC squamous cell carcinoma

Association of genomic data with histology and outcome Histotype and relapse predictions based on the best performing Rulex analysis results Compactness_2 CT ≤ 0.179247 Min_Intensity CT ≤ −309 Max_Intensity CT > 69 Cluster_Shade CT > 124,107,462,960 HISTO_Energy PET ≤ 17,375 GLCM_Energy PET > 0.000853 SRLGE PET ≤ 0.039183 HIF1A ≤ 12.717780 TP63 ≤ 7.853678 GHR ≤ 4.706566 LRHGE_PET > 853 AC adenocarcinoma, Cond condition, SQC squamous cell carcinoma Mutation and gene expression data on the 74 cases were used to search for possible correlations with histotype and the relapse status. Genes specifically altered by relapse were investigated on the 68 samples with available data (31 relapses, 37 no relapse). Only four genes resulted differentially expressed with an FDR ≤0.05. Mutation analysis focused on the two genes most commonly mutated in our cohort (KRAS, TP53) and the EGFR gene, i.e. the sole—among genes guiding treatment decision in lung cancer patients [3, 4]—being mutated in the analysed cases. Concerning histotype, we evidenced a profound difference in the frequency of KRAS mutation carriers between AC and SQC (37.7% and 4.8%, respectively; P = 0.0040) (Table 3A). We also observed a significant association between KRAS/EGFR mutations and relapse status, not retained after correction for histotype (values between parentheses in Table 3B). Gene expression analysis focused on the 238 genes that resulted differentially expressed (at FDR ≤0.001 and a fold change ≥2) according to different tissue samples (normal tissue, AC, SQC; 187 genes), tumour histotypes (47 genes), and recurrent status (4 genes). Among these 238 genes, TP63, FBN2, EPHA10, and IL1RAP emerged as strongly associated with the histotype (P < 1.5*10−4; all genes survive the Bonferroni corrections for multiple tests) (Fig. 3).
Table 3

Association between KRAS/TP53/EGFR mutational status and histotype/relapse

Non-carriers (N)Carriers (N)P value *
A—Histotype
  KRAS
    AC33200.004
    SQC201
  TP53
    AC24290.80
    SQC912 (1 case with 2 mutations)
  EGFR
    AC4850.43
    SQC183
B—Relapse **
  KRAS
    YES1813

0.029

(0.086)

    NO316
  TP53
    YES1516

0.80

(1)

    NO1621 (1 case with 2 mutations)
  EGFR
    YES310

0.028

(0.17)

    NO316

*Fisher exact test. **Analysis performed on a total of 68 cases; in this analysis, the P values presented in parenthesis are corrected for the histotype. Significant P values are indicated in bold

Fig. 3

Top differentially expressed genes discriminating patients based on their lung cancer histotype. The four boxplots show mRNA expression levels of TP63, FBN2, EPHA10, and IL1RAP genes, with lung cancer individuals grouped upon histotype. Boxes define the interquartile range; thick central lines refer to the median. The P value for the difference is indicated (t-test; the threshold for Bonferroni correction for multiple testing corresponding to P = 0.00021)

Association between KRAS/TP53/EGFR mutational status and histotype/relapse 0.029 (0.086) 0.80 (1) 0.028 (0.17) *Fisher exact test. **Analysis performed on a total of 68 cases; in this analysis, the P values presented in parenthesis are corrected for the histotype. Significant P values are indicated in bold Top differentially expressed genes discriminating patients based on their lung cancer histotype. The four boxplots show mRNA expression levels of TP63, FBN2, EPHA10, and IL1RAP genes, with lung cancer individuals grouped upon histotype. Boxes define the interquartile range; thick central lines refer to the median. The P value for the difference is indicated (t-test; the threshold for Bonferroni correction for multiple testing corresponding to P = 0.00021) None among the analysed genes was associated below the Bonferroni threshold (P = 0.00021) with the relapse status. The comprehensive analysis performed by the Rulex LLM approach focused on all the 238 genes, and the data on KRAS/TP53/EGFR mutational status are summarised in Supplementary Table 3. Rulex LLM proved to be very powerful in predicting the histotype (rule number 1 reached a coverage =94.3% with 95.9% accuracy) and, above all, in predicting the outcome (rule number 4 reached a coverage = 91.7% with 95.6% accuracy). In all cases, no conditions related to the mutational status emerged. The best-performing rule predicting the histotype was based only on two conditions (one related to expression levels of TP63), whereas the best rule predicting relapse stem from the expression levels of 5 different genes (AURKA, HIST1H2AM, IL12Rb2, CXXC4, and RYR3) as detailed in Table 2. Association of radiogenomic with histology and outcome Finally, we used the Rulex LLM approach for analysing the 73 cases having the entire set of variables available (i.e. all PET and CT features) and data on KRAS/TP53/EGFR mutational status and on 238 differentially expressed genes. Results of radiogenomics analysis are detailed in Supplementary Table 4. Interestingly, using histotype as an output variable results almost entirely overlapping with those already obtained for the predictions based on genomic data (Supplementary Table 2A). The slightly different covering values depended on the missing sample. The best-performing rule in predicting the outcome (covering rate = 73.3% with 88.2% accuracy) combined conditions based on gene expression data and a PET-derived feature (i.e. LRHGE) as detailed in Table 2. This rule’s ROC curve resulted in an area under the curve of 0.87 (Fig. 4).
Fig. 4

Clusterisation of relapsing/non-relapsing patients based on the best-performing prediction rule evidenced by the Rulex LLM analysis. On the left: three-dimensional scatter plot of patients experiencing (blue dots) or not (red dots) relapse. Patients were plotted based on the three genes’ expression levels evidenced by the Rulex LLM analysis (determining the first three conditions of rule number 4; see Table 2). On the right: ROC curve for differentiating relapsing and non-relapsing patients based on a “score” including the expression levels of the CXXC4, PAK3, and GHR genes, as well as on the radiomic parameter LRHGE_PET. For each patient, the score was built summing, for each of the four conditions of the rule (Table 2), 1 or 0 points. At the bottom rich corner of the ROC panel, the AUC value is reported

Clusterisation of relapsing/non-relapsing patients based on the best-performing prediction rule evidenced by the Rulex LLM analysis. On the left: three-dimensional scatter plot of patients experiencing (blue dots) or not (red dots) relapse. Patients were plotted based on the three genes’ expression levels evidenced by the Rulex LLM analysis (determining the first three conditions of rule number 4; see Table 2). On the right: ROC curve for differentiating relapsing and non-relapsing patients based on a “score” including the expression levels of the CXXC4, PAK3, and GHR genes, as well as on the radiomic parameter LRHGE_PET. For each patient, the score was built summing, for each of the four conditions of the rule (Table 2), 1 or 0 points. At the bottom rich corner of the ROC panel, the AUC value is reported

Discussion

Our study explored the ability of radiomics, genomics, and radiogenomics to provide clinically relevant information in lung cancer patients using explainable models. Indeed, both the binomial generalised linear model and the Rulex/LLM are recognised as interpretable approaches [16]. Rulex/LLM may be used with different complementary purposes including automatic data discovery, model building, and self-explanatory predictions. Specifically, the LLM produces conditional logic-base predictive models with the advantage to be fully explainable (https://www.rulex.ai/rulex-algorithms/). We did not find any radiomic feature, mutational status, or gene expression profile to predict relapse per se using conventional statistics. However, radiomic features and gene expression profile resulted in predicting tumour recurrence when introduced in decision rules identified using ML. Specifically, the use of Rulex/LLM allowed (i) to confirm the central role of SUVmax in predicting histology which was empowered by the addition of radiomic parameters; (ii) to identify a combination of relevant imaging-derived features in predicting outcome confirming the higher discriminative power of PET compared to CT-based biomarkers; and (iii) to detect a gene expression pattern associated to lung cancer outcome. Interpretable models, revealing new insights on a disease, are an essential requirement for translating radiomics and artificial intelligence to clinical practice. Additionally, the implementation of ML into clinical practice requires the integration of the radiomics and ML systems into applications that imagers and clinicians use in their routine practice. Radiomics workflow is time-consuming and requires additional software. Once robust evidence emerges, radiomics/ML models should be promptly implemented into clinical workstations to support everyday practice. Gene expression data—analysed by Rulex LLM—proved to be very powerful both in predicting the histotype (rule number 1 reached a coverage =94.3% with 95.9 accuracy) and, above all, in predicting the outcome (rule number 4 reached a coverage =91.7% with 95.6 accuracy). The best-performing genomic rule in predicting relapse stem from the expression levels of AURKA, HIST1H2AM, IL12Rb2, CXXC4, and RYR3. The expression of AURKA—a gene that contributes to the regulation of cell cycle progression [32]—has been reported to be associated with poor prognosis in smoking-related lung AC [33]. Histone variants HIST1H2, HIST1H3, and HIST1H4, acting as transcriptional promoters or repressors of cancer-related genes, have been reported to be involved in tumour progression and metastasis [34]. In particular, in lung AC, HIST1H2AM has been counted among genes differentially expressed between smoking and non-smoking [35]. Our data suggested that the smoking habits induced changes in AURKA and HIST1H2AM genes, and their upregulation or downregulation might play a pivotal role in determining the outcome. Literature data supported the involvement of IL-12Rβ2 in tumour cell proliferation, apoptosis, and metastasis. The downregulation of IL-12Rβ2 in lung AC seems to be a tumour escape mechanism [36], and the IL-12Rβ2 expression has been negatively associated with tumour progression [37]. The Dishevelled (Dvl) inhibitor Idax, coded by the CXXC4 gene [38], seems to be involved in tumour cell invasiveness and proliferation [39, 40], and its expression is associated with poor prognosis [41]. Furthermore, CXXC4, being capable of inhibiting the mitogen-activated protein kinases (MAPK) signalling pathway [42], is emerging as a novel tumour suppressor [43]. Moreover, the downregulation of the MAPK signalling pathway reduces the expression of programmed cell death 1 ligand 1 (PD-L1) in lung AC cells [44]. The RYR3 gene encodes for a ryanodine receptor which mediates the calcium release for many cellular processes. Ryanodine receptor has been reported to be involved in epithelial-mesenchymal transition, in tumour cell apoptosis and treatment resistance in some cancers, including lung AC [45-47]. These data may provide the rationale for postoperative risk stratification with a differential follow-up scheme. The patients operated on a more aggressive tumour may be closely investigated during follow-up to identify recurrence at an earlier time point. However, molecular testing to identify molecular biomarkers are currently performed on tumour samples collected from biopsies or cytological specimens; these are invasive procedures that are not always feasible, may result in inadequate sampling, and cannot characterise intra- and inter-tumour heterogeneity. Moreover, in the case of recurrence, repetition of a biopsy is not mandatory. Indeed, targeted therapies may be administered based on the molecular testing on the specimens obtained at diagnosis, assuming that no molecular modification arises between disease onset and recurrence [48]. Consequently, other methods to identify actionable biomarkers in NSCLC are emerging to address the need for complementing or replacing traditional testing on tissue and cytological samples. We found that image-derived features were able to discriminate between NSCLC histotype (Fig. 2). Similarly, several studies have successfully demonstrated an association between radiomic features and NSCLC tumour histology based on both CT and PET radiomic features. In the study by Wu et al. [49], 53 CT radiomic features from lung tumours of 350 patients were significantly associated with tumour histological subtype. Applying multivariate classifiers using radiomic features as input tumour histological subtype could be reliably predicted (AUC = 0.72) [49]. While Aerts et al. [27] reported a radiomic analysis of 440 features extracted from CT data of 1019 patients affected by NSCLC and head-and-neck cancer, they found a significant association with histology (P = 0.019, chi-square test). While in the study by Koyasu et al., the authors found PET-based models to identify histological lung cancer subtype with an AUC up to 0.84 [50]. In a previous cohort of 534 patients with lung nodules, we have demonstrated radiomic features’ ability to potentially classify primary lung cancer subtypes (AUC = 0.59–0.70 for CT and = 0.61–0.88 for PET) [51]. SUV appeared to be the best predictor in the present work, following published data [52], regardless of the statistical approach (i.e. conventional statistics and ML). Indeed, SUVmax was recognised by GLM as significant, and minimum SUV was included in the best radiomic rule to predict histology. Randomness due to high correlation and the inherent redundancy among SUVs parameters was probably at the basis of the model’s selection of SUVmax instead of the minimum (selected by GLM and ML, respectively). On the other hand, from the Rulex LLM analysis on radiogenomics, it emerged that gene expression data alone prevail on those coming from [18F]FDG PET/CT analyses in predicting histology. Interestingly, the best-performing rule in predicting the outcome (covering rate > 73%) combined conditions based on gene expression data and PET-derived feature (i.e. LRHGE). Through this rule—less performing than the one found when analysing only gene expression data (Table 2)—it was noteworthy to underline that (i) the number of conditions for this rule was lower, (ii) we “forced” the software to give priority to radiomic features (which indeed only came up for relapse predictions), and (iii) the corresponding ROC curve gave an overall significant AUC of 0.87. We identified several genes’ levels of expression (e.g. TP63 and EPHA10) to be associated with AC vs SQC. In particular, TP63 overexpression, often due to gene amplification, is frequently found in SQC and has been associated with prolonged survival [53]. EPHA10 (the ephrin receptor A10 belonging to the subfamily of receptor tyrosine kinases and involved in cell-cell communication, regulating cell attachment, shape, and mobility in neuronal and epithelial cells), primarily explored in breast cancers [54], has been recently described in lung cancer [55]. Its mutation has been newly described as associated with a favourable outcome in lung adenocarcinoma—but not in squamous cell lung cancer—treated with immune checkpoint inhibitors [56]. This finding needs further investigation to characterise better this protein’s role in lung cancer as a prognostic biomarker or potential therapeutic target. For relapse prediction, PET and CT feature failed to predict tumour recurrence when analysed with classical statistics. We found a couple of rules through the Rulex LLM approach to correctly prognosticate the outcome in a good percentage of cases (61–67% of covering with 78–81% accuracy). These findings underlined that a radiomic feature might efficiently differentiate tumour subtypes, but it is not sufficient to explain the complexity (i.e. biological phenotyping). Conversely, combining more radiomic features, enclosing several—complementary—information, may summarise all those biological properties beyond the histotype that contributes to disease aggressiveness. Notably, the best-performing rule to predict tumour relapse comprised a total of 6 conditions all related to PET features. The inclusion of conditions related to CT-derived parameters determined a drop of performances, confirming literature data [57]. The study is limited by the retrospective design that determined using routinely acquired PET/CT images on two different scanners, but the number of images obtained using one scanner was well balanced compared to those acquired using the other one (52% versus 48%). Moreover, in our previous study, we demonstrated radiomics analysis’s reliability since the predictive models performed in the same way when considering and not considering significantly different features among scanners [51]. The outcome has been evaluated in terms of occurrence or not of disease. This evaluation may have determined a modest prognostic ability. Future analyses will take into account time information related to recurrence and overall survival. We could not perform further analyses on independent tissue samples to confirm and deeply investigate the role of gene expression findings of our work because it was out of the scope. Future studies are planned to investigate the role of these genes as molecular biomarkers and targets. In conclusion, the radiogenomic approach promises to extract relevant information regarding lung cancer histotype, aggressiveness, and progression. Gene expression may provide additional valuable information to guide patient management and follow-up. ML algorithms’ application allows to increase the efficacy of transcriptomic and radiogenomic analysis and provides novel insights into cancer biology. (DOCX 1197 kb).
  48 in total

1.  The close association between IL‑12Rβ2 and p38MAPK, and higher expression in the early stages of NSCLC, indicates a good prognosis for survival.

Authors:  Zhaoguo Liu; Weilin Yang; Shibin Yang; Kaican Cai
Journal:  Mol Med Rep       Date:  2018-06-22       Impact factor: 2.952

2.  Ability of FDG PET and CT radiomics features to differentiate between primary and metastatic lung lesions.

Authors:  Margarita Kirienko; Luca Cozzi; Alexia Rossi; Emanuele Voulaz; Lidija Antunovic; Antonella Fogliata; Arturo Chiti; Martina Sollini
Journal:  Eur J Nucl Med Mol Imaging       Date:  2018-04-06       Impact factor: 9.236

3.  Overexpressed gene signature of EPH receptor A/B family in cancer patients-comprehensive analyses from the public high-throughput database.

Authors:  Nam Nhut Phan; Shirui Liu; Chih-Yang Wang; Hui-Ping Hsu; Ming-Derg Lai; Chung-Yen Li; Chien-Fu Chen; Chung-Chieh Chiao; Meng-Chi Yen; Zhengda Sun; Jia-Zhen Jiang
Journal:  Int J Clin Exp Pathol       Date:  2020-05-01

4.  WIF1, a component of the Wnt pathway, is down-regulated in prostate, breast, lung, and bladder cancer.

Authors:  Christoph Wissmann; Peter Johannes Wild; Simone Kaiser; Stefan Roepcke; Robert Stoehr; Matthias Woenckhaus; Glen Kristiansen; Jen-Chih Hsieh; Ferdinand Hofstaedter; Arndt Hartmann; Ruth Knuechel; André Rosenthal; Christian Pilarsky
Journal:  J Pathol       Date:  2003-10       Impact factor: 7.996

Review 5.  FDG PET-CT for solitary pulmonary nodule and lung cancer: Literature review.

Authors:  D Groheux; G Quere; E Blanc; C Lemarignier; L Vercellino; C de Margerie-Mellon; P Merlet; S Querellou
Journal:  Diagn Interv Imaging       Date:  2016-08-24       Impact factor: 4.026

6.  Usefulness of gradient tree boosting for predicting histological subtype and EGFR mutation status of non-small cell lung cancer on 18F FDG-PET/CT.

Authors:  Sho Koyasu; Mizuho Nishio; Hiroyoshi Isoda; Yuji Nakamoto; Kaori Togashi
Journal:  Ann Nucl Med       Date:  2019-10-28       Impact factor: 2.668

Review 7.  PET Radiomics in NSCLC: state of the art and a proposal for harmonization of methodology.

Authors:  M Sollini; L Cozzi; L Antunovic; A Chiti; M Kirienko
Journal:  Sci Rep       Date:  2017-03-23       Impact factor: 4.379

8.  Phosphoproteomics Reveals HMGA1, a CK2 Substrate, as a Drug-Resistant Target in Non-Small Cell Lung Cancer.

Authors:  Yi-Ting Wang; Szu-Hua Pan; Chia-Feng Tsai; Ting-Chun Kuo; Yuan-Ling Hsu; Hsin-Yung Yen; Wai-Kok Choong; Hsin-Yi Wu; Yen-Chen Liao; Tse-Ming Hong; Ting-Yi Sung; Pan-Chyr Yang; Yu-Ju Chen
Journal:  Sci Rep       Date:  2017-03-14       Impact factor: 4.379

9.  The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping.

Authors:  Alex Zwanenburg; Martin Vallières; Mahmoud A Abdalah; Hugo J W L Aerts; Vincent Andrearczyk; Aditya Apte; Saeed Ashrafinia; Spyridon Bakas; Roelof J Beukinga; Ronald Boellaard; Marta Bogowicz; Luca Boldrini; Irène Buvat; Gary J R Cook; Christos Davatzikos; Adrien Depeursinge; Marie-Charlotte Desseroit; Nicola Dinapoli; Cuong Viet Dinh; Sebastian Echegaray; Issam El Naqa; Andriy Y Fedorov; Roberto Gatta; Robert J Gillies; Vicky Goh; Michael Götz; Matthias Guckenberger; Sung Min Ha; Mathieu Hatt; Fabian Isensee; Philippe Lambin; Stefan Leger; Ralph T H Leijenaar; Jacopo Lenkowicz; Fiona Lippert; Are Losnegård; Klaus H Maier-Hein; Olivier Morin; Henning Müller; Sandy Napel; Christophe Nioche; Fanny Orlhac; Sarthak Pati; Elisabeth A G Pfaehler; Arman Rahmim; Arvind U K Rao; Jonas Scherer; Muhammad Musib Siddique; Nanna M Sijtsema; Jairo Socarras Fernandez; Emiliano Spezi; Roel J H M Steenbakkers; Stephanie Tanadini-Lang; Daniela Thorwarth; Esther G C Troost; Taman Upadhaya; Vincenzo Valentini; Lisanne V van Dijk; Joost van Griethuysen; Floris H P van Velden; Philip Whybra; Christian Richter; Steffen Löck
Journal:  Radiology       Date:  2020-03-10       Impact factor: 29.146

10.  Assessment of gene expression of intracellular calcium channels, pumps and exchangers with epidermal growth factor-induced epithelial-mesenchymal transition in a breast cancer cell line.

Authors:  Felicity M Davis; Michelle T Parsonage; Peter J Cabot; Marie-Odile Parat; Erik W Thompson; Sarah J Roberts-Thomson; Gregory R Monteith
Journal:  Cancer Cell Int       Date:  2013-07-29       Impact factor: 5.722

View more
  9 in total

Review 1.  Radiomics in Oncological PET Imaging: A Systematic Review-Part 1, Supradiaphragmatic Cancers.

Authors:  David Morland; Elizabeth Katherine Anna Triumbari; Luca Boldrini; Roberto Gatta; Daniele Pizzuto; Salvatore Annunziata
Journal:  Diagnostics (Basel)       Date:  2022-05-27

Review 2.  What Genetics Can Do for Oncological Imaging: A Systematic Review of the Genetic Validation Data Used in Radiomics Studies.

Authors:  Rebeca Mirón Mombiela; Anne Rix Arildskov; Frederik Jager Bruun; Lotte Harries Hasselbalch; Kristine Bærentz Holst; Sine Hvid Rasmussen; Consuelo Borrás
Journal:  Int J Mol Sci       Date:  2022-06-10       Impact factor: 6.208

3.  A Machine Learning Model Based on PET/CT Radiomics and Clinical Characteristics Predicts Tumor Immune Profiles in Non-Small Cell Lung Cancer: A Retrospective Multicohort Study.

Authors:  Haipeng Tong; Jinju Sun; Jingqin Fang; Mi Zhang; Huan Liu; Renxiang Xia; Weicheng Zhou; Kaijun Liu; Xiao Chen
Journal:  Front Immunol       Date:  2022-04-29       Impact factor: 8.786

Review 4.  Role of Artificial Intelligence in Radiogenomics for Cancers in the Era of Precision Medicine.

Authors:  Sanjay Saxena; Biswajit Jena; Neha Gupta; Suchismita Das; Deepaneeta Sarmah; Pallab Bhattacharya; Tanmay Nath; Sudip Paul; Mostafa M Fouda; Manudeep Kalra; Luca Saba; Gyan Pareek; Jasjit S Suri
Journal:  Cancers (Basel)       Date:  2022-06-09       Impact factor: 6.575

5.  A Novel Gene Signature Associated With "E2F Target" Pathway for Predicting the Prognosis of Prostate Cancer.

Authors:  Haoran Xia; Miaomiao Wang; Xiaonan Su; Zhengtong Lv; Qiuxia Yan; Xiaoxiao Guo; Ming Liu
Journal:  Front Mol Biosci       Date:  2022-04-13

6.  Clinical parameters combined with radiomics features of PET/CT can predict recurrence in patients with high-risk pediatric neuroblastoma.

Authors:  Lijuan Feng; Luodan Qian; Shen Yang; Qinghua Ren; Shuxin Zhang; Hong Qin; Wei Wang; Chao Wang; Hui Zhang; Jigang Yang
Journal:  BMC Med Imaging       Date:  2022-05-28       Impact factor: 2.795

7.  A Novel Radiogenomics Biomarker Based on Hypoxic-Gene Subset: Accurate Survival and Prognostic Prediction of Renal Clear Cell Carcinoma.

Authors:  Jiahao Gao; Fangdie Ye; Fang Han; Xiaoshuang Wang; Haowen Jiang; Jiawen Zhang
Journal:  Front Oncol       Date:  2021-10-07       Impact factor: 6.244

8.  Diagnostic Accuracy of Deep Learning and Radiomics in Lung Cancer Staging: A Systematic Review and Meta-Analysis.

Authors:  Xiushan Zheng; Bo He; Yunhai Hu; Min Ren; Zhiyuan Chen; Zhiguang Zhang; Jun Ma; Lanwei Ouyang; Hongmei Chu; Huan Gao; Wenjing He; Tianhu Liu; Gang Li
Journal:  Front Public Health       Date:  2022-07-18

Review 9.  Monitoring of Current Cancer Therapy by Positron Emission Tomography and Possible Role of Radiomics Assessment.

Authors:  Noboru Oriuchi; Hideki Endoh; Kyoichi Kaira
Journal:  Int J Mol Sci       Date:  2022-08-20       Impact factor: 6.208

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.