| Literature DB >> 33977113 |
Nova F Smedley1,2, Denise R Aberle1,2, William Hsu1,2,3.
Abstract
Purpose: Integrative analysis combining diagnostic imaging and genomic information can uncover biological insights into lesions that are visible on radiologic images. We investigate techniques for interrogating a deep neural network trained to predict quantitative image (radiomic) features and histology from gene expression in non-small cell lung cancer (NSCLC). Approach: Using 262 training and 89 testing cases from two public datasets, deep feedforward neural networks were trained to predict the values of 101 computed tomography (CT) radiomic features and histology. A model interrogation method called gene masking was used to derive the learned associations between subsets of genes and a radiomic feature or histology class [adenocarcinoma (ADC), squamous cell, and other].Entities:
Keywords: deep learning; gene expression; model interpretability; non-small cell lung cancer; radiogenomics
Year: 2021 PMID: 33977113 PMCID: PMC8105647 DOI: 10.1117/1.JMI.8.3.031906
Source DB: PubMed Journal: J Med Imaging (Bellingham) ISSN: 2329-4302
Patient characteristics, estimated or replicated from the source.
| Training ( | Testing ( | ||
|---|---|---|---|
| Source | Country | USA | Netherlands |
| Gender | Male | 100 (38%) | 59 (66%) |
| Female | 124 (47%) | 28 (32%) | |
| N/A | 38 (15%) | 2 (2%) | |
| Histology | ADC | 129 (49%) | 42 (48%) |
| Squamous | 61 (23%) | 33 (38%) | |
| Other | 34 (13%) | 12 (14%) | |
| N/A | 38 (15%) | 0 (0%) | |
| Stage | I | 123 (47%) | 39 (44%) |
| II | 35 (13%) | 26 (29%) | |
| III | 46 (18%) | 12 (14%) | |
| Other | 20 (8%) | 10 (11%) | |
| N/A | 38 (14%) | 2 (2%) | |
| Smoking status | Current | 66 (25%) | N/A |
| Former | 141 (54%) | N/A | |
| None | 17 (7%) | N/A | |
| N/A | 38 (14%) | N/A | |
| Tumor site | Primary | 224 (86%) | 87 (98%) |
| N/A | 38 (14%) | 2 (2%) | |
| Status | Alive | 134 (51%) | 41 (46%) |
| Deceased | 90 (35%) | 46 (52%) | |
| N/A | 38 (14%) | 2 (2%) | |
| Follow-up | Median months | 32 | 31 |
Fig. 1An overview of this study’s approaches to (a) training and (b) interpretation radiogenomic neural networks.
Fig. 2The architecture and hyperparameter tuning of a radiogenomic neural network.
NSCLC radiogenomic models and hyperparameters. Grid search was used for selecting hyperparameters.
| Model type | # Models | Hyperparameter | Values |
|---|---|---|---|
| Logistic regression | 100 | Penalty type | Elastic net |
| L1 ratio | [0–1] | ||
| Solver | SAGA | ||
| Support vector machines | 400 | Kernel | Linear, poly, rbf, sigmoid |
| C penalty | |||
| Random forest | 120 | Trees | |
| Split criterion | Gini, entropy | ||
| Max. features | |||
| Max. depth | None | ||
| Gradient boosted trees | 150 | Trees | |
| Max. depth | [1–3] | ||
| Learning rate | |||
| Neural network | 48 | Hidden nodes | [6000–250] |
| Hidden layers | 3, 4 | ||
| Optimizer | Nadam, Adadelta | ||
| Activation | Sigmoid, relu | ||
| Dropout | 0.4, 0.6 | ||
| Loss | Binary cross-entropy | ||
| Epochs (patience) | 500 (200) | ||
| Batch | 10 |
G: number of genes in the transcriptome
First hidden layers were either 2000, 4000, or 6000 nodes, where the number of hidden nodes for a layer was halved with each subsequent layer.
Fig. 3The ability of models to predict NSCLC histology and stage in (a) training and (b) testing. In training, models were evaluated using 10-fold CV, and models were compared using the mean AUC scores in CV. The top performing model was then retrained on the full training dataset and evaluated on the testing dataset. The testing performance scores are shown for each histology type and stage. (c)–(e) Gene masking of the histology neural network using gene sets from (c) published gene signatures for histology,, (d) Hallmark (top five out of 50), and (e) Gene Ontology biological processes (GO.bp, top ten out of 7350). In (c)–(e), each column is a type of histology and each row is a gene set used to mask the trained model to inspect how well the model predicted a certain histology type. The color in a cell shows the model’s performance using a gene set to predict a histology type, where red denotes higher AUCs and purple denotes lower AUCs in the testing dataset. (b) The ability of the histology model to predict each histology type and the AUC score is based on using all genes in the gene expression profile. (c)–(e) The ability of a specific gene set in predicting a histology type. For more details on the gene sets used, see Sec. 2. Notation: lcc = gene signature for large cell carcinoma, scc = gene signature for squamous cell carcinoma, adc = gene signature for adenocarciomas, adc versus scc = gene signature for differentiating ADC from SCC.
Summary of predictive gene expression patterns in NSCLC histology.
| Histology | Transcriptomic pattern | Test | |||
|---|---|---|---|---|---|
| Theme | Gene set | Source | AUC | AP | |
| ADC | Synthesis | Phosphatidylcholine biosynthetic process | G | 0.94 | 0.93 |
| Phosphatidic acid biosynthetic process | G | 0.94 | 0.86 | ||
| Transcription | Neg. regulation of dna-binding transcription factor activity | G | 0.93 | 0.92 | |
| Vasculature | Neg. regulation of sprouting angiogenesis | G | 0.93 | 0.92 | |
| Coagulation | H | 0.91 | 0.86 | ||
| Cell development | Retina development in camera-type eye | G | 0.93 | 0.87 | |
| Cell death | Hypoxia | H | 0.91 | 0.88 | |
| KRAS | KRAS signaling down | H | 0.91 | 0.84 | |
| UV | UV response down | H | 0.92 | 0.90 | |
| E2F | E2F targets | H | 0.92 | 0.87 | |
| Squamous cell | Cell development | Metencephalon development | G | 0.95 | 0.90 |
| Carcinoma | Differentiation | Epidermal cell differentiation | G | 0.94 | 0.94 |
| Neuron fate commitment | G | 0.92 | 0.87 | ||
| Catabolism | Neg. regulation of cellular catabolic process | G | 0.94 | 0.88 | |
| Cell death | Cornification | G | 0.94 | 0.93 | |
| Hypoxia | H | 0.91 | 0.90 | ||
| KRAS | KRAS signaling down | H | 0.93 | 0.93 | |
| Hormone | Estrogen response late | H | 0.92 | 0.91 | |
| Cholesterol | Cholesterol homeostasis | H | 0.92 | 0.83 | |
| Other | Transport | Posttranslational protein targeting to membrane, translocation | G | 0.86 | 0.35 |
| Regulation of sodium ion transmembrane transporter activity | G | 0.84 | 0.54 | ||
| AMPA receptor | Regulation of AMPA receptor activity | G | 0.85 | 0.44 | |
| Cell cycle | Mitotic nuclear envelope reassembly | G | 0.84 | 0.50 | |
| Ubiquitination | Neg. regulation of protein K63-linked ubiquitination | G | 0.84 | 0.38 | |
| Immune system | Inflammatory response | H | 0.73 | 0.37 | |
| Vasculature | Angiogenesis | H | 0.72 | 0.35 | |
Gene sets were categorized by comparing the top five gene sets in GO and Hallmark with at least 0.70 test AUC.
from MSigDB v7.0, where = Hallmark, = Gene Ontology, neg.= negative.
Fig. 4Radiogenomic modeling performance (a) between neural networks and other models in the training dataset. Neural network performance (b) in the training and testing datasets for the 13 radiomic features selected for further analysis. Train scores represent the averaged scores of the validation folds during 10-fold CV in the training dataset. The test scores are the model’s performance in the testing dataset after models were retrained on the full training dataset.
Fig. 5Gene masking of the radiogenomics models with biological processes from GO. The top three gene sets ranked by test AUC for each radiomic feature are shown.
Summary of predictive gene expression patterns in NSCLC radiomic features.
| Radiomic feature | Transcriptomic pattern | test | ||
|---|---|---|---|---|
| Theme | Gene set (from GO) | AUC | AP | |
| Stats_skewness | Cytoskeleton | Reg. of actin filament-based process | 0.82 | 0.82 |
| Adhesion | Neg. Reg. off cell adhesion | 0.81 | 0.81 | |
| Neg. Reg. of cell-cell adhesion | 0.80 | 0.77 | ||
| Immune system | Reg. of hemopoiesis | 0.81 | 0.78 | |
| Reg. of leukocyte differentiation | 0.80 | 0.77 | ||
| Stats_rms | Transport | Reg. of release of sequestered calcium ion into cytosol | 0.95 | 0.65 |
| Sequestering of calcium ion | 0.93 | 0.30 | ||
| Development | Muscle organ development | 0.93 | 0.46 | |
| striated muscle cell differentiation | 0.93 | 0.32 | ||
| Actin filament-based movement | 0.92 | 0.26 | ||
| LoG_stats_std | Post-translational | Post-translational protein modification | 1.00 | 1.00 |
| Development | Epidermis development; epidermal cell differentiation | 1.00 | 1.00 | |
| DNA repair | 26-cm DNA double-strand break processing involved in repair via single-strand annealing | 0.99 | 0.79 | |
| Cell cycle | neg. reg. of mitotic cell cycle | 0.99 | 0.79 | |
| LoG_stats_uniformity | Cell development | Liver regeneration | 0.78 | 0.64 |
| Epithelial tube morphogenesis | 0.77 | 0.49 | ||
| Transport | Protein transmembrane transport | 0.77 | 0.64 | |
| Intracellular protein transmembrane transport | 0.76 | 0.61 | ||
| Catabolism | Organic acid catabolic process | 0.75 | 0.54 | |
| LoG_stats_entropy | Localization | Establishment of organelle localization | 0.82 | 0.47 |
| Pos. reg. of protein localization to membrane | 0.79 | 0.60 | ||
| Heart rate | Reg. of heart rate by cardiac conduction | 0.77 | 0.40 | |
| Cell mobility | Reg. of actin filament-based process | 0.77 | 0.46 | |
| External stimulus | Cellular response to mechanical stimulus | 0.77 | 0.44 | |
| LoG_stats_kurtosis | Connective tissue | Elastin metabolic process | 0.73 | 0.84 |
| Collagen metabolic process | 0.72 | 0.85 | ||
| Synthesis | Pos. reg. of receptor biosynthetic process | 0.73 | 0.81 | |
| Pos. reg. of hormone biosynthetic process | 0.73 | 0.84 | ||
| Immune system | Response-regulating cell surface receptor signaling pathway | 0.72 | 0.78 | |
| GLCM_diffEntro | Muscle | Muscle fiber development | 0.76 | 0.82 |
| Muscle cell differentiation | 0.75 | 0.79 | ||
| Cardiac ventricle formation | 0.76 | 0.77 | ||
| Bacteria | Response to molecule of bacterial origin | 0.76 | 0.73 | |
| Rho | Rho protein signal transduction | 0.75 | 0.78 | |
| GLCM_invDiffnorm | Cell development | Fat cell differentiation | 0.83 | 0.63 |
| Neg. reg. of cell development | 0.80 | 0.60 | ||
| Cell respiration | Reg. of aerobic respiration | 0.81 | 0.64 | |
| Immune system | Neg. reg. of lymphocyte activation | 0.80 | 0.63 | |
| Nervous system | Neuromuscular process controlling balance | 0.79 | 0.70 | |
| GLCM_invDiffmomnor | Immune system | Reg. of osteoclast differentiation | 0.81 | 0.58 |
| Osteoclast differentiation | 0.80 | 0.58 | ||
| Homeostasis | Multicellular organismal homeostasis(G) | 0.80 | 0.61 | |
| Tissue homeostasis | 0.79 | 0.57 | ||
| Rho | Reg. of Rho protein signal transduction | 0.79 | 0.55 | |
| GLCM_entrop2 | TNF | Response to TNF | 0.78 | 0.81 |
| Muscle | Muscle cell development | 0.77 | 0.80 | |
| Ventricular septum morphogenesis | 0.77 | 0.77 | ||
| Striated muscle cell differentiation | 0.76 | 0.78 | ||
| Drug response | Response to xenobiotic stimulus | 0.76 | 0.84 | |
| RLGL_shortRunEmphasis | Localization | Pos. regulation of establishment of protein localization | 0.79 | 0.77 |
| Catabolism | Lysine catabolic process | 0.79 | 0.80 | |
| Cell death | Pos. reg. of autophagy of mitochondrion | 0.77 | 0.78 | |
| Hormone | Pos. reg. of insulin secretion | 0.77 | 0.77 | |
| Cell mobility | Neuron projection guidance | 0.75 | 0.71 | |
| RLGL_longRunHighGrayLevEmpha | Syncytium | Syncytium formation | 0.78 | 0.54 |
| Reg. of syncytium formation by plasma membrane fusion | 0.77 | 0.47 | ||
| Synthesis | Pyrimidine nucleotide salvage | 0.76 | 0.50 | |
| AKT | Protein kinase B signaling | 0.76 | 0.54 | |
| Renal system | Renal sodium excretion | 0.76 | 0.44 | |
| RLGL_runPercentage | Muscle | Smooth muscle tissue development | 0.76 | 0.75 |
| TNF | Response to TNF | 0.76 | 0.76 | |
| TNF-mediated signaling pathway | 0.75 | 0.76 | ||
| Immune system | Myeloid leukocyte differentiation | 0.75 | 0.76 | |
| Renal system | Metanephros development | 0.74 | 0.80 | |
Shown are the top five GO gene sets ranked by AUC and with at least 0.50 AP; reg.= regulation; pos. = positive; neg. = negative.
A comparison of the learned radiogenomic associations extracted from our neural networks and the modules previously identified in the same dataset. Each module consisted of a set of Reactome pathways and a set of image features. Shown are the modules that included the radiomic features used in this study. If any module’s set of pathways was ranked among the top 100 in gene masking, the top three pathways were listed.
| Radiomic trait | Reactome pathway | This study | Grossmann et al. | ||
|---|---|---|---|---|---|
| Test AUC | Rank | Module | # Pathways | ||
| GLCM_diffEntro | Cross presentation of soluble exogenous antigens endosomes | 0.58 | 197 | 2 | 5 |
| Phase II conjugation | 0.69 | 9 | 12 | 35 | |
| Regulation of mitotic cell cycle | 0.66 | 21 | — | — | |
| ABCA transporters in lipid homeostasis | 0.65 | 28 | — | — | |
| LoG_stats_std | Regulation of ornithine decarboxylase | 0.94 | 15 | 2 | 5 |
| Cross presentation of soluble exogenous antigens endosomes | 0.82 | 89 | — | — | |
| Antigen processing cross presentation | 0.80 | 106 | — | — | |
| Cholesterol biosynthesis | 0.75 | 150 | 6 | 7 | |
| Signaling by TGF beta receptor complex | 0.83 | 81 | 8 | 17 | |
| Elongation arrest and recovery | 0.82 | 92 | — | — | |
| mRNA splicing | 0.75 | 142 | — | — | |
| LoG_stats_entropy | Elongation arrest and recovery | 0.73 | 1 | 7 | 8 |
| Mitochondrial protein import | 0.63 | 74 | — | — | |
| RNA pol III chain elongation | 0.58 | 180 | — | — | |
| Elongation arrest and recovery | 0.73 | 1 | 13 | 26 | |
| RNA pol II pre transcription events | 0.69 | 7 | — | — | |
| Formation of RNA pol II elongation complex | 0.69 | 9 | — | — | |
| Stats_skewness | Antigen processing cross presentation | 0.62 | 139 | 2 | 5 |
All Reactome pathways were ranked by test AUC.