| Literature DB >> 32098059 |
Jineta Banerjee1, Robert J Allaway1, Jaclyn N Taroni2, Aaron Baker1,3,4, Xiaochun Zhang5, Chang In Moon5, Christine A Pratilas6, Jaishri O Blakeley6,7, Justin Guinney1, Angela Hirbe5, Casey S Greene2,8, Sara Jc Gosline1.
Abstract
Neurofibromatosis type 1 (NF1) is a monogenic syndrome that gives rise to numerous symptoms including cognitive impairment, skeletal abnormalities, and growth of benign nerve sheath tumors. Nearly all NF1 patients develop cutaneous neurofibromas (cNFs), which occur on the skin surface, whereas 40-60% of patients develop plexiform neurofibromas (pNFs), which are deeply embedded in the peripheral nerves. Patients with pNFs have a ~10% lifetime chance of these tumors becoming malignant peripheral nerve sheath tumors (MPNSTs). These tumors have a severe prognosis and few treatment options other than surgery. Given the lack of therapeutic options available to patients with these tumors, identification of druggable pathways or other key molecular features could aid ongoing therapeutic discovery studies. In this work, we used statistical and machine learning methods to analyze 77 NF1 tumors with genomic data to characterize key signaling pathways that distinguish these tumors and identify candidates for drug development. We identified subsets of latent gene expression variables that may be important in the identification and etiology of cNFs, pNFs, other neurofibromas, and MPNSTs. Furthermore, we characterized the association between these latent variables and genetic variants, immune deconvolution predictions, and protein activity predictions.Entities:
Keywords: cancer; latent variables; machine learning; metaVIPER; nerve sheath tumor; neurofibromatosis type 1; random forest; supervised learning; transfer learning; tumor deconvolution
Mesh:
Year: 2020 PMID: 32098059 PMCID: PMC7073563 DOI: 10.3390/genes11020226
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Description of the gene expression datasets used in the present article.
| Dataset Name | Synapse Project Name | Synapse Table Name | Synapse Access Team |
|---|---|---|---|
| WashU Biobank | Preclinical NF1-MPNST Platform Development ( | WashU Biobank RNA-seq data | WUSTL MPNST PDX Data Access |
| JHU Biobank [ | A Nerve Sheath Tumor Bank from Patients with NF1 ( | Biobank RNASeq Data | JHU Biobank Data Access |
| cNF Patient Data [ | Cutaneous Neurofibroma Data Resource ( | cNF RNASeq Counts | CTF cNF Resource Data Access Group |
| CBTTC Data [ | Children’s Brain Tumor Tissue Consortium ( | CBTTC RNASeq Counts | CBTTC Data Access Group |
Description of the genomic variant datasets used in the present article.
| Dataset Name | Assay | Synapse Table Name | Synapse Access Team | Synapse Project |
|---|---|---|---|---|
| JHU Biobank Exome-Seq Data | exomeSeq | Biobank ExomeSeq Data | JHU Biobank Data Access | A Nerve Sheath Tumor Bank from Patients with NF1 |
| cNF WGS Data | wholeGenomeSeq | cNF WGS Harmonized Data | CTF cNF Resource Data Access Group | Cutaneous Neurofibroma Data Resource |
Figure 1Transfer learning reduced dimensionality and added additional context to gene expression datasets. (A) Principal components analysis (PCA) of gene expression data indicated that counts-level data may have been batch confounded. (B) The relative distributions of latent variable expression across the four tumor types using a density plot indicated that the majority of latent variables (LVs) had an expression value near 0 and that the four tumor types had similar latent variable expression distributions. (C) PCA of LVs indicated that batch effects, although reduced, may still have existed in the LV data (D) A look at the 5% most variable LVs across the cohort of gene expression data indicated that the latent variables represented a wide swath of biological processes, as well as some LVs that had no clear association to a defined biological pathway.
Summary of individuals and samples for the NF1 nerve sheath tumors used in this study. All samples have gene expression data and a subset have genomic data derived from whole-exome sequencing or whole genome sequencing. Some neurofibromas did not have more specific pathologic subtyping information available, and therefore were classified as “undefined neurofibromas” or NFs.
| Tumor Type | Individuals | Samples | # with Genomic Variant Data |
|---|---|---|---|
| Cutaneous Neurofibroma (cNF) | 11 | 33 | 23 |
| MPNST | 13 | 13 | 1 |
| Undefined Neurofibroma (NF) | 12 | 12 | 11 |
| Plexiform Neurofibroma (pNF) | 19 | 19 | 5 |
Figure 2An ensemble of random forests selected the most important latent variables for classifying different tumor types in NF1. (A) Density plot showing the distribution of F1 scores of 500 iterations of independent random forest models using all latent variables. (B) Density plot showing the distribution of F1 scores of 500 iterations of independent random forest models trained using only the top 40 features with high importance scores for each class obtained from models included in (A). (C) Ridgeplots of top 20 latent variables selected by the random forest for each tumor type and their importance scores for each class that were selected for later analyses.
Figure 3Selected latent variables (LVs) represented gene combinations unique to each tumor type. (A) Venn diagram showing the distribution of the top 40 LVs from each tumor type. (B,C) Total values of the LVs as measured by multiPLIER across samples are represented in the dot-plots, where color of the dots represents the tumor type (“Class” label colors described in the lower left). Loading values for the top 10 genes for each LV are represented in bar-plots below. The higher the loading, the greater impact that the gene expression had on the total multiPLIER value. (B,i–iii) Genes constituting the latent variables associated with known cell signaling pathways. (C,i–iii) Genes constituting the uncharacterized latent variables.
Figure 4Some genes significantly distinguished expression of latent variables. (A) Latent variables (y-axis) whose values are significantly altered by mutations in specific genes. (B) MultiPLIER value of LV 851 across tumor samples. (C) MultiPLIER value of LV 851 across all samples. (D) Loading values of the top 20 genes that comprise LV 851.
Figure 5Various immune cell signatures correlated to specific LVs that differentiate tumor types in NF1. (A) CIBERSORT deconvolution of bulk nerve sheath tumor expression data predicted the presence of activated mast cells and M2 macrophages and resting CD4+ memory T cells in all of the tested tumor types. (B) MCP-counter based deconvolution of bulk nerve sheath tumor expression data predicted the presence of cancer-associated fibroblasts across all tumor types, and diversity in T cell population across tumor types. (C) Correlation of CIBERSORT immune score (x-axis) with expression of latent variable 546 highlighted the increased presence of activated mast cells and resting dendritic cells in cNFs (circles). (D) Top 20 gene loadings of LV 546. (E) Correlation of MCP-counter score of Tell infiltration (x-axis) with LV 540. (F) Top 20 gene loadings of LV 540.
Figure 6Integration of protein activity information with LVs can identify candidate drug targets for different NF1 tumor types. (A) A heatmap of correlation scores of known proteins with regulatory networks (or regulons) that are represented in the characterized and uncharacterized LVs selected above. The green bar across the top depicts how many protein activity scores had a Spearman correlation greater than 0.65. (B) Clustering of the LV-correlated VIPER proteins highlighted five clusters of latent variables with similar VIPER protein predictions, suggesting that these five clusters may have functional overlap. (C) Mean LV expression within the clusters highlighted differential expression within the clusters across tumor types. Tumor type is indicated by colors on the right. (D) Drug set enrichment analysis of the average VIPER protein correlation of cluster 2 identified some drugs and preclinical molecules that are enriched with targets in this cluster.