| Literature DB >> 32425511 |
Flavia Esposito1, Angelina Boccarelli2, Nicoletta Del Buono3.
Abstract
The rapid development of high-performance technologies has greatly promoted studies of molecular oncology producing large amounts of data. Even if these data are publicly available, they need to be processed and studied to extract information useful to better understand mechanisms of pathogenesis of complex diseases, such as tumors. In this article, we illustrated a procedure for mining biologically meaningful biomarkers from microarray datasets of different tumor histotypes. The proposed methodology allows to automatically identify a subset of potentially informative genes from microarray data matrices, which differs either in the number of rows (genes) and of columns (patients). The methodology integrates nonnegative matrix factorization method, a functional enrichment analysis web tool with a properly designed gene extraction procedure to allow the analysis of omics input data with different row size. The proposed methodology has been used to mine microarray of solid tumors of different embryonic origin to verify the presence of common genes characterizing the heterogeneity of cancer-associated fibroblasts. These automatically extracted biomarkers could be used to suggest appropriate therapies to inactivate the state of active fibroblasts, thus avoiding their action on tumor progression.Entities:
Keywords: NMF; cancer; cancer-associated fibroblast; fibroblast; metagene; microarray
Year: 2020 PMID: 32425511 PMCID: PMC7218276 DOI: 10.1177/1177932220906827
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Figure 1.Work flow of the NMF-based methodology for extraction of common genes between different tumor gene expression datasets.
Figure 2.Possible cases supported by the pre-processing phase.
Figure 3.Logical view of the gene landscape extraction. Preprocessed expression data matrices related to 3 different tumors () are factorized by the NMF algorithm and corresponding metagenes matrices () are taken into considerations. Gene.score scoring method selects the most relevant metagenes , () i, which undergo the Gene Landscape Extraction procedure. This latter uses an intersection set operation to identify labels of genes common to each different tumor histotypes. The extracted subset of gene-labels is then sent to the functional and pathway analyses.
Gene ontology functional analysis: based on the parameters used, 10 categories are identified as enriched categories and all are shown in this table.[9]
| Gene set | Description | Size | Expect | Ratio | FDR | |
|---|---|---|---|---|---|---|
| GO:0006955 | Immune response | 1919 | 9.0964 | 2.5285 | .00001777 | 0.1615 |
| GO:1901565 | Organonitrogen compound catabolic process | 1240 | 5.8778 | 2.8922 | .00005655 | 0.2571 |
| GO:0042454 | Ribonucleoside catabolic process | 22 | 0.1043 | 28.7680 | .00014794 | 0.3362 |
| GO:0009164 | Nucleoside catabolic process | 34 | 0.1612 | 18.6140 | .00055180 | 0.8992 |
| GO:0072529 | Pyrimidine-containing compound catabolic process | 38 | 0.1801 | 16.6550 | .00076738 | 0.8992 |
| GO:0048525 | Negative regulation of viral process | 88 | 0.4171 | 9.5892 | .00080637 | 0.8992 |
| GO:1901658 | Glycosyl compound catabolic process | 44 | 0.2086 | 14.3840 | .00118040 | 0.8992 |
| GO:0006216 | Cytidine catabolic process | 11 | 0.0521 | 38.3570 | .00118690 | 0.8992 |
| GO:0009972 | Cytidine deamination | 11 | 0.0521 | 38.3570 | .00118690 | 0.8992 |
The parameters for the analysis of enrichment are minimum number of IDs in the category: 5; maximum number of IDs in the category: 2000; FDR method: BH; level of significance: Top 10.
Dataset information: GSE series used, bibliographical references, GEO platforms (GPL570, GPL6244, and GPL2136 indicate Affymetrix Human Genome U133 Plus_2.0 Array, Affymetrix Human Gene 1.0 ST Array, and Micro-CRIBI Human Oligo Array [Operon V2.0], respectively), fibroblast sample labels, representative cancer, number of genes, number of samples, and NMF rank value used.
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| GSE51257 |
[ | GPL6244 | Cancer-associated fibroblast | Colon carcinoma | 4 | 20 304 | 2 |
| GSE30292 |
[ | GPL570 | Cancer-associated fibroblast | Colon carcinoma | 3 | 22 189 | 3 |
| GSE75333 |
[ | GPL570 | Carcinoma-associated fibroblast | Breast carcinoma | 3 | 22 189 | 2 |
| GSE20086 |
[ | GPL570 | Carcinoma-associated fibroblast | Breast cancer | 6 | 22 189 | 2 |
| GSE40595 |
[ | GPL570 | Ovarian Cancer stroma | Ovarian cancer | 10 | 22 189 | 2 |
| GSE24990 |
[ | GPL2136 | Active multiple myeloma | MM | 18 | 21 520 | 8 |
| MGUS | MGUS | 5 |
Pathway enrichment analysis: based on the parameters used, 10 categories are identified as enriched categories and all are shown in this table.[9]
| Gene set | Description | Size | Expect | Ratio | FDR | |
|---|---|---|---|---|---|---|
| P00053 | T cell activation | 75 | 0.3972 | 7.5533 | .00617920 | 0.3686 |
| P00032 | Insulin | 29 | 0.1536 | 13.0230 | .00964300 | 0.3686 |
| P06959 | CCKR signaling map | 172 | 0.9109 | 4.3915 | .00978500 | 0.3686 |
| P00047 | PDGF signaling pathway | 125 | 0.6620 | 4.5320 | .02499200 | 0.6789 |
| P02723 | Adenine and hypoxanthine salvage pathway | 6 | 0.0318 | 31.4720 | .03139100 | 0.6789 |
| P00010 | B cell activation | 58 | 0.3071 | 6.5115 | .03604900 | 0.6789 |
| P00031 | Inflammation mediated by chemokine and cytokine signaling pathway | 200 | 1.0591 | 2.8325 | .08217900 | 1.0000 |
| P00002 | Alpha adrenergic receptor signaling pathway | 23 | 0.1218 | 8.2101 | .11549000 | 1.0000 |
| P00004 | Alzheimer disease-presenilin pathway | 112 | 0.5931 | 3.3720 | .11570000 | 1.0000 |
| P00041 | Metabotropic glutamate receptor group I pathway | 24 | 0.1271 | 7.8681 | .12022000 | 1.0000 |
The parameters for the analysis of enrichment are minimum number of IDs in the category: 5. Maximum number of IDs in the category: 2000; FDR method: BH; and level of significance: top 10.
List of the 108 genes common to colon, breast, and ovarian tumors identified by NMF-based methodology described in this article.
| CDH1 | EFEMP1 | IGLC1 | MATR3 | RARRES1 | FANCL | JCHAIN | PNN | GAGE12F |
|---|---|---|---|---|---|---|---|---|
| GAGE2A | GAGE12H | GAGE12E | GAGE2D | GAGE8 | GAGE12J | GAGE12G | GAGE13 | GAGE2E |
| GAGE6 | GAGE5 | GAGE4 | GAGE2C | GAGE1 | BEX4 | NFYB | ITGB8 | IGHV4-31 |
| IGHM | IGHG1 | IGLV1-44 | ANXA1 | COL14A1 | BST2 | SLC16A14 | GPC6 | UBA6 |
| CKAP2 | MGARP | BCL2L13 | HLA-DQB1 | ZNF644 | DMD | HOXD8 | FOS | SLC40A1 |
| SLC39A8 | ETNK1 | PMS1 | PSMB9 | RCBTB1 | EEA1 | NBN | MAP7 | TGFBR3 |
| ENPP2 | DNAJC13 | RBM25 | RSRC2 | SPARCL1 | CFH | INS-IGF2 | IGF2 | HNRNPA2B1 |
| ANKRD28 | EIF4E2 | MBD4 | TRIM14 | CP | DZIP3 | IFT80 | TAP2 | ITPR1 |
| HNMT | TTC14 | LMBRD2 | MINA | TRIM59 | IGKC | AZIN1 | AOX1 | PRPF4B |
| SYNPO2 | FBXO9 | ZNF277 | APOBEC3B | TRIM2 | GTF2H2B | NR3C1 | CHN2 | ROCK1 |
| HLADQA1 | RIF1 | ABCE1 | APOBEC3G | KIF26B | SLC2A13 | PCMTD2 | ZNF655 | HERC2 |
| RARRES3 | FLI1 | ZDHHC21 | PNP | SIK1 | SHROOM3 | PGRMC1 | LINC01116 | KIAA1109 |
Figure 4.Variation of the UpSet plot[59] of the most frequent genes: ITPR1, FOS, ROCK1, and CDH1. The intersection between ITPR1 and FOS genes is also plotted to emphasize the pathway they belong to, commonly.
Figure 5.ECM (extracellular matrix). The ENPP2 protein, common to the CAFs of the 4 tumors analyzed, catalyzes the LPC hydrolysis in LPA, activating its local LPA receptors and the corresponding G proteins. LPA signals through its receptors to induce proliferation, survival, and invasion in tumor cells and cancer stem cells. LPA signaling also induces the recruitment of CAFS cells and a wide range of cellular responses and also reduces the cytotoxic immune response.