| Literature DB >> 33329726 |
Bruno César Feltes1, Joice de Faria Poloni1, Itamar José Guimarães Nunes2, Sara Socorro Faria3, Marcio Dorn1,2,4.
Abstract
Studies describing the expression patterns and biomarkers for the tumoral process increase in number every year. The availability of new datasets, although essential, also creates a confusing landscape where common or critical mechanisms are obscured amidst the divergent and heterogeneous nature of such results. In this work, we manually curated the Gene Expression Omnibus using rigorous filtering criteria to select the most homogeneous and highest quality microarray and RNA-seq datasets from multiple types of cancer. By applying systems biology approaches, combined with machine learning analysis, we investigated possible frequently deregulated molecular mechanisms underlying the tumoral process. Our multi-approach analysis of 99 curated datasets, composed of 5,406 samples, revealed 47 differentially expressed genes in all analyzed cancer types, which were all in agreement with the validation using TCGA data. Results suggest that the tumoral process is more related to the overexpression of core deregulated machinery than the underexpression of a given gene set. Additionally, we identified gene expression similarities between different cancer types not described before and performed an overall survival analysis using 20 cancer types. Finally, we were able to suggest a core regulatory mechanism that could be frequently deregulated.Entities:
Keywords: bioinformatics; cancer; machine learning; omics; overall survival; regulatory networks; systems biology
Year: 2020 PMID: 33329726 PMCID: PMC7719697 DOI: 10.3389/fgene.2020.586602
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Methodological steps used in this work. The work is divided into: (1) Data gathering and curation; (2) Microarray analysis; (3) ML analysis of microarray data; (4) RNA-seq analysis; (5) Systems Biology approach; and (6) Overall Survival analysis.
Figure 2Jaccard index values and GO analysis. (A) Jaccard index matrix showing the similarity values for the overexpressed genes derived from the microarray analysis. (B) Jaccard index matrix showing the similarity values for the underexpressed genes derived from the microarray analysis. White labels were added just for clarity. RNA-seq matrices were not shows because they displayed no significant values based on our cut-off of at least 15% similarity; they can be found on Supplementary Material 2. The values can also be seen in Supplementary Tables 5, 8, 11, 14. PNC, pancreatic cancer; BON, bone [cancer]; BRN, brain [cancer]; RNL, renal [cancer]; GTC, gastric cancer; PRT, prostate [cancer]; LVR, liver [cancer]; CRC, colorectal cancer; LNG, lung [cancer]; BRC, breast cancer; HNC, head/Neck cancer; OVR, ovarian [cancer]; BLD, bladder [cancer]. (C) Gene Ontology groups for the 47 DEG frequently expressed between all analyzed cancer types. Details of which GO composed each group can be found in Supplementary Tables 16, 17.
Table listing the 47 most frequently expressed DEG found between all analyzed cancer types, and their topological properties in the PPI-networks, when applied.
| ANLN | Over | Actin-binding protein related to citokinesis and migration. | HBS |
| ASPM | Over | Related to the mitotic spindle regulation. | HBS |
| ATAD2 | Over | ATPase related to multiple cellular functions, including activation of | HS |
| ATF3 | Under | Member of the cAMP responsive element-binding factors | HBS |
| BIRC5 | Over | Involved in the inhibition of apoptosis. | HBS |
| BUB1 | Over | Plays a role in mitotic spindle-assembly, including the localization of CENPF | HBS |
| BUB1B | Over | Plays a role in mitotic spindle assembly and the localization of CENPE. | HBS |
| CDC25A | Over | Phosphatase involved in cell cycle progression. | S |
| CDCA5 | Over | Protein associated with mitosis. | HS |
| CDK1 | Over | Cyclin-Dependent Kinase deeply involved in cell cycle. | HBS |
| CENPE | Over | Member of the centromere-kinetochore complex. | HS |
| CENPF | Over | See CENPE. | HS |
| CEP55 | Over | Centrosomal protein, associated to cytokinesis. | HBS |
| CTHRC1 | Over | Putative roles in the negative regulation of collagen deposition. | NA |
| CYBRD1 | Under | Member of the cytochrome b family, related to iron absorption. | B |
| DEPDC1 | Over | Transcription corepressor, associated to apoptosis suppression and proliferation. | HS |
| ECT2 | Over | Catalyzes the GDP-GTP exchange. Also involved in cytokenesis. | HS |
| EZH2 | Over | Polycomb-group family, involved in gene silencing and DNA methylation. | B |
| FERMT2 | Under | Involved in extracellular matrix adhesion and regulates cytoskeleton assembly. | HB |
| GINS2 | Over | Associated with DNA replication. | HS |
| HELLS | Over | Helicase involved in chromatin organization. | HS |
| HHIP | Under | Hedgehog-interacting protein, which is related to several developmental processes. | NA |
| HMMR | Over | Hyaluronic acid receptor, associated with metastasis formation. | HS |
| KIF11 | Over | Kinesin family member, deeply related to spindle organization and mitotic progression. | HBS |
| KIF20A | Over | See KIF11. | HBS |
| KIF2C | Over | See KIF11. | HBS |
| KLF4 | Under | Transcription factor associated with embryonic stem cell maintenance. | HBS |
| LIFR | Under | Cytokine receptor, which is heavily associated with the Leukemia Inhibitory Factor. | B |
| MAD2L1 | Over | Member of the mitotic spindle assembling complex. | HBS |
| MCM6 | Over | Required for DNA replication initiation through several processes. | HBS |
| METTL7A | Under | Putative methyltransferase. | NA |
| MT1E | Under | Metallothionein, which alters the intracellular concentration of heavy metals. | NA |
| MT2A | Under | See MT1E. | NA |
| NDRG2 | Under | A hydrolase, which is related to Wnt-signaling. | NTR |
| NEK2 | Over | Kinase that regulates several centrosome-associated events during mitosis. | HBS |
| PBK | Over | Kinase involved in MAPKK activation. | HBS |
| PLPP3 | Under | Phospholipid phosphatase involved in the synthesis of glycerolipids. | NA |
| PRC1 | Over | Involved in the mitotic spindle organization. | HBS |
| PRR11 | Over | Related with cell cycle progression. | H |
| RCAN1 | Under | Inhibitor of calcineurin A. | B |
| RRM2 | Over | Subunit of a ribonucleotide reductase. | HBS |
| SELENOP | Under | Involved in selenium transportation. | NA |
| SOX4 | Over | Transcription factor related to the regulation of embryonic development. | NA |
| TOP2A | Over | DNA topoisomerase. | HBS |
| TPX2 | Over | Associated with microtubules spindle assembly. | HBS |
| UBE2C | Over | Member of the E2 ubiquitin-conjugating enzyme family. | HBS |
| ZBTB16 | Under | Zinc-finger protein, associated to cell cycle progression. | HBS |
HBS, hub-bottleneck-switch; HB, hub-bottleneck; HS, hub-switch; H, hub; B, bottleneck; S, switch; NA, not applicable (This gene was not present in the network); NTR, no topological relevance.
Figure 3Gene expression panorama of the 47 frequently expressed genes, according to the DDBv3 database. The red-colored gradient indicates levels of higher expression, whereas the green-colored gradient indicates levels of lower expression.
Figure 4Networks built using the 47 DEG frequently expressed between all analyzed cancer types. (A) Over-DEG-Net. The red nodes depict the 31 (from 33) overexpressed genes frequently expressed between cancer types, whereas the green nodes show their first neighbors. The pink nodes are shared between the two networks. Over-DEG-Net is composed of 231 nodes and 9,144 edges, displaying high connectivity. (B) Under-DEG-Net. The blue nodes depict the 8 (from 14) underexpressed genes frequently expressed between cancer types, whereas the yellow nodes show their first neighbors. The network is composed of 115 nodes and 484 edges, displaying low connectivity. (C) The octagonal nodes refer to the Hubs-Bottlenecks-Switch (HBS) proteins from the overexpressed network. This subnetwork is composed of 55 nodes and 1,428 edges. (D) The diamond nodes indicate the Hubs-Bottlenecks-Switch (HBS) proteins from the underexpressed network. This subnetwork is composed of 18 nodes and 74 edges.
Figure 5TAR-Net. (A) The initial network, composed by the DEG (Red and blue nodes, similar to the Over and Under-DEG-Nets), the nodes in common between the Over and Under-DEG-Net (pink nodes), N3O's top-features (orange nodes), and the transcription factors associated to them (gray nodes). The red edges represent repression, whereas the green edges depict activation. The gray edges are unknown connections. Unconnected nodes were excluded prior to the analysis. (B) REG-Net, composed only by the nodes predicted to be a regulatory network. This network is composed of 44 nodes and 97 edges. (C) Final subnetwork, containing only the transcription factors and DEG that have at least two other connection. The subnetwork thus contains 32 nodes and 84 edges. (D) Graph depicting the number of connections in TAR-Net and REG-Net.
Figure 6Summary of the overall analysis. The graph displays the 47 genes previously obtained. The gray squares represent genes that showed no statistical significance, whereas the colored ones presented significant p-values. The color scheme illustrates the HR score from lowest to highest. Underexpressed genes were colored blue, whereas overexpressed genes were painted red.