| Literature DB >> 26040466 |
Mohamed Hamed, Christian Spaniol, Alexander Zapp, Volkhard Helms.
Abstract
BACKGROUND: Breast cancer is a genetically heterogeneous type of cancer that belongs to the most prevalent types with a high mortality rate. Treatment and prognosis of breast cancer would profit largely from a correct classification and identification of genetic key drivers and major determinants driving the tumorigenesis process. In the light of the availability of tumor genomic and epigenomic data from different sources and experiments, new integrative approaches are needed to boost the probability of identifying such genetic key drivers. We present here an integrative network-based approach that is able to associate regulatory network interactions with the development of breast carcinoma by integrating information from gene expression, DNA methylation, miRNA expression, and somatic mutation datasets.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26040466 PMCID: PMC4460623 DOI: 10.1186/1471-2164-16-S5-S2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1The integrative network-based approach. A schematic diagram describing data processing and integration of different data sources to detect major determinants and key driver molecules controlling breast carcinomas.
The key driver elements identified TF-gene interactions and miRNA-mRNA interactions.
| Module | Gene count | Top GO category | Top KEGG categories | Key driver count | Key drivers | |
|---|---|---|---|---|---|---|
| black | 41 | Regulation of transcription | Pathways in cancer, Renal cell carcinoma | 5 | SORBS3, ZNF43, ZNF681, RBMX, POU2F1 | |
| blue | 247 | Nucleobase, nucleoside, nucleotide and nucleic acid metabolic process | Cell cycle, Prostate cancer, Melanoma | 9 | ||
| brown | 195 | Anatomical structure morphogenesis | Leukocyte transendothelial migration | 5 | TMOD3, CREB1, POU5F1, SP3, TERT | |
| green | 110 | Cellular macromolecule metabolic process | Endometrial cancer, Insulin signaling pathway | 15 | ||
| grey | 148 | Anatomical structure development | Sulfur metabolism | 18 | ||
| magenta | 26 | Regulation of metabolic process | p53 signaling pathway, Alzheimer's disease | 3 | ||
| pink | 30 | Transcription initiation from RNA polymerase II promoter | Basal transcription factors | 4 | ||
| red | 93 | Regulation of cellular process | Endometrial cancer, Neurotrophin signaling pathway | 14 | ||
| turquoise | 295 | Regulation of cellular metabolic process | p53 signaling pathway, Pancreatic cancer, Apoptosis | 2 | UBL5, RNF111 | |
| yellow | 132 | Immune system process | Chemokine signaling pathway, Natural killer cell mediated cytotoxicity | 19 | ||
| Total | 1317 | |||||
| Genes | ||||||
| 869 | Regulation of macromolecule metabolic process | Pathways in cancer, Pancreatic cancer, Prostate cancer | 17 | |||
| miRNAs | ||||||
| 120 | miRNA tumor suppressors, immune response, Onco-miRNA, cell death, human embryonic stem cells regulation | Breast cancer (65), Neoplasms (58), Melanoma (56), Ovarian Neoplasms (51), Pancreatic Neoplasms (38), Prostatic Neoplasms (38) | 68 | mir-126, mir-609, mir-488, mir-191, mir-200c, mir-200a, mir-30a, mir-30d, mir-335, mir-190b, mir-223, mir-106b, mir-519e, mir-210, mir-379, mir-203, mir-205, mir-708, mir-29c, mir-29a, mir-182, mir-183, mir-127, mir-187, mir-425, let-7g, let-7d, mir-152, mir-155, mir-21, mir-22, mir-758, mir-921, mir-922, mir-375, mir-377, mir-181a-2, mir-657, mir-302d, mir-100, mir-10b, mir-10a, mir-625, mir-629, mir-92a-2, mir-26b, mir-25, mir-145, mir-143, mir-141, mir-221, mir-193b, mir-193a, mir-374a, mir-134, mir-146a, mir-31, let-7a-2, mir-27a, mir-27b, mir-133a-1, let-7i, mir-93, mir-23a, mir-148a, mir-196a-2, mir-487b, mir-149 | ||
For the 10 gene modules identified in TF-mRNA interactions, we list counts of the involved genes, the most significant GO and KEGG terms, and the identified key driver genes from each module. Similarly for the miRNA-mRNA interactions, we list the key driver molecules of both genes and miRNAs. The driver genes, whose protein products are known to be targeted by drugs, are in bold.
Figure 2Gene network modules of TF-gene interactions. (a) Topological overlap matrix (TOM) heatmap corresponding to the ten co-expression modules. Each row and column of the heatmap represent a single gene. Spots with bright colors denote weak interaction whereas darker colors denote strong interaction. The dendrograms on the upper and left sides show the hierarchical clustering tree of genes. (b), (c), and (d) are the final GRN networks highlighting the identified key drivers genes for the green, magenta, and red modules, respectively. Square nodes denote the identified driver genes that are targeted by drugs. Networks were visualized using the Igraph package in R.
Figure 3Regulatory interactions of the 17 key driver genes identified from miRNA-mRNA interactions. Large nodes represent key driver genes and small nodes represent miRNAs, which regulate or are regulated by these driver genes. Square nodes are the identified driver genes that are targeted by drugs. The network was visualized using the Igraph package in R.
Figure 4Proximity analysis of the somatic mutations with the dysregulated miRNAs and differentially methylated genes. Ideogram plots showing the genomic distribution for (a) the 21 cases of deregulated miRNAs adjacent to somatic mutations. The outer green circle shows the entire dataset of miRNAs, whereas the next highlighted red lines refer to the adjacent deregulated miRNAs (20 miRNAs where one miRNA is matched to 2 SNVs). The inner blue circle represent the entire set of somatic SNVs and the next highlighted red lines depict the SNVs matched to the 21 cases. (b) The 347 cases of somatic mutations occurring in the promoter regions of differentially methylated genes. The outer green circle shows the entire set of differentially methylated genes, whereas the next highlighted red lines refer to the identified cases adjacent to the somatic mutations. The inner blue circle represents the entire set of somatic SNVs and the next highlighted red lines depict the SNVs matched to the identified cases. The plot illustrates also the fractions of the three considered types of mutations (C->T, C->G and C->A) showing the occurrence frequency for each one.
List of the identified driver mutations ordered by CHASM score.
| Chrom | Occurring gene | SNV position | CHASM score | P-value | Ref | Alt | Amino acids | Codons | SIFT score | PolyPhen score |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | PTPRC | 198711494 | 0.158 | 6.00E-04 | G | A | E/K | Gag/Aag | Deleterious (0) | probably_damaging (0.999) |
| 8 | TNKS | 9413850 | 0.162 | 6.00E-04 | C | T | S/F | tCc/tTc | Deleterious (0.01) | Unknown (0) |
| X | GRIA3 | 122319694 | 0.298 | 0.0119 | C | A | F/L | ttC/ttA | Deleterious (0) | probably_damaging (0.996) |
| 5 | PCDHB14 | 140604126 | 0.308 | 0.0134 | C | T | S/L | tCg/tTg | Deleterious (0.02) | Benign (0.368) |
| X | HUWE1 | 53644041 | 0.31 | 0.0136 | C | A | R/L | cGa/cTa | Deleterious (0) | probably_damaging (1) |
| 17 | NFE2L1 | 46136186 | 0.326 | 0.0175 | C | T | S/F | tCc/tTc | Deleterious (0.01) | probably_damaging (0.994) |
| 9 | NAIF1 | 130829249 | 0.336 | 0.0204 | C | G | K/N | aaG/aaC | Deleterious (0) | probably_damaging (0.995) |
| 2 | KLHL23 | 170592167 | 0.354 | 0.0251 | C | G | R/G | Cga/Gga | Deleterious (0) | probably_damaging (0.999) |
| 12 | KCNA1 | 5021107 | 0.384 | 0.0406 | C | T | T/M | aCg/aTg | Deleterious (0) | probably_damaging (0.997) |
The CHASM score is defined as the fraction of trees in the Random Forest that voted for the mutation being classified as a passenger. Lower scores increase the confidence of driver mutations. P-values are calculated based on the null score distribution. The table reports also the changes in the related codons and amino acids. The SIFT and PolyPhen scores refer to the prediction of whether an amino acid substitution affects the function and structure of the human proteins. The SIFT prediction is based on the degree of conservation of amino acid residues in sequence alignments derived from closely related sequences (lower scores represent high impacts), whereas the PolyPhen prediction uses physical and evolutionary comparative considerations (higher scores represent high impact and severe influence on the protein function and structure).