| Literature DB >> 29364829 |
Xinguo Lu1, Xing Li2, Ping Liu3, Xin Qian4, Qiumai Miao5, Shaoliang Peng6,7.
Abstract
With advances in next-generation sequencing(NGS) technologies, a large number of multiple types of high-throughput genomics data are available. A great challenge in exploring cancer progression is to identify the driver genes from the variant genes by analyzing and integrating multi-types genomics data. Breast cancer is known as a heterogeneous disease. The identification of subtype-specific driver genes is critical to guide the diagnosis, assessment of prognosis and treatment of breast cancer. We developed an integrated frame based on gene expression profiles and copy number variation (CNV) data to identify breast cancer subtype-specific driver genes. In this frame, we employed statistical machine-learning method to select gene subsets and utilized an module-network analysis method to identify potential candidate driver genes. The final subtype-specific driver genes were acquired by paired-wise comparison in subtypes. To validate specificity of the driver genes, the gene expression data of these genes were applied to classify the patient samples with 10-fold cross validation and the enrichment analysis were also conducted on the identified driver genes. The experimental results show that the proposed integrative method can identify the potential driver genes and the classifier with these genes acquired better performance than with genes identified by other methods.Entities:
Keywords: breast cancer; cancer subtypes; copy number variation; gene expression; integrative analysis; module network
Mesh:
Substances:
Year: 2018 PMID: 29364829 PMCID: PMC6099653 DOI: 10.3390/molecules23020183
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1A comparison of identified cancer drivers between module-network and two state of the integrative methods on breast cancer subtypes datasets.
Figure 2Heatmap of expression values of 28 most significant differently expressed gene in LumA-subtype. Clustering method on expression values was used to generate the heatmap. There are clear clusters of genes for the four tumor subtypes.
Figure 3(a) The top 12 mutation frequency of driver genes in LumA; (b) The mutation proportion of the top 12 genes in each breast cancer subtype samples.
The accuracy of 10-fold cross validation between subtypes.
| Subtypes | Module-Network | Information Gain | Chi-Squared | Lemon-Tree |
|---|---|---|---|---|
| LumA-others | 83.13% | 79.60% | 80.39% | 85.88% |
| LumB-others | 84.70% | 76.47% | 76.86% | 80.39% |
| Basal-others | 98.82% | 97.64% | 98.82% | 98.03% |
| Her2-others | 92.94% | 92.94% | 94.90% | 92.15% |
The recall of 10-fold cross validation between subtypes.
| Subtypes | Module-Network | Information Gain | Chi-Squared | Lemon-Tree |
|---|---|---|---|---|
| LumA-others | 0.801 | 0.844 | 0.853 | 0.827 |
| LumB-others | 0.847 | 0.402 | 0.291 | 0.5 |
| Basal-others | 0.988 | 0.952 | 0.976 | 0.928 |
| Her2-others | 0.929 | 0.56 | 0.6 | 0.44 |
The F-measure of 10-fold cross validation between subtypes.
| Subtypes | Module-Network | Information Gain | Chi-Squared | Lemon-Tree |
|---|---|---|---|---|
| LumA-others | 0.812 | 0.790 | 0.798 | 0.842 |
| LumB-others | 0.719 | 0.491 | 0.415 | 0.590 |
| Basal-others | 0.964 | 0.930 | 0.964 | 0.939 |
| Her2-others | 0.590 | 0.608 | 0.697 | 0.523 |
Top-18 pathways obtained for breast cancer subtypes after enrichment performed by MITHrIL.
| LumA | LumB | ||
|---|---|---|---|
| Chemokine signaling pathway | 0 | MAPK signaling pathway | 0 |
| HIF-1 signaling pathway | 0 | PI3K-Akt signaling pathway | 0 |
| VEGF signaling pathway | 0 | Apoptosis | 0 |
| Osteoclast differentiation | 0 | Neurotrophin signaling pathway | 0 |
| Hippo signaling pathway | 0 | Type I diabetes mellitus | 0 |
| ErbB signaling pathway | 0 | Natural killer cell mediated cytotoxicity | 0 |
| Calcium signaling pathway | 0 | Chagas disease (American trypanosomiasis) | 0 |
| HIF-1 signaling pathway | 0 | HTLV-I infection | 0 |
| Focal adhesion | 0 | ||
| Adherens junction | 0 | ||
Figure 4The process of module-network learning. It is an iterative procedure that determines both the partition of genes to modules and the regulation program for each module.
Figure 5Schematic diagram of the integrative method based on module-network. The first part indicates the pre-processing steps based on the differential expression analysis. The middle part is the construction of the initial modules. The bottom part represents the process of module network learning.