| Literature DB >> 24284521 |
Lina Chen1, Xiaoli Qu, Mushui Cao, Yanyan Zhou, Wan Li, Binhua Liang, Weiguo Li, Weiming He, Chenchen Feng, Xu Jia, Yuehan He.
Abstract
Identifying breast cancer patients is crucial to the clinical diagnosis and therapy for this disease. Conventional gene-based methods for breast cancer diagnosis ignore gene-gene interactions and thus may lead to loss of power. In this study, we proposed a novel method to select classification features, called "Selection of Significant Expression-Correlation Differential Motifs" (SSECDM). This method applied a network motif-based approach, combining a human signaling network and high-throughput gene expression data to distinguish breast cancer samples from normal samples. Our method has higher classification performance and better classification accuracy stability than the mutual information (MI) method or the individual gene sets method. It may become a useful tool for identifying and treating patients with breast cancer and other cancers, thus contributing to clinical diagnosis and therapy for these diseases.Entities:
Mesh:
Year: 2013 PMID: 24284521 PMCID: PMC3842546 DOI: 10.1038/srep03368
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
The number of gene expression profile datasets for breast cancer applied in this study
| Sample | GSE5364 | GSE9574 | GSE15852 | GSE20437 | GSE27562 |
|---|---|---|---|---|---|
| Normal | 13 | 15 | 43 | 18 | 31 |
| Tumor | 183 | 14 | 43 | 24 | 116 |
Figure 1Normal distribution plotted against expression correlation differential scores for breast cancer expression profile dataset GSE5364.
X-axis: expression-correlation differential score for network motifs. Y-axis: number of network motifs. Red arrow: score at which p-value equals 0.05.
The number of SCDMs shared different expression profiles. 56 significantly differential motifs were obtained after removing redundancies
| GSE5364 | GSE9574 | GSE15852 | GSE20437 | Total | |
|---|---|---|---|---|---|
| GSE5364 | — | 12 | 7 | 8 | 27 |
| GSE9574 | 12 | — | 11 | 6 | 17 |
| GSE15852 | 7 | 11 | — | 19 | 19 |
| GSE20437 | 8 | 6 | 19 | — | — |
| Total | 27 | 17 | 19 | — | 56 * |
Figure 2High stable significant differential motifs.
Solid lines represent activating or inhibitory interactions. Dotted lines represent physical interactions. Nodes in pink or green represent cancer-associated or non-cancer-associated genes, respectively.
Motif genes and differential genes identified from the traditional variance analysis. Boldface indicates the genes associated with breast cancer, Boldface and italics indicate the known breast cancer genes
| Gene Type | GeneName | Literature Confirmed Rate |
|---|---|---|
| 85 motif genes | ABL1, | 81.18% |
| 31 differential genes | 71% |
Figure 3Network topology characteristics for high-stability significant differential motifs.
(A) Average number of neighbors. (B) Characteristic path length. Red dots: Average number of neighbors and characteristic path length of original signaling network,signaling network with motif M1 removed, signaling network with motif M2 removed (2GSE). Box plots summarize results from 100 random networks.
Classification accuracy of four kernel functions
| Kernel function | GSE5364 | GSE9574 | GSE15852 | GSE20437 |
|---|---|---|---|---|
| Linear | 0.9745 | 0.8276 | 0.8488 | 0.8095 |
| Quadratic | 0.9694 | 0.6207 | 0.6279 | 0.6667 |
| Polynomial | 0.949 | 0.7586 | 0.8488 | 0.7381 |
| RBF | 0.9337 | 0.4828 | 0.0698 | 0.5714 |
Figure 4The influence of different sample gradients on classification accuracy.
X-axis: the proportion of normal samples to tumor samples. Y-axis: classification accuracy using the given normal-to-tumor sample proportion. Box plot summarizes 100 randomized selections of normal and tumor samples.
Classification accuracy of different features. The five classification features of classification accuracy were shown, the first row refers to the classification accuracy of HSCDMs, the other four rows were the classification accuracy of individual gene sets classification method. BC represents breast cancer
| Feature | GSE5364 | GSE9574 | GSE15852 | GSE20437 | GSE27562 |
|---|---|---|---|---|---|
| Motifs | 0.9745 | 0.8276 | 0.8488 | 0.8095 | 0.9592 |
| Motifs’ genes | 0.9592 | 0.6552 | 0.8372 | 0.7857 | 0.9456 |
| BC genes | 0.9490 | 0.5517 | 0.7093 | 0.6905 | 0.8639 |
| Marker genes | 0.9694 | 0.6552 | 0.7674 | 0.6667 | 0.9184 |
| BC & Marker genes | 0.9745 | 0.5862 | 0.8488 | 0.6905 | 0.9184 |
Classification accuracy of different classifiers
| classifier | GSE5364 | GSE9574 | GSE15852 | GSE20437 |
|---|---|---|---|---|
| SVM | 0.9745 | 0.8276 | 0.8488 | 0.8095 |
| Bayes | 0.9388 | 0.7931 | 0.6977 | 0.7381 |
Figure 5The flowchart of SSECDM method.
μ indicates the average expression level of genes in HSCDM.