| Literature DB >> 28117655 |
Narumol Doungpan1, Worrawat Engchuan2, Jonathan H Chan3, Asawin Meechai4.
Abstract
BACKGROUND: Gene expression has been used to identify disease gene biomarkers, but there are ongoing challenges. Single gene or gene-set biomarkers are inadequate to provide sufficient understanding of complex disease mechanisms and the relationship among those genes. Network-based methods have thus been considered for inferring the interaction within a group of genes to further study the disease mechanism. Recently, the Gene-Network-based Feature Set (GNFS), which is capable of handling case-control and multiclass expression for gene biomarker identification, has been proposed, partly taking into account of network topology. However, its performance relies on a greedy search for building subnetworks and thus requires further improvement. In this work, we establish a new approach named Gene Sub-Network-based Feature Selection (GSNFS) by implementing the GNFS framework with two proposed searching and scoring algorithms, namely gene-set-based (GS) search and parent-node-based (PN) search, to identify subnetworks. An additional dataset is used to validate the results.Entities:
Mesh:
Substances:
Year: 2016 PMID: 28117655 PMCID: PMC5260788 DOI: 10.1186/s12920-016-0231-4
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1GSNFS framework for gene subnetwork identification. For each gene-set, the expression data was integrated with network data to identify gene subnetwork biomarker for phenotype outcome classification. The improvement of the existing algorithm focused on subnetwork expansion procedure to aggregate significant genes. Two searching methods (GS and PN search) were implemented as searching algorithm in the GSNFS method. GS search treats seed nodes in the ith iteration as a set of the current subnetwork (i, ii) and searches all neighbours of the current subnetwork while PS search only looks for neighbours of a particular gene member in the current subnetwork (i or ii) bypassing genes that are already accounted for the current subnetwork (in this diagram is gene (i) and gene (ii))
The homogeneity of datasets
| Data |
|
|
|---|---|---|
| Lung1*2 | 0.081 | 0.6 |
| Lung1*3 | 0.35 | 0.12 |
| Lung1*4 | 0.27 | 0.34 |
| Lung2*3 | 0.27 | 0.14 |
| Lung2*4 | 0.34 | 0.64 |
| Lung3*4 | 0.56 | 0.029 |
The homogeneity of each pair of independent datasets was measured and presented in terms of Purity and Purity . The low Purity and high Purity present the high compatibility of the two datasets for further cross-dataset validation
The number of identified gene subnetworks using GGI from PathwayAPI and PPI network on different three approaches
| Data | GGI | PPI | ||||
|---|---|---|---|---|---|---|
| GNFS-greedy [ | GSNFS-GS | GSNFS-PN | GNFS-greedy [ | GSNFS-GS | GSNFS-PN | |
| Lung1 | 65 | 84 | 94 | 40 | 46 | 62 |
| Lung2 | 49 | 65 | 92 | 29 | 34 | 42 |
| Lung3 | 7 | 4 | 53 | 6 | 11 | 25 |
| Lung4 | 16 | 23 | 46 | 14 | 20 | 35 |
Evaluating the resulting subnetworks based on gene level/gene-set level agreement
| Data | GGI | PPI | ||||
|---|---|---|---|---|---|---|
| GNFS-greedy [ | GSNFS-GS | GSNFS-PN | GNFS-greedy [ | GSNFS-GS | GSNFS-PN | |
| Gene level | ||||||
| Lung1*2 | 0.099 |
| 0.118 |
| 0.140 | 0.145 |
| Lung1*3 | 0.023 | 0.004 |
| 0.047 |
| 0.085 |
| Lung1*4 | 0.047 | 0.084 |
| 0.1 | 0.108 |
|
| Lung2*3 | 0.043 | 0.019 |
| 0.081 | 0.104 |
|
| Lung2*4 | 0.091 | 0.105 |
| 0.087 |
| 0.076 |
| Lung3*4 | 0.037 | 0 |
|
| 0.076 | 0.104 |
| Gene-set level | ||||||
| Lung1*2 | 0.2 | 0.319 |
| 0.23 | 0.23 |
|
| Lung1*3 | 0.058 | 0.0115 |
| 0 | 0.056 |
|
| Lung1*4 | 0.125 | 0.103 |
| 0.08 | 0.179 |
|
| Lung2*3 | 0.037 | 0.015 |
| 0.059 | 0.125 |
|
| Lung2*4 | 0.083 | 0.219 |
| 0 |
| 0.167 |
| Lung3*4 | 0.095 | 0 |
| 0.111 | 0.148 |
|
The gene/gene-set level agreement is a ratio between the number of common genes/gene-sets found in two datasets and the total number of genes/gene-sets identified from the two datasets. The gene/gene-set level agreements were calculated for different searching strategies and presented above
Classification performance of cross-dataset validation using GGI from PathwayAPI and PPI network
| Data | GGI | PPI | ||||
|---|---|---|---|---|---|---|
| GNFS-greedy [ | GSNFS-GS | GSNFS-PN | GNFS-greedy [ | GSNFS-GS | GSNFS-PN | |
| Lung1*2 |
| 0.704 | 0.546 | 0.634 |
| 0.717 |
| Lung1*3 | 0.54 |
| 0.546 |
| 0.481 | 0.535 |
| Lung1*4 | 0.75 |
| 0.635 | 0.519 | 0.635 |
|
| Lung2*1 | 0.834 |
| 0.578 | 0.724 |
| 0.689 |
| Lung2*3 | 0.479 |
| 0.532 | 0.53 |
| 0.537 |
| Lung2*4 |
| 0.827 | 0.692 |
| 0.712 | 0.635 |
| Lung3*1 | 0.275 | 0.539 |
|
| 0.607 | 0.647 |
| Lung3*2 | 0.378 | 0.624 |
| 0.35 | 0.526 |
|
| Lung3*4 | 0.481 |
| 0.635 | 0.462 |
| 0.615 |
| Lung4*1 |
| 0.848 | 0.69 | 0.563 | 0.693 |
|
| Lung4*2 | 0.805 |
| 0.77 | 0.583 | 0.72 |
|
| Lung4*3 | 0.603 | 0.549 |
| 0.595 | 0.608 |
|
The classification performance of the identified subnetworks resulted from different searching strategies in the subnetwork identification
Common genes found in the identified subnetworks with at least three gene members across four datasets by using three different approaches
| GGI | PPI | ||||
|---|---|---|---|---|---|
| GNFS-greedy [ | GSNFS-GS | GSNFS-PN | GNFS-greedy [ | GSNFS-GS | GSNFS-PN |
| MAPK1 | None | MAPK1 | MAPK1 |
|
|
| MAP2K3 | TNFRSF1A | MAPK1 | MAPK1 | ||
| MAP2K4 | SHC1 | TNFRSF1A | TNFRSF1A | ||
| PRKCA | SOS2 | MAP3K7 | |||
| STAT3 | |||||
All gene members in gene subnetworks with at least three gene members were inspected for the common genes obtained from the identified gene subnetworks across four datasets by using different searching strategies and different network data in subnetwork identification. The genes in italic indicate the important genes in lung cancer