| Literature DB >> 26044366 |
Jingxue Xin, Xianwen Ren, Luonan Chen, Yong Wang.
Abstract
Identifying effective biomarkers to battle complex diseases is an important but challenging task in biomedical research today. Molecular data of complex diseases is increasingly abundant due to the rapid advance of high throughput technologies. However, a great gap remains in identifying the massive molecular data to phenotypic changes, in particular, at a network level, i.e., a novel method for identifying network biomarkers is in pressing need to accurately classify and diagnose diseases from molecular data and shed light on the mechanisms of disease pathogenesis. Rather than seeking differential genes at an individual-molecule level, here we propose a novel method for identifying network biomarkers based on protein-protein interaction affinity (PPIA), which identify the differential interactions at a network level. Specifically, we firstly define PPIAs by estimating the concentrations of protein complexes based on the law of mass action upon gene expression data. Then we select a small and non-redundant group of protein-protein interactions and single proteins according to the PPIAs, that maximizes the discerning ability of cases from controls. This method is mathematically formulated as a linear programming, which can be efficiently solved and guarantees a globally optimal solution. Extensive results on experimental data in breast cancer demonstrate the effectiveness and efficiency of the proposed method for identifying network biomarkers, which not only can accurately distinguish the phenotypes but also provides significant biological insights at a network or pathway level. In addition, our method provides a new way to integrate static protein-protein interaction information with dynamical gene expression data.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26044366 PMCID: PMC4460625 DOI: 10.1186/1755-8794-8-S2-S11
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Figure 1The workflow for PPIA + ellipsoidFN method. (A1) Gene expression data for cancer (rows are genes and columns are samples with multiple different types) and human protein-protein interaction network. (A2) Combining gene expression data with human PPI network by product approximation, where each protein-protein interaction affinity Ei can be computed based on the law of mass action. The rows are edges of PPI or gene nodes and the columns are samples. (A3) By the PPIA+ellipsoidFN method, different cancer types or normal and disease case can be represented by different ellipsoids, and the distances among these ellipsoids are maximized. (A4) PPIA network biomarkers are identified by the PPIA+ellipsoidFN method to classify different phenotypes.
Figure 2Biomarkers identified by PPIA + ellipsoidFN and DEG + ellipsoidFN for the breast cancer data GSE10797. Network biomarkers identified by PPIA + ellipsoidFN contain 3 proteins and 6 protein-protein interactions including 14 genes, while DEG + ellipsoidFN identified 22 genes. 4 genes are in common for the two methods.
Performance comparison among various methods.
| A | |||||
|---|---|---|---|---|---|
| Two-class case | GSE10797 | DEG+ellipsoidFN | 93.94% (62/66) | 0.2090 | 22 |
| GSE7904 | DEG+ellipsoidFN | 98.39% (61/62) | 0.3698 | 18 | |
| Multiple class | GSE18229 | DEG+ellipsoidFN | 80.00% (248/310) | 0.1532 | 170 |
| GSE10797 | DEG+ellipsoidFN | 78.79% (52/66) | 0.1663 | 161 | |
| Two-class case | GSE10797 | t-test | 0.2715 | 13 | |
| GSE7904 | t-test | 0.3077 | 27 | ||
| Multiple class | GSE18229 | F-test | 0.3031 | 15 | |
| GSE10797 | F-test | 0.3280 | 11 | ||
(A) Comparing PPIA + ellipsoidFN method with DEG + ellipsoidFN in predicting accuracy, redundancy score, and the number of genes identified. (B) Comparisons on PPIA + ellipsoidFN and t-test for two-class case and PPIA + ellipsoidFN versus F-test for multiple-class case based on 50 top PPIs and proteins.
Figure 3Comparison on pathways enriched by biomarkers identified from PPIA + ellipsoidFN and DEG + ellipsoidFN method. For two biomarker sets identified by the two methods respectively, we used KEGG pathway enrichment search to find several pathways according to their p-values. Then we sort these p-values in ascending order and make a comparison between the two methods on dataset GSE10797 for both two-class and four-class cases. X-axis denotes pathways, while y-axis denotes the p-value of each corresponding pathway. For example, the blue line in the figure on the top shows there are five pathways in total enriched by the node biomarkers whose p-value is lower than 0.1. The small number of pathways that are significantly enriched may result from the fact that there are few genes in the biomarker set and their non-redundant property. From the figure we can easily see that network biomarkers identified by PPIA + ellipsoidFN method tends to enriched to more pathways with lower p-value.
KEGG pathway and DAVID functional analysis results for PPIA + ellipsoidFN method and DEG + ellipsoidFN method on data set GSE10797 for two class case.
| PPIA + ellipsoidFN | DEG + ellipsoidFN | |
|---|---|---|
| KEGG pathways | Renal cell carcinoma | Focal adhesion |
| DAVID functional analysis | GOTERM_MF_FAT transcription factor binding | GOTERM_BP_FAT peptide cross-linking |
The top 5 terms are listed and ranked by their p-values.