| Literature DB >> 28122019 |
Xu Shi1, Sharmi Banerjee1, Li Chen2, Leena Hilakivi-Clarke3, Robert Clarke3, Jianhua Xuan1.
Abstract
One of the important tasks in cancer research is to identify biomarkers and build classification models for clinical outcome prediction. In this paper, we develop a CyNetSVM software package, implemented in Java and integrated with Cytoscape as an app, to identify network biomarkers using network-constrained support vector machines (NetSVM). The Cytoscape app of NetSVM is specifically designed to improve the usability of NetSVM with the following enhancements: (1) user-friendly graphical user interface (GUI), (2) computationally efficient core program and (3) convenient network visualization capability. The CyNetSVM app has been used to analyze breast cancer data to identify network genes associated with breast cancer recurrence. The biological function of these network genes is enriched in signaling pathways associated with breast cancer progression, showing the effectiveness of CyNetSVM for cancer biomarker identification. The CyNetSVM package is available at Cytoscape App Store and http://sourceforge.net/projects/netsvmjava; a sample data set is also provided at sourceforge.net.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28122019 PMCID: PMC5266326 DOI: 10.1371/journal.pone.0170482
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1An overview of the CyNetSVM app.
Fig 2Screenshot of the CyNetSVM app.
Input Data of CyNetSVM.
| Data | Format | Description |
|---|---|---|
| Protein-protein interaction data | TSV | Protein interaction networks |
| Gene expression data | GCT | Microarray gene expression data |
| Group 1 index | TSV | Group 1 (e.g., early recurrence) (value = 1) |
| Group 2 index | TSV | Group 2 (e.g., late recurrence) (value = 2) |
| Gene id | TSV | Gene list of interest |
| Gene product location | TSV | File containing gene Entrez ID, gene symbol, gene product location, and participated pathways in the cell |
Means and standard deviations of accuracy for phenotype prediction and AUC for network identification on simulation data with different SNR.
| SNR (dB) | Phenotype prediction (accuracy) | Network identification (AUC) | ||||
|---|---|---|---|---|---|---|
| CyNetSVM | NetSVM | SVM | CyNetSVM | NetSVM | SVM | |
| 10 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.86 ± 0.04 | 0.85 ± 0.03 | 0.76 ± 0.04 |
| 8 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.83 ± 0.05 | 0.84 ± 0.06 | 0.76 ± 0.04 |
| 6 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.83 ± 0.04 | 0.84 ± 0.04 | 0.77 ± 0.03 |
| 4 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.83 ± 0.06 | 0.84 ± 0.06 | 0.77 ± 0.03 |
| 2 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.83 ± 0.05 | 0.83 ± 0.06 | 0.76 ± 0.03 |
| 0 | 0.99 ± 0.01 | 0.99 ± 0.01 | 0.99 ± 0.01 | 0.83 ± 0.04 | 0.83 ± 0.04 | 0.77 ± 0.04 |
| -2 | 0.98 ± 0.02 | 0.98 ± 0.02 | 0.99 ± 0.01 | 0.81 ± 0.03 | 0.80 ± 0.04 | 0.74 ± 0.05 |
| -4 | 0.91 ± 0.02 | 0.91 ± 0.03 | 0.91 ± 0.02 | 0.79 ± 0.04 | 0.79 ± 0.07 | 0.72 ± 0.02 |
| -6 | 0.85 ± 0.03 | 0.85 ± 0.03 | 0.83 ± 0.04 | 0.79 ± 0.06 | 0.79 ± 0.06 | 0.71 ± 0.03 |
| -8 | 0.77 ± 0.06 | 0.78 ± 0.06 | 0.78 ± 0.06 | 0.79 ± 0.06 | 0.79 ± 0.07 | 0.72 ± 0.04 |
| -10 | 0.71 ± 0.06 | 0.70 ± 0.05 | 0.71 ± 0.05 | 0.70 ± 0.05 | 0.71 ± 0.05 | 0.68 ± 0.03 |
Fig 3Network identified from Loi et al. data.
Functional enrichment of genes identified from Loi et al. data in signaling pathways and associated p-values.
| Pathway | Genes | P-value |
|---|---|---|
| FOXO signaling pathway | AKT1,CREBBP,SMAD2,SMAD4,FOXO3,IGF1R,MAPK10,MAPK9,PLK1,USP7 | 1.1E-6 |
| MAPK signaling pathway | AKT1,RASA1,CDC42,FLNA,FLNB,HSPA1B,HSPA8,HSPB1,MAPK10,MAPK9 | 1.2E-4 |
| Ras signaling pathway | AKT1,RASA1,CDC42,GRIN1,GRIN2B,IGF1R,MAPK10,MAPK9 | 1.1E-3 |
| TGF-Beta signaling pathway | CREBBP,SMAD2,SMAD4,SP1,THBS1 | 1.3E-3 |
| Estrogen signaling pathway | AKT1,GNAI2,SP1,HSPA1B,HSPA8 | 2.3E-3 |
| Wnt signaling pathway | CREBBP,SMAD4,DVL2,MAPK10,MAPK9 | 6.9E-3 |
| ErbB signaling pathway | AKT1,ERBB2,MAPK10,MAPK9 | 1.2E-2 |
Fig 4ROC curve of the classification of patients in Loi et al. data.
Fig 5Network identified from METABRIC discovery data.
Functional enrichment of genes identified from the discovery dataset in signaling pathways and associated p-values.
| Pathway | Genes | P-value |
|---|---|---|
| Estrogen signaling pathway | AKT1,GNAI1,GNAI2,JUN,SRC,ESR1,GRB2,MAPK1 | 5.6E-6 |
| Ras signaling pathway | AKT1,RELA,GRB2,KDR,MAPK1,NF1,PDGFB,PRKCB,RGL1 | 2.5E-4 |
| ErbB signaling pathway | AKT1,JUN,SRC,GRB2,MAPK1,PRKCB | 3.2E-4 |
| MAPK signaling pathway | AKT1,JUN,RELA,GRB2,MAPK1,MAPKAPK2,NF1,PDGFB,PRKCB | 5.2E-4 |
| TGF-Beta signaling pathway | CREBBP,TGIF1,BMP6,GDF6,MAPK1 | 1.3E-3 |
| Wnt signaling pathway | CREBBP,JUN,CTNNB1,DVL2,PSEN1,PRKCB | 1.4E-3 |
| FOXO signaling pathway | AKT1,CREBBP,GRB2,MAPK1,STAT3 | 9.0E-3 |
Fig 6ROC curve of the classification of patients in METABRIC validation data.
Computational time of the CyNetSVM app as tested with different network sizes and cross-validation folds.
| No. of nodes | No. of edges | Average node degree | Cross-validation folds | Time (sec) |
|---|---|---|---|---|
| 100 | 143 | 2.86 | 5 | 3.1 |
| 100 | 143 | 2.86 | 10 | 3.7 |
| 300 | 553 | 3.69 | 5 | 4.5 |
| 300 | 553 | 3.69 | 10 | 5.4 |
| 500 | 2162 | 8.65 | 5 | 8.3 |
| 500 | 2162 | 8.65 | 10 | 9.2 |
| 1000 | 2181 | 4.36 | 5 | 60.1 |
| 1000 | 2181 | 4.36 | 10 | 64.3 |
| 1000 | 3539 | 7.20 | 5 | 60.9 |
| 1000 | 3539 | 7.20 | 10 | 65.2 |
| 1000 | 4919 | 9.84 | 5 | 59.7 |
| 1000 | 4919 | 9.84 | 10 | 64.8 |
| 2545 | 15094 | 11.86 | 5 | 1856.3 |
| 2545 | 15094 | 11.86 | 10 | 1874.9 |
| 5000 | 19207 | 7.68 | 5 | 2980.3 |
| 5000 | 19207 | 7.68 | 10 | 2998.8 |
| 9673 | 40563 | 8.39 | 5 | 21653.5 |
| 9673 | 40563 | 8.39 | 10 | 21706.3 |