Literature DB >> 22570412

SurvNet: a web server for identifying network-based biomarkers that most correlate with patient survival data.

Jun Li¹, Paul Roebuck, Stefan Grünewald, Han Liang.

Abstract

An important task in biomedical research is identifying biomarkers that correlate with patient clinical data, and these biomarkers then provide a critical foundation for the diagnosis and treatment of disease. Conventionally, such an analysis is based on individual genes, but the results are often noisy and difficult to interpret. Using a biological network as the searching platform, network-based biomarkers are expected to be more robust and provide deep insights into the molecular mechanisms of disease. We have developed a novel bioinformatics web server for identifying network-based biomarkers that most correlate with patient survival data, SurvNet. The web server takes three input files: one biological network file, representing a gene regulatory or protein interaction network; one molecular profiling file, containing any type of gene- or protein-centred high-throughput biological data (e.g. microarray expression data or DNA methylation data); and one patient survival data file (e.g. patients' progression-free survival data). Given user-defined parameters, SurvNet will automatically search for subnetworks that most correlate with the observed patient survival data. As the output, SurvNet will generate a list of network biomarkers and display them through a user-friendly interface. SurvNet can be accessed at http://bioinformatics.mdanderson.org/main/SurvNet.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Biomarkers

Year: 2012 PMID： 22570412 PMCID： PMC3394266 DOI： 10.1093/nar/gks386

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

With the advance of genome characterization technology, high-throughput genomic and proteomic data of patients have accumulated rapidly, allowing the systematic identification of biomarkers (1–4). Biomarkers that correlate with patient survival data are of particular interest because they provide a critical foundation for the diagnosis and treatment of disease (5,6). Conventionally, such an analysis is based on individual genes. However, the results thereby obtained are often noisy and difficult to interpret the underlying mechanisms of disease. Biological networks (e.g. gene regulatory networks or protein interaction networks) represent a reasonable way to summarize the functional behaviours of components within a biological system (7–9). Therefore, using a biological network as the searching platform, network-based biomarkers (i.e. a group of functionally related genes or proteins) are expected to be more robust and provide valuable insights into the molecular mechanisms of disease. Previous studies (10,11) on this topic have focused on other clinical data (such as metastasis status), and the utility of patient survival data has not been explored. In this study, we introduce SurvNet, a novel bioinformatics web server for identifying network-based biomarkers that most correlate with patient survival data. The web server takes three input files: one biological network file, one molecular profiling file and one patient survival data file. In order to identify network-based biomarkers, SurvNet uses established algorithms (10–12) for searching and evaluating the biomarkers. As the output, SurvNet generates a user-friendly display of network-based biomarkers. We expect SurvNet to be a valuable bioinformatics tool for the biomedical community.

MATERIALS AND METHODS

The computational approach used by SurvNet to identify network-based biomarkers consists of three component processes: (i) a scoring function (combining the subnetwork property, molecular profile and patient survival data), (ii) a searching algorithm (for finding the candidate biomarkers) and (iii) an evaluation (validating the statistical significance of the biomarkers).

Scoring function

SurvNet first evaluates each gene (node) i by calculating the P-values p from a univariable Cox proportional hazards regression model (13,14), which quantifies how significantly the molecular profiling data of the gene correlate with the patient survival data. Then, each gene i is assigned a z-score s transformed from p, as the score for each node in the network, where Φ−1 is the inverse standard normal cumulative distribution function (12). For random data, p follows a uniform distribution from 0 to 1, and by the transformation, s follows a standard normal distribution, with smaller p corresponding to larger z-scores. The scoring function F of a subnetwork G with n genes is calculated by an aggregate z-score (12), where F follows a standard normal distribution if the s are independently drawn from a standard normal distribution. According to the formula, F is independent with a subnetwork of size n. Therefore, subnetworks with different sizes are comparable under this score function.

Searching algorithm

Because finding the connected subnetworks with the maximal score is NP-hard (12), SurvNet uses a greedy searching algorithm, as previously described (10–12). The searching starts from a seeded gene i and expands iteratively. The algorithm will terminate and output the candidate subnetworks if no candidate gene j around the current subnetwork G satisfies the following two conditions: (i) the number of edges in the shortest path between j and seeded gene i is smaller than or equal to δ and (ii) the score of subnetwork G with gene j is higher than (1 + ρ) * F, where δ and ρ are two pre-determined parameters. Specifically, δ is used to reduce the searching space and ρ is a fixed increasing rate, ensuring that a new gene added to the subnetwork must increase the network score F by a rate larger than or equal to ρ.

Evaluation

SurvNet evaluates the statistical significance of the subnetworks identified in the searching step, as previously described (12). It first uses random sampling to see if the score of a subnetwork is significantly higher than that of a random gene set in the network. To do so, SurvNet randomly samples gene sets with n genes 10 000 times. Then, the same scoring function is used to calculate the scores for the random gene sets. The population mean μ and standard deviation σ are estimated from the sampled gene sets. Finally, F is calibrated against this background distribution as follows: This calibrated score is the final network score for a subnetwork in the output. Moreover, since the multivariable Cox proportional hazards regression model is widely used to quantify the correlation between a group of genes and patient survival data, SurvNet also calculates the mutivariable Cox P-values for each subnetwork to validate their clinical utility. One potential advantage of SurvNet is to identify key disease genes that could have been missed through single-gene based analyses. For example, TP53 is a master cancer gene in ovarian carcinoma. Based on the protein expression and patient survival data from a recent study (1), TP53 protein, as a single node, shows no significant correlation with the patient survival, but a TP53-centered network is among the top biomarkers SurvNet identifies.

WEB SERVER

Input

The web server accepts three input files. The first one is a biological network file, representing a gene regulatory or protein interaction network [a human protein–protein interaction network (15) is provided as the default]. This file contains all the edges of a biological network, in which each line represents an edge. The second file is one molecular profiling file, containing any type of gene- or protein-centred high-throughput biological data (e.g. microarray-based gene expression data, reverse-phase protein array (16) protein expression data, DNA methylation data or gene mutation data). This file is a tab-separated numeric matrix, where the column names are the sample IDs and row names are gene IDs. The third file is one patient survival data file (e.g. patients’ overall survival time or progression-free survival time). This file has three columns, named ‘id’, ‘censor’ and ‘time’, respectively. After uploading the required input files, users can set the search distance (Figure 1A). This parameter defines the searching area in the network: start with each valid gene (or protein) node as the seed, SurvNet will automatically search for the optimal subnetwork(s) within this defined distance. SurvNet uses the same network searching algorithm that was previously described (10,11). A larger parameter will require a longer computation time. The default search distance is 2.

Figure 1.

Snapshots of the SurvNet web server. (A) Input page, through which input files and a search parameter can be specified. (B) Output page, on which the top subnetwork biomarkers identified are displayed. (C) Visualization page, on which subnetworks can be visualized in a user-friendly way.

Output

In the final output, the subnetworks that SurvNet identifies will first be displayed in a table format (Figure 1B). These results can be directly downloaded. The network files are in a ‘.dot’ format that can be visualized by GraphViz (http://www.graphviz.org). As shown in Figure 1C, the identified subnetworks are ranked within the table according to the network score, from high to low score. Each score is associated with the output items that follow. The network score, which quantifies how significantly the nodes in a subnetwork correlate with the observed patient survival data, is calculated based on the univariable Cox proportional hazards model and the network properties. A higher network score indicates a more significant correlation between the network and the patient survival time. The ‘Cox P-value’ is the P-value derived from the multivariable Cox proportional hazards regression model. The ‘gene_ID’ indicates the seeded node for each subnetwork; the number of nodes indicates how many genes (or proteins) are in the subnetwork; and the number of edges indicates how many interactions are in the subnetwork. Users can further narrow down the results by two output parameters: network P-value cut-off and minimal number nodes. The network P-value cut-off determines how significant the returned subnetworks are compared to the random background; and the default significance level is 0.05. Minimal number nodes determine the minimal number of nodes in a subnetwork; the default value is 2. After clicking the ‘Graph’ button in the final output page, users can visualize an identified subnetwork in a user-friendly Java applet that allows them to pan/zoom, search and retrieve useful information (from GeneCard) for a node of interest. A detailed description about the Java applet is available under the visualization page.

CONCLUSION

We have developed SurvNet, a web server that can efficiently identify network-based biomarkers that most correlate with patient survival data. To the best of our knowledge, SurvNet is the only available bioinformatics tool for this function. SurvNet uses the network-based biomarker searching algorithms that were established in previous studies, and provides a user-friendly interface for exploring the identified biomarkers. We expect SurvNet to be a valuable resource for generating meaningful hypotheses for disease diagnosis and treatment.

FUNDING

The University of Texas MD Anderson Cancer Center, the National Institutes of Health [U24 CA143883 and P30 CA016672]; G.S. Hogan Gastrointestinal Research fund and the Lorraine Dell Program in Bioinformatics for Personalization of Cancer Medicine (to H.L.). Funding for open access charge: The University of Texas MD Anderson Cancer Center. Conflict of interest statement. None declared.

12 in total

1. Discovering regulatory and signalling circuits in molecular interaction networks.

Authors: Trey Ideker; Owen Ozier; Benno Schwikowski; Andrew F Siegel
Journal: Bioinformatics Date: 2002 Impact factor: 6.937

Review 2. Network biology: understanding the cell's functional organization.

Authors: Albert-László Barabási; Zoltán N Oltvai
Journal: Nat Rev Genet Date: 2004-02 Impact factor: 53.242

3. dmGWAS: dense module searching for genome-wide association studies in protein-protein interaction networks.

Authors: Peilin Jia; Siyuan Zheng; Jirong Long; Wei Zheng; Zhongming Zhao
Journal: Bioinformatics Date: 2010-11-02 Impact factor: 6.937

4. Reverse phase protein array: validation of a novel proteomic technology and utility for analysis of primary leukemia specimens and hematopoietic stem cells.

Authors: Raoul Tibes; Yihua Qiu; Yiling Lu; Bryan Hennessy; Michael Andreeff; Gordon B Mills; Steven M Kornblau
Journal: Mol Cancer Ther Date: 2006-10 Impact factor: 6.261

Review 5. Molecular approaches to personalizing management of ovarian cancer.

Authors: R C Bast
Journal: Ann Oncol Date: 2011-12 Impact factor: 32.976

Review 6. Proteomics of gliomas: initial biomarker discovery and evolution of technology.

Authors: Juliya Kalinina; Junmin Peng; James C Ritchie; Erwin G Van Meir
Journal: Neuro Oncol Date: 2011-09 Impact factor: 12.300

7. Towards patient-based cancer therapeutics.

Authors: Stuart L Schreiber; Alykhan F Shamji; Paul A Clemons; Cindy Hon; Angela N Koehler; Benito Munoz; Michelle Palmer; Andrew M Stern; Bridget K Wagner; Scott Powers; Scott W Lowe; Xuecui Guo; Alex Krasnitz; Eric T Sawey; Raffaella Sordella; Lincoln Stein; Lloyd C Trotman; Andrea Califano; Riccardo Dalla-Favera; Adolfo Ferrando; Antonio Iavarone; Laura Pasqualucci; José Silva; Brent R Stockwell; William C Hahn; Lynda Chin; Ronald A DePinho; Jesse S Boehm; Shuba Gopal; Alan Huang; David E Root; Barbara A Weir; Daniela S Gerhard; Jean Claude Zenklusen; Michael G Roth; Michael A White; John D Minna; John B MacMillan; Bruce A Posner
Journal: Nat Biotechnol Date: 2010-09 Impact factor: 54.908

8. Discover protein complexes in protein-protein interaction networks using parametric local modularity.

Authors: Jongkwang Kim; Kai Tan
Journal: BMC Bioinformatics Date: 2010-10-19 Impact factor: 3.169

9. Human Protein Reference Database--2009 update.

Authors: T S Keshava Prasad; Renu Goel; Kumaran Kandasamy; Shivakumar Keerthikumar; Sameer Kumar; Suresh Mathivanan; Deepthi Telikicherla; Rajesh Raju; Beema Shafreen; Abhilash Venugopal; Lavanya Balakrishnan; Arivusudar Marimuthu; Sutopa Banerjee; Devi S Somanathan; Aimy Sebastian; Sandhya Rani; Somak Ray; C J Harrys Kishore; Sashi Kanth; Mukhtar Ahmed; Manoj K Kashyap; Riaz Mohmood; Y L Ramachandra; V Krishna; B Abdul Rahiman; Sujatha Mohan; Prathibha Ranganathan; Subhashri Ramabadran; Raghothama Chaerkady; Akhilesh Pandey
Journal: Nucleic Acids Res Date: 2008-11-06 Impact factor: 16.971

10. Network-based classification of breast cancer metastasis.

Authors: Han-Yu Chuang; Eunjung Lee; Yu-Tsueng Liu; Doheon Lee; Trey Ideker
Journal: Mol Syst Biol Date: 2007-10-16 Impact factor: 11.429

10 in total

1. Network-based analysis identifies epigenetic biomarkers of esophageal squamous cell carcinoma progression.

Authors: Chun-Pei Cheng; I-Ying Kuo; Hakan Alakus; Kelly A Frazer; Olivier Harismendy; Yi-Ching Wang; Vincent S Tseng
Journal: Bioinformatics Date: 2014-07-10 Impact factor: 6.937

2. Discover the molecular biomarker associated with cell death and extracellular matrix module in ovarian cancer.

Authors: Qiang Liu; Jianxin Guo; Jinghong Cui; Jing Wang; Ping Yi
Journal: Biomed Res Int Date: 2015-03-16 Impact factor: 3.411

3. Network-based survival-associated module biomarker and its crosstalk with cell death genes in ovarian cancer.

Authors: Nana Jin; Hao Wu; Zhengqiang Miao; Yan Huang; Yongfei Hu; Xiaoman Bi; Deng Wu; Kun Qian; Liqiang Wang; Changliang Wang; Hongwei Wang; Kongning Li; Xia Li; Dong Wang
Journal: Sci Rep Date: 2015-06-23 Impact factor: 4.379

Review 4. Pathway mapping and development of disease-specific biomarkers: protein-based network biomarkers.

Authors: Hao Chen; Zhitu Zhu; Yichun Zhu; Jian Wang; Yunqing Mei; Yunfeng Cheng
Journal: J Cell Mol Med Date: 2015-01-05 Impact factor: 5.310

5. A network-based method for identifying prognostic gene modules in lung squamous carcinoma.

Authors: Lin Feng; Run Tong; Xiaohong Liu; Kaitai Zhang; Guiqi Wang; Lei Zhang; Ning An; Shujun Cheng
Journal: Oncotarget Date: 2016-04-05

6. Identification of a multi-cancer gene expression biomarker for cancer clinical outcomes using a network-based algorithm.

Authors: Emmanuel Martinez-Ledesma; Roeland G W Verhaak; Victor Treviño
Journal: Sci Rep Date: 2015-07-23 Impact factor: 4.379

7. Prognostic Implications and Immune Infiltration Characteristics of Chromosomal Instability-Related Dysregulated CeRNA in Lung Adenocarcinoma.

Authors: Shengnan Guo; Tianhao Li; Dahua Xu; Jiankai Xu; Hong Wang; Jian Li; Xiaoman Bi; Meng Cao; Zhizhou Xu; Qianfeng Xia; Ying Cui; Kongning Li
Journal: Front Mol Biosci Date: 2022-03-28

8. ExSurv: A Web Resource for Prognostic Analyses of Exons Across Human Cancers Using Clinical Transcriptomes.

Authors: Seyedsasan Hashemikhabir; Gungor Budak; Sarath Chandra Janga
Journal: Cancer Inform Date: 2016-08-07

9. Integration of gene interaction information into a reweighted random survival forest approach for accurate survival prediction and survival biomarker discovery.

Authors: Wei Wang; Wei Liu
Journal: Sci Rep Date: 2018-09-04 Impact factor: 4.379

10. EMT network-based feature selection improves prognosis prediction in lung adenocarcinoma.

Authors: Borong Shao; Maria Moksnes Bjaanæs; Åslaug Helland; Christof Schütte; Tim Conrad
Journal: PLoS One Date: 2019-01-31 Impact factor: 3.240

10 in total