Literature DB >> 27625573

Biological Networks for Cancer Candidate Biomarkers Discovery.

Wenying Yan¹, Wenjin Xue², Jiajia Chen³, Guang Hu¹.

Abstract

Due to its extraordinary heterogeneity and complexity, cancer is often proposed as a model case of a systems biology disease or network disease. There is a critical need of effective biomarkers for cancer diagnosis and/or outcome prediction from system level analyses. Methods based on integrating omics data into networks have the potential to revolutionize the identification of cancer biomarkers. Deciphering the biological networks underlying cancer is undoubtedly important for understanding the molecular mechanisms of the disease and identifying effective biomarkers. In this review, the networks constructed for cancer biomarker discovery based on different omics level data are described and illustrated from recent advances in the field.

Entities: Chemical Disease Gene Species

Keywords: biomarker; cancer; network; omics

Year: 2016 PMID： 27625573 PMCID： PMC5012434 DOI： 10.4137/CIN.S39458

Source DB: PubMed Journal: Cancer Inform ISSN： 1176-9351

Background

Along with overall improvement in the quality of medical services and technologies, the prevention and treatment of cancer have been greatly improved over the past decades; however, death from cancer is still common and ever increasingly. In 2012, there were 8.2 and 2.2 million estimated cancer deaths worldwide and in China, respectively.1,2 Besides the high mortality rates, the pathophysiology of cancer is not completely understood. The heterogeneous properties of cancer still pose significant challenges for preventing, treating, and gaining a deep understanding of the pathological mechanisms of cancer; thus, an expedited discovery of effective biomarkers is of prime importance. During the past decade, extensive researches have been performed to identify molecular biomarkers for presymptomatic diagnosis, stratification of cancer subtype, assessment of cancer progression, prediction of patient response to therapy, and detection of recurrences.3–5 However, effective biomarkers of the oncogenic process remain poorly predictive of outcome and, therefore, are too unreliable for clinical application. Network biology has been widely used to represent, compute, and model intracellular interactions to gain insights into cellular mechanisms.6 The recent progress of network biology has provided new methods for cancer-related biomarker discovery.7 Network-based holistic analysis integrates multidimensional high throughput omics profiles from tumor tissues, blood specimens, or other samples and has vastly improved our knowledge of the molecular basis of carcinogenesis and improved our ability to identify novel cancer biomarkers.8,9 With the advances of different biological networks, including gene regulation networks, gene coexpression networks (GCNs), microRNA–mRNA networks, protein–protein interaction (PPI) networks, and advanced approaches for network construction, analysis, and interpretation, it is possible to discover reliable and accurate molecular biomarkers and sub-network biomarkers for monitoring cancer progression, treatment, and management, which will also pave the way toward the realization of biomarker-driven personalized medicine against cancer. Here, we give a comprehensive overview of the biological networks that have been used to identify the biomarkers from genomic, transcriptomic, and proteomic levels (Fig. 1). We also summarize the network-based tools for biomarker discovery (Table 1). The intention of this review is to provide an understanding of the potential benefits of using network analysis of complex systems for the biomarker discovery.

Figure 1

Biological networks used to biomarker discovery.

Table 1

Network-based analysis tools for biomarker discovery.

NAME	PLATFORM	INPUT	NETWORK TYPE	FUNCTION	CONSTRUCTION	REFERENCE
IPA	Web	Gene/protein list	Molecular network	Network construction, visualization, biomarkers annotation	Reference	www.ingenuity.com
MetaCore	Web	Gene/protein list	Molecular network	Network construction, visualization, biomarkers annotation	Reference	portal.genego.com
CCA	Pajek	Gene mutation data	CCA	Understanding the contribution of gene mutation to tumorigenesis	Genes that have significant co-occurrence or anti-co-occurrence with other genes (Fisher’s exact test)	Cui11
EPoC	RMatlab	Copy number aberration dataGene expression data	Gene network	Network construction, visualization, survival scores	The transcription of a gene is determined both by its own DNA copy number and the product of other genes (Differential equations)	Jornsten et al.12
HyperModules	Cytoscape	Disease mutation informationMolecular network	Gene network	Mutated gene modules, visualization	–	Leung et al.15
SDLS	R	Copy number aberrationGene expression data	Gene network	The correlations among gene expression and CNAs	Gene network (sparse double Laplacian shrinkage approach)	Shi et al.14
WGCNA	R	Gene expression data	GCN	Module detection, gene selection, density, cluster coefficient, visualization	Co-expression similarity for each gene pair (topological overlap measure)	Langfelder et al.20
DDN	MatlabCytoscape	Gene expression data	GRN	Topological changes between networks	Regulatory dependencies of genes (local dependency models which select the number of dependent variables automatically by the Lasso method)	Zhang et al.26,27
ActmiR	R	miRNA and gene expression data	microRNA regulatory network	Key microRNAs and miRNA-mediated regulatory networks	miRNA-target pairs (linear model, an iteratively reweighted least squares (IRLS) regression method to estimate coefficient in the linear model)	Lee et al.38
ancGWAS	R	GWAS data PPI	PPI	Significant disease sub-networks	Weighting the PPI network with Linkage Disequilibrium data (z-transforms)	Chimusa et al.16
atBioNet	Web	Gene/protein list	PPI	Functional modules, Page Rank, Degree Centrality, HITS, and betweenness	Knowledge based PPI (integrates seven publicly available PPI databases)	Ding et al.58
HyperPrior	matlab	ArrayCGH dataGene expression data PPI	PPI	Sample classification and biomarker selection	PPI (Hypergraph-based iterative learning method)	Tian et al.75

Genomic Level

A variety of genomic alterations, such as point mutations, copy number variations, and gene rearrangements, contribute to tumor formation and development. While genome-scale loss of function assays, such as Achilles Project that contains genome-scale loss of function screens in hundreds of cancer cell lines,10 also provide important resources for biomarker discovery. Several studies have used the direct genomic alteration data from primary tumors in biomarker research. Using only cancer gene mutation data, Cui11 built a gene network from the co-occurring and antico-occurring relationships between gene mutations (CCA network). The resulting CCA network had two complementary modules with distinct functions and roles in tumorigenesis. Genomic alteration information also plays a complementary role in the construction of molecular networks. Endogenous perturbation analysis of cancer is a causal network model that explains the transcriptional consequences of DNA copy number alterations to detect survival markers for glioblastoma.12,13 Shi et al.14 also developed a network model that combined copy number alteration and mRNA expression data using a sparse double Laplacian shrinkage (SDLS) method. The advantage of the SDLS is that it effectively accommodates correlations on both sides of the gene expression and copy number alteration regression. Based on gene mutation information, biomolecular interaction networks, and patient clinical information, Leung et al.15 developed a Cytoscape plug-in called HyperModules to clinically and phenotypically identify significant mutated gene modules as potential multivariate biomarkers for cancer. HyperModules can analyze diverse biomolecular interaction networks including gene regulatory networks (GRNs), PPI networks, and curated biological pathways. By integrating genome-wide association datasets and PPI data, Chimusa et al.16 presented an algebraic graph-based method (ancGWAS) to identify significant disease-specific sub-networks. ancGWAS can handle not only the linkage disequilibrium data as PPI weights but also other user-defined weights.

Transcriptomic Level

At the transcriptomic level, GCNs and transcription regulatory networks are most widely used for biomarker discovery. In GCN, two genes (nodes) are connected if there is a significant correlation (eg, P < 0.05) between expression levels of the genes across the samples. A series of GCNs were constructed for genes identified by a guilt-by-association algorithm, and the conserved gene interactions that impact cancer outcomes were detected.17 Malusa et al.18 built coexpression networks based on time-course gene profiling data to find key modules related to the demethylated retinoblastoma cell line, and Xing and Zeng19 also constructed a GCN for biomarker discovery in glioma. In weighted GCNs, the correlations can be weighted with an adjacency function. An R software package called weighted coexpression network analysis was developed for gene weighted correlation network analysis with the functions of construction of weighted GCN, detection of module, selection of gene, calculations of topological properties, and visualization based on properly processed and normalized microarray data.20 It has been used to identify a gene module associated with lung cancer,21 prognostic biomarkers for estrogen receptor-positive breast cancer treated with tamoxifen,22 and biomarkers for predicting the chemotherapy response in breast cancer.23 Yang et al.24 also used the weighted coexpression network analysis package to build weighted GCNs for four cancer types using data from the Cancer Genome Atlas and found common properties of prognostic genes across multiple cancer types. However, one limitation of this technology is that it is only applicable to undirected networks.20 The aim of GRNs is to mathematically capture the dependencies between transcriptional regulatory genes and their downstream targets.25 Zhang et al.26,27 proposed a local dependency method called differential dependency network (DDN) to find statistically significant topological changes in GRNs under two biological conditions, which provides an alternative mean for biomarker prediction. This method can capture topological changes even when the fold change in gene expression is not significantly different and can be executed both by the Matlab package and Cytocsape plug-in (CytoDDN). However, DDN only take account of linear relationships. Based on biological knowledge and conditional independency, GRNs can be reconstructed from microarray data and have been employed to predict prostate cancer-related genes and sub-networks.28 Using the GRN inferred by an algorithm for the reconstruction of accurate cellular networks, Remo et al.29 predicted NFAT5 as a novel marker of inflammatory breast cancer. Seifert et al.30 inferred signature-specific GRNs to distinguish different subtypes of astrocytoma, and Akutekwe and Seker31 developed a GRN construction method based on support vector regression and dynamic Bayesian network, which have been applied to time-course data in ovarian carcinoma. Due to the emerging role of noncoding RNAs, the networks containing noncoding RNAs have also made contributions to biomarker discovery. MicroRNA (miRNA) biomarkers offer a powerful alternative to protein-coding gene signatures and have the flexibility of gene expression signature classifiers.32 microRNA regulatory networks (miRNA–mRNA networks) represent the regulation patterns between microRNAs and genes. By using these networks, Zhang et al.33 identified that hsa-let-7i and its target genes play crucial roles in colorectal cancer metastasis, Canturk et al.34 found the network hub microRNAs and genes might be candidate markers for bladder cancer, Zafari et al.35 determined the disease-related and housekeeping microRNAs, and Sehgal et al.36 identified microRNAs that regulate functional pathways in multiple cancers. The combinatorial regulatory network that comprises the interactions between microRNAs and genes, transcription factors (TFs) and genes, and TFs and microRNA was constructed to identify key regulators that contribute to hepatocellular carcinoma metastasis.37 Lee et al.38 developed software called ActMiR to infer miRNA-mediated regulatory networks and the activity of microRNAs that could be as potential prognostic biomarkers of cancers based on expression data of microRNAs and their predicted target genes by regression models. This has proven to be a relative robust approach for modeling microRNA activity and was applied to multiple breast cancer data sets. More recently, the pipeline called pipeline of outlier microRNA analysis was constructed to identify candidate microRNAs by exploring the sub-structure of the microRNA regulatory network, which is constructed by integrating the miRNA–mRNA interaction database with microRNA and mRNA expression data. Pipeline of outlier microRNA analysis has already been applied to find candidate microRNA biomarkers in prostate cancer, clear cell renal cell carcinoma (ccRCC), and pediatric acute myeloid leukemia.39–41 Currently, most of the microRNA regulatory networks only take account of the relationships between microRNAs and mRNAs and lack the cooperative or synergetic effects between miRNAs. Long noncoding RNAs (lncRNAs) are nonprotein-coding transcripts longer than 200 nucleotides that can function as scaffolds for chromatin modification and transcriptional and posttranscriptional regulations42,43 and exhibit aberrant expression in various human cancers.44 Yang et al.45 built a lncRNA-coding GCN and indicated that lncRNAs and mRNAs may act as biomarkers for nasopharyngeal carcinoma. A coexpression network among differentially expressed lncRNA and mRNAs was also constructed for breast cancer patients and control group to identify the core genes of network as biomarkers for HER-2-enriched subtype breast cancer.46 There are more methods for constructing network-based models in cancer biomarker discovery by measuring the relationships between transcripts, such as feature selection-based genetic networks for lung cancer47 and Boolean implication networks for both normal and malignant tissues.48,49

Proteomic Level

Proteomics has been increasingly applied to cancer research, especially for biomarker discovery,50 and quite, a few tools for PPI networks analysis51 and PPI-based methods for cancer biomarker identification based on proteomic data have emerged. STRING is a well-known database to predict PPIs52 and has been used to find proteins and modules that related to the diethyl nitrosamine-induced progression of immune suppression and apoptosis resistance in hepatocellular carcinoma.53 STRING has additional applications related to biomarker discovery, eg, revealing the transition from early stage to late stage in colorectal cancer,54 detecting key genes related to inflammatory responses in bladder cancer,55 finding platelet-derived growth factor receptor beta as a biomarker from urinary for recurrence in bladder cancer,56 and identifying noninvasive blood-based diagnosis markers for pancreatic cancer.57 Ding et al.58 developed a web-based PPI network tool named atBioNet, which was created by integrating seven public PPI databases for biomarker discovery using a fast network-clustering algorithm called structural clustering algorithm for networks, to find disease-related functional modules in the PPI network. Functional modules are constructed at the time of the query in atBioNet, so it allows novel modules to be generated based on the input proteins/genes. atBioNet also has powerful network analysis and visualization tools, but the biological annotation is relatively weak, only relying on KEGG. Pradhan et al.59 generated a TF interaction network by text mining to find key TFs in colorectal cancer. Based on the theory of coevolution, Zhang et al.60 constructed PPI networks and identified seven key proteins in non-small cell lung cancer. Using phosphoproteomic data of breast cancer cells treated with transforming growth factor-β (TGF-β), Ahn et al.61 constructed a TGF-β-affected PPI network and detected sub-network markers for TGF-β treatment. Shen et al.62 identified that the caveolin-1 is a candidate biomarker for gastric cancer through the PPI network built by the Human Interaction Network for differentially expressed proteins between gastric cancer-associated fibroblasts and their corresponding inflammation-associated fibroblasts. Oh and Deasy63 constructed PPI networks for proteins selected by literature-based methods to identify chemoresistance-related genes and pathways in cancer. New methods that combine PPI networks and other omics data such as gene expression data are currently being used to identify biomarkers. Methods combining PPIs and gene expression profiles were applied to identify proteins associated with liver, lung, and brain metastases in breast cancer,64,65 novel interactions in pancreatic cancer,66 sub-network markers for breast cancer metastasis,67 protein biomarkers for lung cancer diagnosis68 and hepatocellular carcinoma diagnosis,69 proteins related with clinical outcome in patients with metastatic melanoma,70 biomarkers of early onset colorectal cancer,71 module markers for gastric cancer,72 and core and specific network markers of carcinogenesis in bladder, colorectal, liver, and lung cancers.73 Similarly, by taking significantly changed proteins between normal and late-staged colon cancer from two gel-based proteomics experiment as seeds, sub-networks were constructed by MetaCore, and the scores were calculated using gene expression data in the sub-networks to identify protein biomarkers.74 PPI could also be used as prior knowledge in a hypergraph-based semisupervised learning algorithm (HyperPrior) that using gene expression data and array-based comparative genomic hybridization data for sample classification and biomarker selection.75 HyperPrior is a natural extension of label propagation algorithms and can handle the problem of hyperedges’ learning optimal weighting. Recently, Zhang et al.76 built dynamic PPI networks to identify informative proteins based on PPI networks and gene expression data. He et al.77 used the PPI network that was formed by four public PPI databases as a background network and mapped the differentially expressed genes to the network to suggest three hub genes as potential markers for oral squamous cell carcinoma. Like GCNs, protein coexpression networks can also be constructed based on the correlation between protein expression levels. Based on this kind of network, Yang et al.78 predicted specific profiles of inflammatory mediators for non-small cell lung cancer. Analysis of the coexpressed proteins’ network between cytokines revealed biomarkers for HIV/HPV-associated anal cancer.79

Other Network-Based Methods

Recently, there has been a tendency to construct networks by integrating multiple omics level data for biomarker discovery. For example, a biomolecular network was constructed by combining GCN and STRING PPI database and was used to identify T-cell homing factors as the genes whose expression was significantly associated with disease-free survival in colorectal cancer.80 Butz et al built a gene interaction network by integrating mRNA, microRNA, and protein expression profiles of ccRCC and normal samples from 28 publications and identified three genes as potential biomarkers for ccRCC.81 Sehgal et al detected the key network modules from pathways that were enriched by differentially expressed genes for colorectal cancer.82 An SDLS method was used to construct a network that combined gene expression and copy number alteration data to describe the interconnections between genes.14 Xu et al.83 constructed a combinatorial network by integrating PPI networks, microRNA regulatory networks, and GRNs and took the network hubs as candidate biomarkers for hepatocellular carcinoma. High-throughput screening for drug sensitivity patterns are also frequently integrated into biomarker discovery, such as the drug–drug network that was employed in a systemic identification of genomic markers of drug sensitivity.84,85 Moreover, two commercially available web-based applications, MetaCore (portal.genego.com) and Ingenuity Pathway Analysis (IPA, www.ingenuity.com), were used to construct molecular networks that made contributions to biomarker discovery. Both software construct networks based on literature annotations and provide features for identifying the promising and relevant biomarker candidates within experimental data. MetaCore has helped identify radiation-specific biomarkers by constructing gene networks for the top 500 genes that were predicted by a linear regression model,86 finding novel proteins and functional sub-networks with an altered expression in prostate cancer,87 and building systems biology-based classifiers for hepatocellular carcinoma.88 Meanwhile, IPA has been used to generate connections between proteins identified by a fluorescence two-dimensional difference gel electrophoresis approach combined with matrix-assisted laser desorption/ionization time-of-flight tandem mass spectrometry (MALDI–ToF-MS/MS),89 contribute to finding differentiation-related biomarkers for head and neck cancer,90 predict plasma protein biomarkers for cervical cancer,45 and investigate tumor-specific changes of plasma proteins in hereditary breast cancer.91

Conclusions

The past decade has seen rapid developments in network models for cancer biomarker discovery. Different network-based methods have provided a new paradigm and hold great promise for the future study of cancer. However, network-based methods also have their limitations and disadvantages, which fall mainly into two categories. One is that most of the current methods lack effective validation, especially in large and multiple datasets, which is also the key problem for identifying efficient and clinical useful biomarkers. Another is that the power of integration for multiple level data is still relatively weak, and most of the methods can only integrate two or three different level data. Therefore, we still have several future challenges in the field of biomarker discovery based on the network approach. First, due to the heterogeneous properties of cancer, there are differential responses from individual biomarkers, making the identification of clinically useful and precise biomarkers for cancer diagnosis and predicting clinical outcomes quite difficult. Second, integration of multiscale omics data, cell level data, tissue level data, phenotype level data, and clinical data remains a major challenge in network medicine. The continued refinement of the algorithms and tools based on networks is critically needed and will have a significant impact on the development of personalized biomarkers. Third, the development of robust and standardized methods for the assessment of molecular biomarkers, especially the sub-network biomarkers, will be essential in the future. It is hoped that network-based approaches will guide treatment decisions and accelerate the development of personalized medicine for therapeutic regimens for cancer patients.

90 in total

1. ancGWAS: a post genome-wide association study method for interaction, pathway and ancestry analysis in homogeneous and admixed populations.

Authors: Emile R Chimusa; Mamana Mbiyavanga; Gaston K Mazandu; Nicola J Mulder
Journal: Bioinformatics Date: 2015-10-27 Impact factor: 6.937

2. Expression of endoplasmic reticulum stress proteins is a candidate marker of brain metastasis in both ErbB-2+ and ErbB-2- primary breast tumors.

Authors: Rebeca Sanz-Pamplona; Ramón Aragüés; Keltouma Driouch; Berta Martín; Baldo Oliva; Miguel Gil; Susana Boluda; Pedro L Fernández; Antonio Martínez; Víctor Moreno; Juan J Acebes; Rosette Lidereau; Fabien Reyal; Marc J Van de Vijver; Angels Sierra
Journal: Am J Pathol Date: 2011-06-25 Impact factor: 4.307

3. Integrative bioinformatics analysis reveals new prognostic biomarkers of clear cell renal cell carcinoma.

Authors: Henriett Butz; Peter M Szabó; Roy Nofech-Mozes; Fabio Rotondo; Kalman Kovacs; Lorna Mirham; Hala Girgis; Dina Boles; Attila Patocs; George M Yousef
Journal: Clin Chem Date: 2014-08-19 Impact factor: 8.327

4. Inference of nonlinear gene regulatory networks through optimized ensemble of support vector regression and dynamic Bayesian networks.

Authors: Arinze Akutekwe; Huseyin Seker
Journal: Conf Proc IEEE Eng Med Biol Soc Date: 2015-08

5. Structural templates predict novel protein interactions and targets from pancreas tumour gene expression data.

Authors: Gihan Dawelbait; Christof Winter; Yanju Zhang; Christian Pilarsky; Robert Grützmann; Jörg-Christian Heinrich; Michael Schroeder
Journal: Bioinformatics Date: 2007-07-01 Impact factor: 6.937

6. An integrated transcriptomic and computational analysis for biomarker identification in human glioma.

Authors: Wenli Xing; Chun Zeng
Journal: Tumour Biol Date: 2015-12-14

7. Identification of clinically relevant protein targets in prostate cancer with 2D-DIGE coupled mass spectrometry and systems biology network platform.

Authors: Ramesh Ummanni; Frederike Mundt; Heike Pospisil; Simone Venz; Christian Scharf; Christine Barett; Maria Fälth; Jens Köllermann; Reinhard Walther; Thorsten Schlomm; Guido Sauter; Carsten Bokemeyer; Holger Sültmann; A Schuppert; Tim H Brümmendorf; Stefan Balabanov
Journal: PLoS One Date: 2011-02-11 Impact factor: 3.240

8. Platelet-derived growth factor receptor beta: a novel urinary biomarker for recurrence of non-muscle-invasive bladder cancer.

Authors: Jiayu Feng; Weifeng He; Yajun Song; Ying Wang; Richard J Simpson; Xiaorong Zhang; Gaoxing Luo; Jun Wu; Chibing Huang
Journal: PLoS One Date: 2014-05-06 Impact factor: 3.240

9. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells.

Authors: Wanjuan Yang; Jorge Soares; Patricia Greninger; Elena J Edelman; Howard Lightfoot; Simon Forbes; Nidhi Bindal; Dave Beare; James A Smith; I Richard Thompson; Sridhar Ramaswamy; P Andrew Futreal; Daniel A Haber; Michael R Stratton; Cyril Benes; Ultan McDermott; Mathew J Garnett
Journal: Nucleic Acids Res Date: 2012-11-23 Impact factor: 16.971

10. Signal propagation in protein interaction network during colorectal cancer progression.

Authors: Yang Jiang; Tao Huang; Lei Chen; Yu-Fei Gao; Yudong Cai; Kuo-Chen Chou
Journal: Biomed Res Int Date: 2013-03-20 Impact factor: 3.411

3 in total

1. Improved annotation of Lutzomyia longipalpis genome using bioinformatics analysis.

Authors: Zhiyuan Yang; Ying Wu
Journal: PeerJ Date: 2019-10-09 Impact factor: 2.984

2. Identification of CCNB2 as A Potential Non-Invasive Breast Cancer Biomarker in Peripheral Blood Mononuclear Cells Using The Systems Biology Approach.

Authors: Raheleh Moradpoor; Hakimeh Zali; Ahmad Gharebaghian; Mohammad Esmaeil Akbari; Soheila Ajdari; Mona Salimi
Journal: Cell J Date: 2021-08-29 Impact factor: 2.479

Review 3. Network Biology Approaches to Achieve Precision Medicine in Inflammatory Bowel Disease.

Authors: John P Thomas; Dezso Modos; Tamas Korcsmaros; Johanne Brooks-Warburton
Journal: Front Genet Date: 2021-10-21 Impact factor: 4.599

3 in total