| Literature DB >> 27502015 |
Abstract
Traditionally, biologists have devoted their careers to studying individual biological entities of their own interest, partly due to lack of available data regarding that entity. Large, highthroughput data, too complex for conventional processing methods (i.e., "big data"), has accumulated in cancer biology, which is freely available in public data repositories. Such challenges urge biologists to inspect their biological entities of interest using novel approaches, firstly including repository data retrieval. Essentially, these revolutionary changes demand new interpretations of huge datasets at a systems-level, by so called "systems biology". One of the representative applications of systems biology is to generate a biological network from high-throughput big data, providing a global map of molecular events associated with specific phenotype changes. In this review, we introduce the repositories of cancer big data and cutting-edge systems biology tools for network generation, and improved identification of therapeutic targets. [BMB Reports 2017; 50(1): 12-19].Entities:
Mesh:
Year: 2017 PMID: 27502015 PMCID: PMC5319659 DOI: 10.5483/bmbrep.2017.50.1.135
Source DB: PubMed Journal: BMB Rep ISSN: 1976-6696 Impact factor: 4.778
Fig. 1Systems biology, databases, and network generation. (A) The diversity of types of high-throughput data (genomics, epigenomics, transcriptomics, proteomics, metabolomics) available. The relationships among the data types are connected by edges. (B) The flow (represented by “edges”) of genetic information from DNA to protein is aligned with the diverse data types. Public repositories corresponding to each data type are listed (further description in Table 1). (C) Network differences between correlation-based approaches and Bayesian networks approaches. The correlation (or mutual information) oriented tools, ARACNE (39) and WGCNA (36), do not report directions of edges in networks. Bayesian-driven networks naturally reveal directed edges among the network entries. In other words, the undirected network (in left of the grey-shaded triangular) having G1, G2, and G3 entries by ARACNE and WGCNA can be differentiated into directed networks (in the right of the grey-shaded triangular), using Bayesian networks tools (48–51).
Cancer-related, high-throughput data repositories. The databases in Fig. 1B are described with additional information including the number of available data sets, data types, and websites. The number of entries is deemed valid as of 05/02/2016
| Names | Description | Address | Cancer relating data |
|---|---|---|---|
| TCGA | The Cancer Genome Atlas (TCGA): now one of programs organized by newly established NCI’s Center for Cancer Genomics ( | 34 cancer studies (types), 11,091 samples | |
| dbGaP | The database of Genotypes and Phenotypes (dbGaP): archive of genome and phenotype in human | 991 datasets | |
| SRA | Sequence Read Archive (SRA): raw sequencing files and alignment files from next generation sequencing | 1,950 cancer studies | |
| cBioPortal | Multi-functional platform: supporting intuitive visualization, literate clinical pie chart, and simple data access ( | 126 cancer genomics studies, 26,080 samples | |
| ICGC | The International Cancer Genome Consortium (ICGC): global-scale cancer projects ( | 66 cancer projects, 17,867 donors | |
| ArrayExpress | An archive of functional genomics data ( | 14,974 datasets | |
| EGA | The European Genome-phenome Archive (EGA) | 1,997 datasets | |
| UCSC CGB | UCSC Cancer Genomics Browser (UCSC CGB): supplying interactive heat-map based visualization, and ready-to-use tab-delimited genomics and clinical data download ( | 720 datasets | |
| GEO | The Gene Expression Omnibus (GEO) ( | 19,554 datasets | |
| ENCODE | The Encyclopedia of DNA Elements (ENCODE) Consortium: decoding functional elements in DNA ( | Cancer cell lines available | |
| CCLE | The Cancer Cell Line Encyclopedia (CCLE) project: genomics and visualization in about 1,000 cell lines. Drug sensitivity available for the cell lines ( | Genomic characterization of 1,000 cell lines | |
| PeptideAtlas | An archive of proteome information ( | 99 datasets | |
| PRIDE | PRoteomics IDEntifications (PRIDE) database: protein and peptide identifications, post-translational modifications ( | 290 datasets |
Summary of tools in network construction. The short description and homepages of some tools in the manuscript are summarized
| Class | Name | Homepage and description |
|---|---|---|
| Data-driven model | ARACNE ( |
■ ■ Standalone tool available ■ Mutual information based network generation |
| WGCNA ( |
■ ■ R package available ■ Correlation-based network generation | |
| Cancer Landscapes ( |
■ ■ Web-based tool ■ Sparse inverse covariance selection-based network generation | |
| Ultranet ( |
■ ■ Standalone tool available ■ Sparse inverse covariance selection-based network generation | |
| Banjo ( |
■ ■ Standalone tool available ■ Network generation by using Bayesian networks | |
| CATNET ( |
■ ■ Standalone tool available ■ Bayesian networks | |
| Hybrid model | EDDY ( |
■ ■ Standalone tool available ■ Gene sets and Bayesian networks combined |
| PATHOME ( |
■ Web version of the algorithm under construction (available on request) ■ KEGG pathways and correlation-based statistic combined | |
| SPIA ( |
■ ■ R package available ■ KEGG pathways and permutation tests combined |