Literature DB >> 29617727

A survey and evaluation of Web-based tools/databases for variant analysis of TCGA data.

Zhuo Zhang¹, Hao Li¹, Shuai Jiang¹, Ruijiang Li¹, Wanying Li¹, Hebing Chen¹, Xiaochen Bo¹.

Abstract

The Cancer Genome Atlas (TCGA) is a publicly funded project that aims to catalog and discover major cancer-causing genomic alterations with the goal of creating a comprehensive 'atlas' of cancer genomic profiles. The availability of this genome-wide information provides an unprecedented opportunity to expand our knowledge of tumourigenesis. Computational analytics and mining are frequently used as effective tools for exploring this byzantine series of biological and biomedical data. However, some of the more advanced computational tools are often difficult to understand or use, thereby limiting their application by scientists who do not have a strong computational background. Hence, it is of great importance to build user-friendly interfaces that allow both computational scientists and life scientists without a computational background to gain greater biological and medical insights. To that end, this survey was designed to systematically present available Web-based tools and facilitate the use TCGA data for cancer research.

Entities: Chemical Disease Gene Species

Keywords: The Cancer Genome Atlas; bioinformatics tools; cancer; databases; survey

Mesh：

Substances：
Biomarkers, Tumor

Year: 2019 PMID： 29617727 PMCID： PMC6781580 DOI： 10.1093/bib/bby023

Source DB: PubMed Journal: Brief Bioinform ISSN： 1467-5463 Impact factor: 11.622

Introduction

Cancer continues to be a key field of interest for human geneticists, despite the complexities involved. Moreover, despite the frequency of cancer diagnoses, scientists still do not know the causes for many cancers, or how best to treat them. More recently, high-throughput DNA sequencing [1-3] has revolutionized the study of cancer, and the use of sequencing data to assist in diagnosis is generally referred to as precision medicine [4, 5]. Thus, advances in our understanding of the cancer genome have the potential to improve precision medicine for individuals. In particular, massive efforts to undertake parallel next-generation sequencing (NGS) have revolutionized most facets of scientific discovery, and they are also responsible for many advances in the application of genomic information to human health, particularly in the field of oncology. Regarding the latter, the potential utility of these data encompasses early detection, diagnosis, prognosis ascertainment, recurrence detection, risk assessment and treatment selection for many cancers. The Cancer Genome Atlas (TCGA) project [6] represents a significant advance in cancer genomics with its aim to provide a comprehensive catalog of key genomic changes that occur in major cancer types [7, 8]. In addition, these data facilitate more effective diagnoses, treatments and prevention. Thus, this project has remarkable potential for scientists who study cancer, and many achievements with these data have already been published [9-14]. Comprehensive genomic data from a large number of patients would undoubtedly improve our knowledge and understanding of cancer-related genes and their clinical relevance. Currently, analyses of TCGA data are complex, with multiple steps involved (Figure 1) [15]. Moreover, to obtain meaningful biological results, each step of an analysis needs to be carefully considered, with specific tools applied to certain experimental models. To develop relevant and realistic exploration tools for available data, coordination between experimentalists and computational scientists is needed. However, life scientists may find it difficult to use many of the computational tools that have been developed by computational scientists and which require data preparation and installation and use of packaged software. This problem is further complicated by the fact that some software are platform- or operating system-specific. Conversely, computer scientists may face challenges in performing experimental validations to confirm predictions based on data analysis. Fortunately, there are Web-based tools that provide sophisticated computational solutions to help bridge this gap between wet-lab scientists and the many in silico tools available for the analysis of cancer genomic data. It is apparent that the appropriate choice of tools is not a trivial task, especially for inexperienced users. To the best of our knowledge, a comprehensive review of all available Web-based TCGA data analysis tools has not been reported. Such a review would be tremendously helpful for researchers with an interest in analyzing cancer genomic data, as it could potentially provide a guide for selecting analytical tools for a particular application. Therefore, we initiated this survey of existing Web-based tools/databases to compile a comprehensive list of programs that can perform variant analysis of TCGA data. Nonpublic tools and business tools were excluded from this survey.

Figure 1

Overview of common analysis and some applications for multidimensional data available from TCGA.

Overview of common analysis and some applications for multidimensional data available from TCGA. A total of 61 online analysis tools for cancer genome data were surveyed, including 32 which are primarily based on TCGA data. We have listed the functions, characteristics and suitable research areas for each. In addition, we have classified these complex tools into three categories based on their different uses of cancer genome data to facilitate their application by scientists lacking relevant data analysis experience. In addition, five case studies are described from a user’s perspective, which illustrate the major international cancer research areas and apply our review to the selection of these tools. It is anticipated that these efforts will enable researchers to select and use publicly available analysis tools. The present article is structured as follows. First, the TCGA database is introduced as a resource for understanding cancer genome data, and this is important for researchers who initially access this database. Next, all of the publicly available online analysis tools and their classifications are described. Finally, five cancer genome research questions with case studies are presented and discussed, and general recommendations for tool selection and prioritization according to the different types of cancer research are presented.

Variant data types within TCGA

To provide a comprehensive analysis of cancer genome profiles, TCGA applied high-throughput technologies based on microarray data of nucleic acids and proteins and NGS methods that provide global analyses of nucleic acids to generate genomic, transcriptomic, epigenomic and clinical data for several cancer types. To date, there are >10 000 cases of 33 tumor types available, with 20 cancer types each having >200 cases. The TCGA Data Portal is no longer operational, and all TCGA data have been centralized at the Genomic Data Commons (GDC) (https://gdc.nci.nih.gov/). The data can be downloaded for academic use. The identifier (ID) types listed at the GDC include: file universally unique identifier (UUID), file submitted ID (file name), case UUID, case submitted ID (case ID) and project ID. These ID types provide good identification and cataloging of a large amount of data (Table 1). The data types for each cancer include: somatic mutations, copy numbers, gene expression, microRNA (miRNA) expression, DNA methylation, reverse protein phase array (RPPA), and clinical information. Each data type includes raw and processed data that are available for public download, except for the raw sequencing files (Table 2). Somatic mutations are identified based on exome sequencing data, with exome sequencing able to detect single-nucleotide variants that are categorized as nonsynonymous or synonymous. Nonsynonymous single-nucleotide variants cause single amino acid substitutions, which may lead to altered protein function(s) or truncated proteins. Copy number alterations are generally the most frequent genetic events that occur during tumor development, and they have been determined with the Affymetrix SNP (Single Nucleotide Polymorphism) 6.0 array, which detects gains and losses in the genome. Gene expression and miRNA expression are determined with RNA sequencing (RNAseq) and miRNA sequencing analyses, respectively. The abundances of transcripts, isoforms, novel transcripts, gene fusions and noncoding RNAs can be extracted from the sequencing data. DNA methylation is determined by using the Illumina platform, which provides single-nucleotide resolution of CpGs across the vast majority of CpG islands and promoters in the genome. DNA methylation profiling provides information regarding epigenetic changes that have occurred in the genome. Protein expression is determined with RPPA [16], which is an array-based method of detecting proteins at nanogram levels. Validated antibodies are used to determine protein levels, as well as the levels of phosphorylated proteins. This analysis allows activated proteins to be detected, which would not be able to be inferred from RNA expression data. Clinical data are listed for each patient with standard metrics such as patient age, patient gender and time to death or last known contact date. For each cancer, there are specific stratification parameters. For instance, Gleeson scores are provided for prostate cancer, and Breslow index values are provided for melanoma. Overall survival, as well as progression-free survival, can be calculated and stratified according to cancer-specific staging. Generated data are also categorized not only by data type but also by data level. Raw, nonnormalized data (Level I), processed data (Level II) and segmented/interpreted data (Level III) apply to individual samples, while summarized data (Level IV) refer to analyses across sample sets. Levels III and IV data are freely available from publicly accessible databases; yet, access to lower level data (e.g. Levels I and II) requires specific permissions to be acquired and granted. Overall, each data type is comprehensive in its covering of the genome, and it is ideal for scientists who are studying cancer to obtain an integrated analysis of TCGA data.

Table 1

ID types within TCGA

ID type	Description	Example
File UUID	ID of data in TCGA	00a2364d-7385-4fa8-8562-b4f19548505a
File Submitted ID	ID of data uploaded to TCGA	147f470-7440-42b8-8e3a-4e28b654916e-beta-value
Case UUID	Sample/case ID in TCGA	942c0088-c9a0-428c-a879-e16f8c5bfdb8
Case Submitted ID	ID of sample/case uploaded to TCGA, which is commonly used to represent sample/case	TCGA-CJ-4642
Project ID	Project ID which sample/case belongs to	TCGA-BRCA

Table 2

Description of data types and their access level

Data type	Description	Access Level
Aligned Reads	Raw sequencing data	Controlled
Raw Simple Somatic Mutation	Raw mutation information data	Controlled
Annotated Somatic Mutation	Annotated mutation information data	Controlled
Aggregated Somatic Mutation	Aggregated mutation information data	Controlled
Masked Somatic Mutation	Transformed mutation information data	Open
Gene Expression Quantification	Gene expression data	Open
Copy Number Segment	Copy number information data	Open
Masked Copy Number Segment	Transformed copy number information data	Open
Methylation Beta Value	Methylation data	Open
Isoform Expression Quantification	Mature miRNA expression data	Open
miRNA Expression Quantification	miRNA expression data	Open
Biospecimen Supplement	Biospecimen information	Open
Clinical Supplement	Clinical information	Open

ID types within TCGA Description of data types and their access level

Overview and categories of public Web-based tools for analyzing TCGA data

Owing to the large amount of genomic data available, specialized Web-based tools have been developed to aid clinicians and researchers in their analysis and interpretation of available data types in a meaningful way. Here, we have attempted to build an exhaustive list of Web-based tools that are publicly available for the analysis of TCGA data. In addition, we have classified these tools into specific categories. Table 3 provides a detailed list of the Web-based tools that represent the main resources currently available for analyzing TCGA data. Many useful indices are also indicated to facilitate the selection of tools according to a particular need. Furthermore, an enumeration of all back-end databases used, as well as main analysis content, uniform resource locator (URL), visualization type, download, batch query and application programming interface (API) availability, is presented. In the sections of each category below, and in Tables 3–4, the tools are presented in alphabetical order. To further distinguish and guide the selection of these available tools, we have divided our systematic exploration into three main categories as follows: (1) Global analysis; (2) Target analysis; and (3) Auxiliary analysis.

Table 3

List of Web servers and databases

Name	Databases	Batch queries	Mutation analysis	Correlation analysis	Differential expression analysis	Pathway analysis	Kaplan–Meier plots	Pan-cancer analysis	Visualization type	Download	API	URL
BCMD	TCGA	No	No	No	No	No	No	No	Image	No	No	http://tcga.lbl.gov: 9999/
Broad GDAC Firehose	TCGA	No	Yes	Yes	Yes	Yes	Yes	Yes	Matrix Histogram	Yes	Yes	http://gdac.broadinstitute.org/
Cancer Landscapes	TCGA	No	No	Yes	No	Yes	Yes	Yes	Networks Matrix	Yes	No	http://cancerlandscapes.org/
Cancer3D	TCGA CCLE	No	Yes	No	No	No	No	No	Genomic coordinates Network Scatter plots/box plots 3D structure	Yes	No	http://www.cancer3d.org
canEvolve	TCGA ICGC GEO	Yes	No	Yes	Yes	Yes	Yes	No	Heatmap Network Plots	Yes	No	http://www.canevolve.org/
cbioportal	TCGA CCLE	Yes	Yes	Yes	Yes	No	Yes	Yes	Networks Matrix Heatmaps	Yes	Yes	http://cbioportal.org
CDSA	TCGA	No	No	No	No	No	No	No	Image	No	No	http://cancer.digitalslidearchive.net/
CELLX	TCGA CCLE GEO GSK GTEx	Yes	Yes	Yes	Yes	No	Yes	No	Heatmap Matrix	Yes	No	http://cellx.sourceforge.net
GDISC	TCGA	No	No	Yes	No	No	Yes	No	Matrix Box plots	Yes	No	https://gdisc.bme.gatech.edu
GEPIA	TCGA GTEx	Yes	No	Yes	Yes	No	Yes	No	Matrix Bar graph Box plots/violin plots/dot plots	Yes	Yes	http://gepia.cancer-pku.cn/
IntOGen	TCGA ICGC	Yes	Yes	No	No	No	No	Yes	Heatmap Matrix Histogram	Yes	No	https://www.intogen.org/search
KMplotter	TCGA GEO EGA	Yes	No	No	No	No	Yes	No	Linear plots	Yes	No	http://kmplot.com/analysis/
MethHC	TCGA	Yes	No	Yes	No	Yes	No	No	Matrix Heatmaps	Yes	No	http://methhc.mbc.nctu.edu.tw
MEXPRESS	TCGA	No	No	Yes	Yes	No	No	No	Genomic coordinates	Yes	Yes	http://mexpress.be/
OASISPRO	TCGA	No	No	Yes	No	No	Yes	No	Histogram linear plots/box plots	Yes	No	http://tinyurl.com/oasispro
OncoScape	TCGA CCLE	Yes	No	No	Yes	Yes	No	No	Heatmap Pathway maps Matrix Scatter plot	Yes	No	http://oncoscape.nki.nl/
PathwayMapper	TCGA	No	No	No	No	Yes	No	No	Pathway maps	Yes	Yes	http://pathwaymapper.org
PROGgeneV2	TCGA GEO NKI	Yes	No	No	No	No	Yes	No	Linear plots	Yes	No	http://www.compbio.iupui.edu/proggene
Regulome Explorer	TCGA	No	No	Yes	No	Yes	No	Yes	Circos Genomic coordinates Network Matrix	Yes	No	http://explorer.cancerregulome.org/all_pairs/
TANRIC	TCGA CCLE	No	Yes	Yes	Yes	No	Yes	No	Heatmaps	Yes	No	http://ibl.mdanderson.org/tanric/_design/basic/index.html
TCGA Clinial Explorer	TCGA	No	Yes	Yes	No	No	Yes	No	Matrix Histogram	Yes	No	http://genomeportal.stanford.edu/pan-tcga/
TCGA Mbatch	TCGA	No	No	No	No	No	No	No	Matrix PCA diagrams Hierarchical clustering diagrams	Yes	No	http://bioinformatics.mdanderson.org/tcgambatch/
TCGA NG-CHM	TCGA	No	No	Yes	No	Yes	No	Yes	Heatmaps	Yes	No	http://bioinformatics.mdanderson.org/chm
TCGA SpliceSeq	TCGA	No	No	No	No	No	No	No	Matrix	Yes	No	http://bioinformatics.mdanderson.org/TCGASpliceSeq/
TCGA4U	TCGA	Yes	Yes	No	Yes	No	Yes	No	Heatmap Matrix Histogram	Yes	No	http://www.tcga4u.org: 8888
TCIA	TCGA	No	No	No	No	No	No	No	Image	Yes	Yes	http://www.cancerimagingarchive.net
TCPA	TCGA	No	No	Yes	Yes	No	Yes	No	Networks Heatmaps	Yes	No	http://www.tcpaportal.org/tcpa/
UALCAN	TCGA	Yes	No	No	Yes	No	Yes	No	Heatmap Boxplots Linear plots	Yes	No	http://ualcan.path.uab.edu/tutorial.html
UCSC Xena	TCGA GDC ICGC GTEx TARGET TOIL	No	Yes	No	No	No	Yes	Yes	Heatmaps Scatter plot Histogram	Yes	Yes	http://xena.ucsc.edu/getting-started/
Vanno	TCGA	No	Yes	No	No	No	No	No	Circos Matrix 3D structure Heatmap	Yes	No	http://cgts.cgu.edu.tw/vanno
Wanderer	TCGA	No	No	Yes	Yes	No	No	No	Genomic coordinates Scatter plot	Yes	Yes	http://maplab.cat/wanderer
Zodiac	TCGA	Yes	No	Yes	No	No	No	Yes	Matrix Circular network	No	No	http://www.compgenome.org/zodiac2/

List of Web servers and databases Matrix Histogram Networks Matrix TCGA CCLE Genomic coordinates Network Scatter plots/box plots 3D structure TCGA ICGC GEO Heatmap Network Plots TCGA CCLE Networks Matrix Heatmaps TCGA CCLE GEO GSK GTEx Heatmap Matrix Matrix Box plots TCGA GTEx Matrix Bar graph Box plots/violin plots/dot plots TCGA ICGC Heatmap Matrix Histogram TCGA GEO EGA Matrix Heatmaps TCGA CCLE Heatmap Pathway maps Matrix Scatter plot TCGA GEO NKI Circos Genomic coordinates Network Matrix TCGA CCLE Matrix PCA diagrams Hierarchical clustering diagrams Heatmap Matrix Histogram Networks Heatmaps Heatmap Boxplots Linear plots TCGA GDC ICGC GTEx TARGET TOIL Heatmaps Scatter plot Histogram Circos Matrix 3D structure Heatmap Genomic coordinates Scatter plot Matrix Circular network Additional databases and Web servers In Table 4, an additional 29 online resources are provided. In these tools, TCGA data are not the major analysis object, and many of the tools do not access TCGA data unless an upgraded version is used.

Table 4

Additional databases and Web servers

Name	Content	URL
AnimalTFDB 2.0	Animal transcription factors	http://bioinfo.life.hust.edu.cn/AnimalTFDB/
ArrayMap	A resource for genomic copy number profiles of human tumors	http://www.arraymap.org
BloodSpot	Gene expression profiles and transcriptional programs for healthy and malignant hematopoiesis	www.bloodspot.eu
BreCAN-DB	Break point profiles of cancer genomes	http://brecandb.igib.res.in
Cancer RNA-Seq Nexus	Phenotype-specific transcriptome profiling	http://syslab4.nchu.edu.tw/CRN
canSAR	Cancer research and drug discovery	http://cansar.icr.ac.uk/
ccmGDB	Cancer cell metabolism gene	http://bioinfo.mc.vanderbilt.edu/ccmGDB
CGWB	A computational platform to integrate clinical tumor mutation profiles with the reference human genome	https://cgwb.nci.nih.gov/
ChimerDB 3.0	Fusion gene	http://ercsb.ewha.ac.kr/fusiongene/
ChIPBase v2.0	Transcriptional regulatory networks of noncoding RNAs and protein-coding genes	http://rna.sysu.edu.cn/chipbase/
CMPD	Cancer mutant proteome database	http://cgbc.cgu.edu.tw/cmpd
COSMIC	Somatic mutations in human cancer	http://cancer.sanger.ac.uk
dbDEMC 2.0	Differentially expressed miRNAs in human cancer	http://www.picb.ac.cn/dbDEMC
DBTSS	Transcriptome, epigenome and genome sequence variation data	http://dbtss.hgc.jp/
DiseaseMeth	Human disease methylation database	http://bioinfo.hrbmu.edu.cn/diseasemeth/
DriverDBv2	Human cancer driver gene	http://ngs.ym.edu.tw/driverdb
LNCediting	A database for functional effects of RNA editing in lncRNAs	http://bioinfo.life.hust.edu.cn/LNCediting/
lncRNASNP	SNPs in lncRNAs	http://bioinfo.life.hust.edu.cn/lncRNASNP/
miRTarBase 2016	MiRNA database	http://miRTarBase.mbc.nctu.edu.tw/
Mutagene	Cancer genetic heterogeneity	https://www.ncbi.nlm.nih.gov/projects/mutagene/
MutationAligner	Recurrent mutation hot spots	http://www.mutationaligner.org
mutLBSgeneDB	Mutated ligand-binding site gene DataBase	http://zhaobioinfo.org/mutLBSgeneDB
NetGestalt	Multidimensional omics data	http://www.netgestalt.org
Oncotator	Cancer variant annotation tool	http://www.broadinstitute.org/oncotator/
PhosphoSitePlus	Protein posttranslational modifications	http://www.phosphosite.org/
POSTAR	Posttranscriptional regulation	http://postar.ncrnalab.org/
RBP-Var	Functional variants involved in regulation mediated by RNA-binding proteins	http://www.rbp-var.biols.ac.cn/
WebGestalt 2017	Enrichment analysis	http://www.webgestalt.org
YM500v2	MiRNAs for human cancer	http://ngs.ym.edu.tw/ym500/

Global analysis

Global analysis tools allow users to examine the overall features of cancer genomes, and they can be a valuable resource for scientists who have just started to study cancer genomic data. There are two types of global analysis tools: type I and type II. The former only provides a global analysis, while the latter provides selected target analysis in addition to global analysis.

Type I

Broad GDAC Firehose

Broad GDAC Firehose (http://gdac.broadinstitute.org/) is a Web portal site developed by the Broad Institute to perform automated analyses of TCGA data for general users. Preprocessed annotated data and association analysis across all types of data, including clinical data, are provided. For example, it can provide a list of genes whose copy number alterations, methylation status, mRNA expression and mutations significantly correlate with tumor stage and patient survival, gender, age and ethnic background. Gene expression across all cancer types can also be easily assessed at the Firebrowse Web portal (http://firebrowse.org/).

Cancer Landscapes

Cancer Landscapes [17] is a Web-based tool that derives data networks by using a newer data-driven modeling method that is based on generalized sparse inverse covariance selection. This tool integrates genetic, epigenetic and transcriptional data from multiple cancers. Users are provided with interactive Web content that visualizes constructed network models based on statistical optimization.

canEvolve

The Web portal, canEvolve [18], stores functional genomics and other large-scale data on cancer, including gene and miNRA expression profiles and copy number changes. This tool provides users with easy access to information and analysis results derived from primary, integrative and network analyses of oncogenomic data that are generated by using various functional genomics platforms. The algorithms used for the analysis pipelines were selected based on the creators’ experience in creating and using such tools to generate biologically relevant hypotheses.

Regulome Explorer

Regulome Explorer [19] is a Web tool that integrates associations between clinical and molecular features of TCGA data. This tool enables users to search and visualize analytical data that are filtered according to user-specified parameters. All data types are mapped to a circos plot with genomic coordinates. There are other views available, which can be used to evaluate associations, including graphs and tables. Two-dimensional distributions of feature pairs (identified by association analysis) are also provided. Correlation of features is represented as edges between corresponding nodes.

TCGA Mbatch

TCGA Mbatch (http://bioinformatics.mdanderson.org/tcgambatch/) allows the user to assess and quantify the presence of any batch effects in a given TCGA data set via algorithms such as hierarchical clustering and principal component analysis. The results from these algorithms are then presented graphically as both simple and interactive diagrams. If significant batch effects are observed in the data, the user has the option to download data that have been computationally corrected according to methods such as Empirical Bayes (ComBat), Median Polish and analysis of variance.

TCGA Next-Generation Clustered Heatmaps

TCGA Next-Generation Clustered Heatmaps (TCGA NG-CHM) (http://bioinformatics.mdanderson.org/chm) is a tool that creates interactive large-scale visualizations of data based on a classic heat map approach. The user is able to zoom and pan across a heatmap, alter its color scheme, generate production quality PDFs and access rows, columns and individual heatmap entries that are related to statistics, databases and other information. TCGA NG-CHM also provides pathway and gene ontology (GO) information, chromosomal interactive ideograms, rapid recoloring, high-resolution graphics output and links to public information resources (e.g. cBioPortal) regarding genes, proteins, pathways and drugs.

The Cancer Proteome Atlas

The Cancer Proteome Atlas (TCPA) [20] is a portal for accessing proteomic data available from TCGA project, which includes extensively validated antibodies for nearly 200 proteins and phosphoproteins. Correlation analyses can be performed between proteins and for associations between proteins and patient prognosis. In addition to TCGA data, TCPA can also access data from established cancer cell lines and can provide validation of findings from TCGA RPPA data through independent sample cohorts.

Type II

MethHC

MethHC [21] is a database that integrates a large collection of DNA methylation data and mRNA/miRNA expression profiles in human cancers, and also identifies correlations between DNA methylation and mRNA/miRNA expression data from TCGA. The methylation data span gene regions [e.g. promoter, enhancer, 5′ untranslated region (UTR), first exon, gene body and 3′ UTR] and CpG islands (e.g. regions, shelves and shores). MethHC also provides methylation patterns of different cancers with hierarchical clustering graphs. Users can easily obtain 250 hypermethylated genes, 250 hypomethylated genes and 250 of the most differentially methylated genes for particular cancer types.

Omics Analysis System for Precision Oncology

Omics Analysis System for Precision Oncology (OASISPRO) [22] is an online platform that is designed to mine quantitative omics information from TCGA. This tool can effectively visualize patients’ clinical profiles and other omics data and can evaluate prediction performance by using held-out test sets. OASISPRO is also rather unique in that is uses a machine learning method.

OncoScape

OncoScape [23] is an R package software for cancer gene prioritization that has a Web portal for interactive analyses. OncoScape can access five complementary data types across 11 different cancers to identify new candidate cancer genes and explore cancer aberrations by using a fusion of genomic data. For example, with this tool, molecular profiling data of two groups of samples can be compared to identify genes that exhibit significant differences. OncoScape can also perform analyses of gene expression, DNA copy number, DNA methylation, mutation and short hairpin RNA (shRNA) knock-down data. Users can explore candidate genes for each cancer type and upload their own gene list to obtain a detailed aberration profile. OncoScape can provide box plots that show log changes in gene expression (e.g. copy number data) for tumor and normal samples, and can provide an overview of the prioritization scores in genomic regions and pathway diagrams.

TCGA Clinical Explorer

TCGA Clinical Explorer [24] enables the cancer research community and others to explore clinically relevant associations inferred from TCGA data. With its accessible Web and mobile interfaces, users can examine queries and test hypotheses regarding genomic/proteomic alterations across a broad spectrum of malignancies. This tool also summarizes TCGA clinical parameters and translates these data into a list of clinically relevant cancer drivers, including genes, miRNAs and proteins. All analyses include 25 cancer types and 18 clinical parameters. Users can query TCGA data in multiple ways, including searching for clinically relevant gene/protein/miRNAs by name, cancer type or clinical parameter; profiling genomic/proteomic changes according to clinical parameters in a cancer type; and testing two-hit hypotheses.

TCGA SpliceSeq

TCGA SpliceSeq [25] investigates cross-tumor and tumor-normal alterations in mRNA splicing patterns of TCGA RNASeq data. Percent Spliced In (PSI) values for splice events derived from 33 different types of tumor samples, including available adjacent normal samples, have been loaded into this tool. As a result, users can investigate the splicing pattern of a gene of interest in a variety of tumor types. TCGA SpliceSeq also provides knowledge discovery via genome-wide PSI splice event searches to locate significant splice variations among tumor types, or between tumor and normal tissue, and these splicing data can be downloaded for integrative analyses.

Target analysis

Target analysis is the category of public Web-based tools that is most often used by researchers. These tools allow researchers to investigate a target of interest with in-depth analyses of gene(s) and miRNAs.

Cancer3D

Cancer3D [26] is a public database that analyzes cancer missense mutations in the context of protein structures. It also allows users to explore two different cancer-related problems at the same time, e.g. drug sensitivity/biomarker identification and prediction of cancer drivers. In addition, somatic missense mutations from TCGA and Cancer Cell Line Encyclopedia (CCLE) can be mapped onto >24 300 structures, as well as onto 1300 potential novel protein domains.

cBioPortal

The cBioPortal [27] for Cancer Genomics offers one of the best Web-based tools for beginners who have limited experience analyzing genomic data and only want to analyze a limited number of genes. The cBioPortal is an open-access resource that was developed at the Memorial Sloan Kettering Cancer Center (MSKCC) for the visualization, analysis and download of large-scale cancer genomics data sets. It allows users to search gene(s) of interest in certain cancers or among all cancers in TCGA data, while providing a flexible interface for working with multiple data sets and easy-to-use visualization options. The cBioportal also offers correlation plots for expression and copy number alterations or methylation of genes, an ability to assess clinical relevance of genes with Kaplan–Meier plots, co-expression analysis and network analysis. Additionally, the portal facilitates interactive explorations of custom data sets with access to OncoPrinter and MutationMapper Web tools. OncoPrint diagrams provide intuitive diagrams of genomic alterations such as somatic mutations and copy number alterations across a set of samples, while MutationMapper provides a summary diagram of mutations on a linear protein map that has links to a database of three-dimensional (3D) protein structures for the user to examine the potential effects of the mutations identified.

Gene Expression Profiling Interactive Analysis

Gene Expression Profiling Interactive Analysis (GEPIA) [28] is a Web-based tool that rapidly delivers customizable functionalities based on TCGA and GTEx data. GEPIA provides key interactive and customizable functions that include differential expression analysis, profiling plotting, correlation analysis, patient survival analysis, similar gene detection and dimensionality reduction analysis.

IntOGen

IntOGen [29] is a Web platform that can identify cancer drivers across tumor types and perform a systematic analysis of the most up-to-date large data sets of tumor somatic mutations. The IntOGen pipeline integrates the results of tumor genome studies conducted with different mutation-calling workflows, and it is scalable to hundreds of thousands of tumor genomes. This tool can also compute the frequency of mutation for individual genes and/or pathways within a project or cancer site, detect a subset of novel candidate drivers and download driver mutations from previous studies.

KMplotter

KMplotter is an online tool that draws survival plots, which can be used to assess the relevance of gene expression levels on clinical outcome for treated and untreated cancer patients. Data are derived from gene expression, relapse-free survival and overall survival data that are downloaded from Gene Expression Omnibus (GEO) (Affymetrix microarrays only), European Genome-phenome Archive (EGA) and TCGA. Specifically, survival analyses can be performed for mRNAs from four cancer types (breast, ovarian, lung and gastric) and for miRNAs from two cancer types (breast and liver) [30].

MEXPRESS

MEXPRESS [31] is a straightforward and easy-to-use Web tool that integrates and visualizes gene expression, DNA methylation and clinical TCGA data on a single-gene level. It also provides correlation among data sets, has a unique set of features that are easy to use, and it can integrate visualizations of different data types for hundreds of samples. Currently, the developer of this tool is also looking into updating MEXPRESS to use the new repository of TCGA data.

PROGgeneV2

PROGgeneV2 [32] is a tool that allows researchers to use publicly available data to study prognostic implications of genes of interest in multiple cancers. For example, this tool can be used to generate plots of survival analysis data according to gene expression profiles of target genes in selected data sets from multiple cancers. Furthermore, either single genes or sets of genes can be used to estimate their association with prognosis of patients. This tool can also provide survival analyses for miRNA and PROGmiRV2 [33], and its usage is similar to that of PROGgeneV2.

TANRIC

TANRIC [34] is an open-access resource for investigating the function and clinical relevance of long noncoding RNAs (lncRNAs) in cancer. TANRIC provides three analysis modules that enable users to examine the function and underlying mechanisms of lncRNAs. It can characterize the expression profiles of lncRNAs in large patient cohorts of up to 20 cancer types, including TCGA, CCLE and other independent data sets. Users can examine whether lncRNAs exhibit differential expression profiles between tumor and normal samples, or among tumor subgroups. Possible correlations between lncRNAs and patient survival time can also be identified, while correlations between lncRNAs and various molecular data for protein-coding and miRNA genes can be explored.

TCGA4U

TCGA4U [35] is a tool that provides visualizations of the relationship between cancer genomics alterations and clinical data. This Web tool can apply four types of data (somatic mutation, DNA methylation, gene expression and copy number variants) for specific genes or gene lists to five types of cancer (lung squamous cell carcinoma, breast invasive carcinoma, colon adenocarcinoma, lung adenocarcinoma and rectum adenocarcinoma). By using specific genes and gene lists to analyze genomic alterations and characterize the molecular characteristics of cancers, cancer genomic mining is performed with the following outputs: potential driver genes are identified, GO term maps are generated and survival analyses are conducted.

UALCAN

UALCAN [36] is an interactive Web portal for researchers to facilitate the study of gene expression variation and survival associations across tumors. All data are from the TCGA database. It can help researchers identify survival associations that involve any gene of interest, across different cancer types as well as cancer subtypes as defined by various clinicopathologic features. The analysis results can be downloaded in several formats. Thus, this online tool can aid cancer biologists and clinicians in the identification of novel diagnostic and therapeutic targets, and investigate the gene expression and its disease association in any particular cancer.

UCSC Xena

UCSC Xena (http://xena.ucsc.edu/getting-started/) is a new tool that has been developed by the UCSC Cancer Browser, and it can analyze and visualize a user’s private functional genomics and data sets in the context of public and shared genomic/phenotypic data sets. The Xena platform consists of a set of federated data hubs and the Xena browser. The latter integrates across the hubs, thereby providing one location at which to analyze and visualize data. There is a large public Xena hub that currently hosts an expanding set of searchable data from several large consortiums, including TCGA, GDC, International Cancer Genome Consortium (ICGC), Genotype-Tissue Expression (GTEx), Therapeutically Available Research to Generate Effective Treatments (TARGET) and Scalable and Efficient Workflow Engine (TOIL). Dynamic Kaplan–Meier survival analyses can also be performed to assess survival according to certain parameters, and these data can be presented as visual spreadsheets, scatter plots and bar graphs.

Wanderer

Wanderer [37] is a public Web server that is able to explore and interpret gene-associated expression profiles and DNA methylation for all of the cancer types available at TCGA. This tool also provides normal–tumor paired comparisons in the form of graphs and comprehensive tables.

Zodiac

Zodiac [38] is a search engine and computational tool that obtains multiple features of gene networks, including copy number, gene expression, methylation, mutation, miRNA and some protein expression data, to describe molecular interactions for approximately 200 million pairs of genes. Zodiac then integrates existing knowledge about cancer genetic interactions with a Bayesian graphical model of TCGA data to produce updated and data-enhanced knowledge. The results are organized into a comprehensive database that allows customized searches to be performed. Zodiac also provides data processing and analysis tools that allow users to customize prior networks and update genetic pathways of interest. Furthermore, this tool can be used to identify gene interactions, to discover potential drug targets, and to identify potential genetic aberrations such as gene fusions.

Auxiliary analysis

The third category of public Web-based tools translates TCGA data into an online resource that is easily accessed, browsed and downloaded. These data can help users complement their experimental results, or they can provide additional proof and explanation of their research for comprehensive biological discoveries.

BCMD

BCMD [39] is a platform that can be used to represent and characterize tumor histology, and it can additionally provide an integrated analysis with clinical outcome. Data and intermediaries for a number of tumor types are available, and it has an interface that allows for panning and zooming of whole-mount tissue sections with or without overlaid segmentation results for quality control.

CDSA

CDSA [40] provides interactive tools for viewing and annotating diagnostic and tissue slide images of different tumor types from TCGA project. Currently, it hosts >20 000 whole-slide images from 22 cancer types. This searchable resource provides users with an opportunity to identify and explore sets of images according to particular genomic, pathologic or clinical criteria. Thus, CDSA represents a valuable resource for the fields of imaging and pathology.

Cell Index Database

Cell Index Database (CELLX) [41] is an online resource that can be used to manage multidimensional genomics data sets that contain gene expression, copy number variations, mutations and compound sensitivity data. Users can visualize, analyze and download data in a preformatted table that is suitable for offline computation. This tool is valuable for computational biologists who would prefer greater control over their data or would like to integrate custom data that are not available in public databases.

Gene–Drug Interaction for Survival in Cancer

Gene–Drug Interaction for Survival in Cancer (GDISC) [42] is a Web portal that integrates gene copy number, drug exposure and patient survival data. It allows users to interactively explore gene–drug interactions that have been identified in the context of TCGA, and to examine their favorite combinations of gene, drug and cancer type. Moreover, GDISC provides a list of drug names found in all cancer types, which can facilitate drug-specific analyses.

PathwayMapper

PathwayMapper [43] is a collaborative visual Web editor for cancer pathways. It can be used for viewing precurated cancer pathways, and it provides an option to overlay genomic alteration data. It also has an interactive graphical editing tool for creating and modifying pathways, it allows multiple users to cooperate curation in real time and support is provided for concurrent modifications and built-in conflict resolution. Finally, users can import data from the cBioPortal and export pathway images with alteration frequencies.

TCIA

TCIA [44] is a service created by the National Cancer Institute (NCI) to collect and share a large amount of radiological imaging data available from TCGA cases to support imaging phenotype–genotype research. Users can share or find research-relevant clinical image data collections and download detailed image files.

Vanno

Vanno [45] is a comprehensive variant annotation tool for the visualization and analysis of genetic alteration profiles. It provides an integrated framework for a functional analysis of genomic variants and the Web portal for comparing in-house data with TCGA data supports efforts to obtain a comprehensive identification of disease-relevant variations.

Case studies

The case studies presented here elaborate on five different cancer genomic research questions that can be answered visually with the available tools and resources described above. These case studies encompass major cancer research efforts, and they provide examples for the application of online tools for TCGA data analysis.

Patterns in global alteration profiles

Various alteration phenotypes have been observed in cancer cells. One of the most conspicuous of these is the mutation phenotype [46], where tumor cells exhibit an abnormally high mutation burden. Somatic mutation patterns have been described for: malignant melanoma [47], small cell lung carcinoma [48], acute lymphoblast leukemia [49], colorectal cancer [10], kidney cancer [50] and lung cancer [51]. These studies have demonstrated the value of whole-genome sequencing for obtaining global alteration profiles and analyzing the patterns observed. Broad GDAC Firehose is a good Web-based tool for exploring global alteration profiles. In this portal, the cancer type for mutation analysis can be directly specified, and a wealth of content analysis data can be selected. The latter includes aggregate analysis, correlation analysis with mutation and several mutation analysis methods including MutSig v2.0 (Figure 2A). The online results give users access to both standard data packages (right column), and standard analyses suite (left column). Analyses results may also be accessed from the unified reports. Furthermore, the results of an analysis can be downloaded in a PDF format, and this online tool has an interactive API for fine-grained querying of results via the Web. Another tool, Cancer Landscapes, can provide a high-performance statistical network modeling of multiple human cancers. Tumors are used to represent different cancer types and shapes represent different types of data. Users first select one of the multicancer modes for further analysis. The system then loads the model where different data types and cancers are represented as specific shapes and colors. Users can click on nodes to view the details of a local network and associated pathways (Figure 2B). In this exploration view, users can switch between different data types, adjust the optimization parameters and organize the network.

Figure 2

Two explorations of global alteration profile patterns as provided by publicly accessible Broad GDAC Firehose and Cancer Landscape Web tools. (A) This window view displays the user interface of Broad GDAC Firehose where users can choose a specific mutation analysis method. (B) This window provides network modeling of multiple cancers and data sets as indicated by the data sets and data types that were selected at the far right in Cancer Landscapes.

Exploration of cancer drivers

Distinguishing the alterations that give cancer cells a selective advantage (drivers) from those that are merely side effects (passengers) of a destabilized cancer genome is a major problem in oncogenomics research. Many studies have focused on the identification of novel cancer genes for many different cancer types including: acute lymphoblast leukemia [52], acute myeloid leukemia [53], breast cancer [54, 55], glioblastoma [56] and liver cancer [57]. Different tools use various methods to address this problem by exploiting the properties of driver genes. Here, we selected two Web-based tools, OncoScape and IntOGen, to test this problem. OncoScape can access five complementary data types (copy number, gene expression, DNA methylation, somatic mutation and shRNA) to identify new candidate cancer genes, with screening parameters and thresholds selected by the user. We can easily find all functional modules in the toolbar above, and the ‘Top Candidate Genes’ is a module that looks for cancer candidate genes. We used combined score and cutoff values ≥3 to identify drivers for lung adenocarcinoma (Figure 3A), and there is a detailed description for combined score and cutoff values in the ‘FAQ’. Meanwhile, IntOGen can directly provide driver genes according to the selected cancer type based on the frequency of occurrence for mutations. In addition, users can upload their own data for analysis of somatic mutations. Here, we used the public data set on this tool to perform somatic mutation analysis for specific cancer type. The plot shown in Figure 3B shows the most recurrently mutated cancer driver genes in lung adenocarcinoma. Each bar of the histogram indicates the number of samples with protein-affecting mutations. OncoScape and IntOGen identified 22 driver genes and 169 driver genes, respectively.

Figure 3

An exploration of driver genes associated with lung adenocarcinoma was conducted in OncoScape (A) and IntOGen (B). The two windows display different formats for the results obtained.

Stratification of cancer patients

It is necessary for cancers to be properly classified to achieve effective clinical management and meaningful laboratory investigations of underlying cancer mechanisms. While tumors may appear similar when examined with conventional diagnostic methods, they may look markedly different from a molecular viewpoint, and this can lead to differences in outcome and treatment response. Therefore, the molecular features of tumors can be used to stratify patients to support more accurate clinical and therapeutic decisions. Molecular stratification of tumors has been an important area of cancer research over the past few decades [58-61], and the studies performed have underscored the heterogeneous and complex nature of cancer subgroups. Molecular subtypes can be identified through different data types, including gene expression, copy number, DNA methylation and mutation data. Moreover, an integrated analysis is needed based on the different cancer characteristics. Currently, there are no tools that can directly provide stratification because of the complexity of this analysis. As a result, scientists need to combine many data types and clinical features for a comprehensive assessment. OASISPRO can identify genes that are strongly associated with tumor stage by applying user-selected machine learning algorithms to omic data and evaluating prediction performance by using held-out test sets (Figure 4). However, OASISPRO only focuses on the classification of clinical phenotypes, and it cannot synthesize a variety of data types. Users have to strictly follow the settings of the tool for step-by-step selection. In addition, OASISPRO can only use a single clinical feature parameter for analysis. Thus, OASISPRO would be useful for preliminary analyses and scientific hypotheses.

Figure 4

Views of interface windows in OASISPRO. (A) The stepwise selection of parameters for conducting a classification of clinical phenotypes is shown. (B) This window presents the input variables and results obtained from a representative analysis.

Correlation with multiple molecular features

Studies of correlations among multiple molecular features can provide valuable insight into complex biological systems. Individual data sets that include genomic, epigenomic, transcriptomic or proteomic information are highly informative, and the integration of these data sets offers an exciting potential to answer many long-standing questions. For example, integrated analyses of transcriptomic, proteomic and metabolomic data have helped researchers better understand global regulatory processes and complex metabolic networks in cancer [62, 63]. Many tools can provide correlation analyses for various molecular features. In fact, more than half of the tools included in our study can conduct a correlation analysis. However, the major function of Regulome Explorer is to perform correlation analyses. Users can select a data set to load and get the genome-level view for the correlation between different data types. This tool provides both circos plots and network representations of correlations between multi-omics features, and it includes nine data types (Figure 5). It can map multi-omics features onto genomic locations for further systems biology analyses. Moreover, the parameters of a correlation can be adjusted according to a filter panel that is presented on the right side of the Web server and both network maps and detailed data tables of correlations are provided.

Figure 5

A representative window of the results provided by Regulome Explorer for a correlation analysis. This figure displays the main user interface, including the option for using multiple data types.

Survival analysis

Identification of prognostic biomarkers, which may include genes, polymorphisms, mutations, micromolecules or epigenetic regulators, represents a major advance in the field of cancer genomics. Cancer research predominantly focuses on specific patient populations for biomarker identification. Gene signatures have been developed specific for prognostication in particular subtype of a cancer, for instance, a subgroup of population treated with a specific drug. To date, gene signatures of prognostic importance have been reported for breast cancer [64, 65], colon cancer [66, 67], liver cancer [68], lung cancer [69, 70] and pancreatic cancer [71]. Generally, the primary end point of prognostic assessment is survival analysis, and patient groups are divided into good or bad prognosis groups based on weighted or unweighted expression of individual genes or groups of genes. This type of analysis provides a rationale for mechanistic studies, followed by therapeutic targeting. Web-based tools can be used to identify and expand prognostic biomarker targets in different cancers based on the publicly available data these tools have compiled. In addition to providing easy-to-perform prognostic analyses for multiple cancers, they can also be important hypothesis-generating tools for researchers working on topics related to cancer. Here, PROGgeneV2 and KMplotter were selected to perform test analyses. Users can select gene(s), cancer type, survival measure and the data set for specific parameter settings. The results of the survival analysis conducted by PROGgeneV2 are presented in a KM plot (Figure 6), while KMplotter could not provide results because of an insufficient number of TCGA samples. These results demonstrate that the parameters and data sources for Web-based tools are not exactly the same, as the number of lung adenocarcinoma samples obtained from TCGA differed between the two analysis programs. Therefore, users need to carefully consider the data being subjected to analysis and select appropriate parameters.

Figure 6

A representative survival plot generated with PROGgeneV2. TP53 gene expression was applied to a lung adenocarcinoma data set from TCGA.

Usage advice

Our study has identified three categories of all online TCGA analysis tools. The user can make preliminary screening according to their own work needs. All tools in each category have their unique features that we described above. It can also be identified based on different cancer genomic research questions as we described in case studies. Finally, the user need to combine their study, such as data sources, data types, analytical methods and research purposes, to determine the specific tool for further analysis. The following are specific suggestions for different analysis of TCGA data.

Mutation analysis

There are 10 online tools (Broad GDAC Firehose, Cancer3D, cbioportal, CELLX, IntOGen, TANRIC, TCGA Clinical Explorer, TCGA4U, UCSC Xena and Vanno) that can perform mutation analysis. In general, we recommend cbioportal because this tool contains a variety of cancer types and multiple visualizations, and it is powerful but easy to use.

Correlation analysis

There are 17 online tools (Broad GDAC Firehose, Cancer Landscapes, canEvolve, cbioportal, CELLX, GDISC, GEPIA, MethHC, MEXPRESS, OASISPRO, Regulome Explorer, TANRIC, TCGA Clinical Explorer, TCGA NG-CHM, TCPA, Wanderer and Zodiac) that can perform correlation analysis. In general, we recommend Broad GDAC Firehose from Broad institute of MIT and Harvard, which has a variety of analysis algorithms available to users.

Differential analysis

There are 12 online tools (Broad GDAC Firehose, canEvolve, cbioportal, CELLX, GEPIA, MEXPRESS, OncoScape, TANRIC, TCGA4U, TCPA, UALCAN and Wanderer) that can perform differential analysis. In general, we recommend GEPIA, an analysis tool for gene expression profiling. Differential analysis is this tool’s main analysis function, and the online analysis interface is simple and easy to understand.

Pathway analysis

There are eight online tools (Broad GDAC Firehose, Cancer Landscapes, canEvolve, MethHC, OncoScape, PathwayMapper, Regulome Explorer and TCGA NG-CHM) that can perform pathway analysis. We recommend Broad GDAC Firehose and OncoScape; the former has a variety of analysis methods, and the latter is more simple and intuitive.

Survival analysis

There are 16 online tools (Broad GDAC Firehose, Cancer Landscapes, canEvolve, cbioportal, CELLX, GDISC, GEPIA, KMplotter, OASISPRO, PROGgeneV2, TANRIC, TCGA Clinical Explorer, TCGA4U, TCPA, UALCAN and UCSC Xena) that can perform survival analysis. If users want to perform this single analysis, we recommend PROGgeneV2, which has a wide range of data sources and adjustable parameters for survival analysis.

Pan-cancer analysis

There are eight online tools (Broad GDAC Firehose, Cancer Landscapes, cbioportal, IntOGen, Regulome Explorer, TCGA NG-CHM, UCSC Xena and Zodiac) that can perform pan-cancer analysis. In general, we recommend cbioportal and Cancer Landscapes. The former has a large number of samples from pan-cancer studies and powerful analytical capabilities. The latter has combined pan-cancer model for analysis.

Discussion

The functionalities of a cancer can be better characterized by integrating information from different modalities. TCGA data were collected by using a number of different modalities, and data for several tumor types are available. Consequently, TCGA data represents a valuable resource for researchers to advance their understanding of various cancers and to facilitate the realization of precision medicine in oncology. Multilayer analyses performed on different platforms reflect distinct biological characteristics, and these provide a better understanding of cancer biology. As a result, improvements in patient stratification, identification of novel prognostic or predictive markers and the identification of novel therapeutic targets can be achieved. However, integrating information from different modalities to obtain a comprehensive analysis remains a prodigious challenge [72]. Many bioinformatics tools that are compatible with TCGA data have been developed for basic scientists who do not have extensive training in informatics, statistics or clinical knowledge. Correspondingly, the wealth of available tools for analysis and interpretation of data reflects the importance of TCGA and the dynamic nature of the field of data analysis. Therefore, the goal of this review was to provide a comprehensive introduction to publicly available Web-based resources and tools to help researchers select the appropriate tool for their needs. Thus, we organized these resource tools into three categories: global analysis, target analysis and auxiliary analysis. In addition, we provided five case studies, which demonstrate classic analysis methods along with corresponding tools. However, none of these tools completely replaces advanced computational and statistical methodologies. Moreover, it remains the responsibility of cancer researchers to understand this vast amount of data and translate it into testable hypotheses and novel diagnostic and therapeutic options for the clinic. To this end, it is our hope that the current survey will afford researchers the confidence needed to extend their current knowledge of cancer genomics and its complex details and networks to identify new approaches and targets for cancer treatment and prevention. TCGA provides unprecedented opportunities to increase our knowledge of cancer and facilitate the realization of precision medicine in oncology. The most comprehensive and currently available Web servers and resources that assist with TCGA data analysis are enumerated. The tools are classified based on their different analysis modes to help researchers select the appropriate tool for their work. Case studies are provided, which further illustrate the roles of TCGA data analysis in five predominant areas of cancer research. Click here for additional data file.

72 in total

1. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer.

Authors: Soonmyung Paik; Steven Shak; Gong Tang; Chungyeul Kim; Joffre Baker; Maureen Cronin; Frederick L Baehner; Michael G Walker; Drew Watson; Taesung Park; William Hiller; Edwin R Fisher; D Lawrence Wickerham; John Bryant; Norman Wolmark
Journal: N Engl J Med Date: 2004-12-10 Impact factor: 91.245

2. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository.

Authors: Kenneth Clark; Bruce Vendt; Kirk Smith; John Freymann; Justin Kirby; Paul Koppel; Stephen Moore; Stanley Phillips; David Maffitt; Michael Pringle; Lawrence Tarbox; Fred Prior
Journal: J Digit Imaging Date: 2013-12 Impact factor: 4.056

3. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1.

Authors: Roel G W Verhaak; Katherine A Hoadley; Elizabeth Purdom; Victoria Wang; Yuan Qi; Matthew D Wilkerson; C Ryan Miller; Li Ding; Todd Golub; Jill P Mesirov; Gabriele Alexe; Michael Lawrence; Michael O'Kelly; Pablo Tamayo; Barbara A Weir; Stacey Gabriel; Wendy Winckler; Supriya Gupta; Lakshmi Jakkula; Heidi S Feiler; J Graeme Hodgson; C David James; Jann N Sarkaria; Cameron Brennan; Ari Kahn; Paul T Spellman; Richard K Wilson; Terence P Speed; Joe W Gray; Matthew Meyerson; Gad Getz; Charles M Perou; D Neil Hayes
Journal: Cancer Cell Date: 2010-01-19 Impact factor: 31.743

4. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.

Authors: Katherine A Hoadley; Christina Yau; Denise M Wolf; Andrew D Cherniack; David Tamborero; Sam Ng; Max D M Leiserson; Beifang Niu; Michael D McLellan; Vladislav Uzunangelov; Jiashan Zhang; Cyriac Kandoth; Rehan Akbani; Hui Shen; Larsson Omberg; Andy Chu; Adam A Margolin; Laura J Van't Veer; Nuria Lopez-Bigas; Peter W Laird; Benjamin J Raphael; Li Ding; A Gordon Robertson; Lauren A Byers; Gordon B Mills; John N Weinstein; Carter Van Waes; Zhong Chen; Eric A Collisson; Christopher C Benz; Charles M Perou; Joshua M Stuart
Journal: Cell Date: 2014-08-07 Impact factor: 41.582

5. Subtypes of Pediatric High-Grade Gliomas ID'd.

Authors:
Journal: Cancer Discov Date: 2017-10-20 Impact factor: 39.397

6. Vanno: a visualization-aided variant annotation tool.

Authors: Po-Jung Huang; Chi-Ching Lee; Bertrand Chin-Ming Tan; Yuan-Ming Yeh; Kuo-Yang Huang; Ruei-Chi Gan; Ting-Wen Chen; Cheng-Yang Lee; Sheng-Ting Yang; Chung-Shou Liao; Hsuan Liu; Petrus Tang
Journal: Hum Mutat Date: 2015-02 Impact factor: 4.878

7. The landscape of cancer genes and mutational processes in breast cancer.

Authors: Philip J Stephens; Patrick S Tarpey; Helen Davies; Peter Van Loo; Chris Greenman; David C Wedge; Serena Nik-Zainal; Sancha Martin; Ignacio Varela; Graham R Bignell; Lucy R Yates; Elli Papaemmanuil; David Beare; Adam Butler; Angela Cheverton; John Gamble; Jonathan Hinton; Mingming Jia; Alagu Jayakumar; David Jones; Calli Latimer; King Wai Lau; Stuart McLaren; David J McBride; Andrew Menzies; Laura Mudie; Keiran Raine; Roland Rad; Michael Spencer Chapman; Jon Teague; Douglas Easton; Anita Langerød; Ming Ta Michael Lee; Chen-Yang Shen; Benita Tan Kiat Tee; Bernice Wong Huimin; Annegien Broeks; Ana Cristina Vargas; Gulisa Turashvili; John Martens; Aquila Fatima; Penelope Miron; Suet-Feung Chin; Gilles Thomas; Sandrine Boyault; Odette Mariani; Sunil R Lakhani; Marc van de Vijver; Laura van 't Veer; John Foekens; Christine Desmedt; Christos Sotiriou; Andrew Tutt; Carlos Caldas; Jorge S Reis-Filho; Samuel A J R Aparicio; Anne Vincent Salomon; Anne-Lise Børresen-Dale; Andrea L Richardson; Peter J Campbell; P Andrew Futreal; Michael R Stratton
Journal: Nature Date: 2012-05-16 Impact factor: 49.962

8. The transcriptional landscape and mutational profile of lung adenocarcinoma.

Authors: Jeong-Sun Seo; Young Seok Ju; Won-Chul Lee; Jong-Yeon Shin; June Koo Lee; Thomas Bleazard; Junho Lee; Yoo Jin Jung; Jung-Oh Kim; Jung-Young Shin; Saet-Byeol Yu; Jihye Kim; Eung-Ryoung Lee; Chang-Hyun Kang; In-Kyu Park; Hwanseok Rhee; Se-Hoon Lee; Jong-Il Kim; Jin-Hyoung Kang; Young Tae Kim
Journal: Genome Res Date: 2012-09-13 Impact factor: 9.043

9. GEPIA: a web server for cancer and normal gene expression profiling and interactive analyses.

Authors: Zefang Tang; Chenwei Li; Boxi Kang; Ge Gao; Cheng Li; Zemin Zhang
Journal: Nucleic Acids Res Date: 2017-07-03 Impact factor: 16.971

10. Complex landscapes of somatic rearrangement in human breast cancer genomes.

Authors: Philip J Stephens; David J McBride; Meng-Lay Lin; Ignacio Varela; Erin D Pleasance; Jared T Simpson; Lucy A Stebbings; Catherine Leroy; Sarah Edkins; Laura J Mudie; Chris D Greenman; Mingming Jia; Calli Latimer; Jon W Teague; King Wai Lau; John Burton; Michael A Quail; Harold Swerdlow; Carol Churcher; Rachael Natrajan; Anieta M Sieuwerts; John W M Martens; Daniel P Silver; Anita Langerød; Hege E G Russnes; John A Foekens; Jorge S Reis-Filho; Laura van 't Veer; Andrea L Richardson; Anne-Lise Børresen-Dale; Peter J Campbell; P Andrew Futreal; Michael R Stratton
Journal: Nature Date: 2009-12-24 Impact factor: 49.962

17 in total

Introduction

Variant data types within TCGA

Overview and categories of public Web-based tools for analyzing TCGA data

Global analysis

Type I

Broad GDAC Firehose

Cancer Landscapes

canEvolve

Regulome Explorer

TCGA Mbatch

TCGA Next-Generation Clustered Heatmaps

The Cancer Proteome Atlas

Type II

MethHC

Omics Analysis System for Precision Oncology

OncoScape

TCGA Clinical Explorer

TCGA SpliceSeq

Target analysis

Cancer3D

cBioPortal

Gene Expression Profiling Interactive Analysis

IntOGen

KMplotter

MEXPRESS

PROGgeneV2

TANRIC

TCGA4U

UALCAN

UCSC Xena

Wanderer

Zodiac

Auxiliary analysis

BCMD

CDSA

Cell Index Database

Gene–Drug Interaction for Survival in Cancer

PathwayMapper

TCIA

Vanno

Case studies

Patterns in global alteration profiles

Exploration of cancer drivers

Stratification of cancer patients

Correlation with multiple molecular features

Survival analysis

Usage advice

Mutation analysis

Correlation analysis

Differential analysis

Pathway analysis

Survival analysis

Pan-cancer analysis

Discussion

Review 1. Online informatics resources to facilitate cancer target and chemical probe discovery.

Review 9. Integration of Online Omics-Data Resources for Cancer Research.