Literature DB >> 29617727

A survey and evaluation of Web-based tools/databases for variant analysis of TCGA data.

Zhuo Zhang1, Hao Li1, Shuai Jiang1, Ruijiang Li1, Wanying Li1, Hebing Chen1, Xiaochen Bo1.   

Abstract

The Cancer Genome Atlas (TCGA) is a publicly funded project that aims to catalog and discover major cancer-causing genomic alterations with the goal of creating a comprehensive 'atlas' of cancer genomic profiles. The availability of this genome-wide information provides an unprecedented opportunity to expand our knowledge of tumourigenesis. Computational analytics and mining are frequently used as effective tools for exploring this byzantine series of biological and biomedical data. However, some of the more advanced computational tools are often difficult to understand or use, thereby limiting their application by scientists who do not have a strong computational background. Hence, it is of great importance to build user-friendly interfaces that allow both computational scientists and life scientists without a computational background to gain greater biological and medical insights. To that end, this survey was designed to systematically present available Web-based tools and facilitate the use TCGA data for cancer research.
© The Author(s) 2018. Published by Oxford University Press.

Entities:  

Keywords:  The Cancer Genome Atlas; bioinformatics tools; cancer; databases; survey

Mesh:

Substances:

Year:  2019        PMID: 29617727      PMCID: PMC6781580          DOI: 10.1093/bib/bby023

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


Introduction

Cancer continues to be a key field of interest for human geneticists, despite the complexities involved. Moreover, despite the frequency of cancer diagnoses, scientists still do not know the causes for many cancers, or how best to treat them. More recently, high-throughput DNA sequencing [1-3] has revolutionized the study of cancer, and the use of sequencing data to assist in diagnosis is generally referred to as precision medicine [4, 5]. Thus, advances in our understanding of the cancer genome have the potential to improve precision medicine for individuals. In particular, massive efforts to undertake parallel next-generation sequencing (NGS) have revolutionized most facets of scientific discovery, and they are also responsible for many advances in the application of genomic information to human health, particularly in the field of oncology. Regarding the latter, the potential utility of these data encompasses early detection, diagnosis, prognosis ascertainment, recurrence detection, risk assessment and treatment selection for many cancers. The Cancer Genome Atlas (TCGA) project [6] represents a significant advance in cancer genomics with its aim to provide a comprehensive catalog of key genomic changes that occur in major cancer types [7, 8]. In addition, these data facilitate more effective diagnoses, treatments and prevention. Thus, this project has remarkable potential for scientists who study cancer, and many achievements with these data have already been published [9-14]. Comprehensive genomic data from a large number of patients would undoubtedly improve our knowledge and understanding of cancer-related genes and their clinical relevance. Currently, analyses of TCGA data are complex, with multiple steps involved (Figure 1) [15]. Moreover, to obtain meaningful biological results, each step of an analysis needs to be carefully considered, with specific tools applied to certain experimental models. To develop relevant and realistic exploration tools for available data, coordination between experimentalists and computational scientists is needed. However, life scientists may find it difficult to use many of the computational tools that have been developed by computational scientists and which require data preparation and installation and use of packaged software. This problem is further complicated by the fact that some software are platform- or operating system-specific. Conversely, computer scientists may face challenges in performing experimental validations to confirm predictions based on data analysis. Fortunately, there are Web-based tools that provide sophisticated computational solutions to help bridge this gap between wet-lab scientists and the many in silico tools available for the analysis of cancer genomic data. It is apparent that the appropriate choice of tools is not a trivial task, especially for inexperienced users. To the best of our knowledge, a comprehensive review of all available Web-based TCGA data analysis tools has not been reported. Such a review would be tremendously helpful for researchers with an interest in analyzing cancer genomic data, as it could potentially provide a guide for selecting analytical tools for a particular application. Therefore, we initiated this survey of existing Web-based tools/databases to compile a comprehensive list of programs that can perform variant analysis of TCGA data. Nonpublic tools and business tools were excluded from this survey.
Figure 1

Overview of common analysis and some applications for multidimensional data available from TCGA.

Overview of common analysis and some applications for multidimensional data available from TCGA. A total of 61 online analysis tools for cancer genome data were surveyed, including 32 which are primarily based on TCGA data. We have listed the functions, characteristics and suitable research areas for each. In addition, we have classified these complex tools into three categories based on their different uses of cancer genome data to facilitate their application by scientists lacking relevant data analysis experience. In addition, five case studies are described from a user’s perspective, which illustrate the major international cancer research areas and apply our review to the selection of these tools. It is anticipated that these efforts will enable researchers to select and use publicly available analysis tools. The present article is structured as follows. First, the TCGA database is introduced as a resource for understanding cancer genome data, and this is important for researchers who initially access this database. Next, all of the publicly available online analysis tools and their classifications are described. Finally, five cancer genome research questions with case studies are presented and discussed, and general recommendations for tool selection and prioritization according to the different types of cancer research are presented.

Variant data types within TCGA

To provide a comprehensive analysis of cancer genome profiles, TCGA applied high-throughput technologies based on microarray data of nucleic acids and proteins and NGS methods that provide global analyses of nucleic acids to generate genomic, transcriptomic, epigenomic and clinical data for several cancer types. To date, there are >10 000 cases of 33 tumor types available, with 20 cancer types each having >200 cases. The TCGA Data Portal is no longer operational, and all TCGA data have been centralized at the Genomic Data Commons (GDC) (https://gdc.nci.nih.gov/). The data can be downloaded for academic use. The identifier (ID) types listed at the GDC include: file universally unique identifier (UUID), file submitted ID (file name), case UUID, case submitted ID (case ID) and project ID. These ID types provide good identification and cataloging of a large amount of data (Table 1). The data types for each cancer include: somatic mutations, copy numbers, gene expression, microRNA (miRNA) expression, DNA methylation, reverse protein phase array (RPPA), and clinical information. Each data type includes raw and processed data that are available for public download, except for the raw sequencing files (Table 2). Somatic mutations are identified based on exome sequencing data, with exome sequencing able to detect single-nucleotide variants that are categorized as nonsynonymous or synonymous. Nonsynonymous single-nucleotide variants cause single amino acid substitutions, which may lead to altered protein function(s) or truncated proteins. Copy number alterations are generally the most frequent genetic events that occur during tumor development, and they have been determined with the Affymetrix SNP (Single Nucleotide Polymorphism) 6.0 array, which detects gains and losses in the genome. Gene expression and miRNA expression are determined with RNA sequencing (RNAseq) and miRNA sequencing analyses, respectively. The abundances of transcripts, isoforms, novel transcripts, gene fusions and noncoding RNAs can be extracted from the sequencing data. DNA methylation is determined by using the Illumina platform, which provides single-nucleotide resolution of CpGs across the vast majority of CpG islands and promoters in the genome. DNA methylation profiling provides information regarding epigenetic changes that have occurred in the genome. Protein expression is determined with RPPA [16], which is an array-based method of detecting proteins at nanogram levels. Validated antibodies are used to determine protein levels, as well as the levels of phosphorylated proteins. This analysis allows activated proteins to be detected, which would not be able to be inferred from RNA expression data. Clinical data are listed for each patient with standard metrics such as patient age, patient gender and time to death or last known contact date. For each cancer, there are specific stratification parameters. For instance, Gleeson scores are provided for prostate cancer, and Breslow index values are provided for melanoma. Overall survival, as well as progression-free survival, can be calculated and stratified according to cancer-specific staging. Generated data are also categorized not only by data type but also by data level. Raw, nonnormalized data (Level I), processed data (Level II) and segmented/interpreted data (Level III) apply to individual samples, while summarized data (Level IV) refer to analyses across sample sets. Levels III and IV data are freely available from publicly accessible databases; yet, access to lower level data (e.g. Levels I and II) requires specific permissions to be acquired and granted. Overall, each data type is comprehensive in its covering of the genome, and it is ideal for scientists who are studying cancer to obtain an integrated analysis of TCGA data.
Table 1

ID types within TCGA

ID typeDescriptionExample
File UUIDID of data in TCGA00a2364d-7385-4fa8-8562-b4f19548505a
File Submitted IDID of data uploaded to TCGA147f470-7440-42b8-8e3a-4e28b654916e-beta-value
Case UUIDSample/case ID in TCGA942c0088-c9a0-428c-a879-e16f8c5bfdb8
Case Submitted IDID of sample/case uploaded to TCGA, which is commonly used to represent sample/caseTCGA-CJ-4642
Project IDProject ID which sample/case belongs toTCGA-BRCA
Table 2

Description of data types and their access level

Data typeDescriptionAccess Level
Aligned ReadsRaw sequencing dataControlled
Raw Simple Somatic MutationRaw mutation information dataControlled
Annotated Somatic MutationAnnotated mutation information dataControlled
Aggregated Somatic MutationAggregated mutation information dataControlled
Masked Somatic MutationTransformed mutation information dataOpen
Gene Expression QuantificationGene expression dataOpen
Copy Number SegmentCopy number information dataOpen
Masked Copy Number SegmentTransformed copy number information dataOpen
Methylation Beta ValueMethylation dataOpen
Isoform Expression QuantificationMature miRNA expression dataOpen
miRNA Expression QuantificationmiRNA expression dataOpen
Biospecimen SupplementBiospecimen informationOpen
Clinical SupplementClinical informationOpen
ID types within TCGA Description of data types and their access level

Overview and categories of public Web-based tools for analyzing TCGA data

Owing to the large amount of genomic data available, specialized Web-based tools have been developed to aid clinicians and researchers in their analysis and interpretation of available data types in a meaningful way. Here, we have attempted to build an exhaustive list of Web-based tools that are publicly available for the analysis of TCGA data. In addition, we have classified these tools into specific categories. Table 3 provides a detailed list of the Web-based tools that represent the main resources currently available for analyzing TCGA data. Many useful indices are also indicated to facilitate the selection of tools according to a particular need. Furthermore, an enumeration of all back-end databases used, as well as main analysis content, uniform resource locator (URL), visualization type, download, batch query and application programming interface (API) availability, is presented. In the sections of each category below, and in Tables 3–4, the tools are presented in alphabetical order. To further distinguish and guide the selection of these available tools, we have divided our systematic exploration into three main categories as follows: (1) Global analysis; (2) Target analysis; and (3) Auxiliary analysis.
Table 3

List of Web servers and databases

NameDatabasesBatch queriesMutation analysisCorrelation analysisDifferential expression analysisPathway analysisKaplan–Meier plotsPan-cancer analysisVisualization typeDownloadAPIURL
BCMDTCGANoNoNoNoNoNoNoImageNoNo http://tcga.lbl.gov: 9999/
Broad GDAC FirehoseTCGANoYesYesYesYesYesYes

Matrix

Histogram

YesYes http://gdac.broadinstitute.org/
Cancer LandscapesTCGANoNoYesNoYesYesYes

Networks

Matrix

YesNo http://cancerlandscapes.org/
Cancer3D

TCGA

CCLE

NoYesNoNoNoNoNo

Genomic coordinates

Network

Scatter plots/box plots

3D structure

YesNo http://www.cancer3d.org
canEvolve

TCGA

ICGC

GEO

YesNoYesYesYesYesNo

Heatmap

Network

Plots

YesNo http://www.canevolve.org/
cbioportal

TCGA

CCLE

YesYesYesYesNoYesYes

Networks

Matrix

Heatmaps

YesYes http://cbioportal.org
CDSATCGANoNoNoNoNoNoNoImageNoNo http://cancer.digitalslidearchive.net/
CELLX

TCGA

CCLE

GEO

GSK

GTEx

YesYesYesYesNoYesNo

Heatmap

Matrix

YesNo http://cellx.sourceforge.net
GDISCTCGANoNoYesNoNoYesNo

Matrix

Box plots

YesNo https://gdisc.bme.gatech.edu
GEPIA

TCGA

GTEx

YesNoYesYesNoYesNo

Matrix

Bar graph

Box plots/violin plots/dot plots

YesYes http://gepia.cancer-pku.cn/
IntOGen

TCGA

ICGC

YesYesNoNoNoNoYes

Heatmap

Matrix

Histogram

YesNo https://www.intogen.org/search
KMplotter

TCGA

GEO

EGA

YesNoNoNoNoYesNoLinear plotsYesNo http://kmplot.com/analysis/
MethHCTCGAYesNoYesNoYesNoNo

Matrix

Heatmaps

YesNo http://methhc.mbc.nctu.edu.tw
MEXPRESSTCGANoNoYesYesNoNoNoGenomic coordinatesYesYes http://mexpress.be/
OASISPROTCGANoNoYesNoNoYesNoHistogram linear plots/box plotsYesNo http://tinyurl.com/oasispro
OncoScape

TCGA

CCLE

YesNoNoYesYesNoNo

Heatmap

Pathway maps

Matrix

Scatter plot

YesNo http://oncoscape.nki.nl/
PathwayMapperTCGANoNoNoNoYesNoNoPathway mapsYesYes http://pathwaymapper.org
PROGgeneV2

TCGA

GEO

NKI

YesNoNoNoNoYesNoLinear plotsYesNo http://www.compbio.iupui.edu/proggene
Regulome ExplorerTCGANoNoYesNoYesNoYes

Circos

Genomic coordinates

Network

Matrix

YesNo http://explorer.cancerregulome.org/all_pairs/
TANRIC

TCGA

CCLE

NoYesYesYesNoYesNoHeatmapsYesNo http://ibl.mdanderson.org/tanric/_design/basic/index.html
TCGA Clinial ExplorerTCGANoYesYesNoNoYesNoMatrix HistogramYesNo http://genomeportal.stanford.edu/pan-tcga/
TCGA MbatchTCGANoNoNoNoNoNoNo

Matrix

PCA diagrams

Hierarchical clustering diagrams

YesNo http://bioinformatics.mdanderson.org/tcgambatch/
TCGA NG-CHMTCGANoNoYesNoYesNoYesHeatmapsYesNo http://bioinformatics.mdanderson.org/chm
TCGA SpliceSeqTCGANoNoNoNoNoNoNoMatrixYesNo http://bioinformatics.mdanderson.org/TCGASpliceSeq/
TCGA4UTCGAYesYesNoYesNoYesNo

Heatmap

Matrix

Histogram

YesNo http://www.tcga4u.org: 8888
TCIATCGANoNoNoNoNoNoNoImageYesYes http://www.cancerimagingarchive.net
TCPATCGANoNoYesYesNoYesNo

Networks

Heatmaps

YesNo http://www.tcpaportal.org/tcpa/
UALCANTCGAYesNoNoYesNoYesNo

Heatmap

Boxplots

Linear plots

YesNo http://ualcan.path.uab.edu/tutorial.html
UCSC Xena

TCGA

GDC

ICGC

GTEx

TARGET

TOIL

NoYesNoNoNoYesYes

Heatmaps

Scatter plot

Histogram

YesYes http://xena.ucsc.edu/getting-started/
VannoTCGANoYesNoNoNoNoNo

Circos

Matrix

3D structure

Heatmap

YesNo http://cgts.cgu.edu.tw/vanno
WandererTCGANoNoYesYesNoNoNo

Genomic coordinates

Scatter plot

YesYes http://maplab.cat/wanderer
ZodiacTCGAYesNoYesNoNoNoYes

Matrix

Circular network

NoNo http://www.compgenome.org/zodiac2/
List of Web servers and databases Matrix Histogram Networks Matrix TCGA CCLE Genomic coordinates Network Scatter plots/box plots 3D structure TCGA ICGC GEO Heatmap Network Plots TCGA CCLE Networks Matrix Heatmaps TCGA CCLE GEO GSK GTEx Heatmap Matrix Matrix Box plots TCGA GTEx Matrix Bar graph Box plots/violin plots/dot plots TCGA ICGC Heatmap Matrix Histogram TCGA GEO EGA Matrix Heatmaps TCGA CCLE Heatmap Pathway maps Matrix Scatter plot TCGA GEO NKI Circos Genomic coordinates Network Matrix TCGA CCLE Matrix PCA diagrams Hierarchical clustering diagrams Heatmap Matrix Histogram Networks Heatmaps Heatmap Boxplots Linear plots TCGA GDC ICGC GTEx TARGET TOIL Heatmaps Scatter plot Histogram Circos Matrix 3D structure Heatmap Genomic coordinates Scatter plot Matrix Circular network Additional databases and Web servers In Table 4, an additional 29 online resources are provided. In these tools, TCGA data are not the major analysis object, and many of the tools do not access TCGA data unless an upgraded version is used.
Table 4

Additional databases and Web servers

NameContentURL
AnimalTFDB 2.0Animal transcription factors http://bioinfo.life.hust.edu.cn/AnimalTFDB/
ArrayMapA resource for genomic copy number profiles of human tumors http://www.arraymap.org
BloodSpotGene expression profiles and transcriptional programs for healthy and malignant hematopoiesis www.bloodspot.eu
BreCAN-DBBreak point profiles of cancer genomes http://brecandb.igib.res.in
Cancer RNA-Seq NexusPhenotype-specific transcriptome profiling http://syslab4.nchu.edu.tw/CRN
canSARCancer research and drug discovery http://cansar.icr.ac.uk/
ccmGDBCancer cell metabolism gene http://bioinfo.mc.vanderbilt.edu/ccmGDB
CGWBA computational platform to integrate clinical tumor mutation profiles with the reference human genome https://cgwb.nci.nih.gov/
ChimerDB 3.0Fusion gene http://ercsb.ewha.ac.kr/fusiongene/
ChIPBase v2.0Transcriptional regulatory networks of noncoding RNAs and protein-coding genes http://rna.sysu.edu.cn/chipbase/
CMPDCancer mutant proteome database http://cgbc.cgu.edu.tw/cmpd
COSMICSomatic mutations in human cancer http://cancer.sanger.ac.uk
dbDEMC 2.0Differentially expressed miRNAs in human cancer http://www.picb.ac.cn/dbDEMC
DBTSSTranscriptome, epigenome and genome sequence variation data http://dbtss.hgc.jp/
DiseaseMethHuman disease methylation database http://bioinfo.hrbmu.edu.cn/diseasemeth/
DriverDBv2Human cancer driver gene http://ngs.ym.edu.tw/driverdb
LNCeditingA database for functional effects of RNA editing in lncRNAs http://bioinfo.life.hust.edu.cn/LNCediting/
lncRNASNPSNPs in lncRNAs http://bioinfo.life.hust.edu.cn/lncRNASNP/
miRTarBase 2016MiRNA database http://miRTarBase.mbc.nctu.edu.tw/
MutageneCancer genetic heterogeneity https://www.ncbi.nlm.nih.gov/projects/mutagene/
MutationAlignerRecurrent mutation hot spots http://www.mutationaligner.org
mutLBSgeneDBMutated ligand-binding site gene DataBase http://zhaobioinfo.org/mutLBSgeneDB
NetGestaltMultidimensional omics data http://www.netgestalt.org
OncotatorCancer variant annotation tool http://www.broadinstitute.org/oncotator/
PhosphoSitePlusProtein posttranslational modifications http://www.phosphosite.org/
POSTARPosttranscriptional regulation http://postar.ncrnalab.org/
RBP-VarFunctional variants involved in regulation mediated by RNA-binding proteins http://www.rbp-var.biols.ac.cn/
WebGestalt 2017Enrichment analysis http://www.webgestalt.org
YM500v2MiRNAs for human cancer http://ngs.ym.edu.tw/ym500/

Global analysis

Global analysis tools allow users to examine the overall features of cancer genomes, and they can be a valuable resource for scientists who have just started to study cancer genomic data. There are two types of global analysis tools: type I and type II. The former only provides a global analysis, while the latter provides selected target analysis in addition to global analysis.

Type I

Broad GDAC Firehose

Broad GDAC Firehose (http://gdac.broadinstitute.org/) is a Web portal site developed by the Broad Institute to perform automated analyses of TCGA data for general users. Preprocessed annotated data and association analysis across all types of data, including clinical data, are provided. For example, it can provide a list of genes whose copy number alterations, methylation status, mRNA expression and mutations significantly correlate with tumor stage and patient survival, gender, age and ethnic background. Gene expression across all cancer types can also be easily assessed at the Firebrowse Web portal (http://firebrowse.org/).

Cancer Landscapes

Cancer Landscapes [17] is a Web-based tool that derives data networks by using a newer data-driven modeling method that is based on generalized sparse inverse covariance selection. This tool integrates genetic, epigenetic and transcriptional data from multiple cancers. Users are provided with interactive Web content that visualizes constructed network models based on statistical optimization.

canEvolve

The Web portal, canEvolve [18], stores functional genomics and other large-scale data on cancer, including gene and miNRA expression profiles and copy number changes. This tool provides users with easy access to information and analysis results derived from primary, integrative and network analyses of oncogenomic data that are generated by using various functional genomics platforms. The algorithms used for the analysis pipelines were selected based on the creators’ experience in creating and using such tools to generate biologically relevant hypotheses.

Regulome Explorer

Regulome Explorer [19] is a Web tool that integrates associations between clinical and molecular features of TCGA data. This tool enables users to search and visualize analytical data that are filtered according to user-specified parameters. All data types are mapped to a circos plot with genomic coordinates. There are other views available, which can be used to evaluate associations, including graphs and tables. Two-dimensional distributions of feature pairs (identified by association analysis) are also provided. Correlation of features is represented as edges between corresponding nodes.

TCGA Mbatch

TCGA Mbatch (http://bioinformatics.mdanderson.org/tcgambatch/) allows the user to assess and quantify the presence of any batch effects in a given TCGA data set via algorithms such as hierarchical clustering and principal component analysis. The results from these algorithms are then presented graphically as both simple and interactive diagrams. If significant batch effects are observed in the data, the user has the option to download data that have been computationally corrected according to methods such as Empirical Bayes (ComBat), Median Polish and analysis of variance.

TCGA Next-Generation Clustered Heatmaps

TCGA Next-Generation Clustered Heatmaps (TCGA NG-CHM) (http://bioinformatics.mdanderson.org/chm) is a tool that creates interactive large-scale visualizations of data based on a classic heat map approach. The user is able to zoom and pan across a heatmap, alter its color scheme, generate production quality PDFs and access rows, columns and individual heatmap entries that are related to statistics, databases and other information. TCGA NG-CHM also provides pathway and gene ontology (GO) information, chromosomal interactive ideograms, rapid recoloring, high-resolution graphics output and links to public information resources (e.g. cBioPortal) regarding genes, proteins, pathways and drugs.

The Cancer Proteome Atlas

The Cancer Proteome Atlas (TCPA) [20] is a portal for accessing proteomic data available from TCGA project, which includes extensively validated antibodies for nearly 200 proteins and phosphoproteins. Correlation analyses can be performed between proteins and for associations between proteins and patient prognosis. In addition to TCGA data, TCPA can also access data from established cancer cell lines and can provide validation of findings from TCGA RPPA data through independent sample cohorts.

Type II

MethHC

MethHC [21] is a database that integrates a large collection of DNA methylation data and mRNA/miRNA expression profiles in human cancers, and also identifies correlations between DNA methylation and mRNA/miRNA expression data from TCGA. The methylation data span gene regions [e.g. promoter, enhancer, 5′ untranslated region (UTR), first exon, gene body and 3′ UTR] and CpG islands (e.g. regions, shelves and shores). MethHC also provides methylation patterns of different cancers with hierarchical clustering graphs. Users can easily obtain 250 hypermethylated genes, 250 hypomethylated genes and 250 of the most differentially methylated genes for particular cancer types.

Omics Analysis System for Precision Oncology

Omics Analysis System for Precision Oncology (OASISPRO) [22] is an online platform that is designed to mine quantitative omics information from TCGA. This tool can effectively visualize patients’ clinical profiles and other omics data and can evaluate prediction performance by using held-out test sets. OASISPRO is also rather unique in that is uses a machine learning method.

OncoScape

OncoScape [23] is an R package software for cancer gene prioritization that has a Web portal for interactive analyses. OncoScape can access five complementary data types across 11 different cancers to identify new candidate cancer genes and explore cancer aberrations by using a fusion of genomic data. For example, with this tool, molecular profiling data of two groups of samples can be compared to identify genes that exhibit significant differences. OncoScape can also perform analyses of gene expression, DNA copy number, DNA methylation, mutation and short hairpin RNA (shRNA) knock-down data. Users can explore candidate genes for each cancer type and upload their own gene list to obtain a detailed aberration profile. OncoScape can provide box plots that show log changes in gene expression (e.g. copy number data) for tumor and normal samples, and can provide an overview of the prioritization scores in genomic regions and pathway diagrams.

TCGA Clinical Explorer

TCGA Clinical Explorer [24] enables the cancer research community and others to explore clinically relevant associations inferred from TCGA data. With its accessible Web and mobile interfaces, users can examine queries and test hypotheses regarding genomic/proteomic alterations across a broad spectrum of malignancies. This tool also summarizes TCGA clinical parameters and translates these data into a list of clinically relevant cancer drivers, including genes, miRNAs and proteins. All analyses include 25 cancer types and 18 clinical parameters. Users can query TCGA data in multiple ways, including searching for clinically relevant gene/protein/miRNAs by name, cancer type or clinical parameter; profiling genomic/proteomic changes according to clinical parameters in a cancer type; and testing two-hit hypotheses.

TCGA SpliceSeq

TCGA SpliceSeq [25] investigates cross-tumor and tumor-normal alterations in mRNA splicing patterns of TCGA RNASeq data. Percent Spliced In (PSI) values for splice events derived from 33 different types of tumor samples, including available adjacent normal samples, have been loaded into this tool. As a result, users can investigate the splicing pattern of a gene of interest in a variety of tumor types. TCGA SpliceSeq also provides knowledge discovery via genome-wide PSI splice event searches to locate significant splice variations among tumor types, or between tumor and normal tissue, and these splicing data can be downloaded for integrative analyses.

Target analysis

Target analysis is the category of public Web-based tools that is most often used by researchers. These tools allow researchers to investigate a target of interest with in-depth analyses of gene(s) and miRNAs.

Cancer3D

Cancer3D [26] is a public database that analyzes cancer missense mutations in the context of protein structures. It also allows users to explore two different cancer-related problems at the same time, e.g. drug sensitivity/biomarker identification and prediction of cancer drivers. In addition, somatic missense mutations from TCGA and Cancer Cell Line Encyclopedia (CCLE) can be mapped onto >24 300 structures, as well as onto 1300 potential novel protein domains.

cBioPortal

The cBioPortal [27] for Cancer Genomics offers one of the best Web-based tools for beginners who have limited experience analyzing genomic data and only want to analyze a limited number of genes. The cBioPortal is an open-access resource that was developed at the Memorial Sloan Kettering Cancer Center (MSKCC) for the visualization, analysis and download of large-scale cancer genomics data sets. It allows users to search gene(s) of interest in certain cancers or among all cancers in TCGA data, while providing a flexible interface for working with multiple data sets and easy-to-use visualization options. The cBioportal also offers correlation plots for expression and copy number alterations or methylation of genes, an ability to assess clinical relevance of genes with Kaplan–Meier plots, co-expression analysis and network analysis. Additionally, the portal facilitates interactive explorations of custom data sets with access to OncoPrinter and MutationMapper Web tools. OncoPrint diagrams provide intuitive diagrams of genomic alterations such as somatic mutations and copy number alterations across a set of samples, while MutationMapper provides a summary diagram of mutations on a linear protein map that has links to a database of three-dimensional (3D) protein structures for the user to examine the potential effects of the mutations identified.

Gene Expression Profiling Interactive Analysis

Gene Expression Profiling Interactive Analysis (GEPIA) [28] is a Web-based tool that rapidly delivers customizable functionalities based on TCGA and GTEx data. GEPIA provides key interactive and customizable functions that include differential expression analysis, profiling plotting, correlation analysis, patient survival analysis, similar gene detection and dimensionality reduction analysis.

IntOGen

IntOGen [29] is a Web platform that can identify cancer drivers across tumor types and perform a systematic analysis of the most up-to-date large data sets of tumor somatic mutations. The IntOGen pipeline integrates the results of tumor genome studies conducted with different mutation-calling workflows, and it is scalable to hundreds of thousands of tumor genomes. This tool can also compute the frequency of mutation for individual genes and/or pathways within a project or cancer site, detect a subset of novel candidate drivers and download driver mutations from previous studies.

KMplotter

KMplotter is an online tool that draws survival plots, which can be used to assess the relevance of gene expression levels on clinical outcome for treated and untreated cancer patients. Data are derived from gene expression, relapse-free survival and overall survival data that are downloaded from Gene Expression Omnibus (GEO) (Affymetrix microarrays only), European Genome-phenome Archive (EGA) and TCGA. Specifically, survival analyses can be performed for mRNAs from four cancer types (breast, ovarian, lung and gastric) and for miRNAs from two cancer types (breast and liver) [30].

MEXPRESS

MEXPRESS [31] is a straightforward and easy-to-use Web tool that integrates and visualizes gene expression, DNA methylation and clinical TCGA data on a single-gene level. It also provides correlation among data sets, has a unique set of features that are easy to use, and it can integrate visualizations of different data types for hundreds of samples. Currently, the developer of this tool is also looking into updating MEXPRESS to use the new repository of TCGA data.

PROGgeneV2

PROGgeneV2 [32] is a tool that allows researchers to use publicly available data to study prognostic implications of genes of interest in multiple cancers. For example, this tool can be used to generate plots of survival analysis data according to gene expression profiles of target genes in selected data sets from multiple cancers. Furthermore, either single genes or sets of genes can be used to estimate their association with prognosis of patients. This tool can also provide survival analyses for miRNA and PROGmiRV2 [33], and its usage is similar to that of PROGgeneV2.

TANRIC

TANRIC [34] is an open-access resource for investigating the function and clinical relevance of long noncoding RNAs (lncRNAs) in cancer. TANRIC provides three analysis modules that enable users to examine the function and underlying mechanisms of lncRNAs. It can characterize the expression profiles of lncRNAs in large patient cohorts of up to 20 cancer types, including TCGA, CCLE and other independent data sets. Users can examine whether lncRNAs exhibit differential expression profiles between tumor and normal samples, or among tumor subgroups. Possible correlations between lncRNAs and patient survival time can also be identified, while correlations between lncRNAs and various molecular data for protein-coding and miRNA genes can be explored.

TCGA4U

TCGA4U [35] is a tool that provides visualizations of the relationship between cancer genomics alterations and clinical data. This Web tool can apply four types of data (somatic mutation, DNA methylation, gene expression and copy number variants) for specific genes or gene lists to five types of cancer (lung squamous cell carcinoma, breast invasive carcinoma, colon adenocarcinoma, lung adenocarcinoma and rectum adenocarcinoma). By using specific genes and gene lists to analyze genomic alterations and characterize the molecular characteristics of cancers, cancer genomic mining is performed with the following outputs: potential driver genes are identified, GO term maps are generated and survival analyses are conducted.

UALCAN

UALCAN [36] is an interactive Web portal for researchers to facilitate the study of gene expression variation and survival associations across tumors. All data are from the TCGA database. It can help researchers identify survival associations that involve any gene of interest, across different cancer types as well as cancer subtypes as defined by various clinicopathologic features. The analysis results can be downloaded in several formats. Thus, this online tool can aid cancer biologists and clinicians in the identification of novel diagnostic and therapeutic targets, and investigate the gene expression and its disease association in any particular cancer.

UCSC Xena

UCSC Xena (http://xena.ucsc.edu/getting-started/) is a new tool that has been developed by the UCSC Cancer Browser, and it can analyze and visualize a user’s private functional genomics and data sets in the context of public and shared genomic/phenotypic data sets. The Xena platform consists of a set of federated data hubs and the Xena browser. The latter integrates across the hubs, thereby providing one location at which to analyze and visualize data. There is a large public Xena hub that currently hosts an expanding set of searchable data from several large consortiums, including TCGA, GDC, International Cancer Genome Consortium (ICGC), Genotype-Tissue Expression (GTEx), Therapeutically Available Research to Generate Effective Treatments (TARGET) and Scalable and Efficient Workflow Engine (TOIL). Dynamic Kaplan–Meier survival analyses can also be performed to assess survival according to certain parameters, and these data can be presented as visual spreadsheets, scatter plots and bar graphs.

Wanderer

Wanderer [37] is a public Web server that is able to explore and interpret gene-associated expression profiles and DNA methylation for all of the cancer types available at TCGA. This tool also provides normal–tumor paired comparisons in the form of graphs and comprehensive tables.

Zodiac

Zodiac [38] is a search engine and computational tool that obtains multiple features of gene networks, including copy number, gene expression, methylation, mutation, miRNA and some protein expression data, to describe molecular interactions for approximately 200 million pairs of genes. Zodiac then integrates existing knowledge about cancer genetic interactions with a Bayesian graphical model of TCGA data to produce updated and data-enhanced knowledge. The results are organized into a comprehensive database that allows customized searches to be performed. Zodiac also provides data processing and analysis tools that allow users to customize prior networks and update genetic pathways of interest. Furthermore, this tool can be used to identify gene interactions, to discover potential drug targets, and to identify potential genetic aberrations such as gene fusions.

Auxiliary analysis

The third category of public Web-based tools translates TCGA data into an online resource that is easily accessed, browsed and downloaded. These data can help users complement their experimental results, or they can provide additional proof and explanation of their research for comprehensive biological discoveries.

BCMD

BCMD [39] is a platform that can be used to represent and characterize tumor histology, and it can additionally provide an integrated analysis with clinical outcome. Data and intermediaries for a number of tumor types are available, and it has an interface that allows for panning and zooming of whole-mount tissue sections with or without overlaid segmentation results for quality control.

CDSA

CDSA [40] provides interactive tools for viewing and annotating diagnostic and tissue slide images of different tumor types from TCGA project. Currently, it hosts >20 000 whole-slide images from 22 cancer types. This searchable resource provides users with an opportunity to identify and explore sets of images according to particular genomic, pathologic or clinical criteria. Thus, CDSA represents a valuable resource for the fields of imaging and pathology.

Cell Index Database

Cell Index Database (CELLX) [41] is an online resource that can be used to manage multidimensional genomics data sets that contain gene expression, copy number variations, mutations and compound sensitivity data. Users can visualize, analyze and download data in a preformatted table that is suitable for offline computation. This tool is valuable for computational biologists who would prefer greater control over their data or would like to integrate custom data that are not available in public databases.

Gene–Drug Interaction for Survival in Cancer

Gene–Drug Interaction for Survival in Cancer (GDISC) [42] is a Web portal that integrates gene copy number, drug exposure and patient survival data. It allows users to interactively explore gene–drug interactions that have been identified in the context of TCGA, and to examine their favorite combinations of gene, drug and cancer type. Moreover, GDISC provides a list of drug names found in all cancer types, which can facilitate drug-specific analyses.

PathwayMapper

PathwayMapper [43] is a collaborative visual Web editor for cancer pathways. It can be used for viewing precurated cancer pathways, and it provides an option to overlay genomic alteration data. It also has an interactive graphical editing tool for creating and modifying pathways, it allows multiple users to cooperate curation in real time and support is provided for concurrent modifications and built-in conflict resolution. Finally, users can import data from the cBioPortal and export pathway images with alteration frequencies.

TCIA

TCIA [44] is a service created by the National Cancer Institute (NCI) to collect and share a large amount of radiological imaging data available from TCGA cases to support imaging phenotype–genotype research. Users can share or find research-relevant clinical image data collections and download detailed image files.

Vanno

Vanno [45] is a comprehensive variant annotation tool for the visualization and analysis of genetic alteration profiles. It provides an integrated framework for a functional analysis of genomic variants and the Web portal for comparing in-house data with TCGA data supports efforts to obtain a comprehensive identification of disease-relevant variations.

Case studies

The case studies presented here elaborate on five different cancer genomic research questions that can be answered visually with the available tools and resources described above. These case studies encompass major cancer research efforts, and they provide examples for the application of online tools for TCGA data analysis.

Patterns in global alteration profiles

Various alteration phenotypes have been observed in cancer cells. One of the most conspicuous of these is the mutation phenotype [46], where tumor cells exhibit an abnormally high mutation burden. Somatic mutation patterns have been described for: malignant melanoma [47], small cell lung carcinoma [48], acute lymphoblast leukemia [49], colorectal cancer [10], kidney cancer [50] and lung cancer [51]. These studies have demonstrated the value of whole-genome sequencing for obtaining global alteration profiles and analyzing the patterns observed. Broad GDAC Firehose is a good Web-based tool for exploring global alteration profiles. In this portal, the cancer type for mutation analysis can be directly specified, and a wealth of content analysis data can be selected. The latter includes aggregate analysis, correlation analysis with mutation and several mutation analysis methods including MutSig v2.0 (Figure 2A). The online results give users access to both standard data packages (right column), and standard analyses suite (left column). Analyses results may also be accessed from the unified reports. Furthermore, the results of an analysis can be downloaded in a PDF format, and this online tool has an interactive API for fine-grained querying of results via the Web. Another tool, Cancer Landscapes, can provide a high-performance statistical network modeling of multiple human cancers. Tumors are used to represent different cancer types and shapes represent different types of data. Users first select one of the multicancer modes for further analysis. The system then loads the model where different data types and cancers are represented as specific shapes and colors. Users can click on nodes to view the details of a local network and associated pathways (Figure 2B). In this exploration view, users can switch between different data types, adjust the optimization parameters and organize the network.
Figure 2

Two explorations of global alteration profile patterns as provided by publicly accessible Broad GDAC Firehose and Cancer Landscape Web tools. (A) This window view displays the user interface of Broad GDAC Firehose where users can choose a specific mutation analysis method. (B) This window provides network modeling of multiple cancers and data sets as indicated by the data sets and data types that were selected at the far right in Cancer Landscapes.

Two explorations of global alteration profile patterns as provided by publicly accessible Broad GDAC Firehose and Cancer Landscape Web tools. (A) This window view displays the user interface of Broad GDAC Firehose where users can choose a specific mutation analysis method. (B) This window provides network modeling of multiple cancers and data sets as indicated by the data sets and data types that were selected at the far right in Cancer Landscapes.

Exploration of cancer drivers

Distinguishing the alterations that give cancer cells a selective advantage (drivers) from those that are merely side effects (passengers) of a destabilized cancer genome is a major problem in oncogenomics research. Many studies have focused on the identification of novel cancer genes for many different cancer types including: acute lymphoblast leukemia [52], acute myeloid leukemia [53], breast cancer [54, 55], glioblastoma [56] and liver cancer [57]. Different tools use various methods to address this problem by exploiting the properties of driver genes. Here, we selected two Web-based tools, OncoScape and IntOGen, to test this problem. OncoScape can access five complementary data types (copy number, gene expression, DNA methylation, somatic mutation and shRNA) to identify new candidate cancer genes, with screening parameters and thresholds selected by the user. We can easily find all functional modules in the toolbar above, and the ‘Top Candidate Genes’ is a module that looks for cancer candidate genes. We used combined score and cutoff values ≥3 to identify drivers for lung adenocarcinoma (Figure 3A), and there is a detailed description for combined score and cutoff values in the ‘FAQ’. Meanwhile, IntOGen can directly provide driver genes according to the selected cancer type based on the frequency of occurrence for mutations. In addition, users can upload their own data for analysis of somatic mutations. Here, we used the public data set on this tool to perform somatic mutation analysis for specific cancer type. The plot shown in Figure 3B shows the most recurrently mutated cancer driver genes in lung adenocarcinoma. Each bar of the histogram indicates the number of samples with protein-affecting mutations. OncoScape and IntOGen identified 22 driver genes and 169 driver genes, respectively.
Figure 3

An exploration of driver genes associated with lung adenocarcinoma was conducted in OncoScape (A) and IntOGen (B). The two windows display different formats for the results obtained.

An exploration of driver genes associated with lung adenocarcinoma was conducted in OncoScape (A) and IntOGen (B). The two windows display different formats for the results obtained.

Stratification of cancer patients

It is necessary for cancers to be properly classified to achieve effective clinical management and meaningful laboratory investigations of underlying cancer mechanisms. While tumors may appear similar when examined with conventional diagnostic methods, they may look markedly different from a molecular viewpoint, and this can lead to differences in outcome and treatment response. Therefore, the molecular features of tumors can be used to stratify patients to support more accurate clinical and therapeutic decisions. Molecular stratification of tumors has been an important area of cancer research over the past few decades [58-61], and the studies performed have underscored the heterogeneous and complex nature of cancer subgroups. Molecular subtypes can be identified through different data types, including gene expression, copy number, DNA methylation and mutation data. Moreover, an integrated analysis is needed based on the different cancer characteristics. Currently, there are no tools that can directly provide stratification because of the complexity of this analysis. As a result, scientists need to combine many data types and clinical features for a comprehensive assessment. OASISPRO can identify genes that are strongly associated with tumor stage by applying user-selected machine learning algorithms to omic data and evaluating prediction performance by using held-out test sets (Figure 4). However, OASISPRO only focuses on the classification of clinical phenotypes, and it cannot synthesize a variety of data types. Users have to strictly follow the settings of the tool for step-by-step selection. In addition, OASISPRO can only use a single clinical feature parameter for analysis. Thus, OASISPRO would be useful for preliminary analyses and scientific hypotheses.
Figure 4

Views of interface windows in OASISPRO. (A) The stepwise selection of parameters for conducting a classification of clinical phenotypes is shown. (B) This window presents the input variables and results obtained from a representative analysis.

Views of interface windows in OASISPRO. (A) The stepwise selection of parameters for conducting a classification of clinical phenotypes is shown. (B) This window presents the input variables and results obtained from a representative analysis.

Correlation with multiple molecular features

Studies of correlations among multiple molecular features can provide valuable insight into complex biological systems. Individual data sets that include genomic, epigenomic, transcriptomic or proteomic information are highly informative, and the integration of these data sets offers an exciting potential to answer many long-standing questions. For example, integrated analyses of transcriptomic, proteomic and metabolomic data have helped researchers better understand global regulatory processes and complex metabolic networks in cancer [62, 63]. Many tools can provide correlation analyses for various molecular features. In fact, more than half of the tools included in our study can conduct a correlation analysis. However, the major function of Regulome Explorer is to perform correlation analyses. Users can select a data set to load and get the genome-level view for the correlation between different data types. This tool provides both circos plots and network representations of correlations between multi-omics features, and it includes nine data types (Figure 5). It can map multi-omics features onto genomic locations for further systems biology analyses. Moreover, the parameters of a correlation can be adjusted according to a filter panel that is presented on the right side of the Web server and both network maps and detailed data tables of correlations are provided.
Figure 5

A representative window of the results provided by Regulome Explorer for a correlation analysis. This figure displays the main user interface, including the option for using multiple data types.

A representative window of the results provided by Regulome Explorer for a correlation analysis. This figure displays the main user interface, including the option for using multiple data types.

Survival analysis

Identification of prognostic biomarkers, which may include genes, polymorphisms, mutations, micromolecules or epigenetic regulators, represents a major advance in the field of cancer genomics. Cancer research predominantly focuses on specific patient populations for biomarker identification. Gene signatures have been developed specific for prognostication in particular subtype of a cancer, for instance, a subgroup of population treated with a specific drug. To date, gene signatures of prognostic importance have been reported for breast cancer [64, 65], colon cancer [66, 67], liver cancer [68], lung cancer [69, 70] and pancreatic cancer [71]. Generally, the primary end point of prognostic assessment is survival analysis, and patient groups are divided into good or bad prognosis groups based on weighted or unweighted expression of individual genes or groups of genes. This type of analysis provides a rationale for mechanistic studies, followed by therapeutic targeting. Web-based tools can be used to identify and expand prognostic biomarker targets in different cancers based on the publicly available data these tools have compiled. In addition to providing easy-to-perform prognostic analyses for multiple cancers, they can also be important hypothesis-generating tools for researchers working on topics related to cancer. Here, PROGgeneV2 and KMplotter were selected to perform test analyses. Users can select gene(s), cancer type, survival measure and the data set for specific parameter settings. The results of the survival analysis conducted by PROGgeneV2 are presented in a KM plot (Figure 6), while KMplotter could not provide results because of an insufficient number of TCGA samples. These results demonstrate that the parameters and data sources for Web-based tools are not exactly the same, as the number of lung adenocarcinoma samples obtained from TCGA differed between the two analysis programs. Therefore, users need to carefully consider the data being subjected to analysis and select appropriate parameters.
Figure 6

A representative survival plot generated with PROGgeneV2. TP53 gene expression was applied to a lung adenocarcinoma data set from TCGA.

A representative survival plot generated with PROGgeneV2. TP53 gene expression was applied to a lung adenocarcinoma data set from TCGA.

Usage advice

Our study has identified three categories of all online TCGA analysis tools. The user can make preliminary screening according to their own work needs. All tools in each category have their unique features that we described above. It can also be identified based on different cancer genomic research questions as we described in case studies. Finally, the user need to combine their study, such as data sources, data types, analytical methods and research purposes, to determine the specific tool for further analysis. The following are specific suggestions for different analysis of TCGA data.

Mutation analysis

There are 10 online tools (Broad GDAC Firehose, Cancer3D, cbioportal, CELLX, IntOGen, TANRIC, TCGA Clinical Explorer, TCGA4U, UCSC Xena and Vanno) that can perform mutation analysis. In general, we recommend cbioportal because this tool contains a variety of cancer types and multiple visualizations, and it is powerful but easy to use.

Correlation analysis

There are 17 online tools (Broad GDAC Firehose, Cancer Landscapes, canEvolve, cbioportal, CELLX, GDISC, GEPIA, MethHC, MEXPRESS, OASISPRO, Regulome Explorer, TANRIC, TCGA Clinical Explorer, TCGA NG-CHM, TCPA, Wanderer and Zodiac) that can perform correlation analysis. In general, we recommend Broad GDAC Firehose from Broad institute of MIT and Harvard, which has a variety of analysis algorithms available to users.

Differential analysis

There are 12 online tools (Broad GDAC Firehose, canEvolve, cbioportal, CELLX, GEPIA, MEXPRESS, OncoScape, TANRIC, TCGA4U, TCPA, UALCAN and Wanderer) that can perform differential analysis. In general, we recommend GEPIA, an analysis tool for gene expression profiling. Differential analysis is this tool’s main analysis function, and the online analysis interface is simple and easy to understand.

Pathway analysis

There are eight online tools (Broad GDAC Firehose, Cancer Landscapes, canEvolve, MethHC, OncoScape, PathwayMapper, Regulome Explorer and TCGA NG-CHM) that can perform pathway analysis. We recommend Broad GDAC Firehose and OncoScape; the former has a variety of analysis methods, and the latter is more simple and intuitive.

Survival analysis

There are 16 online tools (Broad GDAC Firehose, Cancer Landscapes, canEvolve, cbioportal, CELLX, GDISC, GEPIA, KMplotter, OASISPRO, PROGgeneV2, TANRIC, TCGA Clinical Explorer, TCGA4U, TCPA, UALCAN and UCSC Xena) that can perform survival analysis. If users want to perform this single analysis, we recommend PROGgeneV2, which has a wide range of data sources and adjustable parameters for survival analysis.

Pan-cancer analysis

There are eight online tools (Broad GDAC Firehose, Cancer Landscapes, cbioportal, IntOGen, Regulome Explorer, TCGA NG-CHM, UCSC Xena and Zodiac) that can perform pan-cancer analysis. In general, we recommend cbioportal and Cancer Landscapes. The former has a large number of samples from pan-cancer studies and powerful analytical capabilities. The latter has combined pan-cancer model for analysis.

Discussion

The functionalities of a cancer can be better characterized by integrating information from different modalities. TCGA data were collected by using a number of different modalities, and data for several tumor types are available. Consequently, TCGA data represents a valuable resource for researchers to advance their understanding of various cancers and to facilitate the realization of precision medicine in oncology. Multilayer analyses performed on different platforms reflect distinct biological characteristics, and these provide a better understanding of cancer biology. As a result, improvements in patient stratification, identification of novel prognostic or predictive markers and the identification of novel therapeutic targets can be achieved. However, integrating information from different modalities to obtain a comprehensive analysis remains a prodigious challenge [72]. Many bioinformatics tools that are compatible with TCGA data have been developed for basic scientists who do not have extensive training in informatics, statistics or clinical knowledge. Correspondingly, the wealth of available tools for analysis and interpretation of data reflects the importance of TCGA and the dynamic nature of the field of data analysis. Therefore, the goal of this review was to provide a comprehensive introduction to publicly available Web-based resources and tools to help researchers select the appropriate tool for their needs. Thus, we organized these resource tools into three categories: global analysis, target analysis and auxiliary analysis. In addition, we provided five case studies, which demonstrate classic analysis methods along with corresponding tools. However, none of these tools completely replaces advanced computational and statistical methodologies. Moreover, it remains the responsibility of cancer researchers to understand this vast amount of data and translate it into testable hypotheses and novel diagnostic and therapeutic options for the clinic. To this end, it is our hope that the current survey will afford researchers the confidence needed to extend their current knowledge of cancer genomics and its complex details and networks to identify new approaches and targets for cancer treatment and prevention. TCGA provides unprecedented opportunities to increase our knowledge of cancer and facilitate the realization of precision medicine in oncology. The most comprehensive and currently available Web servers and resources that assist with TCGA data analysis are enumerated. The tools are classified based on their different analysis modes to help researchers select the appropriate tool for their work. Case studies are provided, which further illustrate the roles of TCGA data analysis in five predominant areas of cancer research. Click here for additional data file.
  72 in total

1.  A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer.

Authors:  Soonmyung Paik; Steven Shak; Gong Tang; Chungyeul Kim; Joffre Baker; Maureen Cronin; Frederick L Baehner; Michael G Walker; Drew Watson; Taesung Park; William Hiller; Edwin R Fisher; D Lawrence Wickerham; John Bryant; Norman Wolmark
Journal:  N Engl J Med       Date:  2004-12-10       Impact factor: 91.245

2.  The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository.

Authors:  Kenneth Clark; Bruce Vendt; Kirk Smith; John Freymann; Justin Kirby; Paul Koppel; Stephen Moore; Stanley Phillips; David Maffitt; Michael Pringle; Lawrence Tarbox; Fred Prior
Journal:  J Digit Imaging       Date:  2013-12       Impact factor: 4.056

3.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1.

Authors:  Roel G W Verhaak; Katherine A Hoadley; Elizabeth Purdom; Victoria Wang; Yuan Qi; Matthew D Wilkerson; C Ryan Miller; Li Ding; Todd Golub; Jill P Mesirov; Gabriele Alexe; Michael Lawrence; Michael O'Kelly; Pablo Tamayo; Barbara A Weir; Stacey Gabriel; Wendy Winckler; Supriya Gupta; Lakshmi Jakkula; Heidi S Feiler; J Graeme Hodgson; C David James; Jann N Sarkaria; Cameron Brennan; Ari Kahn; Paul T Spellman; Richard K Wilson; Terence P Speed; Joe W Gray; Matthew Meyerson; Gad Getz; Charles M Perou; D Neil Hayes
Journal:  Cancer Cell       Date:  2010-01-19       Impact factor: 31.743

4.  Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.

Authors:  Katherine A Hoadley; Christina Yau; Denise M Wolf; Andrew D Cherniack; David Tamborero; Sam Ng; Max D M Leiserson; Beifang Niu; Michael D McLellan; Vladislav Uzunangelov; Jiashan Zhang; Cyriac Kandoth; Rehan Akbani; Hui Shen; Larsson Omberg; Andy Chu; Adam A Margolin; Laura J Van't Veer; Nuria Lopez-Bigas; Peter W Laird; Benjamin J Raphael; Li Ding; A Gordon Robertson; Lauren A Byers; Gordon B Mills; John N Weinstein; Carter Van Waes; Zhong Chen; Eric A Collisson; Christopher C Benz; Charles M Perou; Joshua M Stuart
Journal:  Cell       Date:  2014-08-07       Impact factor: 41.582

5.  Subtypes of Pediatric High-Grade Gliomas ID'd.

Authors: 
Journal:  Cancer Discov       Date:  2017-10-20       Impact factor: 39.397

6.  Vanno: a visualization-aided variant annotation tool.

Authors:  Po-Jung Huang; Chi-Ching Lee; Bertrand Chin-Ming Tan; Yuan-Ming Yeh; Kuo-Yang Huang; Ruei-Chi Gan; Ting-Wen Chen; Cheng-Yang Lee; Sheng-Ting Yang; Chung-Shou Liao; Hsuan Liu; Petrus Tang
Journal:  Hum Mutat       Date:  2015-02       Impact factor: 4.878

7.  The landscape of cancer genes and mutational processes in breast cancer.

Authors:  Philip J Stephens; Patrick S Tarpey; Helen Davies; Peter Van Loo; Chris Greenman; David C Wedge; Serena Nik-Zainal; Sancha Martin; Ignacio Varela; Graham R Bignell; Lucy R Yates; Elli Papaemmanuil; David Beare; Adam Butler; Angela Cheverton; John Gamble; Jonathan Hinton; Mingming Jia; Alagu Jayakumar; David Jones; Calli Latimer; King Wai Lau; Stuart McLaren; David J McBride; Andrew Menzies; Laura Mudie; Keiran Raine; Roland Rad; Michael Spencer Chapman; Jon Teague; Douglas Easton; Anita Langerød; Ming Ta Michael Lee; Chen-Yang Shen; Benita Tan Kiat Tee; Bernice Wong Huimin; Annegien Broeks; Ana Cristina Vargas; Gulisa Turashvili; John Martens; Aquila Fatima; Penelope Miron; Suet-Feung Chin; Gilles Thomas; Sandrine Boyault; Odette Mariani; Sunil R Lakhani; Marc van de Vijver; Laura van 't Veer; John Foekens; Christine Desmedt; Christos Sotiriou; Andrew Tutt; Carlos Caldas; Jorge S Reis-Filho; Samuel A J R Aparicio; Anne Vincent Salomon; Anne-Lise Børresen-Dale; Andrea L Richardson; Peter J Campbell; P Andrew Futreal; Michael R Stratton
Journal:  Nature       Date:  2012-05-16       Impact factor: 49.962

8.  The transcriptional landscape and mutational profile of lung adenocarcinoma.

Authors:  Jeong-Sun Seo; Young Seok Ju; Won-Chul Lee; Jong-Yeon Shin; June Koo Lee; Thomas Bleazard; Junho Lee; Yoo Jin Jung; Jung-Oh Kim; Jung-Young Shin; Saet-Byeol Yu; Jihye Kim; Eung-Ryoung Lee; Chang-Hyun Kang; In-Kyu Park; Hwanseok Rhee; Se-Hoon Lee; Jong-Il Kim; Jin-Hyoung Kang; Young Tae Kim
Journal:  Genome Res       Date:  2012-09-13       Impact factor: 9.043

9.  GEPIA: a web server for cancer and normal gene expression profiling and interactive analyses.

Authors:  Zefang Tang; Chenwei Li; Boxi Kang; Ge Gao; Cheng Li; Zemin Zhang
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

10.  Complex landscapes of somatic rearrangement in human breast cancer genomes.

Authors:  Philip J Stephens; David J McBride; Meng-Lay Lin; Ignacio Varela; Erin D Pleasance; Jared T Simpson; Lucy A Stebbings; Catherine Leroy; Sarah Edkins; Laura J Mudie; Chris D Greenman; Mingming Jia; Calli Latimer; Jon W Teague; King Wai Lau; John Burton; Michael A Quail; Harold Swerdlow; Carol Churcher; Rachael Natrajan; Anieta M Sieuwerts; John W M Martens; Daniel P Silver; Anita Langerød; Hege E G Russnes; John A Foekens; Jorge S Reis-Filho; Laura van 't Veer; Andrea L Richardson; Anne-Lise Børresen-Dale; Peter J Campbell; P Andrew Futreal; Michael R Stratton
Journal:  Nature       Date:  2009-12-24       Impact factor: 49.962

View more
  17 in total

Review 1.  Online informatics resources to facilitate cancer target and chemical probe discovery.

Authors:  Xuan Yang; Haian Fu; Andrey A Ivanov
Journal:  RSC Med Chem       Date:  2020-04-09

2.  Comprehensive analysis of lncRNA biomarkers in kidney renal clear cell carcinoma by lncRNA-mediated ceRNA network.

Authors:  Ke Gong; Ting Xie; Yong Luo; Hui Guo; Jinlan Chen; Zhiping Tan; Yifeng Yang; Li Xie
Journal:  PLoS One       Date:  2021-06-08       Impact factor: 3.240

3.  New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx.

Authors:  Mohamed Mounir; Marta Lucchetta; Tiago C Silva; Catharina Olsen; Gianluca Bontempi; Xi Chen; Houtan Noushmehr; Antonio Colaprico; Elena Papaleo
Journal:  PLoS Comput Biol       Date:  2019-03-05       Impact factor: 4.475

4.  DNA methylation-based classification and identification of renal cell carcinoma prognosis-subgroups.

Authors:  Wenbiao Chen; Jia Zhuang; Peizhong Peter Wang; Jingjing Jiang; Chenhong Lin; Ping Zeng; Yan Liang; Xujun Zhang; Yong Dai; Hongyan Diao
Journal:  Cancer Cell Int       Date:  2019-07-16       Impact factor: 5.722

5.  Substantial batch effects in TCGA exome sequences undermine pan-cancer analysis of germline variants.

Authors:  Roni Rasnic; Nadav Brandes; Or Zuk; Michal Linial
Journal:  BMC Cancer       Date:  2019-08-07       Impact factor: 4.430

6.  A six-gene prognostic model predicts overall survival in bladder cancer patients.

Authors:  Liwei Wang; Jiazhong Shi; Yaqin Huang; Sha Liu; Jingqi Zhang; Hua Ding; Jin Yang; Zhiwen Chen
Journal:  Cancer Cell Int       Date:  2019-09-05       Impact factor: 5.722

7.  Specific Lung Squamous Cell Carcinoma Prognosis-Subtype Distinctions Based on DNA Methylation Patterns.

Authors:  Guichuan Huang; Jing Zhang; Ling Gong; Daishun Liu; Xin Wang; Yi Chen; Shuliang Guo
Journal:  Med Sci Monit       Date:  2021-03-04

8.  Biological Function and Clinical Value of VPS13A in Pan-Cancer Based on Bioinformatics Analysis.

Authors:  Xue Qin Zhang; Li Li
Journal:  Int J Gen Med       Date:  2021-10-16

Review 9.  Integration of Online Omics-Data Resources for Cancer Research.

Authors:  Tonmoy Das; Geoffroy Andrieux; Musaddeque Ahmed; Sajib Chakraborty
Journal:  Front Genet       Date:  2020-10-23       Impact factor: 4.599

10.  Establishment and Validation of a Comprehensive Prognostic Model for Patients With HNSCC Metastasis.

Authors:  Yajun Shen; Lingyu Li; Yunping Lu; Min Zhang; Xin Huang; Xiaofei Tang
Journal:  Front Genet       Date:  2021-07-12       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.