| Literature DB >> 26484415 |
Katarzyna Klonowska1, Karol Czubak1, Marzena Wojciechowska1, Luiza Handschuh1,2, Agnieszka Zmienko1,3, Marek Figlerowicz1,3, Hanna Dams-Kozlowska4,5, Piotr Kozlowski1.
Abstract
Somatically acquired genomic alterations that drive oncogenic cellular processes are of great scientific and clinical interest. Since the initiation of large-scale cancer genomic projects (e.g., the Cancer Genome Project, The Cancer Genome Atlas, and the International Cancer Genome Consortium cancer genome projects), a number of web-based portals have been created to facilitate access to multidimensional oncogenomic data and assist with the interpretation of the data. The portals provide the visualization of small-size mutations, copy number variations, methylation, and gene/protein expression data that can be correlated with the available clinical, epidemiological, and molecular features. Additionally, the portals enable to analyze the gathered data with the use of various user-friendly statistical tools. Herein, we present a highly illustrated review of seven portals, i.e., Tumorscape, UCSC Cancer Genomics Browser, ICGC Data Portal, COSMIC, cBioPortal, IntOGen, and BioProfiling.de. All of the selected portals are user-friendly and can be exploited by scientists from different cancer-associated fields, including those without bioinformatics background. It is expected that the use of the portals will contribute to a better understanding of cancer molecular etiology and will ultimately accelerate the translation of genomic knowledge into clinical practice.Entities:
Keywords: COSMIC; IntOGen; PPISURV/MIRUMIR; Tumorscape; cBioPortal
Mesh:
Year: 2016 PMID: 26484415 PMCID: PMC4807991 DOI: 10.18632/oncotarget.6128
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Main characteristics of the selected oncogenomic portals
| database | data source | sites of analysed cancer | organisation of data | oncogenomic data/analyses | link/literature |
|---|---|---|---|---|---|
| Tumorscape | Broad Institute | Bd; Bld; Br; Bra; Clr; Eso; GIST; HN; Htp; Kd; Lng; Lvr; Lymph; Msh; Ov; Pnc; Prst; Sk; ST; Stc; Swn; Thr; Utr; also in: cancer cell lines | level i-iii | copy number alterations | |
| UCSC Cancer Genomics Browser | TCGA, SU2C Breast Cell Line, Cancer Cell Line Encyclopedia, The Connectivity Map, TARGET, cancer data from literature | Bd; Bld; Br; Bra; Chl; Col; Clr; EG; Eso; HN; Kd; Lng; Lvr; Lymph; Msh; Ov; Pan; Pnc; Prc /Prn; Prst; Rc; Sk; ST; Stc; Thm; Thr; Utr; also: cancer cell lines; cancer data from mouse models | level i-iii | DNA copy number, miRNA/exon/gene/protein expression, DNA methylation, gene-level mutations, PARADIGM pathway activity; clinical, epidemiological, and molecular information | |
| ICGC Data Portal | ICGC, TCGA, TARGET | Bd; Bld; Bo; Br; Bra; Clr; Col; Eso; HN; Kd; Lng; Lvr; Lymph; Nb; Ov; Pnc; Prst; Rc; Sk; ST; Stc; Thr; Utr; | level i-iv | simple somatic mutations, copy number somatic alterations, structural somatic mutations, simple germline variants, DNA methylation, gene/protein expression, miRNA expression, exon junction; epidemiological and clinical data | |
| COSMIC | TCGA, ICGC, cancer data from literature | Bo; Br; EA; Eso; GIST; Htp; Kd; Lvr; Lng; Ov; Pnc; Prst; Sk; Stc; Tst; Thm; Thr; Utr | level iii-iv | somatic mutations, copy number alterations, gene expression | |
| cBioPortal | AMC, BCCRC, BGI, British Columbia, Broad, Broad/Cornell, CCLE, CLCGP, Genentech, ICGC, JHU, Michigan, MKSCC, MKSCC/Broad, NCCS, NUS, PCGP, Pfizer UHK, Riken, Sanger, Singapore, TCGA, TSP, UTokyo, Yale | ACC; Bd; Bld; Br; Bra; Chl; Clr; Eso; HN; Kd; Lng; Lvr; Lymph; MM; Npx; Ov; Pnc; Prst; Sk; ST; Stc; Thr; Utr; also: cancer cell lines | level iii-iv | mutations, putative copy number alterations; mRNA expression, protein/phosphoprotein level; survival analyses | |
| IntOGen (2014.12) | TCGA, ICGC, cancer data from literature | Bd; Bld; Br; Bra; Clr; Eso; HN; Kd; Lng; Lvr; Lymph; Ov; Pnc; Prst; Sk; Stc; Thr; Utr | level iii-iv | results of the analyses indicating driver alterations and genes; therapies tailored to the mutation profiles of the analyzed patients | |
| BioProfiling.de | |||||
| PPISURV | for gene expression: Gene Expression Omnibus; for interactome: IntAct, HPRD, Reactome, HumanCyc, NCI_NATURE, PhosphoSitePlus | Bd; Bld; Br; Bra; Col; Htp; Lng; Lvr; Lymph; Ov; Prst; ST; Utr | level iv | survival analyses | |
| MIRUMIR | Gene Expression Omnibus | Br; Eso; Lvr; Lng; Npx; Ov; Prst; Sk | level iv | survival analyses | |
| DRUGSURV | for gene expression: Gene Expression Omnibus; for drugs modulating a gene of interest: DrugBank, Pubchem Bioassay | Bld; Br; Bd; Col; Bra; Lng; Lvr; Lymph; Prst;; ST; Utr | level iv | list of drugs targeting specific genes/cancer types; survival analyses |
List of abbrieviations of cancer sites. In the brackets there are exemplary cancer subtypes included in the portals.
ACC – adenoid cystic carcinoma; Bd – bladder; Bld – blood; Bo – bone; Br – breast; Bra – brain; Chl – cholangiocarcinoma; Clr – colorectal; Col – colon; EA – eye and adnexa; EG - endocrine glands; Eso – esophagus; GIST – gastrointestinal; HN – head and neck; Htp – hematopoietic; Kd – kidney; Lng – lung; Lvr – liver and biliary tract; Lymph – Lymphoma; Msh – mesothelioma; Mth – mouth; Nb – neuroblastoma; Npx – nasopharynx; Ov – ovary; Pan – pancancer; Pnc – pancreas; Pnx – pharynx; Prc/Prn - pheochromocytoma and paraganglioma; Prst – prostate; Rc – rectum; Sk – skin; ST – soft tissues; Stc – stomach; Swn – schwannoma; Thm – thymus; Thr – thyroid; Tst – testis; Utr – uterine (cerxix and corpus).
In oncogenomic portals cancer resources are arranged in different levels of organisation, including: (i) raw, (ii) computationally processed/normalized, (iii) interpreted and (iv) summarized data [3].
Figure 1Examples of Tumorscape data analysis and visualization
A. An example of the results that were obtained with the “cancer-centric” analysis. The table shows a list of genomic regions that were most frequently amplified in lung adenocarcinoma. The q-value represents the likelihood of a random occurrence of the specific amplification/deletion that is calculated based on the background copy number variation. The fourth most frequently amplified region that spans EGFR is highlighted. B. Results obtained with “gene-centric” analysis; the table depicts a list of cancers in which the representative gene (EGFR) is located in or near the frequently amplified region (orange and yellow rows, respectively). C. Visualization of chromosomal regions that span the exemplary EGFR and CDKN2A genes, which are undergoing frequent amplifications and deletions, respectively. The heatmaps show copy number variations of glioma and lung adenocarcinoma samples. Each row represents an individual sample, and red and blue indicate amplification and deletion, respectively.
Figure 2The UCSC Cancer Genomics Browser
An example of analysis focused on the EGFR genomic region that is conducted concurrently on various oncogenomic data across different cancer types and subtypes. A. Small-scale images (icons) of selected datasets that are simultaneously visualized in the browser. Datasets represented by icons are displayed in a column, similar to the datasets from panels B-D. B. A heatmap panel that presents the results of the TCGA genome-wide copy number analysis of glioblastoma multiforme (GBM) samples. A screenshot of the GBM dataset was used for presentation, based on the presence of considerable amplification of the genomic region that spans the representative EGFR. Each horizontal line (track) represents a specific sample. The red or blue colors indicate, respectively, a gain or loss in the copy number. On the right side of panel B, there is a drop-down list with epidemiological, clinical, and molecular attributes that can be used to sort the presented data (as shown in panel D). C. The TCGA copy number data identified in patients with GBM visualized as a proportions plot. D. A heatmap panel showing the results of TCGA analysis of gene expression in lung cancer samples in the genes that are indicated above (e.g., EGFR). Red and green colors indicate, respectively, upregulation and downregulation of the relative gene expression. The samples are sorted by epidemiological, clinical, and molecular attributes (selected from a drop-down list of attributes), as in panels B and C, shown on the right side of the expression panel. The copy number and expression data presented in panels B-D correspond to the same genomic region indicated above panel B. E. Kaplan-Meier plots generated using the attributes of lung cancer samples (shown in the right side of panel D).
Figure 3The ICGC Data Portal
An example of possible data analyses and visualizations. A. Three interactive entry points to the ICGC Data Portal. B. The “Cancer Projects” entry point. Screenshot of summary results from all 55 cancer projects. The upper left-hand panel: pie chart that depicts the distribution of cancer types (internal circle) and cancer subtypes/projects (external circle) among the donors, e.g., different lung cancer types and subtypes/projects (indicated in the pie chart). The upper right-hand panel: bar plot that represents the top 20 most frequently mutated genes. Different colors indicate different projects. The middle panel: scatter plot that depicts the distribution of the number of somatic mutations in the donors' exomes across cancer projects. Each dot represents the number of somatic mutations (per 1 Mb) that are identified in the analyzed sample. Vertical lines indicate the median number of mutations. The bottom part of panel B shows a summary of each project. More information about the specific project (types of experimental analyses, available genomic data, most commonly mutated genes, most common mutations, and most affected donors) can be found by clicking at specific project code. C. The “Advanced Search” entry point, which enables extended analysis of the oncogenomic data. This screenshot shows the browsing of donor features. The upper left-hand panel depicts features that can be used for filtering the donor data. The middle panel (pie charts) provides a summary of the clinical, epidemiological, and molecular attributes of the donors. The bottom panel represents summary data about specific donors. More information (clinical and genetic) can be found by clicking at the donor ID.
Figure 4Three levels of data analysis in the COSMIC browser
A. Screenshot shows exemplary EGFR gene data. The upper left-hand panel demonstrates basic information about the gene, whereas the right-hand panel of “Mutation analysis” provides links to the detailed data of mutations that were detected in the EGFR. Within the panel, there is a “Histogram” link that allows detailed analysis of the gene alterations, whose features are shown in the framed panel. One of the histograms shows the distribution of EGFR tyrosine kinase domain mutations, with the most frequently occurring mutation being L858R. The distribution can also be visualized as a table (on the right). B. The screenshots present the results for the representative lung adenocarcinoma cancer type. The left framed panel shows a list of the 20 most frequently mutated genes, whereas the middle and right framed panels display a CNV plot and the Mutation Matrix, respectively. The CNV circular plot shows a summary of the copy number variations across the whole genome of the lung adenocarcinoma. The height of the corresponding bars shows the total number of samples with CNV in a specific region. The Mutation Matrix presents alterations in the most frequently mutated genes (y-axis) in the adenocarcinoma samples that have the highest number of alterations (x-axis). C. Circular plot of all of the alterations (coding mutations, gene expression and CNV) that are detected in an individual exemplary sample (TCGA-A6-5657-01) of adenocarcinoma.
Figure 5Exemplary data analysis and visualization available in the cBioPortal
A. The table shows nonsynonymous mutations in the TCGA-50-5944-01 sample of lung adenocarcinoma. They are characterized by the mutation name, its type, its frequency and its effect on the expression of the mutated gene. Additional information on the frequency of specific mutations can be found under the “cBioPortal” and “Cosmic” columns. The table also provides the information about the predictable impact of a given mutation on the gene function (under the Mutation Assessor tool). B. Genes with copy number alterations (CNAs) in the TCGA-50-5944-01 sample are shown. The table also contains the information on the frequency of CNA in a specific gene and the effect of the alterations on the gene expression. C. Summary of the genomic alterations in four selected genes of lung adenocarcinoma samples. Each column shows an individual tumor sample in which homozygous deletions (blue), amplifications (red), missense mutations (green squares), truncating mutations (black squares) and no mutation changes (grey) were found. D. A plot of the correlation between copy number alterations and mRNA expression of the exemplary EGFR gene. E. Kaplan-Meier plot of overall survival shown for patients with (red) and without (blue) changes in EGFR. F. Summary graph of EGFR alterations (shown in different colors) in individual studies deposited in the portal. For a selected study, the distribution of the mutations is shown in the inset. For a selected mutation (here L858R), a 3D interactive protein structure can be displayed (the position of the mutation is indicated in red).
Figure 6Exemplary results generated in the PPISURV and MIRUMIR databases
A. Results generated with the PPISURV. Survival analysis shown for representative EGFR and its interactome. From the top: the first table depicts the summary of EGFR interactions that are annotated according to different interactomes across the available datasets. The last column of the table provides a link for more detailed characteristics of a selected interactome (shown in the second table). It includes the results of the analysis of the influence of the particular interactome on survival determined for all of the available datasets. The third table presents datasets on the direct correlation between EGFR expression and survival. The last column of the table is a link for the visualization of the data in the Kaplan-Meier graph. The exemplary graph shows the influence of EGFR expression on survival in lung adenocarcinoma patients. B. Results generated with MIRUMIR. The table shows a summary analysis for a representative microRNA-21 on the influence of its expression on survival in a specific cancer type. The inset represents the Kaplan-Meier graph of the effect of the microRNA expression on disease-free survival in breast cancer.