| Literature DB >> 33119407 |
Marcel Ramos1,2,3, Ludwig Geistlinger1,2, Sehyun Oh1,2, Lucas Schiffer1,2,4, Rimsha Azhar1,2,5, Hanish Kodali1,2, Ino de Bruijn6, Jianjiong Gao6,7, Vincent J Carey8, Martin Morgan3, Levi Waldron1,2.
Abstract
PURPOSE: Investigations of the molecular basis for the development, progression, and treatment of cancer increasingly use complementary genomic assays to gather multiomic data, but management and analysis of such data remain complex. The cBioPortal for cancer genomics currently provides multiomic data from > 260 public studies, including The Cancer Genome Atlas (TCGA) data sets, but integration of different data types remains challenging and error prone for computational methods and tools using these resources. Recent advances in data infrastructure within the Bioconductor project enable a novel and powerful approach to creating fully integrated representations of these multiomic, pan-cancer databases.Entities:
Year: 2020 PMID: 33119407 PMCID: PMC7608653 DOI: 10.1200/CCI.19.00119
Source DB: PubMed Journal: JCO Clin Cancer Inform ISSN: 2473-4276
FIG 1.Comparison of The Cancer Genome Atlas (TCGA) data resources by integration, ease of use, and data completeness. Integration refers to the ability of the resource to be used within an analysis platform such as R and Bioconductor. A resource with high data completeness allows users to download the entirety of TCGA data. Ease of use is defined as the low cognitive overhead for use of a resource as imposed by data models and knowledge of query structures.
FIG A1.Flow diagram of the curatedTCGAData pipeline and cBioPortalData data provenance. NCI, National Cancer Institute; NHGRI, National Human Genome Research Institute; NIH, National Institutes of Health.
TCGA Cancer and Curation Data Available From curatedTCGAData
Descriptions of Data Types Available in curatedTCGAData by Bioconductor Data Class
FIG A2.(A) Example code for installing and downloading The Cancer Genome Atlas (TCGA) data using curatedTCGAData. (B) Example cBioPortalData code for downloading and exporting TCGA data from cBioPortal and through the cBioPortal application programming interface (API). (C) Example hg19 to hg38 liftOver procedure using Bioconductor tools.
FIG A3.Example code for downloading data through GenomicDataCommons and loading with TCGAutils.
FIG 2.OncoPrint plot of selected cancer driver genes frequently mutated across 33 The Cancer Genome Atlas cancer types.
FIG 3.Pan-cancer differential expression analysis. Shown are the top eight consistently downregulated genes (bottom left) and the top eight consistently upregulated genes (top right) when comparing cancer versus adjacent normal samples across 14 cancer types.
FIG 4.Pan-cancer gene set enrichment analysis. Shown are the 15 Gene Ontology Biologic Process terms that were most frequently found enriched for differential expression in cancer v adjacent-normal comparisons across 14 cancer types. On the left, enrichment is defined as being found by an over-representation analysis (ORA) with P < .05. For comparison, the right shows whether these terms were also found to be enriched according to another enrichment method (Pathway Analysis with Down-weighting of Overlapping Genes [PADOG]).
FIG 5.Histogram of the distribution of Pearson correlation coefficients between gene copy number and RNA sequencing gene expression in adrenocortical carcinoma. An integrative representation readily allows comparison and correlation of multiomics experiments.
FIG 6.Gene dosage effect on SNRPB2 expression in adrenocortical carcinoma (ACC) tumors. The violin plots show increasing expression of SNRPB2 with increasing copy number, corresponding to a Pearson correlation of 0.83 (the highest correlation observed in ACC).