| Literature DB >> 33982016 |
A Gordon Robertson1, Christina Yau2,3, Jian Carrot-Zhang4,5,6, Jeffrey S Damrauer7, Theo A Knijnenburg8, Nyasha Chambwe8, Katherine A Hoadley7, Anab Kemal9, Jean C Zenklusen9, Andrew D Cherniack4,5,6, Rameen Beroukhim4,5,6,10,11, Wanding Zhou12,13.
Abstract
Cellular and molecular aberrations contribute to the disparity of human cancer incidence and etiology between ancestry groups. Multiomics profiling in The Cancer Genome Atlas (TCGA) allows for querying of the molecular underpinnings of ancestry-specific discrepancies in human cancer. Here, we provide a protocol for integrative associative analysis of ancestry with molecular correlates, including somatic mutations, DNA methylation, mRNA transcription, miRNA transcription, and pathway activity, using TCGA data. This protocol can be generalized to analyze other cancer cohorts and human diseases. For complete details on the use and execution of this protocol, please refer to Carrot-Zhang et al. (2020).Entities:
Keywords: Bioinformatics; Cancer; Genomics
Mesh:
Substances:
Year: 2021 PMID: 33982016 PMCID: PMC8082263 DOI: 10.1016/j.xpro.2021.100483
Source DB: PubMed Journal: STAR Protoc ISSN: 2666-1667
Figure 2Expected outcomes
(A) Distribution of ancestry-associated DNA methylation for EDAR. Slope coefficients (Δβ) are labeled on top of the plot.
(B) Distribution of ancestry-associated DNA methylation for PACS2.
(C) The slope coefficients are plotted for multiple CpGs (ordered by genomic coordinates) for PACS2.
(D) Comparison of pan-cancer ancestry-association (Y-axis) with cancer-type-specific ancestry-association (X-axis, showing LIHC on the left and COAD on the right).
(E) Venn diagram showing the number of significant associations of non-EUR ancestry group with EUR.
(F) Bar plot for the number of non-EUR cases for COAD and LIHC.
(G) Slope coefficients associated with iCluster memberships.
(H) The overlap between subtype-corrected ancestry association and uncorrected inference on BLCA.
(I) The effect size distribution of Δβ in each cancer type.
(J) Distribution of cancer type-specific ancestry association in the number of cancer types where this association is found (X-axis). The overlap with pan-cancer analysis is labeled on top of each bar. The left panel shows the distribution without the effect size filter. The right panel corresponds to after filtering .
Figure 1Input format example and workflow for ancestry association
(A) Example input files for regression. The response variable is colored in red. Top: DNA methylation. Bottom: mRNA expression.
(B) General analysis workflow, showing DNA methylation in expanded panels.
Regression models for detecting ancestry-associated molecular correlates
| Alternative hypothesis | Method | Example model |
|---|---|---|
| Somatic alteration is associated with local ancestry | Logistic Regression | somatic alteration ~ local ancestry + percentage of EUR ancestry + percentage of AFR ancestry |
| Ancestry is associated with arm-level SCNA | Logistic Regression | ancestry ~ arm-level SCNA + aneuploidy + genome doubling + age + gender + subtype |
| Ancestry is associated with TMB, aneuploidy, and IMS | Logistic Regression | ancestry ~ aneuploidy + TMB + IMS + age + gender + subtype |
| DNA methylation is associated with ancestry | Linear Regression | DNA methylation level ( |
| mRNA expression is associated with ancestry | Linear Regression | log2(RPM) ~ ancestry + TCGA plate number + subtype |
| miRNA expression is associated with ancestry | Linear Regression | log2(RPM+1) ~ ancestry + subtype |
| Pathway activity is associated with ancestry | Linear Regression | IPL ~ ancestry + subtype |
RPM: Read Per Million. IPL: Integrated Pathway Level.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| TCGA patient ancestry assignment | Table S1 of | |
| TCGA cancer type and subtype | ||
| TCGA BLCA mRNA subtype | ||
| TCGA patient gender and age | Liu | |
| TCGA sample quality whitelist | Genomic Data Commons | |
| TCGA MC3 mutation calls | Ellrott et al. 2018 | |
| Cancer mutational signature | Alexandrov | |
| TCGA ABSOLUTE-estimated purity | Genomic Data Commons | |
| TCGA local ancestry call | ||
| TCGA Affymetrix SNP 6.0 microarray data | Genomic Data Commons Legacy Archive | |
| TCGA GISTIC copy number call | GISTIC: McCarroll | |
| TCGA chromosome arm calls and aneuploidy scores | Taylor | |
| TCGA Infinium BeadChip data (DNA methylation level, or beta values) | Genomic Data Commons Legacy Archive | |
| Probe annotation for Infinium BeadChip platform | ||
| TCGA WGBS for 49 samples | Zhou | |
| TCGA normalized mRNA data | Genomic Data Commons, Pancan Atlas Portal | |
| TCGA normalized mature strand miRNA RPM data | Genomic Data Commons, Pancan Atlas Portal | |
| miRNA annotation from miRBase | RRID: SCR_003152 | |
| GFF3 for gene annotation | Ensembl v94 gene annotations | |
| GRCh38 cytoband coordinates | UCSC Genome Browser | |
| General miR-gene resource | ||
| GFF3 for miRNA stem-loop and mature strand annotations from miRBase | Kozomara | |
| PARADIGM Pathway Activity Level | Campbell | |
| PARADIGM Pathway Definition | Campbell | |
| PARADIGM for integrative pathway analysis | Vaske | |
| TargetScan for miRNA target detection | ||
| SeSAMe for DNA methylation data preprocessing | ||
| Resource website for the Ancestry-associated Molecular Correlates | ||