| Literature DB >> 35947558 |
Oliver Ocsenas1,2, Jüri Reimand1,2,3.
Abstract
Somatic mutations in cancer genomes are associated with DNA replication timing (RT) and chromatin accessibility (CA), however these observations are based on normal tissues and cell lines while primary cancer epigenomes remain uncharacterised. Here we use machine learning to model megabase-scale mutation burden in 2,500 whole cancer genomes and 17 cancer types via a compendium of 900 CA and RT profiles covering primary cancers, normal tissues, and cell lines. CA profiles of primary cancers, rather than those of normal tissues, are most predictive of regional mutagenesis in most cancer types. Feature prioritisation shows that the epigenomes of matching cancer types and organ systems are often the strongest predictors of regional mutation burden, highlighting disease-specific associations of mutational processes. The genomic distributions of mutational signatures are also shaped by the epigenomes of matched cancer and tissue types, with SBS5/40, carcinogenic and unknown signatures most accurately predicted by our models. In contrast, fewer associations of RT and regional mutagenesis are found. Lastly, the models highlight genomic regions with overrepresented mutations that dramatically exceed epigenome-derived expectations and show a pan-cancer convergence to genes and pathways involved in development and oncogenesis, indicating the potential of this approach for coding and non-coding driver discovery. The association of regional mutational processes with the epigenomes of primary cancers suggests that the landscape of passenger mutations is predominantly shaped by the epigenomes of cancer cells after oncogenic transformation.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35947558 PMCID: PMC9365152 DOI: 10.1371/journal.pcbi.1010393
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.779
Fig 2Chromatin accessibility profiles of primary cancers are stronger predictors of regional mutagenesis.
A. Random forest models informed by CA profiles of primary cancers are more accurate predictors of regional mutation burden, compared to models informed by CA of normal tissues. Bar plot shows relative change in prediction accuracy (Δ adjusted R2) of random model regression models informed by CA profiles of primary cancers, compared to matching models informed by CA of normal tissues. Replication timing (RT) profiles are included in all models as reference. P-values of permutation tests and 95% confidence intervals from bootstrap analysis are shown. Accuracy values of models informed by cancer CA profiles are listed below the bars (adjusted R2). B. Examples of regional mutation burden predicted using models informed by CA profiles of cancer (top) vs. CA profiles of normal tissues (bottom). Scatterplots show model-predicted and observed mutation burden (X vs. Y-axis) in one-megabase regions. Prediction accuracy values are shown (bottom right).