| Literature DB >> 32550006 |
Yu Amanda Guo1, Mei Mei Chang1, Anders Jacobsen Skanderup1.
Abstract
Recurrence and clustering of somatic mutations (hotspots) in cancer genomes may indicate positive selection and involvement in tumorigenesis. MutSpot performs genome-wide inference of mutation hotspots in non-coding and regulatory DNA of cancer genomes. MutSpot performs feature selection across hundreds of epigenetic and sequence features followed by estimation of position- and patient-specific background somatic mutation probabilities. MutSpot is user-friendly, works on a standard workstation, and scales to thousands of cancer genomes.Entities:
Keywords: Cancer genomics; Genome informatics
Year: 2020 PMID: 32550006 PMCID: PMC7275039 DOI: 10.1038/s41525-020-0133-4
Source DB: PubMed Journal: NPJ Genom Med ISSN: 2056-7944 Impact factor: 8.617
Fig. 1MutSpot analysis on 168 gastric cancer whole genomes.
a MutSpot analysis workflow. b, c For each analysis, MutSpot outputs three types of descriptive figures: a Manhattan plot, a feature importance plot of features evaluated by the background mutation model, and lollipop plots of the top hotspots. Figures produced by MutSpot from b a genome-wide analysis and c a CBS-specific analysis of 168 gastric cancer whole genomes. Hotspots with FDR <0.05 are labeled in magenta. d, e Comparison of the number of hotspots detected using MutSpot with the number of hotspots detected using other statistical approaches in d the genome-wide and e CBS-specific analyses.
Details of sequence, epigenetic and structural features that can be included in the MutSpot model.
| Feature | Feature detail | Rationale | Source |
|---|---|---|---|
| Sequence context (SNVs) | Identity of mutated base (A/T or C/G). Trinucleotide and penta-nucleotide contexts centered at the mutated base, and 1 bp and 2 bp left and right flanks of the mutated base. | Sequence context is a major covariate of mutation probability. Although previous studies typically considered trinucleotide contexts, mutation rates could be affected by wider sequence contexts[ | Computed from mutation data |
| Sequence context (indels) | Presence of poly-A/T or poly-C/G sequences longer than 5 bp at the indel site. | Long mononucleotide repeats could lead to artifacts in indel calling. | Computed from mutation data |
| TF-binding profiles | ChIP-Seq peak profiles of 132 TFs and 1 meta profile including peaks of all TFs from ENCODE cell lines. | TF-binding sites have elevated mutation rates in certain cancers due to impaired nucleotide excision repair. | Zerbino et al. [ |
| Replication timing | Mean replication timing profile of 13 ENCODE cell lines. | Replication timing is inversely correlated with mutation probability. | Hansen et al. [ |
| APOBEC editing sites | Predicted APOBEC editing sites. | Elevated mutation rates at APOBEC editing sites could lead to the formation of passenger hotspots. | Buisson et al.[ |
| Local mutation rate | Mutation rate of 100 kb nonoverlapping genomic bins. | To correct for additional unexplained regional variation in mutation rates. | Computed from mutation data |
| Individual mutation count | Mutation burden of individual tumors. | To account for intertumor heterogeneity. | Computed from mutation data |
| Tissue-specific epigenetic profile | Chromatin accessibility and modification profiles from matched tissue/cell type. | Epigenetic profiles from the cell of origin better predict the mutational landscape of tumors[ | Supplied by the user |
| COSMIC mutation signatures | Proportion of mutations contributed by a specific mutation signature for each tumor. | To further correct for specific mutational processes in the tumor cohort. | Supplied by the user |