| Literature DB >> 32245405 |
Jessada Thutkawkorapin1, Jesper Eisfeldt1,2, Emma Tham1,2, Daniel Nilsson3,4.
Abstract
BACKGROUND: DNA damage accumulates over the course of cancer development. The often-substantial amount of somatic mutations in cancer poses a challenge to traditional methods to characterize tumors based on driver mutations. However, advances in machine learning technology can take advantage of this substantial amount of data.Entities:
Keywords: Cancer processes; Human cancer; Mutational signatures; Unsupervised learning
Mesh:
Year: 2020 PMID: 32245405 PMCID: PMC7118897 DOI: 10.1186/s12859-020-3451-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1pyCancerSig workflow diagram. The workflow consists of 4 steps. 1. Data preprocessing - The purpose of this step is to generate a list of variants. This step has to be performed by third party software. - Single nucleotide variant (SNV) - recommending MuTect2, otherwise Muse, VarScan2, or SomaticSniper. - Structural variant (SV) - dependency on FindSV. - Microsatellite instability (MSI) - dependency on MSIsensor. 2. Profiling (Feature extraction) - `cancersig profile` - The purpose of this step is to turn information generated in the first step into matrix features usable by the model in the next step. The output of this stage has similar format as https://cancer.sanger.ac.uk/cancergenome/assets/signatures_probabilities.txt, which consists of at least 3 columns. - Column 1, Variant type (Substitution Type in COSMIC). - Column 2, Variant subgroup (Trinucleotide in COSMIC). - Column 3, Feature ID (Somatic Mutation Type in COSMIC). - From column 4 onward, each column represents one sample. There are subcommand to be used for each type of genetic variation. - `cancersig feature snv` is for extraction single nucleotide variant feature. - `cancersig feature sv` is for extraction structural variant feature. - `cancersig feature msi` is for extraction microsatellite instability feature. - `cancersig feature merge` is for merging all feature profiles into one single profile ready to be used by the next step. 3. Deciphering mutational signatures - `cancersig signature decipher` - The purpose of this step is to use unsupervised learning model to find mutational signature components in the tumors. 4. Visualizing profiles - `cancersig signature visualize` - The purpose of this step is to visualize mutational signature component for each tumor
Fig. 2Example of a visualized tumor profile
Fig. 3Boxplot of tumor mutation burden of 4 sub-populations evaluated in this article. The y-axis represents tumor mutation burden (TMB), which is quantified as the total number of base substitutions in the SNV profile divided by 30 to give an estimate of SNVs per Mbps sequence. The box represents values from the lower to the upper quartile. The yellow line represents the median