| Literature DB >> 32088392 |
Itay Sason1, Damian Wojtowicz2, Welles Robinson3, Mark D M Leiserson3, Teresa M Przytycka2, Roded Sharan4.
Abstract
The characterization of mutational processes in terms of their signatures of activity relies mostly on the assumption that mutations in a given cancer genome are independent of one another. Recently, it was discovered that certain segments of mutations, termed processive groups, occur on the same DNA strand and are generated by a single process or signature. Here we provide a first probabilistic model of mutational signatures that accounts for their observed stickiness and strand coordination. The model conditions on the observed strand for each mutation and allows the same signature to generate a run of mutations. It can both use known signatures or learn new ones. We show that this model provides a more accurate description of the properties of mutagenic processes than independent-mutation achieving substantially higher likelihood on held-out data. We apply this model to characterize the processivity of mutagenic processes across multiple types of cancer.Entities:
Keywords: Bioinformatics; Cancer; Quantitative Genetics
Year: 2020 PMID: 32088392 PMCID: PMC7038582 DOI: 10.1016/j.isci.2020.100900
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Figure 1Definitions and Conventions
The figure shows normal DNA, mutated DNA, representation and characteristics of DNA mutations, and different types of stickiness used in StickySig model variants.
(A) The genome consists of the reference strand (the strand whose 5′-end is on the short arm of the chromosome), also known as the Watson strand or the plus strand, and the complementary strand, also known as the Crick strand or the minus strand.
(B) In the mutated DNA, changes in DNA base pair sequence are shown in red.
(C) Single base-pair substitution (SBS) can be represented as a transition in which one base pair in normal DNA is replaced by another in mutated DNA. The reference allele refers to the nucleotide that is found in the reference strand of normal genomic DNA. The pyrimidine strand is the strand (+or –) containing the pyrimidine base (C or T) in normal DNA. It is usually not known which base in a pair was the source of a mutation; thus, the convention is to annotate mutations from the pyrimidine base of the mutated base pair, leading to 6 substitution types (when context is not considered) or to 96 possible combinations of substitution types and neighboring bases.
(D) The StickySig model can use several types of stickiness opportunities: all mutations can be sticky, same strand stickiness, mutations having a pyrimidine base in the normal DNA on the same strand as the previous mutation; same allele stickiness, mutations having the same reference allele as the previous mutation; same substitution stickiness, mutations having exactly the same base-pair substitution as the previous mutation; same mutation stickiness, mutations having the same mutation features (96 mutation category and pyrimidine strand) as the previous mutation; and none, no stickiness allowed that leads to MMM model. Other types of stickiness can be also considered.
Datasets Analyzed in This Study: Breast Cancer (BRCA), Malignant Lymphoma (MALY), and Chronic Lymphocytic Leukemia (CLLE)
| Cancer Type | #Samples | #Mutations | COSMIC Signatures |
|---|---|---|---|
| BRCA | 560 | 3,479,652 | 1, 2, 3, 5, 6, 8, 13, 17, 18, 20, 26, 30 |
| MALY | 100 | 1,220,526 | 1, 2, 5, 9, 13, 17 |
| CLLE | 100 | 270,870 | 1, 2, 5, 9, 13 |
Performance Evaluation of MMM and StickySig Variants in a Refitting Setting Using the Leave-One-Chromosome-Out (LOCO) Method
| Model | LOCO Log likelihood | ||
|---|---|---|---|
| BRCA | MALY | CLLE | |
| MMM | −13743198 | −5235042 | −1178024 |
| StickySig | −13739451 | −5232119 | −1177527 |
| StickySig-same-strand | −13736711 | −5233842 | −1177905 |
| StickySig-same-allele | −13696283 | −1173271 | |
| StickySig-same-substitution | −5206208 | ||
| StickySig-same-mutation | −13683356 | −5227289 | −1176916 |
In bold are the best values for each dataset.
Performance Evaluation of MMM and StickySig Variants in a De Novo Setting Using 10-Fold Sample Cross-Validation (CV)
| Model | 10-Fold CV Log likelihood | ||
|---|---|---|---|
| BRCA | MALY | CLLE | |
| MMM | −13713230 | −5200582 | −1167727 |
| StickySig | −13708734 | −5187470 | −1167238 |
| StickySig-same-strand | −13702999 | −5202680 | −1167628 |
| StickySig-same-allele | −13231739 | ||
| StickySig-same-substitution | −5020921 | −1128093 | |
| StickySig-same-mutation | −13597846 | −5187859 | −1165856 |
In bold are the best values for each dataset.
Figure 2Signatures Learning
Performance of signature learning by StickySig-same-allele (A) and StickySig-same-substitution (B) on the three cancer datasets. For each case, depicted are the cosine similarities of the learned signatures to known COSMIC signatures, sorted from highest to lowest and computed by a maximum matching algorithm to prevent repetitions.
Figure 3Stickiness in BRCA
(A) Relationship between processive group lengths (columns) and mutational signatures (rows) modeled by StickySig-same-allele. The size of each circle represents the number of groups (log10) observed for the specified group length and for each signature. The color of each circle corresponds to the p value of detecting a processive group of a given length in randomized data (-log10).
(B) The number of processive groups of length more than 10 for all signatures modeled by MMM (gray), StickySig (blue), and StickySig-same-allele (red).
(C) The total number of mutations can be sticky in StickySig (blue) and StickySig-same-allele (red).
(D) The total number of sticky mutations as modeled by StickySig (blue) and StickySig-same-allele (red).
(E) The number of sticky mutations for each signature as modeled by StickySig (blue) and StickySig-same-allele (red).
(F) Signature stickiness α as learned by StickySig (blue) and StickySig-same-allele (red). All bar plots show mean values with standard error of the mean (small black bars) from 10 random initializations of StickySig models.