| Literature DB >> 34930241 |
Hongchen Ji1,2, Junjie Li3, Qiong Zhang1, Jingyue Yang1, Juanli Duan4, Xiaowen Wang1, Ben Ma2, Zhuochao Zhang2, Wei Pan1, Hongmei Zhang5.
Abstract
BACKGROUND: Mutation processes leave different signatures in genes. For single-base substitutions, previous studies have suggested that mutation signatures are not only reflected in mutation bases but also in neighboring bases. However, because of the lack of a method to identify features of long sequences next to mutation bases, the understanding of how flanking sequences influence mutation signatures is limited.Entities:
Keywords: Cancer; Clinical feature; Mutation sequence; Prognosis; Unsupervised learning
Mesh:
Year: 2021 PMID: 34930241 PMCID: PMC8686331 DOI: 10.1186/s12920-021-01144-1
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1Training process of the LSTM-SOM model. a Sketch map of LSTM-SOM. b The clustering process. Two classifications were used for each training period. Ten classes of mutant sequences were obtained after 3 rounds and an extra round of training. Three of the eight dimensions in LSTM output vectors are shown in the space rectangular coordinate system
Fig. 2Mutation type and composition of flanking bases in different MBs. Each bar except for “Reference Allele” and “Mutation Allele” represents one flanking genetic locus. Bars on the left of “Reference Allele” represent bases on the 5’ end of the mutation site, and bars on the right of “Mutation Allele” represent bases on the 3’ end of the mutation site
Fig. 3Quantity and proportion of MBs in different cancers. a MBs in cancers of multiple organ origin. b MBs in cancer of different pathologic types. The left subgraph shows the proportion of different MBs in all SBS mutation data points from different kinds of cancers. The right subgraph shows the quantity and proportion of different MBs in patients. Differences in quantity are reflected in the size of the point, and differences in proportion are reflected in the color of the point
Fig. 4Relationship between patient survival and MB in genes with high mutation frequencies. The top 4 most frequently mutated genes are shown (other genes with high mutation frequencies are shown in Additional file 1: Fig. S5). For each gene, the left subgraph shows the p value of the log-rank test between groups in the whole population; and the right subgraph shows the p value of the log-rank test between groups of patients with different cancers with high incidence. Only p values less than 0.05 are shown in the heat map
Fig. 5MBs in patients with different clinical features. *: p < 0.05 in the t test or ANOVA between groups; **p < 0.005 in the t test or ANOVA between groups. The proportion is shown as the mean ± standard deviation, and error bars represent standard deviation
Fig. 6Differences in survival and clinical features between patients clustered according to MB composition. a Characteristics of MB composition in patients of 7 classes clustered by the K-means method; each line represents one patient. b Survivorship curve of each class of patients. c Log-rank test between classes; differences in the p value are reflected in color. d e: Clinical features of patients in different classes (*: p < 0.05 ANOVA or the chi-square test; **: p < 0.005 ANOVA or the chi-square test; error bars represent standard deviation)