| Literature DB >> 35656545 |
Liqiang Wang1,2, Jianping Lu3, Ying Song1,2, Jing Bai1,2, Wenjing Sun1,2, Jingcui Yu1,4, Mengdi Cai1,2, Songbin Fu1,2.
Abstract
DNA repair mechanisms have been proven to be essential for cells, and abnormalities in DNA repair could cause various diseases, such as cancer. However, the diversity and complexity of DNA repair mechanisms obscure the functions of DNA repair in cancers. In addition, the relationships between DNA repair, the tumor mutational burden (TMB), and immune infiltration are still ambiguous. In the present study, we evaluated the prognostic values of various types of DNA repair mechanisms and found that double-strand break repair through single-strand annealing (SSA) and nonhomologous end-joining (NHEJ) was the most prognostic DNA repair processes in gastric cancer (GC) patients. Based on the activity of these two approaches and expression profiles, we constructed a HR-LR model, which could accurately divide patients into high-risk and low-risk groups with different probabilities of survival and recurrence. Similarly, we also constructed a cancer-normal model to estimate whether an individual had GC or normal health status. The prognostic value of the HR-LR model and the accuracy of the cancer-normal model were validated in several independent datasets. Notably, low-risk samples, which had higher SSA and NHEJ activities, had more somatic mutations and less immune infiltration. Furthermore, the analysis found that low-risk samples had higher and lower methylation levels in CpG islands (CGIs) and open sea regions respectively, and had higher expression levels of programmed death-ligand 1 (PD-L1) and lower methylation levels in the promoter of the gene encoding PD-L1. Moreover, low-risk samples were characterized primarily by higher levels of CD4+ memory T cells, CD8+ naive T cells, and CD8+ TEM cells than those in high-risk samples. Finally, we proposed a decision tree and nomogram to help predict the clinical outcome of an individual. These results provide an improved understanding of the complexity of DNA repair, the TMB, and immune infiltration in GC, and present an accurate prognostic model for use in GC patients.Entities:
Keywords: DNA methylation; DNA repair; gastric cancer; immune infiltration; prognostic model; tumor mutational burden
Year: 2022 PMID: 35656545 PMCID: PMC9152153 DOI: 10.3389/fcell.2022.897096
Source DB: PubMed Journal: Front Cell Dev Biol ISSN: 2296-634X
Patient cohorts from TCGA and GEO databases.
| Cohort | Cancer samples | Normal samples | Recurrence | GPL |
|---|---|---|---|---|
| TCGA | 375 | 32 | r | - |
| GSE62254 | 300 | - | R | GPL570 |
| GSE26253 | 432 | - | R | GPL8432 |
| GSE84437 | 433 | - | - | GPL6947 |
| GSE26899 | 96 | - | R | GPL6947 |
| GSE15460 | 248 | - | - | GPL570 |
| GSE13861 | 65 | 25 | r | GPL6884 |
| GSE13911 | 38 | 31 | - | GPL570 |
| GSE33335 | 25 | 25 | - | GPL5175 |
| GSE66229 | 300 (GSE62254) | 100 | - | GPL570 |
In total, we obtained 375 cancer and 32 normal samples from TCGA, database. Screening samples with survival data and deleting those samples died in 10 days, we finally obtained 348 samples. Then we selected 242 samples as the training set randomly while the remaining 106 samples were as the test set.
“r” in the table represented the datasets with recurrence information.
The dataset GSE15460 included GSE15455, GSE15456, GSE15459, GSE15537, GSE22183, GSE34942 datasets. Deleting those cell line datasets and the datasets sequenced by GPL96, we finally obtained 248 samples, including GSE15459 and GSE34942.
Positive and negative genes used in the HR-LR model and Cancer-Normal model.
| Gene symbols | |
|---|---|
| Positive genes | BRIP1, CDC45, CDC7, CDCA2, CENPK, CLSPN, DDIAS, DLGAP5, DTL, E2F7, EZH2, FANCA, HELLS, HIST1H2AH, KIF11, KIF15, KIF18A, KIF23, KIF2C, KNTC1, LMNB1, MCM10, MND1, NCAPG, ORC1, PCNA, PLK4, POLE2, POLQ, POLR3G, RAD51AP1, RAD54L, RFC4, RRM2, TYMS, UHRF1, XRCC2 |
| Negative genes | ADCY5, APOD, C15orf59, C16orf89, C1QTNF2, C1QTNF7, CGNL1, CRYAB, DAAM2, DACT3, DCN, ELN, FAM110B, FMOD, GHR, GREM2, GSTM5, HSPA2, HSPB8, KCNK3, LRRN4CL, MFAP4, NDNF, NEGR1, NFATC4, PDE2A, PPP1R14A, PPP1R3C, SAMD11, SCN4B, SCUBE2, SLC22A17, SMARCD3, SRPX, TCEAL7, TMEM100, TMOD1, TNFAIP8L3, ZCCHC24 |
FIGURE 1SSA and NHEJ DNA repair approaches are primary protection factors for overall survival in GC patients. (A) The prognostic value of SSA and NHEJ DNA repair process evaluated by univariate Cox proportional-hazards regression model. (B,C) Samples with higher ssGSEA scores of SSA or NHEJ have better clinical outcomes in the TCGA cohort and GSE62254 dataset. (D) ssGESA scores of SSA and NHEJ processes increase first and then decline in normal, good outcome, and poor outcome samples. (E) Samples with a higher ssGSEA score of SSA or NHEJ are with more somatic mutations.
FIGURE 2Using HR-LR model to predict survival risk of samples. (A) Top panel shows the expression profile of the SSA and NHEJ-related marker genes in the TCGA training set. The middle panel shows the score of each training sample by the HR-LR model. The bottom panel represents the Kaplan–Meier survival plot of high-risk and low-risk samples in the training set. (B) Expression profile, sample score, and Kaplan–Meier survival plot for TCGA test samples. (C) The statistic of 76 marker genes. (D) Marker genes related to cancer hallmarks. (E) Low-risk samples are with significantly higher SSA and NHEJ ssGSEA scores in training and test sets. (F) Low-risk samples have significantly more somatic mutations. (G) Low-risk samples are with more alive, low stage, and complete response samples. (H) Correlation of SSA-NHEJ score with complete response. Samples were divided into ten groups according to their scores. Samples with higher scores are with a higher probability of complete response.
FIGURE 3The prognostic effect of the HR-LR model. (A) Multivariate Cox proportional-hazards regression result of SSA-NHEJ score and other clinical characteristics. (B) Validation of the prognostic effect of the SSA-NHEJ score in six GEO cohorts. n represents the number of samples in each GEO dataset.
FIGURE 4The predictive ability of SSA-NHEJ score for recurrence of GC patients. (A) New Tumor Event probability of patients in TCGA cohort. (B) Multivariate Cox proportional-hazards regression result for recurrence prognosis of the SSA-NHEJ score and other clinical characteristics. (C–F) Recurrence probability of patients in four GEO cohorts.
FIGURE 5Mutation information of high-risk samples and low-risk samples. (A,B) Statistic of variant classification and mutation type of samples with high-risk (top panel) and low-risk (bottom panel) survival. (C) Driver mutation genes in high-risk and low-risk samples (top and bottom panel respectively). (D) Mutational signatures were identified in high-risk and low-risk samples, respectively. The plot title indicates the best match against validated COSMIC signatures (left and right panel respectively).
FIGURE 6Immune-related score by “estimate”. (A–C) Correlation between SSA-NHEJ score and StromalScore (A), ImmuneScore (B), and ESTIMATEScore (C) in TCGA samples. R and P were calculated by the Pearson test. (D–F) Low-risk samples have lower StromalScore (D), ImmuneScore (E), and ESTIMATEScore (F) compared with high-risk samples. p values were calculated by Wilcoxon test.
FIGURE 7Immune-related score by “xCell”. (A) The infiltration of immune and stromal cell types as well as the immune-related scores in high-risk and low-risk samples. p values less than 0.05, 0.01, and 0.001 are marked with “*”, “**”, and “***”. (B) The heatmap of infiltration degree of those immune and stromal cell types with significantly different infiltration in high- and low-risk samples. Cell types marked with orange color represent higher infiltration in high-risk samples while cell types marked with green color represent higher infiltration in low-risk samples. Samples are sorted by SSA-NHEJ score. (C) Genome-wide hypermethylation in CGI regions and hypomethylation in open sea regions. (D) Promoter methylation and expression of PD-L1 in high-risk and low-risk samples. p values were evaluated by Wilcoxon test.
FIGURE 8Constructing Cancer-Normal model and predicting the status of samples. (A) SSA-NHEJ score of a normal, good outcome, and poor outcome samples. p values were evaluated by Wilcoxon test. (B) ROC curve of predicting TCGA STAD cancer and normal samples. 0.008 is selected to be the cutoff to predict a sample as normal or cancer status. (C) ROC curves of predicting sample status in four independent GEO cohorts. AUC values are listed. (D) Five measures of predictive effect for TCGA and four GEO cohorts are listed, including true positive rate, 1—false-positive rate, accuracy, precision, and F-measure.
FIGURE 9Evaluation of two models and combing clinical features to predict risk assessment for individuals. (A) The prognosis of 1000 random HR-LR models based on the expression of random gene sets. (B) AUC values of 1000 random Cancer-Normal models based on the expression of random gene sets. (C) Survival plot of TCGA STAD patients classified by stage and SSA-NHEJ score. (D) Decision tree to predict the patient's clinical outcome. The result of samples in the TCGA cohort is listed at to bottom. (E) A nomogram plot is constructed to quantify risk assessment for an individual patient.