| Literature DB >> 35547159 |
Ziyu Wang1,2,3, Tingting Zhang1,2,3, Wei Wu1,2,3, Lingxiang Wu1,2,3, Jie Li1,2,3, Bin Huang1,2,3, Yuan Liang1,2,3, Yan Li1,2,3, Pengping Li1,2,3, Kening Li1,2,3, Wei Wang4, Renhua Guo5, Qianghu Wang1,2,3.
Abstract
Accurate detection and location of tumor lesions are essential for improving the diagnosis and personalized cancer therapy. However, the diagnosis of lesions with fuzzy histology is mainly dependent on experiences and with low accuracy and efficiency. Here, we developed a logistic regression model based on mutational signatures (MS) for each cancer type to trace the tumor origin. We observed MS could distinguish cancer from inflammation and healthy individuals. By collecting extensive datasets of samples from ten tumor types in the training cohort (5,001 samples) and independent testing cohort (2,580 samples), cancer-type-specific MS patterns (CTS-MS) were identified and had a robust performance in distinguishing different types of primary and metastatic solid tumors (AUC:0.76 ∼ 0.93). Moreover, we validated our model in an Asian population and found that the AUC of our model in predicting the tumor origin of the Asian population was higher than 0.7. The metastatic tumor lesions inherited the MS pattern of the primary tumor, suggesting the capability of MS in identifying the tissue-of-origin for metastatic cancers. Furthermore, we distinguished breast cancer and prostate cancer with 90% accuracy by combining somatic mutations and CTS-MS from cfDNA, indicating that the CTS-MS could improve the accuracy of cancer-type prediction by cfDNA. In summary, our study demonstrated that MS was a novel reliable biomarker for diagnosing solid tumors and provided new insights into predicting tissue-of-origin.Entities:
Keywords: cancer biomarkers; cancer diagnosis; cancer localization; liquid biopsy; mutational signatures
Year: 2022 PMID: 35547159 PMCID: PMC9081532 DOI: 10.3389/fbioe.2022.883791
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
FIGURE 1Mutational signatures for cancer diagnosis. (A–C) The biological processes of accumulated mutations in healthy individuals (HI) and patients with ulcerative colitis (UC) and colitis-associated neoplasia (CAN). (D) The correlation between DataSet1 (TCGA) and DataSet2 (previous studies) based on MS in bladder cancer (BLCA), non-small cell lung cancer (NSCLC), pancreatic cancer (PAAD), breast cancer (BRCA), ovarian serous cystadenocarcinoma (OV), liver hepatocellular carcinoma (LIHC), and gastrointestinal cancer, including colorectal cancer (CRC), esophageal carcinoma (ESCA), and stomach adenocarcinoma (STAD). The darker the color, the higher the similarity. (E) Heatmaps of MS in BLCA (n = 412), NSCLC (n = 1,108), PAAD (n = 179), BRCA (n = 985), OV (n = 435), skin cutaneous melanoma (SKCM, n = 468), LIHC (n = 464), CRC (n = 398), ESCA (n = 184), and STAD (n = 439). The color indicates the average contribution of MS. The size of the dots indicates the fraction. Fraction: The proportion of samples with a mutational signature contribution of more than 0.06 in each cancer type as a proportion of the total samples. Contribution: Average contribution of each mutational signature in each cancer type.
FIGURE 2The effectiveness of the cancer diagnosis model based on the MS of the primary tumor. (A,B) AUC-curve of cancer diagnosis models in both training (A) and validation (B) cohort. Random classifiers, indicating the classification accuracies obtained by chance, are shown in gray. (C) The value of AUC and the number of patients in both training (left) and validation (right) cohort. (D) The model performance across different populations. The vertical axis is the AUC of model. The horizontal axis represents tumor type. Red represents European and American population in the training dataset; Green represents European and American population in the validation dataset; Blue represents Asian populations.
FIGURE 3The similarity between metastatic and primary tumors based on MS. (A) PCA based on the MS of matched primary and metastatic cancers. The red dots represent the primary lung cancer, the red triangles represent metastatic lung cancer, the blue dots represent the primary pancreatic cancer, and the blue triangles represent metastatic pancreatic cancer. The red dotted line indicates the distribution area of lung cancer, and the blue dotted line indicates the distribution area of pancreatic cancer. pri., primary cancer; met., metastatic cancer. (B) The similarity between primary cancer and metastatic cancer based on MS. The darker the color, the higher the similarity. The first line indicates the tumor type. Red represents lung cancer, and blue represents pancreatic cancer. The second line shows the origin of the sample. The same color indicates that the sample is from the same patient. (C) The correlation between primary and metastatic cancer based on MS in common cancer. The darker the color, the higher the similarity. The boxplot shows the similarity between the primary and metastatic tumors of the same tumor type and the similarity between the primary and metastatic tumors among different tumor types.
FIGURE 4Tracing the origin of metastatic tumor based on MS. The first column distinguishes whether it is primary liver cancer. The second column traces the origin of metastatic cancer. TP, true positive; FP, false positive; FN, false negative; TN, true negative. Indicated are sample numbers and detection rates in percentages.
FIGURE 5Distinguishing different cancer types based on the MS patterns and somatic mutations called from plasma ctDNA data. (A) The specific mutations in tissue (left) and ctDNA (right). Red indicates that mutation was detected. CRPC: prostate cancer; MBC: metastatic breast cancer. (B) The efficiency of MS and somatic mutations to distinguish lung cancer patients from benign lung nodules patients from ctDNA data. (C,D) The correlation between plasma and tissue in breast cancer based on gene expression (C) and MS (D). Orange indicates breast cancer-specific markers. (E,F) Combined MS patterns and somatic mutations called from plasma ctDNA data distinguish breast cancer and prostate cancer. The red points represent prostate cancer and the blue points represent breast cancer. The horizontal axis represents the score of the prostate cancer model and the vertical axis represents the score of the breast cancer model.