| Literature DB >> 34907266 |
Minsu Kim1, Sangseon Lee2, Sangsoo Lim3, Doh Young Lee4, Sun Kim5,6,7,8,9.
Abstract
Cervical lymph node metastasis is the leading cause of poor prognosis in oral tongue squamous cell carcinoma and also occurs in the early stages. The current clinical diagnosis depends on a physical examination that is not enough to determine whether micrometastasis remains. The transcriptome profiling technique has shown great potential for predicting micrometastasis by capturing the dynamic activation state of genes. However, there are several technical challenges in using transcriptome data to model patient conditions: (1) An Insufficient number of samples compared to the number of genes, (2) Complex dependence between genes that govern the cancer phenotype, and (3) Heterogeneity between patients between cohorts that differ geographically and racially. We developed a computational framework to learn the subnetwork representation of the transcriptome to discover network biomarkers and determine the potential of metastasis in early oral tongue squamous cell carcinoma. Our method achieved high accuracy in predicting the potential of metastasis in two geographically and racially different groups of patients. The robustness of the model and the reproducibility of the discovered network biomarkers show great potential as a tool to diagnose lymph node metastasis in early oral cancer.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34907266 PMCID: PMC8671417 DOI: 10.1038/s41598-021-03333-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Extracting subnetworks using graph embedding technique . This involves 1) generating an adjacency matrix from a given PPI network, 2) random work sampling from a given PPI graph, and 3) generating a word2vec representation of the sampled works to generate a dense representation of each gene.
Figure 2Subnetwork clustering using latent representation. This involves (1) applying a Gaussian mixture model to a given dense representation of the PPI network using a wide range of the number of components as parameters, (2) evaluating each model by calculating the BIC criterion, and (3) choosing the best model to create a subnetwork for a given PPI network.
Figure 3Constructing subnetwork level representation. This includes (1) calculating the sSAS representation for each optimized subnetwork and (2) integrating the representations into the subnetwork-level representation for each sample.
Figure 4Performance evaluation results for PAM50 subtype prediction in breast cancer. The color-coding indicates the actual class label of samples.
Figure 5Performance evaluation results for lymph node metastasis prediction in early oral tongue cancer. The color-coding indicates the actual class label of samples.
Detailed metrics for lymph node metastasis prediction in early oral tongue cancer.
| Measure | Value* | Description |
|---|---|---|
| 10 | # of True positive | |
| 19 | # of True negative | |
| 3 | # of False positive | |
| 1 | # of False negative | |
| Sensitivity | 0.9091 | |
| Specificity | 0.8636 | |
| Precision | 0.7692 | |
| Negative predictive value | 0.9500 | |
| False positive rate | 0.1364 | |
| False discovery rate | 0.2308 | |
| False negative rate | 0.0909 | |
| Accuracy | 0.8788 | |
| F1 score | 0.8333 |
* Note that the metrics are not class-balanced.
Attention map of the best model using the proposed method.
| Subnetwork1 | Attention2 (%) | Genes |
|---|---|---|
| EPITHELIAL_MESENCHYMAL | ||
| _TRANSITION_2 | 9.12 | BMP1,DAB2,FBLN5,GADD45A,GEM,LOXL1,LUM,SNAI2,TPM1, VEGFA |
| E2F_TARGETS_5 | 9.06 | AURKB,BRCA1,CCNE1,CDC20,CDKN2C,EXOSC8,GINS3,IPO7, MAD2L1,MCM7,POLA2,PRIM2,PTTG1,RAD1,RAD21,RANBP1, SYNCRIP,TK1,TUBB,XPO1 |
| MYOGENESIS_6 | 7.97 | ADAM12,CDKN1A,HSPB8,KIFC3,MB,MYOZ1,PSEN2,TNNC2, TNNT3,TPM3 |
| TNFA_SIGNALING_VIA_NFKB_6 | 7.45 | CCND1,CEBPB,CFLAR,ETS2,FOS,GADD45A,MAP3K8,MYC, NFE2L2,SMAD3,SPHK1,TNF,TRAF1,TRIB1 |
| MITOTIC_SPINDLE_2 | 4.78 | ARHGAP27,ARHGEF11,CENPE,CEP250,KIF4A,KIF5B,KIFAP3, LLGL1,LMNB1,RACGAP1,RASA1,TUBA4A |
Note that the top-5 subnetworks are listed.
1SUBNETWORK column indicates the name of subnetworks, where the prefix is the name of HGS geneset and the postfix (the integer number) is the index.
2ATTENTION column indicates the attention values generated by the best model of the proposed method averaged by all samples (including the TCGA and SNUH samples).