| Literature DB >> 33854823 |
Yeongjoo Kim1,2, Ji Wan Kang1,2, Junho Kang1,2, Eun Jung Kwon1,2, Mihyang Ha1,2, Yoon Kyeong Kim1,2, Hansong Lee1,2, Je-Keun Rhee3, Yun Hak Kim2,4.
Abstract
The tumor microenvironment (TME) within mucosal neoplastic tissue in oral cancer (ORCA) is greatly influenced by tumor-infiltrating lymphocytes (TILs). Here, a clustering method was performed using CIBERSORT profiles of ORCA data that were filtered from the publicly accessible data of patients with head and neck cancer in The Cancer Genome Atlas (TCGA) using hierarchical clustering where patients were regrouped into binary risk groups based on the clustering-measuring scores and survival patterns associated with individual groups. Based on this analysis, clinically reasonable differences were identified in 16 out of 22 TIL fractions between groups. A deep neural network classifier was trained using the TIL fraction patterns. This internally validated classifier was used on another individual ORCA dataset from the International Cancer Genome Consortium data portal, and patient survival patterns were precisely predicted. Seven common differentially expressed genes between the two risk groups were obtained. This new approach confirms the importance of TILs in the TME and provides a direction for the use of a novel deep-learning approach for cancer prognosis.Entities:
Keywords: Head and neck cancer; cibersort; deep learning; international cancer genome consortium; oral cancer; the cancer genome atlas; tumor microenvironment; tumor-infiltrating lymphocytes
Year: 2021 PMID: 33854823 PMCID: PMC8018482 DOI: 10.1080/2162402X.2021.1904573
Source DB: PubMed Journal: Oncoimmunology ISSN: 2162-4011 Impact factor: 8.110
Clinical characteristics of the cohort of patients with head and neck cancer in The Cancer Genome Atlas for which the oral cancer data were filtered
| Total | Low-risk group | High-risk group | ||
|---|---|---|---|---|
| Number (Percentage) | ||||
| Age (years) | 0–39 | 4(2.3) | 3(4.1) | 1(1.0) |
| 40–49 | 20(11.6) | 8(10.8) | 12(12.1) | |
| 50–59 | 58(33.5) | 27(36.5) | 31(31.3) | |
| 60–69 | 42(24.3) | 14(18.9) | 28(28.3) | |
| 70–79 | 34(19.7) | 15(20.3) | 19(19.2) | |
| 80+ | 15(8.7) | 7(9.5) | 8(8.1) | |
| Sex | Male | 124(71.7) | 53(71.6) | 71(71.7) |
| Female | 49(28.3) | 21(28.4) | 28(28.3) | |
| N stage | N0 | 75(43.4) | 34(45.9) | 41(41.4) |
| N1 | 28(16.2) | 8(10.8) | 20(20.2) | |
| N2 | 62(35.8) | 29(39.2) | 33(33.3) | |
| N3 | 2(1.2) | 0(0.0) | 2(2.0) | |
| NX | 6(3.5) | 3(4.1) | 3(3.0) | |
| T stage | T1 | 14(8.1) | 10(13.5) | 4(4.0) |
| T2 | 56(32.4) | 29(39.2) | 27(27.3) | |
| T3 | 33(19.1) | 15(20.3) | 18(18.2) | |
| T4 | 66(38.2) | 17(23.0) | 49(49.5) | |
| TX | 4(2.3) | 3(4.1) | 1(1.0) | |
| M stage | M0 | 166(96.0) | 68(91.9) | 98(99.0) |
| M1 | 2(1.2) | 2(2.7) | 0(0.0) | |
| MX | 5(2.9) | 4(5.4) | 1(1.0) | |
Figure 1.(a) Pipeline flowchart depicting the data preprocessing step. (b) Pipeline flowchart for processing the classifier establishment step, including the validation process using a deep neural network (DNN) classifier. GDAC, Genome Data Analysis Center; HNSC, head and neck cancer; RNA-seq, RNA sequencing; TIL, tumor-infiltrating lymphocyte; DEG, differentially expressed gene; DNN, deep neural network; RF, random forests; DT, decision tree; ICGC, International Cancer Genome Consortium; ORCA, oral cancer
Mutual information (MI), Normalized MI (NMI) and Adjusted MI (AMI) scores of potential clustering methods. The most acceptable scores among variated k values across each clustering/measuring method are highlighted.
| Consensus | Hierarchical | Hierarchical | Hierarchical | K-means | K-means | |
|---|---|---|---|---|---|---|
| 0.423562 | 0.405244 | 1.037712 | 0.426107 | 0.686712 | ||
| 0.414230 | 0.592863 | 0.818148 | 0.674453 | 0.642146 | ||
| 0.400079 | 0.589346 | 0.809483 | 0.671388 | 0.633973 |
Figure 2.(a) Kaplan‒Meier (k-m) plot of K-means clustering after cell-type identification by estimating relative subsets of RNA transcripts (k = 3 and n = 173). The yellow line (class 3) shows a distinct favorable survival pattern. (p-value = 0.26592) (b) K-M plot of Figure 2a regrouped by binary risk group. Groups corresponding to the blue and green lines in Figure 2a are merged into one high-risk group. (p-value = 0.01441)
Figure 3.Bar plots indicating the differences in the estimated LM22 fraction between the high and low survival risk groups. Each p-value is written above the bar plots (NS: p > .05, *: p ≤ 0.05, **: p ≤ 0.01, ***: p ≤ 0.001, and ****: p ≤ 0.0001). Y-axis indicates predicted fraction level of each cell subtype
Figure 4.Scalar visualization of the established deep neural network (DNN) classifier model over steps in the loss function (a) and accuracy with the training datasets (b), primary test set (c), and secondary test set (d)
Figure 5.Kaplan‒Meier survival plot of the predicted International Cancer Genome Consortium oral cancer dataset. (p-value: 0.00685)
Clinical characteristics of the risk-group predicted cohort of patients with oral cancer in the International Cancer Genome Consortium data
| Total | Low-risk group | High-risk group | ||
|---|---|---|---|---|
| Number (Percentage) | ||||
| Age (years) | 0–39 | 7(20.5) | 7(25.0) | 0(0.0) |
| 40–49 | 10(29.4) | 8(28.6) | 2(33.3) | |
| 50–59 | 11(32.4) | 10(35.7) | 1(16.7) | |
| 60–69 | 5(14.7) | 3(10.7) | 2(33.3) | |
| 70–79 | 1(2.9) | 0(0.0) | 1(16.7) | |
| 80+ | 0(0.0) | 0(0.0) | 0(0.0) | |
| Sex | Male | 28(82.4) | 22(78.6) | 6(100.0) |
| Female | 6(17.6) | 6(21.4) | 0(0.0) | |
| N stage | N0 | 8(23.5) | 8(28.6) | 0(0.0) |
| N1 | 17(50.0) | 12(42.9) | 5(83.3) | |
| N2 | 9(26.5) | 8(28.6) | 1(16.7) | |
| N3 | 0(0.0) | 0(0.0) | 0(0.0) | |
| NX | 0(0.0) | 0(0.0) | 0(0.0) | |
| T stage | T1 | 0(0.0) | 0(0.0) | 0(0.0) |
| T2 | 0(0.0) | 0(0.0) | 0(0.0) | |
| T3 | 2(5.9) | 2(7.1) | 0(0.0) | |
| T4 | 32(94.1) | 26(92.9) | 6(100.0) | |
| TX | 0(0.0) | 0(0.0) | 0(0.0) | |
| M stage | M0 | 34(100.0) | 28(100.0) | 6(100.0) |
| M1 | 0(0.0) | 0(0.0) | 0(0.0) | |
| MX | 0(0.0) | 0(0.0) | 0(0.0) | |