| Literature DB >> 36194194 |
Jie Lian1, Jiajun Deng2, Edward S Hui3,4, Mohamad Koohi-Moghadam5, Yunlang She2, Chang Chen2, Varut Vardhanabhuti1.
Abstract
Background: We proposed a population graph with Transformer-generated and clinical features for the purpose of predicting overall survival (OS) and recurrence-free survival (RFS) for patients with early stage non-small cell lung carcinomas and to compare this model with traditional models.Entities:
Keywords: computational biology; computed tomography; lung cancer; medical imaging; medicine; none; prognostic model; survival; systems biology; transformer cnn
Mesh:
Year: 2022 PMID: 36194194 PMCID: PMC9531948 DOI: 10.7554/eLife.80547
Source DB: PubMed Journal: Elife ISSN: 2050-084X Impact factor: 8.713
Figure 1.Overall flow of the study in both internal and external data set.
Figure 2.Tumour image processing and feature generation.
(A) Tumour images normalisation, reshaping, and padding to standard sizes, then re-arranged into 2D images. (B) Generating 1D Transformer survival features from pretrained Transformer model.
Figure 3.Population graph building and model prediction pipeline.
(A) Each patient was regarded as a node and the Transformer-generated feature was regarded as node features. (B) Graph edges and the relevant weights were defined by their similarity scores. (C) We then put the whole population graph to train the GraphSAGE network in order to make a prediction for each patient (pink indicates high risk and blue indicates low risk). (D) Node updating inside the GraphSAGE network.
Feature distribution in the total patient cohorts, training and validation cohorts and the test cohorts.
| TRAIN and VAL(n=1492) | TEST(n=213) | EXTERNAL(n=127) | ||||
|---|---|---|---|---|---|---|
| Feature | Content | p | Mean, SD, 95% CI/Count, % | p | ||
| Age | Age | 60.6, 8.7, (CI: 60.1, 61.0) | 60.7, 9.5, (CI: 59.4, 62.0) | >0.05 | 68.7, 9.1, (CI: 67.2, 70.1) | <0.01** |
| Sex | Female no. (%); | 602 (33.3); 890 (66.7) | 93 (33.3); 120 (66.7) | >0.05 | 32 (25.2); 95 (74.8) | <0.01** |
| Resection | Sublobar resection no. (%); | 123 (8.2); | 23 (10.8); | >0.05 | / | / |
| Histology | Adenocarcinoma no. (%); | 1072 (71.4); | 163 (76.5); | >0.05 | 95 (74.8); | >0.05 |
| Tumour | LUL no. (%); | 384 (25.7); | 51 (23.9); | >0.05 | 30 (23.6); | >0.05 |
| Tumour size | Tumour size | 2.68, 1.38, | 2.55, 1.25, | >0.05 | / | / |
| pTNM stage | Stage I no. (%); | 1219 (81.7); | 179 (84.0); | >0.05 | 97 (76.3); | <0.01** |
| RFS status | RFS no. (%) | 1089 (73.0) | 154 (72.3) | >0.05 | 75 (59.1) | >0.05 |
| RFS month | RFS month | 57.5, 24.5, | 58.4, 23.4, | >0.05 | 39.5, 26.9, | <0.01** |
| OS status | OS no. (survival %) | 1166 (78.2) | 167 (78.4) | >0.05 | 87 (68.5) | >0.05 |
| OS month | OS month | 62.4, 19.9, | 63.4, 18.4, | >0.05 | 44.8, 27.8, | <0.01** |
Figure 4.Model performance: (A) ROC-AUC curve on test data and external set for OS and (B) RFS prediction and (C) KM curve on test data set for OS and (D) RFS prediction.
(E) Decision curve on test data set for OS and RFS prediction. KM, Kaplan–Meier; OS, overall survival; RFS, recurrence-free survival; ROC-AUC, area under the receiver operator characteristic curve.
(A) KM curve on external data set for OS and (B) RFS prediction.
Figure 4—figure supplement 1.Kaplan-Meier survival analysis.
(A) KM curve on external data set for OS and (B) RFS prediction.
Figure 5.Testing set graph analysis.
(A) A visual representation of the whole cohort population graph of 1705 patients. (B) A visual representation of the testing sub-graph of 213 patients. (C) and (D) two sub-graphs containing challenging cases where the graphs contained both high- and low-risk patients. (E) Node features’ correlation heatmaps and edge weights distribution of patient No. 44: Each square represents a neighbour’s node features’ correlation coefficient, higher values (red colour) reveal closer relation with the target node; the box plot of 42 neighbours indicates that the high-risk neighbours (blue box) have higher edge weights median. (F) Node features’ correlation heatmaps and edge weights distribution of patient No. 182: The box plot of 25 neighbours indicates that the low-risk neighbours (orange box) have higher edge weights median.