| Literature DB >> 32107398 |
Gil Pinheiro1, Tania Pereira2, Catarina Dias1,3, Cláudia Freitas4,5, Venceslau Hespanhol4,5, José Luis Costa5,6,7, António Cunha1,8, Hélder P Oliveira1,9.
Abstract
EGFR and KRAS are the most frequently mutated genes in lung cancer, being active research topics in targeted therapy. The biopsy is the traditional method to genetically characterise a tumour. However, it is a risky procedure, painful for the patient, and, occasionally, the tumour might be inaccessible. This work aims to study and debate the nature of the relationships between imaging phenotypes and lung cancer-related mutation status. Until now, the literature has failed to point to new research directions, mainly consisting of results-oriented works in a field where there is still not enough available data to train clinically viable models. We intend to open a discussion about critical points and to present new possibilities for future radiogenomics studies. We conducted high-dimensional data visualisation and developed classifiers, which allowed us to analyse the results for EGFR and KRAS biological markers according to different combinations of input features. We show that EGFR mutation status might be correlated to CT scans imaging phenotypes; however, the same does not seem to hold for KRAS mutation status. Also, the experiments suggest that the best way to approach this problem is by combining nodule-related features with features from other lung structures.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32107398 PMCID: PMC7046701 DOI: 10.1038/s41598-020-60202-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Visualisation of sample distributions based on PCA and t-SNE. Each point is coloured according to its mutation status, with red dots and green crosses representing the wild type and mutated cases, respectively.
Figure 2Averaged ROC curve obtained for EGFR predictive model based on semantic features. For each of the N = 100 runs, the ROC curve is calculated. The blue line depicts the arithmetic average ROC curve and the shading the standard deviation. The red dashed lines indicate ROC curves of at-chance classifiers.
Classification results for EGFR and KRAS mutation status predictive models considering different sets of input features.
| Features | AUC (mean ± standard deviation) | |
|---|---|---|
| Radiomic | 0.5797 ± 0.1238 | 0.5087 ± 0.0104 |
| Semantic Nodule | 0.6542 ± 0.0953 | 0.4381 ± 0.0679 |
| Semantic Non-Nodule | 0.6831 ± 0.0890 | 0.4921 ± 0.0851 |
| Semantic Hybrid | 0.7458 ± 0.0877 | 0.5035 ± 0.0776 |
Figure 3Top 16 semantic features based on the importance scores of features, measured via XGBoost, for predicting the EGFR mutation status. Were represented the features that have an average importance score greater than a 0.02. For each of the N = 100 runs, the importance score is determined and the average and standard deviation is displayed in the bar graph.
Figure 4Overview of the process of feature extraction via Pyradiomics. First, medical images and segmentation masks are loaded into the software. This step allows to select the region of the tumour. Then, after filters have been applied to the original image, radiomic features are extracted from the ROI of the resultant images.