| Literature DB >> 34179004 |
Yining Xu1, Xinran Cui1, Yadong Wang1.
Abstract
Tumor metastasis is the major cause of mortality from cancer. From this perspective, detecting cancer gene expression and transcriptome changes is important for exploring tumor metastasis molecular mechanisms and cellular events. Precisely estimating a patient's cancer state and prognosis is the key challenge to develop a patient's therapeutic schedule. In the recent years, a variety of machine learning techniques widely contributed to analyzing real-world gene expression data and predicting tumor outcomes. In this area, data mining and machine learning techniques have widely contributed to gene expression data analysis by supplying computational models to support decision-making on real-world data. Nevertheless, limitation of real-world data extremely restricted model predictive performance, and the complexity of data makes it difficult to extract vital features. Besides these, the efficacy of standard machine learning pipelines is far from being satisfactory despite the fact that diverse feature selection strategy had been applied. To address these problems, we developed directed relation-graph convolutional network to provide an advanced feature extraction strategy. We first constructed gene regulation network and extracted gene expression features based on relational graph convolutional network method. The high-dimensional features of each sample were regarded as an image pixel, and convolutional neural network was implemented to predict the risk of metastasis for each patient. Ten cross-validations on 1,779 cases from The Cancer Genome Atlas show that our model's performance (area under the curve, AUC = 0.837; area under precision recall curve, AUPRC = 0.717) outstands that of an existing network-based method (AUC = 0.707, AUPRC = 0.555).Entities:
Keywords: CNN; GCN; cancer metastasis; machine learning method; pan-cancer analysis
Year: 2021 PMID: 34179004 PMCID: PMC8220811 DOI: 10.3389/fcell.2021.675978
Source DB: PubMed Journal: Front Cell Dev Biol ISSN: 2296-634X
Number of positive and negative cases for each cancer type.
| Non-metastasis | Metastasis | |
| cases (0) | cases (1) | |
| Breast invasive carcinoma | 589 | 102 |
| Lung adenocarcinoma | 152 | 122 |
| Lung squamous cell carcinoma | 157 | 76 |
| Stomach adenocarcinoma | 151 | 99 |
| Skin cutaneous melanoma | 75 | 153 |
| Pancreatic adenocarcinoma | 21 | 82 |
| Total | 1,145 | 634 |
FIGURE 1Workflow (A) showing the input data matrix for directed relation-graph convolutional network (DR-GCN). Then, data were put into a DR-GCN feature extraction model. After feature extraction, we get a graph–structure data matrix as shown in panel (B). Then, this data matrix combined label were put into a convolutional neural network prediction model. Finally, we got the prediction result.
FIGURE 2Convolutional neural network structure and parameters.
Comparison of prediction models on pan-cancer data.
| AUROC | AUPRC | |
| DR-GCN-CNN | 0.8365 | 0.7164 |
| NetML-SVM | 0.6122 | 0.4837 |
| NetSML | 0.6396 | 0.6331 |
FIGURE 3Convolutional neural network model’s receiver operating characteristic curve and precision recall curve in 10-fold cross-validations.