| Literature DB >> 31673075 |
Sherry Bhalla1,2, Harpreet Kaur3, Anjali Dhall1, Gajendra P S Raghava4.
Abstract
The metastatic Skin Cutaneous Melanoma (SKCM) has been associated with diminished survival rates and high mortality rates worldwide. Thus, segregating metastatic melanoma from the primary tumors is crucial to employ an optimal therapeutic strategy for the prolonged survival of patients. The SKCM mRNA, miRNA and methylation data of TCGA is comprehensively analysed to recognize key genomic features that can segregate metastatic and primary tumors. Further, machine learning models have been developed using selected features to distinguish the same. The Support Vector Classification with Weight (SVC-W) model developed using the expression of 17 mRNAs achieved Area under the Receiver Operating Characteristic (AUROC) curve of 0.95 and an accuracy of 89.47% on an independent validation dataset. This study reveals the genes C7, MMP3, KRT14, LOC642587, CASP7, S100A7 and miRNAs hsa-mir-205 and hsa-mir-203b as the key genomic features that may substantially contribute to the oncogenesis of melanoma. Our study also proposes genes ESM1, NFATC3, C7orf4, CDK14, ZNF827, and ZSWIM7 as novel putative markers for cutaneous melanoma metastasis. The major prediction models and analysis modules to predict metastatic and primary tumor samples of SKCM are available from a webserver, CancerSPP ( http://webs.iiitd.edu.in/raghava/cancerspp/ ).Entities:
Mesh:
Substances:
Year: 2019 PMID: 31673075 PMCID: PMC6823463 DOI: 10.1038/s41598-019-52134-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The workflow of the study.
Performance measures of 17 mRNA expression based features (selected by SVC-L1 feature selection method) on training and independent validation dataset to classify metastatic from primary tumor samples applying various machine-learning algorithms (classifiers).
| Classifiers | Dataset | TP | FP | TN | FN | Sens (%) | Spec (%) | Acc (%) | MCC | AUROC |
|---|---|---|---|---|---|---|---|---|---|---|
| ETrees | Training | 268 | 12 | 69 | 22 | 92.41 | 85.19 | 90.84 | 0.75 | 0.95 |
| Validation | 67 | 5 | 16 | 7 | 90.54 | 76.19 | 87.37 | 0.65 | 0.94 | |
| KNN | Training | 269 | 9 | 72 | 21 | 92.76 | 88.89 | 91.91 | 0.78 | 0.95 |
| Validation | 66 | 5 | 16 | 8 | 89.19 | 76.19 | 86.32 | 0.62 | 0.93 | |
| RF | Training | 260 | 8 | 73 | 30 | 89.66 | 90.12 | 89.76 | 0.74 | 0.96 |
| Validation | 66 | 2 | 19 | 8 | 89.19 | 90.48 | 89.47 | 0.73 | 0.95 | |
| LR | Training | 261 | 8 | 73 | 29 | 90 | 90.12 | 90.03 | 0.74 | 0.97 |
| Validation | 65 | 2 | 19 | 9 | 87.84 | 90.48 | 88.42 | 0.71 | 0.95 | |
| RC | Training | 262 | 9 | 72 | 28 | 90.34 | 88.89 | 90.03 | 0.74 | 0.96 |
| Validation | 65 | 2 | 19 | 9 | 87.84 | 90.48 | 88.42 | 0.71 | 0.95 | |
| SVC-W | Training | 269 | 8 | 73 | 21 | 92.76 | 90.12 | 92.18 | 0.79 | 0.97 |
| Validation | 66 | 2 | 19 | 8 | 89.19 | 90.48 | 89.47 | 0.73 | 0.95 |
Etrees: Extra Trees Classifier; KNN: K-Nearest Neighbors Classifier; RF: Random Forest; LR: Logistic Regression; RC: Ridge Classifier; SVC-W: Support Vector Classification with weight factor; TP: True positive; FP: False Positive; TN: True Negative; FN: False Negative; Sens: Sensitivity; Spec: Specificity; Acc: Accuracy; MCC: Matthews Correlation Coefficient; AUROC: Area under the Receiver Operating Characteristic.
Figure 2The expression pattern of 17 genes selected using SVC-L1.
Figure 3The scatterplot3D view of tSNE dimension reduction of 17 selected features: (A) distribution of P1, P2, M1 and M2 samples; (B) distribution of P1, P2, M1 and M2 samples after removing 16 primary tumor samples (observed as distant metastatic in NTE file).
Figure 4The presence and absence of various features in different gene signatures developed for segregating metastatic samples from primary samples.
Performance measures of 12 mRNA expression features (selected using SVC-L1 feature selection method) to discriminate M1 from P1 on training and independent validation dataset by applying various machine-learning algorithms.
| Classifier | Dataset | TP | FP | TN | FN | Sens (%) | Spec (%) | Acc (%) | MCC | AUROC |
|---|---|---|---|---|---|---|---|---|---|---|
|
| Training | 170 | 10 | 71 | 7 | 96.05 | 87.65 | 93.41 | 0.85 | 0.96 |
| Validation | 44 | 3 | 18 | 1 | 97.78 | 85.71 | 93.94 | 0.86 | 0.91 | |
|
| Training | 175 | 10 | 71 | 2 | 98.87 | 87.65 | 95.35 | 0.89 | 0.95 |
| Validation | 43 | 4 | 17 | 2 | 95.56 | 80.95 | 90.91 | 0.79 | 0.92 | |
|
| Training | 155 | 7 | 74 | 22 | 87.57 | 91.36 | 88.76 | 0.76 | 0.96 |
| Validation | 37 | 2 | 19 | 8 | 82.22 | 90.48 | 84.85 | 0.69 | 0.93 | |
|
| Training | 174 | 8 | 73 | 3 | 98.31 | 90.12 | 95.74 | 0.9 | 0.98 |
| Validation | 43 | 2 | 19 | 2 | 95.56 | 90.48 | 93.94 | 0.86 | 0.93 | |
|
| Training | 175 | 10 | 71 | 2 | 98.87 | 87.65 | 95.35 | 0.89 | 0.97 |
| Validation | 44 | 3 | 18 | 1 | 97.78 | 85.71 | 93.94 | 0.86 | 0.95 | |
|
| Training | 173 | 7 | 74 | 4 | 97.74 | 91.36 | 95.74 | 0.9 | 0.98 |
| Validation | 43 | 2 | 19 | 2 | 95.56 | 90.48 | 93.94 | 0.86 | 0.94 |
Etrees: Extra Trees Classifier; KNN: K-Nearest Neighbors Classifier; RF: Random Forest; LR: Logistic Regression; RC: Ridge Classifier; SVC-W: Support Vector Classification with weight factor; TP: True positive; FP: False Positive; TN: True Negative; FN: False Negative; Sens: Sensitivity; Spec: Specificity; Acc: Accuracy; MCC: Matthews Correlation Coefficient; AUROC: Area under the Receiver Operating Characteristic.
Performance measures of 32 miRNA expression features (selected by WEKA-FCBF feature selection method) on training and independent validation to classify metastatic from primary samples dataset by applying various machine-learning algorithms.
| Classifier | Dataset | TP | FP | TN | FN | Sens (%) | Spec (%) | Acc (%) | MCC | AUROC |
|---|---|---|---|---|---|---|---|---|---|---|
| ETrees | Training | 250 | 14 | 62 | 28 | 89.93 | 81.58 | 88.14 | 0.67 | 0.92 |
| Validation | 63 | 5 | 14 | 8 | 88.73 | 73.68 | 85.56 | 0.59 | 0.88 | |
| KNN | Training | 234 | 12 | 64 | 44 | 84.17 | 84.21 | 84.18 | 0.61 | 0.91 |
| Validation | 60 | 3 | 16 | 11 | 84.51 | 84.21 | 84.44 | 0.61 | 0.89 | |
| RF | Training | 232 | 8 | 68 | 46 | 83.45 | 89.47 | 84.75 | 0.64 | 0.97 |
| Validation | 63 | 3 | 16 | 8 | 88.73 | 84.21 | 87.78 | 0.67 | 0.95 | |
| LR | Training | 241 | 14 | 62 | 37 | 86.69 | 81.58 | 85.59 | 0.62 | 0.93 |
| Validation | 59 | 3 | 16 | 12 | 83.1 | 84.21 | 83.33 | 0.59 | 0.87 | |
| RC | Training | 245 | 10 | 66 | 33 | 88.13 | 86.84 | 87.85 | 0.69 | 0.94 |
| Validation | 62 | 4 | 15 | 9 | 87.32 | 78.95 | 85.56 | 0.61 | 0.89 | |
| SVC-W | Training | 240 | 7 | 69 | 38 | 86.33 | 90.79 | 87.29 | 0.69 | 0.94 |
| Validation | 64 | 4 | 15 | 7 | 90.14 | 78.95 | 87.78 | 0.66 | 0.89 |
Etrees: Extra Trees Classifier; KNN: K-Nearest Neighbors Classifier; RF: Random Forest; LR: Logistic Regression; RC: Ridge Classifier; SVC-W: Support Vector Classification with weight factor; TP: True positive; FP: False Positive; TN: True Negative; FN: False Negative; Sens: Sensitivity; Spec: Specificity; Acc: Accuracy; MCC: Matthews Correlation Coefficient; AUROC: Area under the Receiver Operating Characteristic
Performance measures of 5 miRNA expression features (selected by SVC-L1 feature selection method) on training and independent validation dataset to classify metastatic from primary samples by applying various machine-learning algorithms.
| Classifier | Dataset | TP | FP | TN | FN | Sens (%) | Spec (%) | Acc (%) | MCC | AUROC |
|---|---|---|---|---|---|---|---|---|---|---|
| ETrees | Training | 227 | 18 | 58 | 51 | 81.65 | 76.32 | 80.51 | 0.52 | 0.88 |
| Validation | 59 | 3 | 16 | 12 | 83.1 | 84.21 | 83.33 | 0.59 | 0.86 | |
| KNN | Training | 226 | 15 | 61 | 52 | 81.29 | 80.26 | 81.07 | 0.54 | 0.88 |
| Validation | 57 | 3 | 16 | 14 | 80.28 | 84.21 | 81.11 | 0.56 | 0.84 | |
| RF | Training | 229 | 18 | 58 | 49 | 82.37 | 76.32 | 81.07 | 0.52 | 0.9 |
| Validation | 56 | 3 | 16 | 15 | 78.87 | 84.21 | 80 | 0.54 | 0.88 | |
| LR | Training | 241 | 14 | 62 | 37 | 86.69 | 81.58 | 85.59 | 0.62 | 0.93 |
| Validation | 59 | 3 | 16 | 12 | 83.1 | 84.21 | 83.33 | 0.59 | 0.87 | |
| RC | Training | 245 | 14 | 62 | 33 | 88.13 | 81.58 | 86.72 | 0.65 | 0.93 |
| Validation | 58 | 3 | 16 | 13 | 81.69 | 84.21 | 82.22 | 0.58 | 0.88 | |
| SVC-W | Training | 231 | 12 | 64 | 47 | 83.09 | 84.21 | 83.33 | 0.60 | 0.93 |
| Validation | 54 | 3 | 16 | 17 | 76.06 | 84.21 | 77.78 | 0.51 | 0.87 |
Etrees: Extra Trees Classifier; KNN: K-Nearest Neighbors Classifier; RF: Random Forest; LR: Logistic Regression; RC: Ridge Classifier; SVC-W: Support Vector Classification with weight factor; TP: True positive; FP: False Positive; TN: True Negative; FN: False Negative; Sens: Sensitivity; Spec: Specificity; Acc: Accuracy; MCC: Matthews Correlation Coefficient; AUROC: Area under the Receiver Operating Characteristic.
Performance measures of 38 features or average methylation of genes (features selected using WEKA-FCBF feature selection method) on training and independent validation dataset to classify metastatic from primary samples by applying various machine-learning algorithms.
| Classifier | Dataset | TP | FP | TN | FN | Sens (%) | Spec (%) | Acc (%) | MCC | AUROC |
|---|---|---|---|---|---|---|---|---|---|---|
| ETrees | Training | 248 | 17 | 65 | 41 | 85.81 | 79.27 | 84.37 | 0.6 | 0.89 |
| Validation | 65 | 8 | 13 | 9 | 87.84 | 61.9 | 82.11 | 0.49 | 0.87 | |
| KNN | Training | 224 | 19 | 63 | 65 | 77.51 | 76.83 | 77.36 | 0.47 | 0.83 |
| Validation | 54 | 6 | 15 | 20 | 72.97 | 71.43 | 72.63 | 0.38 | 0.82 | |
| RF | Training | 255 | 23 | 59 | 34 | 88.24 | 71.95 | 84.64 | 0.58 | 0.92 |
| Validation | 65 | 9 | 12 | 9 | 87.84 | 57.14 | 81.05 | 0.45 | 0.87 | |
| LR | Training | 221 | 17 | 65 | 68 | 76.47 | 79.27 | 77.09 | 0.48 | 0.84 |
| Validation | 58 | 6 | 15 | 16 | 78.38 | 71.43 | 76.84 | 0.44 | 0.85 | |
| RC | Training | 239 | 17 | 65 | 50 | 82.7 | 79.27 | 81.94 | 0.56 | 0.88 |
| Validation | 62 | 8 | 13 | 12 | 83.78 | 61.9 | 78.95 | 0.43 | 0.83 | |
| SVC-W | Training | 221 | 19 | 63 | 68 | 76.47 | 76.83 | 76.55 | 0.46 | 0.82 |
| Validation | 58 | 6 | 15 | 16 | 78.38 | 71.43 | 76.84 | 0.44 | 0.91 |
Etrees: Extra Trees Classifier; KNN: K-Nearest Neighbors Classifier; RF: Random Forest; LR: Logistic Regression; RC: Ridge Classifier; SVC-W: Support Vector Classification with weight factor; TP: True positive; FP: False Positive; TN: True Negative; FN: False Negative; Sens: Sensitivity; Spec: Specificity; Acc: Accuracy; MCC: Matthews Correlation Coefficient; AUROC: Area under the Receiver Operating Characteristic.
Performance measures of RNAseq, miRNAseq and methylation-seq ensemble features on training and independent validation dataset to classify metastatic from primary samples by applying SVC.
| Dataset | TP | FP | TN | FN | Sens (%) | Spec (%) | Acc (%) | MCC | AUROC |
|---|---|---|---|---|---|---|---|---|---|
| Training | 244 | 5 | 71 | 34 | 87.77 | 93.42 | 88.98 | 0.73 | 0.97 |
| Validation | 60 | 1 | 18 | 10 | 85.71 | 94.74 | 87.64 | 0.71 | 0.93 |
Etrees: Extra Trees Classifier; KNN: K-Nearest Neighbors Classifier; RF: Random Forest; LR: Logistic Regression; RC: Ridge Classifier; SVC-W: Support Vector Classification with weight factor; TP: True positive; FP: False Positive; TN: True Negative; FN: False Negative; Sens: Sensitivity; Spec: Specificity; Acc: Accuracy; MCC: Matthews Correlation Coefficient; AUROC: Area under the Receiver Operating Characteristic.
Performance measures of 37 mRNA expression features (selected using SVC-L1 feature selection method) to discriminate early from late stage primary tumors using leave one out cross validation by applying various machine-learning algorithms.
| Classifier | TP | FP | TN | FN | Sen (%) | Spec (%) | Acc (%) | MCC | AUROC |
|---|---|---|---|---|---|---|---|---|---|
| Etrees | 58 | 3 | 28 | 9 | 86.57 | 90.32 | 87.76 | 0.74 | 0.96 |
| KNN | 59 | 16 | 15 | 8 | 88.06 | 48.39 | 75.51 | 0.4 | 0.78 |
| RF | 64 | 5 | 26 | 3 | 95.52 | 83.87 | 91.84 | 0.81 | 0.96 |
| LR | 54 | 5 | 26 | 13 | 80.6 | 83.87 | 81.63 | 0.61 | 0.87 |
| RC | 53 | 8 | 23 | 14 | 79.1 | 74.19 | 77.55 | 0.51 | 0.83 |
| SVC-W | 62 | 9 | 22 | 5 | 92.54 | 70.97 | 85.71 | 0.66 | 0.88 |
Etrees: Extra Trees Classifier; KNN: K-Nearest Neighbors Classifier; RF: Random Forest; LR: Logistic Regression; RC: Ridge Classifier; SVC-W: Support Vector Classification with weight factor; TP: True positive; FP: False Positive; TN: True Negative; FN: False Negative; Sens: Sensitivity; Spec: Specificity; Acc: Accuracy; MCC: Matthews Correlation Coefficient; AUROC: Area under the Receiver Operating Characteristic.
Performance measures of 32 miRNA expression features (selected using SVC-L1 feature selection method) to discriminate early from late stage primary tumors using leave one out cross validation by applying various machine-learning algorithms.
| Classifier | TP | FP | TN | FN | Sen(%) | Spec(%) | Acc(%) | MCC | AUROC |
|---|---|---|---|---|---|---|---|---|---|
| Etrees | 52 | 3 | 27 | 9 | 85.25 | 90 | 86.81 | 0.72 | 0.93 |
| KNN | 56 | 2 | 28 | 5 | 91.8 | 93.33 | 92.31 | 0.83 | 0.96 |
| RF | 48 | 3 | 27 | 13 | 78.69 | 90 | 82.42 | 0.65 | 0.92 |
| LR | 55 | 0 | 30 | 6 | 90.16 | 100 | 93.41 | 0.87 | 0.99 |
| RC | 60 | 2 | 28 | 1 | 98.36 | 93.33 | 96.7 | 0.93 | 0.99 |
| SVC-W | 59 | 0 | 30 | 2 | 96.72 | 100 | 97.8 | 0.95 | 0.99 |
Etrees: Extra Trees Classifier; KNN: K-Nearest Neighbors Classifier; RF: Random Forest; LR: Logistic Regression; RC: Ridge Classifier; SVC-W: Support Vector Classification with weight factor; TP: True positive; FP: False Positive; TN: True Negative; FN: False Negative; Sens: Sensitivity; Spec: Specificity; Acc: Accuracy; MCC: Matthews Correlation Coefficient; AUROC: Area under the Receiver Operating Characteristic.