Literature DB >> 34828357

Bioinformatic and Machine Learning Applications in Melanoma Risk Assessment and Prognosis: A Literature Review.

Emily Z Ma1, Karl M Hoegler1, Albert E Zhou1.   

Abstract

Over 100,000 people are diagnosed with cutaneous melanoma each year in the United States. Despite recent advancements in metastatic melanoma treatment, such as immunotherapy, there are still over 7000 melanoma-related deaths each year. Melanoma is a highly heterogenous disease, and many underlying genetic drivers have been identified since the introduction of next-generation sequencing. Despite clinical staging guidelines, the prognosis of metastatic melanoma is variable and difficult to predict. Bioinformatic and machine learning analyses relying on genetic, clinical, and histopathologic inputs have been increasingly used to risk stratify melanoma patients with high accuracy. This literature review summarizes the key genetic drivers of melanoma and recent applications of bioinformatic and machine learning models in the risk stratification of melanoma patients. A robustly validated risk stratification tool can potentially guide the physician management of melanoma patients and ultimately improve patient outcomes.

Entities:  

Keywords:  bioinformatics; deep learning; machine learning; melanoma; melanoma genomics

Mesh:

Year:  2021        PMID: 34828357      PMCID: PMC8621295          DOI: 10.3390/genes12111751

Source DB:  PubMed          Journal:  Genes (Basel)        ISSN: 2073-4425            Impact factor:   4.096


1. Introduction

Cutaneous melanoma is the most aggressive form of skin cancer and the fifth most common cancer in the United States [1]. The incidence of cutaneous melanoma has been rising in the past few decades, with over 100,000 new cases diagnosed in the United States each year [1]. Despite recent advancements in advanced melanoma therapy, including targeted therapy (e.g., BRAF/MEK inhibitors) and immunotherapy (e.g., PD-1 inhibitors), there are over 7000 melanoma-related deaths each year in the United States, as the most advanced stage melanoma patients have recurrence after initial therapy [1,2,3]. The major risk factors for cutaneous melanoma formation are ultraviolet (UV) exposure and genetic susceptibility. UV-induced DNA damage and oxidative stress can cause the malignant transformation of melanocytes [4]. A family history of melanoma is a strong risk factor for the disease, which has led to the significant growth of melanoma genomics research in the past two decades [5]. The bioinformatic analysis of genomic data has been widely used to identify potential genetics and signaling pathways associated with melanoma pathogenesis and metastasis. More recently, bioinformatic analyses, including machine learning, are increasingly utilized to predict prognosis, risk stratify, and ultimately inform personalized treatment in cutaneous melanoma. We conducted a literature review within PubMed and Google Scholar to provide an overview of bioinformatic and machine learning applications in melanoma prognostics and risk stratification. Given the massive catalog of bioinformatics and machine learning studies in the field of melanoma genomics and risk stratification, we attempt to summarize the currently established key drivers of melanoma that have utilized bioinformatics in its discovery. We also provide an overview of key findings, algorithms, and the predictive accuracy of recent studies applying bioinformatic and machine learning algorithms to melanoma risk stratification.

2. Bioinformatics in Melanoma Genomics

A melanoma is a heterogenous disease with numerous genetic determinants. Bioinformatic tools have been widely used to help understand the genetic drivers of melanoma and identify patient subgroups by specific genetic mutations to inform the management and development of therapies. Ras genes and CDKN2A were the earliest gene mutations identified in melanoma in the 1980s and 1990s (Figure 1) [6,7]. Ras genes are proto-oncogenes that are frequently mutated in cancers which encode a family of small G proteins, while CDKN2A encodes tumor suppressor proteins [8].
Figure 1

Key advances in melanoma genomic research. BI: bioinformatics, ML: machine learning.

In 2002, one of the first genomic studies identified mutations in BRAF, a regulator of cell survival, in 65% of malignant melanomas [9], which led to the development of BRAF inhibitors for BRAF mutant metastatic melanoma [10,11]. The arrival of next generation sequencing (NGS) in the early 2000s precipitated the profiling of the full melanoma genome [12]. Since then, whole-exome sequencing (WES) has characterized mutations in NF1, ARID2, PPP6C, rAC1, SNX31, TACC1, and STK19 related to melanoma development [13,14]. In 2015, the Cancer Genome Atlas Skin Cutaneous Melanoma (TCGA) used WES to confirm previously identified melanoma mutations in BRAF, NRAS, CDKN2A, TP53, and PTEN [15]. TCGA also identified MAP2K1, IDH1, RB1, and DDX3X mutations in melanoma [15]. Figure 1 summarizes the key milestones in melanoma genomic research. Recent whole-genome analyses of melanoma has also identified different mutated genes in cutaneous, acral, and mucosal melanoma, and highlighted mutations in the TERT promoter [16]. The TERT gene encodes the catalytic subunit of telomerase, an enzyme complex that regulates telomere length [16]. Additional genomic changes observed include changes in c-KIT, c-MET, and EGF receptors, and in MAPK and PI3K signaling pathways, which are important pathways for cell proliferation and survival [8]. The introduction of the high throughput analysis of biological information, particularly next-generation sequencing, has led to the rapid growth of genomic data [17]. As new genomic databases grow, additional genetic regulators of melanoma formation and progression are expected to be characterized in the future and potentially inform melanoma management.

3. Bioinformatics and Machine Learning in Melanoma Risk Assessment

Despite clinical staging guidelines, predicting the prognosis of melanoma is challenging due to its heterogenous nature. Bioinformatic tools have been widely used to analyze NGS data and help identify potential mutations associated with melanoma pathogenesis [18]. More recently, there have been increasing applications of bioinformatic analysis in melanoma risk stratification and the prediction of prognosis to inform treatment. Since the approval of systemic adjuvant therapies for stage III and stage IV melanoma, these therapies are now widely used following the resection of advanced melanoma. However, these systemic therapies are associated with frequent grade 3 or 4 adverse events, and are costly [19,20,21,22,23]. 2021 National Comprehensive Cancer Network (NCCN) guidelines currently do not recommend adjuvant therapy in stage I and II patients [24]. Patients with stage II melanoma have a 12% to 25% 10-year melanoma-specific mortality rate, and some stage II patients have worse survival than stage III patients [25,26]. As such, accurate prognostic tools to predict the probability of recurrence and survival are needed to risk stratify to better identify appropriate candidates for adjuvant treatment and level of surveillance.

3.1. Gene-Expression Profiling

The gene expression profiling of stage IV melanomas identified molecular subtypes with unique gene signatures that were correlated with different clinical outcomes [27]. This finding led to the development of a proprietary 31-gene expression profile (GEP) assay (Castle Biosciences) used to categorize the high- versus low-risk of metastases within five years of melanoma diagnosis [28,29]. One of the goals of 31-GEP testing was to determine the intensity of treatment and follow-up for melanoma patients. The clinical utility and performance of 31-GEP has varied, and needs to be further validated in prospective studies [30]. Zager et al. analyzed 523 primary melanoma tumors using 31-GEP and reported that 31-GEP identified 70% of stage I and II patients who ultimately developed distant metastasis [31]. Similarly, Gastman et al. found that 31-GEP accurately identified high-risk patients who are likely to recur or die of melanoma in low-risk subgroups (e.g., sentinel lymph node-negative disease, stage I and IIA) [32]. A meta-analysis reported that 31-GEP performance varied, and was a better predictor of recurrence in stage II disease than in stage I [33]. However, a separate study suggested that there is limited cost-benefit of 31-GEP utilization in stage IIIA melanoma due to the limited survival benefit of this tool for this patient subgroup [34]. Given the lack of clear evidence that 31-GEP improves outcomes in melanoma, an established prognostic tool is still needed to accurately identify high-risk patients.

3.2. Current Bioinformatics in Melanoma Risk Assessement

A bioinformatic analysis of genes and biomarkers has not only been used to help identify genes associated with melanoma survival and mortality, but also to predict melanoma metastasis and prognosis (summarized in Table 1).
Table 1

Summary of major studies in bioinformatic and machine learning risk stratification of melanoma.

PublicationMethodsKey Finding(s)PerformanceData
Arora et al. 2020 [39]Multiple machine learning algorithms (e.g., SVM 1, decision tree, random forest)Machine learning model based on clinicopathologic variables outperformed model based on GEP profiles or AJCC 1 staging in predicting OS 1 RNA expression data of cutaneous melanomas (CMs) (n = 458) from TCGA 1
Bellomo et al. 2020 [40]Machine learning logistic regression modelEpithelial-to-mesenchymal transition and melanosome function genes were associated with SLN 1 metastasis; model combining clinicopathologic and gene expression variables better predicted SLN metastases than model with clinicopathologic or gene expression variablesAUROC 1: 0.82 (clinicopathologic and gene expression model)Gene expression data of primary CMs (n = 754)
Brinker et al. 2021 [41]Artificial neural network (ANN)ANNs trained with H&E images not matched to SLN status had AUROC of 62% and may not be clinically relevant to predict SLN statusAUROC: 61.8% (matched), 55.0% (unmatched)Primary melanoma with positive SLN H&E slides (n = 291)
Cheng et al. 2015 [42]Multi-variate Cox regression analysisBRAF and MMP2 were prognostic biomarkers for stage I/II, while p27 is a biomarker for stage III/IV Primary (n = 148) and metastatic (n = 106) CMs
Farrow et al. 2021 [43]Multi-variate Cox regression analysis12 genes predicted RFS 1; increased TIGIT expression and decreased CXCL16 correlated with improved RFS RNA samples (n = 62) from SLN biopsies
Garg et al. 2021 [44]Random forest classifierMachine learning models trained with 121 metastasis associated genes performed better in predicting regional lymph node metastasis than models trained with clinical trained with clinical covariates or published prognostic signaturesPAUROC: 7.03 × 10−4 (combined model)RNA data of primary CMs (n = 204)
Huang et al. 2021 [45]Decision-tree algorithm (XGBoost)5-methylcytosine (m5c) signatures were used to predict CM prognosis; NSUN6 may be a marker for CM progression Transcriptomic data of CMs (n = 4761) from TCGA
Jiang et al. 2021 [36]GO 1 and KEGG 1 enrichment analysis, PPI network analysisIdentified 435 DEGs 1; FOXM1, EXO1, KIF20A, TPX2, and CDC20 were associated with reduced OS Gene expression data of CMs from UCSC Xena (n = 322) and GEO (n = 45)
Johannet et al. 2021 [46]Deep convolutional neural network (DCNN)Machine learning algorithm trained with histology and clinicodemographic variables predicted immunotherapy response (PFS 1) in advanced melanoma patients with AUC 1 of 0.800AUC: 0.800Advanced melanoma patients (n = 121)
Jönsson et al. 2010 [27]Unsupervised hierarchical clustering, two-group significance of microarray analysis (SAM), support tree analysisFour distinct subtypes with unique gene signatures are associated with different prognoses Global gene expression data of stage IV CMs (n = 57)
Lee et al. 2019 [47]Multi-variate Cox regression analysisPre-operative ctDNA predicts melanoma-specific survival in stage III melanoma Pre-operative ctDNA from stage III CM patients (n = 174)
Mancuso et al. 2021 [48]Multiple machine learning algorithms (e.g., logistic regression, SVM, decision tree, Gaussian naïve Bayes classifier)Machine learning algorithm classified early-stage melanoma patients with high and low risk of metastasis; select serum cytokines (e.g., IL-4, GM-CSG, DCD) and Breslow thickness were variables that best predicted metastasisAccuracy: 80% (Breslow thickness and serum markers model)Stage I and II melanoma patients (n = 323)
Segura et al. 2010 [38]SAM, KEGG enrichment analysis18 overexpressed miRNAs were significantly correlated with longer post-recurrence survivalAccuracy: 80.2%Total RNA of metastatic CMs (n = 59)
Sheng et al. 2020 [35]GO and KEGG enrichment analysis, PPI network analysisIdentified 258 DEGs as potential biomarkers of metastasis Gene expression data of primary (n = 109) and metastatic (n = 136) CMs from GEO
Shepelin et al. 2018 [49]Multiple machine learning algorithms (e.g., SVM, random forest)Identified 44 characteristic signaling pathways associated with melanoma metastasisAccuracy: 94% (SVM classifier)Transcriptomic data of primary and metastatic CMs (n = 478) from GEO
Wang et al. 2020 [37]GO enrichment analysis, PPI network analysisCD38 level was a diagnostic factor for CM; high CD38 expression correlated with higher OS Gene expression data of CD38 positive CMs from TCGA
Wei et al. 2018 [50]KEGG and GO enrichment analysis, PPI network analysis, SVM classifierAn SVM predictor for melanoma metastasis had greater than 94% prediction accuracy; 798 DEGs 1 were identifiedAccuracy: 94.4 to 100%Gene expression data of primary (n = 116) and metastatic (n = 296) CMs from GEO and TCGA
Wong et al. 2005 [51]NomogramA nomogram using clinicopathologic information accurately predicted the probability of a positive SLN in melanomaAccuracy: 69.4%SLN biopsies (n = 979)
Yang et al. 2018 [52]Two-way hierarchical clustering analysis, SVM classifier, random forest classifierSVM classifier of a 6 lncRNA signature risk-stratified patients with 85% accuracyAccuracy: 84.84% (two-way hierarchical clustering), 85.9% (SVM classifier)lncRNA data of primary CMs (n = 376) from TCGA
Zormpas-Petridis et al. 2019 [53]Spatially constrained-convolution neural network (SC-CNN)A novel multi-resolution hierarchical framework (SuperCRF) predicted survival based on histology features; SuperCRF had an 12% improvement in accuracy compared to state-of-art SC-CNN cell classifiersAccuracy: 84.63%Melanoma H&E slides (n = 151)

1 AJCC: American Joint Committee on Cancer; AUC: area under the curve; AUROC: area under the receiver operating characteristic; DEG: differentially expressed genes; GO: gene ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes; OS: overall survival; PFS: progression-free survival; RFS: recurrence -free survival; SLN: sentinel lymph node; SVM: support vector machine; TCGA: The Cancer Genome Atlas.

Several recent studies constructed protein-protein interaction (PPI) networks to identify hub genes in melanoma. Sheng et al. constructed a PPI network to analyze differentially expressed genes (DEGs) from the Gene Expression Omnibus (GEO) database [35]. The study identified DGS3, DSC3, PKP1, EVPL, IVL, FLG, SPRR1A, and SPRR1B as potential biomarkers that predict the metastases of cutaneous melanoma [35]. Another study constructed a PPI network from melanoma gene expression data from UCSC Xena and GEO and found FOXM1, EXO1, KIF20A, TPX2, and CDC20 as genes associated with reduced overall survival [36]. Results from Wang et al. indicated that high CD38 expression could be a diagnostic marker for melanoma, and found that higher CD38 expression levels resulted in improved survival probabilities compared to lower expression levels [37]. An analysis of miRNA expression from 59 melanoma metastases identified 18 miRNA signatures that were overexpressed and correlated with longer post-recurrence survival [38]. Furthermore, the study identified six miRNA signatures that were predictors of survival of stage III patients independent of American Joint Committee on Cancer (AJCC) staging [38]. Sentinel lymph nodes (SLNs) regulate anti-tumor immune responses, so Farrow et al. hypothesized that SLN gene expression could predict a recurrence risk in melanoma [43]. Immune-related genes from SLN biopsies were used to create a multivariate regression model to predict recurrence-free survival [39]. Twelve genes, including immune checkpoint TIGIT, accurately predicted RFS, and therefore could potentially inform patient selection for adjuvant therapy [39]. Several other prognostic biomarkers were identified with Cox regression analyses, including pre-operative circulating tumor DNA that have the potential to further enrich the stage IIIA population for high-risk adjuvant therapy candidates [42,47]. A logistic regression analysis was used to create a nomogram that predicted the probability of a positive SLN in melanoma based on tumor characteristics, such as tumor thickness, Clark level, ulceration, site, and patient sex and age [51]. The nomogram predicted the presence of SLN metastasis more accurately than the AJCC staging system and has been externally validated by three separate institutions [54,55,56].

3.3. Machine Learning in Melanoma Risk Asessement

Machine learning is the application of computer algorithms with the aim to optimize the predictive accuracy of the algorithm [57,58]. Machine learning algorithms are based on pattern recognition and are designed improve its behavior based on data or experience, without additional human intervention. These algorithms can be powerful tools to assist humans in the analysis of large, heterogenous data sets, such as genomic data sets. Machine learning research in dermatology has been primarily focused in developing image recognition tools for the binary classification of malignant melanoma [59]. Recently, there are a growing number of machine learning studies that aim to risk stratify and predict prognosis in melanoma, with several models outperforming the current risk classification tools available (summarized in Table 1). Various machine learning algorithms were employed in the studies we reviewed, with neural networks, a support vector machine, and random forest classifier models as the more commonly utilized algorithms. Several studies were able to achieve an AUROC over 0.8, or accuracy greater than 80%, though there were no clear associations between the machine learning algorithm used and accuracy achieved. We do not compare the predictive abilities of these studies, as the models aimed to predict different outcomes. Gene expression datasets from GEO and TCGA were used to construct a PPI network that identified 798 genes associated with melanoma metastasis [50]. These genes were used as variables in a support vector machine (SVM) classifier that had a metastasis prediction accuracy ranging from 96% to 100% [50]. A separate study used gene expression data from 754 thin- and intermediate-thickness primary cutaneous melanomas to train logistic regression models to predict the presence of SLN metastases from molecular, clinical, and histologic variables. The study found that models using clinicopathologic or gene expression variables were outperformed by a model that included molecular variables along with clinicopathologic predictions (i.e., Breslow thickness and patient age) [40]. Arora et al. also incorporated clinicopathologic variables in their machine learning models and found that models using clinicopathological features (e.g., Breslow thickness, N staging, M staging, ulceration status) outperformed GEP-based profiles and AJCC staging in predicting melanoma prognostics [39]. Several studies have utilized machine learning to analyze large RNA datasets and identify correlations with melanoma prognosis with high degrees of accuracy. Yang et al. used multiple machine learning algorithms to analyze melanoma samples from TCGA. The study hypothesized that six long non-coding RNA (lncRNA) signatures may regulate the MAPK, immune and inflammation-related pathways, the neurotrophin signaling pathway, and focal adhesion pathways [52]. The six lncRNA signatures were identified and used in a machine learning classifier that risk-stratified melanoma patients with 85% accuracy [52]. A separate study of transcriptomic data from 478 primary and metastatic melanoma, nevi, and normal skin samples identified six novel associations between the activation of metabolic molecular signaling pathways and the progression of melanoma [49]. A differential expression analysis of primary tumors from 205 RNA-sequenced melanomas revealed 121 metastasis-associated gene signatures which were then used to train machine learning classification models. The machine learning models better predicted the likelihood of metastases than models trained with clinical covariates or published prognostic signatures [53]. The analysis of RNA transcriptome data from cutaneous melanoma from Huang et al. found 16 m5C-related proteins that (e.g., USUN6, NSUN6) were also predictors of melanoma prognosis [45]. Mancuso et al. analyzed levels of selected cytokines with machine learning to classify stage I and II melanoma patients with a high and low risk of metastasis. The study found that cytokines IL-4, GM-CSF, and CDC with the Breslow thickness best predicted melanoma metastasis [48]. Johannet et al. used deep learning on histology specimens with clinicodemographic variables to predict low versus high risk of progression after immunotherapy in advanced melanoma [46]. A separate computation pathology-based cell classification algorithm demonstrated that a high ratio of lymphocytes to all lymphocytes within the stromal compartment and a high ratio of stromal cells to all cells correlated with a poor survival in melanoma [53]. Histology slides from primary melanoma tumors with known SLN metastasis were used to train a machine learning model to predict SLN status, though the model achieved 61% accuracy and was not clinically relevant [41].

4. Conclusions

Cutaneous melanoma is a genetically heterogenous disease with many patient subgroups associated with different outcomes. There are currently no melanoma risk stratification tools that have been well validated and widely used. Bioinformatic analyses, particularly machine learning, have been internally validated to accurately risk stratify melanoma patients. However, bioinformatic tools will need to be externally validated to have clinical utility. Bioinformatic and machine learning analyses are growing rapidly in the field of melanoma, and we anticipate that continued research in melanoma risk stratification tools can potentially change future patient management and outcomes.
  56 in total

1.  Germline p16 mutations in familial melanoma.

Authors:  C J Hussussian; J P Struewing; A M Goldstein; P A Higgins; D S Ally; M D Sheahan; W H Clark; M A Tucker; N C Dracopoli
Journal:  Nat Genet       Date:  1994-09       Impact factor: 38.330

2.  Maximizing the clinical usefulness of a nomogram to select patients candidate to sentinel node biopsy for cutaneous melanoma.

Authors:  S Pasquali; S Mocellin; L G Campana; A Vecchiato; E Bonandini; M C Montesco; S Santarcangelo; G Zavagno; D Nitti; C R Rossi
Journal:  Eur J Surg Oncol       Date:  2011-06-16       Impact factor: 4.424

3.  A nomogram that predicts the presence of sentinel node metastasis in melanoma with better discrimination than the American Joint Committee on Cancer staging system.

Authors:  Sandra L Wong; Michael W Kattan; Kelly M McMasters; Daniel G Coit
Journal:  Ann Surg Oncol       Date:  2005-03-14       Impact factor: 5.344

Review 4.  Photocarcinogenesis: UVA vs. UVB radiation.

Authors:  Frank R de Gruijl
Journal:  Skin Pharmacol Appl Skin Physiol       Date:  2002 Sep-Oct

5.  Deep learning approach to predict sentinel lymph node status directly from routine histology of primary melanoma tumours.

Authors:  Titus J Brinker; Lennard Kiehl; Max Schmitt; Tanja B Jutzi; Eva I Krieghoff-Henning; Dieter Krahl; Heinz Kutzner; Patrick Gholam; Sebastian Haferkamp; Joachim Klode; Dirk Schadendorf; Achim Hekler; Stefan Fröhling; Jakob N Kather; Sarah Haggenmüller; Christof von Kalle; Markus Heppt; Franz Hilke; Kamran Ghoreschi; Markus Tiemann; Ulrike Wehkamp; Axel Hauschild; Michael Weichenthal; Jochen S Utikal
Journal:  Eur J Cancer       Date:  2021-07-20       Impact factor: 9.162

6.  Systematic review of machine learning for diagnosis and prognosis in dermatology.

Authors:  Kenneth Thomsen; Lars Iversen; Therese Louise Titlestad; Ole Winther
Journal:  J Dermatolog Treat       Date:  2019-10-31       Impact factor: 3.359

7.  Stage-specific prognostic biomarkers in melanoma.

Authors:  Yabin Cheng; Jing Lu; Guangdi Chen; Gholamreza Safaee Ardekani; Anand Rotte; Magdalena Martinka; Xuezhu Xu; Kevin J McElwee; Guohong Zhang; Youwen Zhou
Journal:  Oncotarget       Date:  2015-02-28

Review 8.  Historical perspective, development and applications of next-generation sequencing in plant virology.

Authors:  Marina Barba; Henryk Czosnek; Ahmed Hadidi
Journal:  Viruses       Date:  2014-01-06       Impact factor: 5.048

9.  Molecular pathway activation features linked with transition from normal skin to primary and metastatic melanomas in human.

Authors:  Denis Shepelin; Mikhail Korzinkin; Anna Vanyushina; Alexander Aliper; Nicolas Borisov; Raif Vasilov; Nikolay Zhukov; Dmitry Sokov; Vladimir Prassolov; Nurshat Gaifullin; Alex Zhavoronkov; Bhupinder Bhullar; Anton Buzdin
Journal:  Oncotarget       Date:  2016-01-05

10.  m5C-Related Signatures for Predicting Prognosis in Cutaneous Melanoma with Machine Learning.

Authors:  Maoxin Huang; Yi Zhang; Xiaohong Ou; Caiyun Wang; Xueqing Wang; Bibo Qin; Qiong Zhang; Jie Yu; Jianxiang Zhang; Jianbin Yu
Journal:  J Oncol       Date:  2021-08-04       Impact factor: 4.375

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.