Literature DB >> 31112896

Efficient learning from big data for cancer risk modeling: A case study with melanoma.

Aaron N Richter1, Taghi M Khoshgoftaar2.   

Abstract

BACKGROUND: Building cancer risk models from real-world data requires overcoming challenges in data preprocessing, efficient representation, and computational performance. We present a case study of a cloud-based approach to learning from de-identified electronic health record data and demonstrate its effectiveness for melanoma risk prediction.
METHODS: We used a hybrid distributed and non-distributed approach to computing in the cloud: distributed processing with Apache Spark for data preprocessing and labeling, and non-distributed processing for machine learning model training with scikit-learn. Moreover, we explored the effects of sampling the training dataset to improve computational performance. Risk factors were evaluated using regression weights as well as tree SHAP values.
RESULTS: Among 4,061,172 patients who did not have melanoma through the 2016 calendar year, 10,129 were diagnosed with melanoma within one year. A gradient-boosted classifier achieved the best predictive performance with cross-validation (AUC = 0.799, Sensitivity = 0.753, Specificity = 0.688). Compared to a model built on the original data, a dataset two orders of magnitude smaller could achieve statistically similar or better performance with less than 1% of the training time and cost.
CONCLUSIONS: We produced a model that can effectively predict melanoma risk for a diverse dermatology population in the U.S. by using hybrid computing infrastructure and data sampling. For this de-identified clinical dataset, sampling approaches significantly shortened the time for model building while retaining predictive accuracy, allowing for more rapid machine learning model experimentation on familiar computing machinery. A large number of risk factors (>300) were required to produce the best model.
Copyright © 2019 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  Big data; Cloud computing; Early detection of cancer; Electronic health records; Machine learning

Year:  2019        PMID: 31112896     DOI: 10.1016/j.compbiomed.2019.04.039

Source DB:  PubMed          Journal:  Comput Biol Med        ISSN: 0010-4825            Impact factor:   4.589


  6 in total

1.  Ability to Predict Melanoma Within 5 Years Using Registry Data and a Convolutional Neural Network: A Proof of Concept Study.

Authors:  Martin Gillstedt; Sam Polesie
Journal:  Acta Derm Venereol       Date:  2022-07-13       Impact factor: 3.875

2.  Predictive Analytics for Glaucoma Using Data From the All of Us Research Program.

Authors:  Sally L Baxter; Bharanidharan Radha Saseendrakumar; Paulina Paul; Jihoon Kim; Luca Bonomi; Tsung-Ting Kuo; Roxana Loperena; Francis Ratsimbazafy; Eric Boerwinkle; Mine Cicek; Cheryl R Clark; Elizabeth Cohn; Kelly Gebo; Kelsey Mayo; Stephen Mockrin; Sheri D Schully; Andrea Ramirez; Lucila Ohno-Machado
Journal:  Am J Ophthalmol       Date:  2021-01-23       Impact factor: 5.488

Review 3.  Machine Learning in Dermatology: Current Applications, Opportunities, and Limitations.

Authors:  Stephanie Chan; Vidhatha Reddy; Bridget Myers; Quinn Thibodeaux; Nicholas Brownstone; Wilson Liao
Journal:  Dermatol Ther (Heidelb)       Date:  2020-04-06

Review 4.  Requirements of Health Data Management Systems for Biomedical Care and Research: Scoping Review.

Authors:  Leila Ismail; Huned Materwala; Achim P Karduck; Abdu Adem
Journal:  J Med Internet Res       Date:  2020-07-07       Impact factor: 5.428

5.  Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data.

Authors:  Y-H Taguchi; Turki Turki
Journal:  Genes (Basel)       Date:  2020-12-11       Impact factor: 4.096

6.  DenseNet-II: an improved deep convolutional neural network for melanoma cancer detection.

Authors:  Aparna Sinha; Shivang Gupta; Nancy Girdhar
Journal:  Soft comput       Date:  2022-08-24       Impact factor: 3.732

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.