Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Efficient learning from big data for cancer risk modeling: A case study with melanoma.

Literature DB >> 31112896

Efficient learning from big data for cancer risk modeling: A case study with melanoma.

Aaron N Richter¹, Taghi M Khoshgoftaar².

Abstract

BACKGROUND: Building cancer risk models from real-world data requires overcoming challenges in data preprocessing, efficient representation, and computational performance. We present a case study of a cloud-based approach to learning from de-identified electronic health record data and demonstrate its effectiveness for melanoma risk prediction.
METHODS: We used a hybrid distributed and non-distributed approach to computing in the cloud: distributed processing with Apache Spark for data preprocessing and labeling, and non-distributed processing for machine learning model training with scikit-learn. Moreover, we explored the effects of sampling the training dataset to improve computational performance. Risk factors were evaluated using regression weights as well as tree SHAP values.
RESULTS: Among 4,061,172 patients who did not have melanoma through the 2016 calendar year, 10,129 were diagnosed with melanoma within one year. A gradient-boosted classifier achieved the best predictive performance with cross-validation (AUC = 0.799, Sensitivity = 0.753, Specificity = 0.688). Compared to a model built on the original data, a dataset two orders of magnitude smaller could achieve statistically similar or better performance with less than 1% of the training time and cost.
CONCLUSIONS: We produced a model that can effectively predict melanoma risk for a diverse dermatology population in the U.S. by using hybrid computing infrastructure and data sampling. For this de-identified clinical dataset, sampling approaches significantly shortened the time for model building while retaining predictive accuracy, allowing for more rapid machine learning model experimentation on familiar computing machinery. A large number of risk factors (>300) were required to produce the best model.

Entities: Disease Gene Species

Keywords: Big data; Cloud computing; Early detection of cancer; Electronic health records; Machine learning

Year: 2019 PMID： 31112896 DOI： 10.1016/j.compbiomed.2019.04.039

Source DB: PubMed Journal: Comput Biol Med ISSN： 0010-4825 Impact factor: 4.589

Keyword Cloud
Cited

6 in total

1. Ability to Predict Melanoma Within 5 Years Using Registry Data and a Convolutional Neural Network: A Proof of Concept Study.

Authors: Martin Gillstedt; Sam Polesie
Journal: Acta Derm Venereol Date: 2022-07-13 Impact factor: 3.875

2. Predictive Analytics for Glaucoma Using Data From the All of Us Research Program.

Authors: Sally L Baxter; Bharanidharan Radha Saseendrakumar; Paulina Paul; Jihoon Kim; Luca Bonomi; Tsung-Ting Kuo; Roxana Loperena; Francis Ratsimbazafy; Eric Boerwinkle; Mine Cicek; Cheryl R Clark; Elizabeth Cohn; Kelly Gebo; Kelsey Mayo; Stephen Mockrin; Sheri D Schully; Andrea Ramirez; Lucila Ohno-Machado
Journal: Am J Ophthalmol Date: 2021-01-23 Impact factor: 5.488

Efficient learning from big data for cancer risk modeling: A case study with melanoma.

1. Ability to Predict Melanoma Within 5 Years Using Registry Data and a Convolutional Neural Network: A Proof of Concept Study.

2. Predictive Analytics for Glaucoma Using Data From the All of Us Research Program.

Review 3. Machine Learning in Dermatology: Current Applications, Opportunities, and Limitations.

Review 4. Requirements of Health Data Management Systems for Biomedical Care and Research: Scoping Review.

5. Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data.

6. DenseNet-II: an improved deep convolutional neural network for melanoma cancer detection.