Literature DB >> 33574440

Performance and efficiency of machine learning algorithms for analyzing rectangular biomedical data.

Fei Deng1, Jibing Huang1, Xiaoling Yuan2, Chao Cheng3,4, Lanjing Zhang5,6,7,8.   

Abstract

Most biomedical datasets, including those of 'omics, population studies, and surveys, are rectangular in shape and have few missing data. Recently, their sample sizes have grown significantly. Rigorous analyses on these large datasets demand considerably more efficient and more accurate algorithms. Machine learning (ML) algorithms have been used to classify outcomes in biomedical datasets, including random forests (RF), decision tree (DT), artificial neural networks (ANN), and support vector machine (SVM). However, their performance and efficiency in classifying multi-category outcomes of rectangular data are poorly understood. Therefore, we compared these metrics among the 4 ML algorithms. As an example, we created a large rectangular dataset using the female breast cancers in the surveillance, epidemiology, and end results-18 database, which were diagnosed in 2004 and followed up until December 2016. The outcome was the five-category cause of death, namely alive, non-breast cancer, breast cancer, cardiovascular disease, and other cause. We analyzed the 54 dichotomized features from ~45,000 patients using MatLab (version 2018a) and the tenfold cross-validation approach. The accuracy in classifying five-category cause of death with DT, RF, ANN, and SVM was 69.21%, 70.23%, 70.16%, and 69.06%, respectively, which was higher than the accuracy of 68.12% with multinomial logistic regression. Based on the features' information entropy, we optimized dimension reduction (i.e., reduce the number of features in models). We found 32 or more features were required to maintain similar accuracy, while the running time decreased from 55.57 s for 54 features to 25.99 s for 32 features in RF, from 12.92 s to 10.48 s in ANN, and from 175.50 s to 67.81 s in SVM. In summary, we here show that RF, DT, ANN, and SVM had similar accuracy for classifying multi-category outcomes in this large rectangular dataset. Dimension reduction based on information gain will increase the model's efficiency while maintaining classification accuracy.

Entities:  

Mesh:

Year:  2021        PMID: 33574440     DOI: 10.1038/s41374-020-00525-x

Source DB:  PubMed          Journal:  Lab Invest        ISSN: 0023-6837            Impact factor:   5.662


  1 in total

1.  Predicting long-term multicategory cause of death in patients with prostate cancer: random forest versus multinomial model.

Authors:  Jianwei Wang; Fei Deng; Fuqing Zeng; Andrew J Shanahan; Wei Vivian Li; Lanjing Zhang
Journal:  Am J Cancer Res       Date:  2020-05-01       Impact factor: 6.166

  1 in total
  5 in total

1.  Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: random forest and multinomial logistic regression models.

Authors:  Catherine H Feng; Mary L Disis; Chao Cheng; Lanjing Zhang
Journal:  Lab Invest       Date:  2021-09-18       Impact factor: 5.662

2.  Classify multicategory outcome in patients with lung adenocarcinoma using clinical, transcriptomic and clinico-transcriptomic data: machine learning versus multinomial models.

Authors:  Fei Deng; Lanlan Shen; He Wang; Lanjing Zhang
Journal:  Am J Cancer Res       Date:  2020-12-01       Impact factor: 6.166

3.  Energy Efficiency of Inference Algorithms for Clinical Laboratory Data Sets: Green Artificial Intelligence Study.

Authors:  Yi-Ju Tseng; Hsin-Yao Wang; Jia-Ruei Yu; Chun-Hsien Chen; Tsung-Wei Huang; Jang-Jih Lu; Chia-Ru Chung; Ting-Wei Lin; Min-Hsien Wu
Journal:  J Med Internet Res       Date:  2022-01-25       Impact factor: 5.428

4.  Changing Trends in the Proportional Incidence and Five-year Net Survival of Screened and Non-screened Breast Cancers among Women During 1995-2011 in England.

Authors:  Haiyan Wu; Kwok Wong; Shou-En Lu; John Broggio; Lanjing Zhang
Journal:  J Clin Transl Pathol       Date:  2022-03-18

5.  The Challenges and Opportunities of Translational Pathology.

Authors:  Lanjing Zhang
Journal:  J Clin Transl Pathol       Date:  2022-02-23
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.