| Literature DB >> 30441551 |
Abstract
Pan-cancer analysis is a significant research topic in the past few years. Due to many advancing sequencing technologies, researchers possess more resources and knowledge to identify the key factors that could trigger cancer. Furthermore, since The Cancer Genome Atlas (TCGA) project launched, using machine learning (ML) techniques to analyze TCGA data has been recognized as a useful solution in the line of research. Therefore, this study uses RNA-sequencing data from TCGA and focuses on classifying thirty-three types of cancer patients. Five ML algorithms include decision tree (DT), k nearest neighbor (kNN), linear support vector machine (linear SVM), polynomial support vector machine (poly SVM), and artificial neural network (ANN) are conducted to compare the performances of their accuracies, training time, precisions, recalls, and F1-scores. The results show that linear SVM with a 95.8% accuracy rate is the best classifier in this study. Several critical and sophisticated data pre-processing experiments are also presented to clarify and to improve the performance of the built model.Entities:
Mesh:
Year: 2018 PMID: 30441551 DOI: 10.1109/EMBC.2018.8513521
Source DB: PubMed Journal: Annu Int Conf IEEE Eng Med Biol Soc ISSN: 2375-7477