Literature DB >> 29112707

Effect of normalization methods on the performance of supervised learning algorithms applied to HTSeq-FPKM-UQ data sets: 7SK RNA expression as a predictor of survival in patients with colon adenocarcinoma.

Leili Shahriyari1.   

Abstract

MOTIVATION: One of the main challenges in machine learning (ML) is choosing an appropriate normalization method. Here, we examine the effect of various normalization methods on analyzing FPKM upper quartile (FPKM-UQ) RNA sequencing data sets. We collect the HTSeq-FPKM-UQ files of patients with colon adenocarcinoma from TCGA-COAD project. We compare three most common normalization methods: scaling, standardizing using z-score and vector normalization by visualizing the normalized data set and evaluating the performance of 12 supervised learning algorithms on the normalized data set. Additionally, for each of these normalization methods, we use two different normalization strategies: normalizing samples (files) or normalizing features (genes).
RESULTS: Regardless of normalization methods, a support vector machine (SVM) model with the radial basis function kernel had the maximum accuracy (78%) in predicting the vital status of the patients. However, the fitting time of SVM depended on the normalization methods, and it reached its minimum fitting time when files were normalized to the unit length. Furthermore, among all 12 learning algorithms and 6 different normalization techniques, the Bernoulli naive Bayes model after standardizing files had the best performance in terms of maximizing the accuracy as well as minimizing the fitting time. We also investigated the effect of dimensionality reduction methods on the performance of the supervised ML algorithms. Reducing the dimension of the data set did not increase the maximum accuracy of 78%. However, it leaded to discovery of the 7SK RNA gene expression as a predictor of survival in patients with colon adenocarcinoma with accuracy of 78%.
© The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  7SK RNA; TCGA HTSeq-FPKM-UQ data sets; colon adenocarcinoma; gene expression; normalization methods; supervised machine learning algorithms

Mesh:

Substances:

Year:  2019        PMID: 29112707     DOI: 10.1093/bib/bbx153

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  19 in total

1.  Co-occurrent Alterations of Alzheimer's Genes and Prostate Cancer Genes in Prostate Cancer.

Authors:  Steven Lehrer; Peter H Rheinstein
Journal:  Cancer Genomics Proteomics       Date:  2020 May-Jun       Impact factor: 4.069

2.  Transmissible ER stress between macrophages and tumor cells configures tumor microenvironment.

Authors:  Wei Wei; Yazhuo Zhang; Qiaoling Song; Qianyue Zhang; Xiaonan Zhang; Xinning Liu; Zhihua Wu; Xiaohan Xu; Yuting Xu; Yu Yan; Chenyang Zhao; Jinbo Yang
Journal:  Cell Mol Life Sci       Date:  2022-07-07       Impact factor: 9.207

3.  TumorDecon: A digital cytometry software.

Authors:  Rachel A Aronow; Shaya Akbarinejad; Trang Le; Sumeyye Su; Leili Shahriyari
Journal:  SoftwareX       Date:  2022-04-07

4.  Druggable genetic targets in endometrial cancer✰,✰✰.

Authors:  Steven Lehrer; Peter H Rheinstein
Journal:  Cancer Treat Res Commun       Date:  2021-12-17

5.  Increased expression of von Willebrand factor gene is associated with poorer survival in primary lower grade glioma.

Authors:  Steven Lehrer; Peter H Rheinstein; Kenneth E Rosenzweig
Journal:  Glioma       Date:  2018-08-30

6.  A cross-study analysis of drug response prediction in cancer cell lines.

Authors:  Fangfang Xia; Jonathan Allen; Prasanna Balaprakash; Thomas Brettin; Cristina Garcia-Cardona; Austin Clyde; Judith Cohn; James Doroshow; Xiaotian Duan; Veronika Dubinkina; Yvonne Evrard; Ya Ju Fan; Jason Gans; Stewart He; Pinyi Lu; Sergei Maslov; Alexander Partin; Maulik Shukla; Eric Stahlberg; Justin M Wozniak; Hyunseung Yoo; George Zaki; Yitan Zhu; Rick Stevens
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

7.  Alzheimer Gene BIN1 may Simultaneously Influence Dementia Risk and Androgen Deprivation Therapy Dosage in Prostate Cancer.

Authors:  Steven Lehrer; Peter H Rheinstein
Journal:  Am J Clin Oncol       Date:  2020-10       Impact factor: 2.787

8.  Defining housekeeping genes suitable for RNA-seq analysis of the human allograft kidney biopsy tissue.

Authors:  Zijie Wang; Zili Lyu; Ling Pan; Gang Zeng; Parmjeet Randhawa
Journal:  BMC Med Genomics       Date:  2019-06-17       Impact factor: 3.063

9.  von Willebrand Factor Gene Expression in Primary Lower Grade Glioma: Mutually Co-Occurring Mutations in von Willebrand Factor, ATRX, and TP53.

Authors:  Steven Lehrer; Peter H Rheinstein; Sheryl Green; Kenneth E Rosenzweig
Journal:  Brain Tumor Res Treat       Date:  2019-04

10.  BAP1 expression is prognostic in breast and uveal melanoma but not colon cancer and is highly positively correlated with RBM15B and USP19.

Authors:  Leili Shahriyari; Mohamed Abdel-Rahman; Colleen Cebulla
Journal:  PLoS One       Date:  2019-02-04       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.