Literature DB >> 34174824

A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data.

Yan Zhou1, Bin Yang1, Junhui Wang2, Jiadi Zhu3, Guoliang Tian4.   

Abstract

BACKGROUND: Identifying differentially expressed genes between the same or different species is an urgent demand for biological and medical research. For RNA-seq data, systematic technical effects and different sequencing depths are usually encountered when conducting experiments. Normalization is regarded as an essential step in the discovery of biologically important changes in expression. The present methods usually involve normalization of the data with a scaling factor, followed by detection of significant genes. However, more than one scaling factor may exist because of the complexity of real data. Consequently, methods that normalize data by a single scaling factor may deliver suboptimal performance or may not even work.The development of modern machine learning techniques has provided a new perspective regarding discrimination between differentially expressed (DE) and non-DE genes. However, in reality, the non-DE genes comprise only a small set and may contain housekeeping genes (in same species) or conserved orthologous genes (in different species). Therefore, the process of detecting DE genes can be formulated as a one-class classification problem, where only non-DE genes are observed, while DE genes are completely absent from the training data.
RESULTS: In this study, we transform the problem to an outlier detection problem by treating DE genes as outliers, and we propose a scaling-free minimum enclosing ball (SFMEB) method to construct a smallest possible ball to contain the known non-DE genes in a feature space. The genes outside the minimum enclosing ball can then be naturally considered to be DE genes. Compared with the existing methods, the proposed SFMEB method does not require data normalization, which is particularly attractive when the RNA-seq data include more than one scaling factor. Furthermore, the SFMEB method could be easily extended to different species without normalization.
CONCLUSIONS: Simulation studies demonstrate that the SFMEB method works well in a wide range of settings, especially when the data are heterogeneous or biological replicates. Analysis of the real data also supports the conclusion that the SFMEB method outperforms other existing competitors. The R package of the proposed method is available at https://bioconductor.org/packages/MEB .

Entities:  

Keywords:  Differentially expressed genes; Minimum enclosing ball; RNA-seq data

Mesh:

Year:  2021        PMID: 34174824      PMCID: PMC8234728          DOI: 10.1186/s12864-021-07790-0

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


  27 in total

1.  Human housekeeping genes are compact.

Authors:  Eli Eisenberg; Erez Y Levanon
Journal:  Trends Genet       Date:  2003-07       Impact factor: 11.639

2.  The evolution of gene expression levels in mammalian organs.

Authors:  David Brawand; Magali Soumillon; Anamaria Necsulea; Philippe Julien; Gábor Csárdi; Patrick Harrigan; Manuela Weier; Angélica Liechti; Ayinuer Aximu-Petri; Martin Kircher; Frank W Albert; Ulrich Zeller; Philipp Khaitovich; Frank Grützner; Sven Bergmann; Rasmus Nielsen; Svante Pääbo; Henrik Kaessmann
Journal:  Nature       Date:  2011-10-19       Impact factor: 49.962

3.  Generalized core vector machines.

Authors:  Ivor Wai-Hung Tsang; James Tin-Yau Kwok; Jacek M Zurada
Journal:  IEEE Trans Neural Netw       Date:  2006-09

4.  An oncogenic KRAS2 expression signature identified by cross-species gene-expression analysis.

Authors:  Alejandro Sweet-Cordero; Sayan Mukherjee; Aravind Subramanian; Han You; Jeffrey J Roix; Christine Ladd-Acosta; Jill Mesirov; Todd R Golub; Tyler Jacks
Journal:  Nat Genet       Date:  2004-12-19       Impact factor: 38.330

5.  A scaling normalization method for differential expression analysis of RNA-seq data.

Authors:  Mark D Robinson; Alicia Oshlack
Journal:  Genome Biol       Date:  2010-03-02       Impact factor: 13.583

6.  Scaling up minimum enclosing ball with total soft margin for training on large datasets.

Authors:  Wenjun Hu; Fu-Lai Chung; Shitong Wang; Wenhao Ying
Journal:  Neural Netw       Date:  2012-10-03

Review 7.  From RNA-seq reads to differential expression results.

Authors:  Alicia Oshlack; Mark D Robinson; Matthew D Young
Journal:  Genome Biol       Date:  2010-12-22       Impact factor: 13.583

8.  A Hypothesis Testing Based Method for Normalization and Differential Expression Analysis of RNA-Seq Data.

Authors:  Yan Zhou; Guochang Wang; Jun Zhang; Han Li
Journal:  PLoS One       Date:  2017-01-10       Impact factor: 3.240

9.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors:  Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal:  Bioinformatics       Date:  2009-11-11       Impact factor: 6.937

10.  Prediction of human disease genes by human-mouse conserved coexpression analysis.

Authors:  Ugo Ala; Rosario Michael Piro; Elena Grassi; Christian Damasco; Lorenzo Silengo; Martin Oti; Paolo Provero; Ferdinando Di Cunto
Journal:  PLoS Comput Biol       Date:  2008-03-28       Impact factor: 4.475

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.