Literature DB >> 31756426

Robust identification of differentially expressed genes from RNA-seq data.

Md Shahjaman1, Md Manir Hossain Mollah2, Md Rezanur Rahman3, S M Shahinul Islam4, Md Nurul Haque Mollah5.   

Abstract

BACKGROUND: Identification of differentially expressed genes (DEGs) under two or more experimental conditions is an important task for elucidating the molecular basis of phenotypic variation. In the recent years, next generation sequencing (RNA-seq) has become very attractive and competitive alternative to the microarrays because of reducing the cost of sequencing and limitations of microarrays. A number of methods have been developed for detecting the DEGs from RNA-seq data. Most of these methods are based on either Poisson distribution or negative binomial (NB) distribution. However, identification of DEGs based on read count data using skewed distribution is inflexible and complicated of in presence of outliers or extreme values.
RESULTS: Most of the existing DEGs selection methods produce lower accuracies and higher false discoveries in presence of outliers. There are some robust approaches such as edgeR_robust and DEseq2 perform well in presence of outliers for large sample case. But they show weak performance for small-sample case, in presence of outliers. To address this issues an alternative approach has emerged by transforming the RNA-seq data into microarray like data. Among various transformation methods voom using limma pipeline is proven better for RNA-seq data. However, limma by voom transformation is sensitive to outliers for small-sample case. Therefore, in this paper, we robustify the voom approach using the minimum β-divergence method. We demonstrate the performance of the proposed method in a comparison of seven popular biomarkers selection methods: DEseq, DEseq2, SAMseq, Bayseq, limma (voom), edgeR and edgeR_robust using both simulated and real dataset. Both types of experimental results show that the performance of the proposed method improve over the competing methods, in presence of outliers and in absence of outliers it keeps almost equal performance with these methods.
CONCLUSION: We observe the improved performance of the proposed method from simulation and real RNA-seq count data analysis for both small-and large-sample cases, in presence of outliers. Therefore, our proposal is to use the proposed method instead of existing methods to obtain the better performance for selecting the DEGs.
Copyright © 2019 Elsevier Inc. All rights reserved.

Keywords:  DEGs; Log-cpm; RNA-sequence data; β-Weight function and robustness

Year:  2019        PMID: 31756426     DOI: 10.1016/j.ygeno.2019.11.012

Source DB:  PubMed          Journal:  Genomics        ISSN: 0888-7543            Impact factor:   5.736


  2 in total

1.  Transcriptomics of the depressed and PTSD brain.

Authors:  Jing Zhang; Alfred P Kaye; Jiawei Wang; Matthew J Girgenti
Journal:  Neurobiol Stress       Date:  2021-10-11

2.  Rank-in: enabling integrative analysis across microarray and RNA-seq for cancer.

Authors:  Kailin Tang; Xuejie Ji; Mengdi Zhou; Zeliang Deng; Yuwei Huang; Genhui Zheng; Zhiwei Cao
Journal:  Nucleic Acids Res       Date:  2021-09-27       Impact factor: 16.971

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.