Literature DB >> 33535967

GEOlimma: differential expression analysis and feature selection using pre-existing microarray data.

Liangqun Lu1,2, Kevin A Townsend2, Bernie J Daigle3,4.   

Abstract

BACKGROUND: Differential expression and feature selection analyses are essential steps for the development of accurate diagnostic/prognostic classifiers of complicated human diseases using transcriptomics data. These steps are particularly challenging due to the curse of dimensionality and the presence of technical and biological noise. A promising strategy for overcoming these challenges is the incorporation of pre-existing transcriptomics data in the identification of differentially expressed (DE) genes. This approach has the potential to improve the quality of selected genes, increase classification performance, and enhance biological interpretability. While a number of methods have been developed that use pre-existing data for differential expression analysis, existing methods do not leverage the identities of experimental conditions to create a robust metric for identifying DE genes.
RESULTS: In this study, we propose a novel differential expression and feature selection method-GEOlimma-which combines pre-existing microarray data from the Gene Expression Omnibus (GEO) with the widely-applied Limma method for differential expression analysis. We first quantify differential gene expression across 2481 pairwise comparisons from 602 curated GEO Datasets, and we convert differential expression frequencies to DE prior probabilities. Genes with high DE prior probabilities show enrichment in cell growth and death, signal transduction, and cancer-related biological pathways, while genes with low prior probabilities were enriched in sensory system pathways. We then applied GEOlimma to four differential expression comparisons within two human disease datasets and performed differential expression, feature selection, and supervised classification analyses. Our results suggest that use of GEOlimma provides greater experimental power to detect DE genes compared to Limma, due to its increased effective sample size. Furthermore, in a supervised classification analysis using GEOlimma as a feature selection method, we observed similar or better classification performance than Limma given small, noisy subsets of an asthma dataset.
CONCLUSIONS: Our results demonstrate that GEOlimma is a more effective method for differential gene expression and feature selection analyses compared to the standard Limma method. Due to its focus on gene-level differential expression, GEOlimma also has the potential to be applied to other high-throughput biological datasets.

Entities:  

Keywords:  DE prior probabilities; Differential expression; Feature selection; GEOlimma; Supervised classification

Year:  2021        PMID: 33535967     DOI: 10.1186/s12859-020-03932-5

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  22 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 2.  Monitoring gene expression using DNA microarrays.

Authors:  C A Harrington; C Rosenow; J Retief
Journal:  Curr Opin Microbiol       Date:  2000-06       Impact factor: 7.934

3.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

4.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

Authors:  Ron Edgar; Michael Domrachev; Alex E Lash
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

Review 5.  Applications of DNA microarrays in biology.

Authors:  Roland B Stoughton
Journal:  Annu Rev Biochem       Date:  2005       Impact factor: 23.643

Review 6.  The properties of high-dimensional data spaces: implications for exploring gene and protein expression data.

Authors:  Robert Clarke; Habtom W Ressom; Antai Wang; Jianhua Xuan; Minetta C Liu; Edmund A Gehan; Yue Wang
Journal:  Nat Rev Cancer       Date:  2008-01       Impact factor: 60.716

Review 7.  RNA-Seq: a revolutionary tool for transcriptomics.

Authors:  Zhong Wang; Mark Gerstein; Michael Snyder
Journal:  Nat Rev Genet       Date:  2009-01       Impact factor: 53.242

8.  Transcriptomics Signature from Next-Generation Sequencing Data Reveals New Transcriptomic Biomarkers Related to Prostate Cancer.

Authors:  Abedalrhman Alkhateeb; Iman Rezaeian; Siva Singireddy; Dora Cavallo-Medved; Lisa A Porter; Luis Rueda
Journal:  Cancer Inform       Date:  2019-03-13

Review 9.  Genome-wide host RNA signatures of infectious diseases: discovery and clinical translation.

Authors:  Harriet D Gliddon; Jethro A Herberg; Michael Levin; Myrsini Kaforou
Journal:  Immunology       Date:  2017-10-24       Impact factor: 7.397

10.  Identification of Biomarkers Based on Differentially Expressed Genes in Papillary Thyroid Carcinoma.

Authors:  Jun Han; Meijun Chen; Yihan Wang; Boxuan Gong; Tianwei Zhuang; Lingyu Liang; Hong Qiao
Journal:  Sci Rep       Date:  2018-07-02       Impact factor: 4.379

View more
  2 in total

1.  IGF2BP2 maybe a novel prognostic biomarker in oral squamous cell carcinoma.

Authors:  Xiangpu Wang; Haoyue Xu; Zuo Zhou; Siyuan Guo; Renji Chen
Journal:  Biosci Rep       Date:  2022-02-25       Impact factor: 3.840

2.  Identification and validation of the miRNA-mRNA regulatory network in fetoplacental arterial endothelial cells of gestational diabetes mellitus.

Authors:  Longkai He; Xiaotong Wang; Ya Jin; Weipeng Xu; Yi Guan; Jingchao Wu; Shasha Han; Guosheng Liu
Journal:  Bioengineered       Date:  2021-12       Impact factor: 3.269

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.