Literature DB >> 21930246

Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: a case study.

Renaud Gaujoux1, Cathal Seoighe.   

Abstract

Heterogeneity in sample composition is an inherent issue in many gene expression studies and, in many cases, should be taken into account in the downstream analysis to enable correct interpretation of the underlying biological processes. Typical examples are infectious diseases or immunology-related studies using blood samples, where, for example, the proportions of lymphocyte sub-populations are expected to vary between cases and controls. Nonnegative Matrix Factorization (NMF) is an unsupervised learning technique that has been applied successfully in several fields, notably in bioinformatics where its ability to extract meaningful information from high-dimensional data such as gene expression microarrays has been demonstrated. Very recently, it has been applied to biomarker discovery and gene expression deconvolution in heterogeneous tissue samples. Being essentially unsupervised, standard NMF methods are not guaranteed to find components corresponding to the cell types of interest in the sample, which may jeopardize the correct estimation of cell proportions. We have investigated the use of prior knowledge, in the form of a set of marker genes, to improve gene expression deconvolution with NMF algorithms. We found that this improves the consistency with which both cell type proportions and cell type gene expression signatures are estimated. The proposed method was tested on a microarray dataset consisting of pure cell types mixed in known proportions. Pearson correlation coefficients between true and estimated cell type proportions improved substantially (typically from about 0.5 to approximately 0.8) with the semi-supervised (marker-guided) versions of commonly used NMF algorithms. Furthermore known marker genes associated with each cell type were assigned to the correct cell type more frequently for the guided versions. We conclude that the use of marker genes improves the accuracy of gene expression deconvolution using NMF and suggest modifications to how the marker gene information is used that may lead to further improvements.
Copyright © 2011 Elsevier B.V. All rights reserved.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21930246     DOI: 10.1016/j.meegid.2011.08.014

Source DB:  PubMed          Journal:  Infect Genet Evol        ISSN: 1567-1348            Impact factor:   3.342


  52 in total

1.  Matrix Factorization for Transcriptional Regulatory Network Inference.

Authors:  Michael F Ochs; Elana J Fertig
Journal:  IEEE Symp Comput Intell Bioinforma Comput Biol Proc       Date:  2012-05

2.  Differential DNA methylation patterns of homeobox genes in proximal and distal colon epithelial cells.

Authors:  Alan Barnicle; Cathal Seoighe; Aaron Golden; John M Greally; Laurence J Egan
Journal:  Physiol Genomics       Date:  2016-01-26       Impact factor: 3.107

3.  MMAD: microarray microdissection with analysis of differences is a computational tool for deconvoluting cell type-specific contributions from tissue samples.

Authors:  David A Liebner; Kun Huang; Jeffrey D Parvin
Journal:  Bioinformatics       Date:  2013-10-01       Impact factor: 6.937

Review 4.  An assessment of computational methods for estimating purity and clonality using genomic data derived from heterogeneous tumor tissue samples.

Authors:  Vinod Kumar Yadav; Subhajyoti De
Journal:  Brief Bioinform       Date:  2014-02-20       Impact factor: 11.622

5.  A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing.

Authors:  Karthik Devarajan; Guoli Wang; Nader Ebrahimi
Journal:  Mach Learn       Date:  2015-04-01       Impact factor: 2.940

6.  Epigenomic Deconvolution of Breast Tumors Reveals Metabolic Coupling between Constituent Cell Types.

Authors:  Vitor Onuchic; Ryan J Hartmaier; David N Boone; Michael L Samuels; Ronak Y Patel; Wendy M White; Vesna D Garovic; Steffi Oesterreich; Matt E Roth; Adrian V Lee; Aleksandar Milosavljevic
Journal:  Cell Rep       Date:  2016-11-15       Impact factor: 9.423

7.  A UNIFIED STATISTICAL FRAMEWORK FOR SINGLE CELL AND BULK RNA SEQUENCING DATA.

Authors:  Lingxue Zhu; Jing Lei; Bernie Devlin; Kathryn Roeder
Journal:  Ann Appl Stat       Date:  2018-03-09       Impact factor: 2.083

Review 8.  Computational solutions for omics data.

Authors:  Bonnie Berger; Jian Peng; Mona Singh
Journal:  Nat Rev Genet       Date:  2013-05       Impact factor: 53.242

9.  Computational deconvolution of gene expression by individual host cellular subsets from microarray analyses of complex, parasite-infected whole tissues.

Authors:  Nirad Banskota; Justin I Odegaard; Gabriel Rinaldi; Michael H Hsieh
Journal:  Int J Parasitol       Date:  2016-03-26       Impact factor: 3.981

10.  Analysis of the small RNA transcriptional response in multidrug-resistant Staphylococcus aureus after antimicrobial exposure.

Authors:  Benjamin P Howden; Marie Beaume; Paul F Harrison; David Hernandez; Jacques Schrenzel; Torsten Seemann; Patrice Francois; Timothy P Stinear
Journal:  Antimicrob Agents Chemother       Date:  2013-06-03       Impact factor: 5.191

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.