Literature DB >> 33074331

A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes.

Christopher A Mancuso1, Jacob L Canfield1,2, Deepak Singla1,3, Arjun Krishnan1,2.   

Abstract

While there are >2 million publicly-available human microarray gene-expression profiles, these profiles were measured using a variety of platforms that each cover a pre-defined, limited set of genes. Therefore, key to reanalyzing and integrating this massive data collection are methods that can computationally reconstitute the complete transcriptome in partially-measured microarray samples by imputing the expression of unmeasured genes. Current state-of-the-art imputation methods are tailored to samples from a specific platform and rely on gene-gene relationships regardless of the biological context of the target sample. We show that sparse regression models that capture sample-sample relationships (termed SampleLASSO), built on-the-fly for each new target sample to be imputed, outperform models based on fixed gene relationships. Extensive evaluation involving three machine learning algorithms (LASSO, k-nearest-neighbors, and deep-neural-networks), two gene subsets (GPL96-570 and LINCS), and multiple imputation tasks (within and across microarray/RNA-seq datasets) establishes that SampleLASSO is the most accurate model. Additionally, we demonstrate the biological interpretability of this method by showing that, for imputing a target sample from a certain tissue, SampleLASSO automatically leverages training samples from the same tissue. Thus, SampleLASSO is a simple, yet powerful and flexible approach for harmonizing large-scale gene-expression data.
© The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2020        PMID: 33074331      PMCID: PMC7708069          DOI: 10.1093/nar/gkaa881

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  40 in total

1.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

Authors:  Ron Edgar; Michael Domrachev; Alex E Lash
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

Review 2.  DNA microarray technology: devices, systems, and applications.

Authors:  Michael J Heller
Journal:  Annu Rev Biomed Eng       Date:  2002-03-22       Impact factor: 9.590

3.  Biological impact of missing-value imputation on downstream analyses of gene expression profiles.

Authors:  Sunghee Oh; Dongwan D Kang; Guy N Brock; George C Tseng
Journal:  Bioinformatics       Date:  2010-11-02       Impact factor: 6.937

Review 4.  Missing value imputation for gene expression data: computational techniques to recover missing data from available information.

Authors:  Alan Wee-Chung Liew; Ngai-Fong Law; Hong Yan
Journal:  Brief Bioinform       Date:  2010-12-14       Impact factor: 11.622

Review 5.  Dealing with missing values in large-scale studies: microarray data imputation and beyond.

Authors:  Tero Aittokallio
Journal:  Brief Bioinform       Date:  2009-12-04       Impact factor: 11.622

6.  Conditional generative adversarial network for gene expression inference.

Authors:  Xiaoqian Wang; Kamran Ghasedi Dizaji; Heng Huang
Journal:  Bioinformatics       Date:  2018-09-01       Impact factor: 6.937

Review 7.  RNA-Seq: a revolutionary tool for transcriptomics.

Authors:  Zhong Wang; Mark Gerstein; Michael Snyder
Journal:  Nat Rev Genet       Date:  2009-01       Impact factor: 53.242

8.  Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes.

Authors:  Guy N Brock; John R Shaffer; Richard E Blakesley; Meredith J Lotz; George C Tseng
Journal:  BMC Bioinformatics       Date:  2008-01-10       Impact factor: 3.169

9.  Integrative missing value estimation for microarray data.

Authors:  Jianjun Hu; Haifeng Li; Michael S Waterman; Xianghong Jasmine Zhou
Journal:  BMC Bioinformatics       Date:  2006-10-12       Impact factor: 3.169

10.  Imputing gene expression to maximize platform compatibility.

Authors:  Weizhuang Zhou; Lichy Han; Russ B Altman
Journal:  Bioinformatics       Date:  2017-02-15       Impact factor: 6.937

View more
  2 in total

1.  Reconciling multiple connectivity scores for drug repurposing.

Authors:  Kewalin Samart; Phoebe Tuyishime; Arjun Krishnan; Janani Ravi
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

2.  Risk-score model to predict prognosis of malignant airway obstruction after interventional bronchoscopy.

Authors:  Minlin Jiang; Hao Xu; Dongmei Yu; Li Yang; Wenhui Wu; Hao Wang; Hui Sun; Jun Zhu; Wencheng Zhao; Qiyu Fang; Jia Yu; Peixin Chen; Shengyu Wu; Zixuan Zheng; Liping Zhang; Likun Hou; Huixian Zhang; Ye Gu; Yayi He
Journal:  Transl Lung Cancer Res       Date:  2021-07
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.