Literature DB >> 29629235

Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies.

Haohan Wang1, Bryon Aragam2, Eric P Xing2.   

Abstract

A fundamental and important challenge in modern datasets of ever increasing dimensionality is variable selection, which has taken on renewed interest recently due to the growth of biological and medical datasets with complex, non-i.i.d. structures. Naïvely applying classical variable selection methods such as the Lasso to such datasets may lead to a large number of false discoveries. Motivated by genome-wide association studies in genetics, we study the problem of variable selection for datasets arising from multiple subpopulations, when this underlying population structure is unknown to the researcher. We propose a unified framework for sparse variable selection that adaptively corrects for population structure via a low-rank linear mixed model. Most importantly, the proposed method does not require prior knowledge of individual relationships in the data and adaptively selects a covariance structure of the correct complexity. Through extensive experiments, we illustrate the effectiveness of this framework over existing methods. Further, we test our method on three different genomic datasets from plants, mice, and humans, and discuss the knowledge we discover with our model.

Entities:  

Keywords:  Applied computing → Genetics; Computational Ge-nomics; Computational genomics; Computing methodologies → Supervised learning; Confounding Correction; Information systems → Data mining; Linear Mixed Model; Sparsity; Variable Selection

Year:  2017        PMID: 29629235      PMCID: PMC5889139          DOI: 10.1109/BIBM.2017.8217687

Source DB:  PubMed          Journal:  Proceedings (IEEE Int Conf Bioinformatics Biomed)        ISSN: 2156-1125


  23 in total

1.  Best linear unbiased estimation and prediction under a selection model.

Authors:  C R Henderson
Journal:  Biometrics       Date:  1975-06       Impact factor: 2.571

2.  Principal components analysis corrects for stratification in genome-wide association studies.

Authors:  Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal:  Nat Genet       Date:  2006-07-23       Impact factor: 38.330

3.  Genome-wide genetic association of complex traits in heterogeneous stock mice.

Authors:  William Valdar; Leah C Solberg; Dominique Gauguier; Stephanie Burnett; Paul Klenerman; William O Cookson; Martin S Taylor; J Nicholas P Rawlins; Richard Mott; Jonathan Flint
Journal:  Nat Genet       Date:  2006-07-09       Impact factor: 38.330

4.  VARIABLE SELECTION IN LINEAR MIXED EFFECTS MODELS.

Authors:  Yingying Fan; Runze Li
Journal:  Ann Stat       Date:  2012-08-01       Impact factor: 4.028

5.  Source verification of mis-identified Arabidopsis thaliana accessions.

Authors:  Alison E Anastasio; Alexander Platt; Matthew Horton; Erich Grotewold; Randy Scholl; Justin O Borevitz; Magnus Nordborg; Joy Bergelson
Journal:  Plant J       Date:  2011-06-16       Impact factor: 6.417

6.  An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations.

Authors:  Vincent Segura; Bjarni J Vilhjálmsson; Alexander Platt; Arthur Korte; Ümit Seren; Quan Long; Magnus Nordborg
Journal:  Nat Genet       Date:  2012-06-17       Impact factor: 38.330

7.  Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines.

Authors:  Susanna Atwell; Yu S Huang; Bjarni J Vilhjálmsson; Glenda Willems; Matthew Horton; Yan Li; Dazhe Meng; Alexander Platt; Aaron M Tarone; Tina T Hu; Rong Jiang; N Wayan Muliyati; Xu Zhang; Muhammad Ali Amer; Ivan Baxter; Benjamin Brachi; Joanne Chory; Caroline Dean; Marilyne Debieu; Juliette de Meaux; Joseph R Ecker; Nathalie Faure; Joel M Kniskern; Jonathan D G Jones; Todd Michael; Adnane Nemri; Fabrice Roux; David E Salt; Chunlao Tang; Marco Todesco; M Brian Traw; Detlef Weigel; Paul Marjoram; Justin O Borevitz; Joy Bergelson; Magnus Nordborg
Journal:  Nature       Date:  2010-03-24       Impact factor: 49.962

8.  Population structure and eigenanalysis.

Authors:  Nick Patterson; Alkes L Price; David Reich
Journal:  PLoS Genet       Date:  2006-12       Impact factor: 5.917

Review 9.  The advantages and limitations of trait analysis with GWAS: a review.

Authors:  Arthur Korte; Ashley Farlow
Journal:  Plant Methods       Date:  2013-07-22       Impact factor: 4.993

10.  Genes and pathways underlying regional and cell type changes in Alzheimer's disease.

Authors:  Jeremy A Miller; Randall L Woltjer; Jeff M Goodenbour; Steve Horvath; Daniel H Geschwind
Journal:  Genome Med       Date:  2013-05-25       Impact factor: 11.117

View more
  8 in total

1.  Multiplex confounding factor correction for genomic association mapping with squared sparse linear mixed model.

Authors:  Haohan Wang; Xiang Liu; Yunpeng Xiao; Ming Xu; Eric P Xing
Journal:  Methods       Date:  2018-04-27       Impact factor: 3.608

2.  Trade-offs of Linear Mixed Models in Genome-Wide Association Studies.

Authors:  Haohan Wang; Bryon Aragam; Eric P Xing
Journal:  J Comput Biol       Date:  2022-02-25       Impact factor: 1.479

3.  Transition-transversion encoding and genetic relationship metric in ReliefF feature selection improves pathway enrichment in GWAS.

Authors:  M Arabnejad; B A Dawkins; W S Bush; B C White; A R Harkness; B A McKinney
Journal:  BioData Min       Date:  2018-11-03       Impact factor: 2.522

4.  Coupled mixed model for joint genetic analysis of complex disorders with two independently collected data sets.

Authors:  Haohan Wang; Fen Pei; Michael M Vanyukov; Ivet Bahar; Wei Wu; Eric P Xing
Journal:  BMC Bioinformatics       Date:  2021-02-05       Impact factor: 3.169

5.  Sparse Regression in Cancer Genomics: Comparing Variable Selection and Predictions in Real World Data.

Authors:  Robert J O'Shea; Sophia Tsoka; Gary Jr Cook; Vicky Goh
Journal:  Cancer Inform       Date:  2021-11-27

Review 6.  In Search of Biomarkers for Pathogenesis and Control of Leishmaniasis by Global Analyses of Leishmania-Infected Macrophages.

Authors:  Patricia Sampaio Tavares Veras; Pablo Ivan Pereira Ramos; Juliana Perrone Bezerra de Menezes
Journal:  Front Cell Infect Microbiol       Date:  2018-09-19       Impact factor: 5.293

7.  Discovering weaker genetic associations guided by known associations.

Authors:  Haohan Wang; Michael M Vanyukov; Eric P Xing; Wei Wu
Journal:  BMC Med Genomics       Date:  2020-02-24       Impact factor: 3.063

8.  Deep mixed model for marginal epistasis detection and population stratification correction in genome-wide association studies.

Authors:  Haohan Wang; Tianwei Yue; Jingkang Yang; Wei Wu; Eric P Xing
Journal:  BMC Bioinformatics       Date:  2019-12-27       Impact factor: 3.169

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.