Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies.

Literature DB >> 29629235

Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies.

Haohan Wang¹, Bryon Aragam², Eric P Xing².

Abstract

A fundamental and important challenge in modern datasets of ever increasing dimensionality is variable selection, which has taken on renewed interest recently due to the growth of biological and medical datasets with complex, non-i.i.d. structures. Naïvely applying classical variable selection methods such as the Lasso to such datasets may lead to a large number of false discoveries. Motivated by genome-wide association studies in genetics, we study the problem of variable selection for datasets arising from multiple subpopulations, when this underlying population structure is unknown to the researcher. We propose a unified framework for sparse variable selection that adaptively corrects for population structure via a low-rank linear mixed model. Most importantly, the proposed method does not require prior knowledge of individual relationships in the data and adaptively selects a covariance structure of the correct complexity. Through extensive experiments, we illustrate the effectiveness of this framework over existing methods. Further, we test our method on three different genomic datasets from plants, mice, and humans, and discuss the knowledge we discover with our model.

Entities: Chemical Disease Gene Mutation Species

Keywords: Applied computing → Genetics; Computational Ge-nomics; Computational genomics; Computing methodologies → Supervised learning; Confounding Correction; Information systems → Data mining; Linear Mixed Model; Sparsity; Variable Selection

Year: 2017 PMID： 29629235 PMCID： PMC5889139 DOI： 10.1109/BIBM.2017.8217687

Source DB: PubMed Journal: Proceedings (IEEE Int Conf Bioinformatics Biomed) ISSN： 2156-1125

23 in total

1. Best linear unbiased estimation and prediction under a selection model.

Authors: C R Henderson
Journal: Biometrics Date: 1975-06 Impact factor: 2.571

2. Principal components analysis corrects for stratification in genome-wide association studies.

Authors: Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal: Nat Genet Date: 2006-07-23 Impact factor: 38.330

3. Genome-wide genetic association of complex traits in heterogeneous stock mice.

Authors: William Valdar; Leah C Solberg; Dominique Gauguier; Stephanie Burnett; Paul Klenerman; William O Cookson; Martin S Taylor; J Nicholas P Rawlins; Richard Mott; Jonathan Flint
Journal: Nat Genet Date: 2006-07-09 Impact factor: 38.330

4. VARIABLE SELECTION IN LINEAR MIXED EFFECTS MODELS.

Authors: Yingying Fan; Runze Li
Journal: Ann Stat Date: 2012-08-01 Impact factor: 4.028

5. Source verification of mis-identified Arabidopsis thaliana accessions.

Authors: Alison E Anastasio; Alexander Platt; Matthew Horton; Erich Grotewold; Randy Scholl; Justin O Borevitz; Magnus Nordborg; Joy Bergelson
Journal: Plant J Date: 2011-06-16 Impact factor: 6.417

6. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations.

Authors: Vincent Segura; Bjarni J Vilhjálmsson; Alexander Platt; Arthur Korte; Ümit Seren; Quan Long; Magnus Nordborg
Journal: Nat Genet Date: 2012-06-17 Impact factor: 38.330

7. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines.

Authors: Susanna Atwell; Yu S Huang; Bjarni J Vilhjálmsson; Glenda Willems; Matthew Horton; Yan Li; Dazhe Meng; Alexander Platt; Aaron M Tarone; Tina T Hu; Rong Jiang; N Wayan Muliyati; Xu Zhang; Muhammad Ali Amer; Ivan Baxter; Benjamin Brachi; Joanne Chory; Caroline Dean; Marilyne Debieu; Juliette de Meaux; Joseph R Ecker; Nathalie Faure; Joel M Kniskern; Jonathan D G Jones; Todd Michael; Adnane Nemri; Fabrice Roux; David E Salt; Chunlao Tang; Marco Todesco; M Brian Traw; Detlef Weigel; Paul Marjoram; Justin O Borevitz; Joy Bergelson; Magnus Nordborg
Journal: Nature Date: 2010-03-24 Impact factor: 49.962

8. Population structure and eigenanalysis.

Authors: Nick Patterson; Alkes L Price; David Reich
Journal: PLoS Genet Date: 2006-12 Impact factor: 5.917

Review 9. The advantages and limitations of trait analysis with GWAS: a review.

Authors: Arthur Korte; Ashley Farlow
Journal: Plant Methods Date: 2013-07-22 Impact factor: 4.993

10. Genes and pathways underlying regional and cell type changes in Alzheimer's disease.

Authors: Jeremy A Miller; Randall L Woltjer; Jeff M Goodenbour; Steve Horvath; Daniel H Geschwind
Journal: Genome Med Date: 2013-05-25 Impact factor: 11.117

8 in total

1. Multiplex confounding factor correction for genomic association mapping with squared sparse linear mixed model.

Authors: Haohan Wang; Xiang Liu; Yunpeng Xiao; Ming Xu; Eric P Xing
Journal: Methods Date: 2018-04-27 Impact factor: 3.608

2. Trade-offs of Linear Mixed Models in Genome-Wide Association Studies.

Authors: Haohan Wang; Bryon Aragam; Eric P Xing
Journal: J Comput Biol Date: 2022-02-25 Impact factor: 1.479

3. Transition-transversion encoding and genetic relationship metric in ReliefF feature selection improves pathway enrichment in GWAS.

Authors: M Arabnejad; B A Dawkins; W S Bush; B C White; A R Harkness; B A McKinney
Journal: BioData Min Date: 2018-11-03 Impact factor: 2.522

4. Coupled mixed model for joint genetic analysis of complex disorders with two independently collected data sets.

Authors: Haohan Wang; Fen Pei; Michael M Vanyukov; Ivet Bahar; Wei Wu; Eric P Xing
Journal: BMC Bioinformatics Date: 2021-02-05 Impact factor: 3.169

5. Sparse Regression in Cancer Genomics: Comparing Variable Selection and Predictions in Real World Data.

Authors: Robert J O'Shea; Sophia Tsoka; Gary Jr Cook; Vicky Goh
Journal: Cancer Inform Date: 2021-11-27

Review 6. In Search of Biomarkers for Pathogenesis and Control of Leishmaniasis by Global Analyses of Leishmania-Infected Macrophages.

Authors: Patricia Sampaio Tavares Veras; Pablo Ivan Pereira Ramos; Juliana Perrone Bezerra de Menezes
Journal: Front Cell Infect Microbiol Date: 2018-09-19 Impact factor: 5.293

7. Discovering weaker genetic associations guided by known associations.

Authors: Haohan Wang; Michael M Vanyukov; Eric P Xing; Wei Wu
Journal: BMC Med Genomics Date: 2020-02-24 Impact factor: 3.063

8. Deep mixed model for marginal epistasis detection and population stratification correction in genome-wide association studies.

Authors: Haohan Wang; Tianwei Yue; Jingkang Yang; Wei Wu; Eric P Xing
Journal: BMC Bioinformatics Date: 2019-12-27 Impact factor: 3.169

8 in total