Literature DB >> 34146108

Fast Numerical Optimization for Genome Sequencing Data in Population Biobanks.

Ruilin Li1, Christopher Chang2, Yosuke Tanigawa3, Balasubramanian Narasimhan3,4, Trevor Hastie3,4, Robert Tibshirani3,4, Manuel A Rivas4.   

Abstract

MOTIVATION: Large-scale and high-dimensional genome sequencing data poses computational challenges. General purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data.
RESULTS: We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0, 1, 2, NA}. We take advantage of this fact and use two bits to represent each entry in a genetic matrix, which reduces memory requirement by a factor of 32 compared to a double precision floating point representation. Using this representation, we implemented an iteratively reweighted least square algorithm to solve Lasso regressions on genetic matrices, which we name snpnet-2.0. When the dataset contains many rare variants, the predictors can be encoded in a sparse matrix. We utilize the sparsity in the predictor matrix to further reduce memory requirement and computational speed. Our sparse genetic matrix implementation uses both the compact 2-bit representation and a simplified version of compressed sparse block format so that matrix-vector multiplications can be effectively parallelized on multiple CPU cores. To demonstrate the effectiveness of this representation, we implement an accelerated proximal gradient method to solve group Lasso on these sparse genetic matrices. This solver is named sparse-snpnet, and will also be included as part of snpnet R package. Our implementation is able to solve Lasso and group Lasso, linear, logistic and Cox regression problems on sparse genetic matrices that contain 1,000,000 variants and almost 100,000 individuals within 10 minutes and using less than 32GB of memory. AVAILABILITY: https://github.com/rivas-lab/snpnet/tree/compact.
© The Author(s) (2021). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Year:  2021        PMID: 34146108      PMCID: PMC9206591          DOI: 10.1093/bioinformatics/btab452

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.931


  8 in total

1.  Phenome-wide Burden of Copy-Number Variation in the UK Biobank.

Authors:  Matthew Aguirre; Manuel A Rivas; James Priest
Journal:  Am J Hum Genet       Date:  2019-07-25       Impact factor: 11.025

2.  A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank.

Authors:  Junyang Qian; Yosuke Tanigawa; Wenfei Du; Matthew Aguirre; Chris Chang; Robert Tibshirani; Manuel A Rivas; Trevor Hastie
Journal:  PLoS Genet       Date:  2020-10-23       Impact factor: 5.917

3.  UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.

Authors:  Cathie Sudlow; John Gallacher; Naomi Allen; Valerie Beral; Paul Burton; John Danesh; Paul Downey; Paul Elliott; Jane Green; Martin Landray; Bette Liu; Paul Matthews; Giok Ong; Jill Pell; Alan Silman; Alan Young; Tim Sprosen; Tim Peakman; Rory Collins
Journal:  PLoS Med       Date:  2015-03-31       Impact factor: 11.069

4.  Efficient Bayesian mixed-model analysis increases association power in large cohorts.

Authors:  Po-Ru Loh; George Tucker; Brendan K Bulik-Sullivan; Bjarni J Vilhjálmsson; Hilary K Finucane; Rany M Salem; Daniel I Chasman; Paul M Ridker; Benjamin M Neale; Bonnie Berger; Nick Patterson; Alkes L Price
Journal:  Nat Genet       Date:  2015-02-02       Impact factor: 38.330

5.  Structural absorption by barbule microstructures of super black bird of paradise feathers.

Authors:  Dakota E McCoy; Teresa Feo; Todd Alan Harvey; Richard O Prum
Journal:  Nat Commun       Date:  2018-01-09       Impact factor: 14.919

6.  Polygenic prediction via Bayesian regression and continuous shrinkage priors.

Authors:  Tian Ge; Chia-Yen Chen; Yang Ni; Yen-Chen Anne Feng; Jordan W Smoller
Journal:  Nat Commun       Date:  2019-04-16       Impact factor: 14.919

7.  Improved polygenic prediction by Bayesian multiple regression on summary statistics.

Authors:  Luke R Lloyd-Jones; Jian Zeng; Julia Sidorenko; Loïc Yengo; Gerhard Moser; Kathryn E Kemper; Huanwei Wang; Zhili Zheng; Reedik Magi; Tõnu Esko; Andres Metspalu; Naomi R Wray; Michael E Goddard; Jian Yang; Peter M Visscher
Journal:  Nat Commun       Date:  2019-11-08       Impact factor: 14.919

8.  Genetics of 35 blood and urine biomarkers in the UK Biobank.

Authors:  Nasa Sinnott-Armstrong; Yosuke Tanigawa; Manuel A Rivas; David Amar; Nina Mars; Christian Benner; Matthew Aguirre; Guhan Ram Venkataraman; Michael Wainberg; Hanna M Ollila; Tuomo Kiiskinen; Aki S Havulinna; James P Pirruccello; Junyang Qian; Anna Shcherbina; Fatima Rodriguez; Themistocles L Assimes; Vineeta Agarwala; Robert Tibshirani; Trevor Hastie; Samuli Ripatti; Jonathan K Pritchard; Mark J Daly
Journal:  Nat Genet       Date:  2021-01-18       Impact factor: 38.330

  8 in total
  2 in total

1.  Construction and validation of prognostic prediction established on N6-methyladenosine related genes in cervical squamous cell carcinoma.

Authors:  Danxia Chen; Wenhao Guo; Hailan Yu; Jianhua Yang
Journal:  Transl Cancer Res       Date:  2022-09       Impact factor: 0.496

2.  Significant sparse polygenic risk scores across 813 traits in UK Biobank.

Authors:  Yosuke Tanigawa; Junyang Qian; Guhan Venkataraman; Johanne Marie Justesen; Ruilin Li; Robert Tibshirani; Trevor Hastie; Manuel A Rivas
Journal:  PLoS Genet       Date:  2022-03-24       Impact factor: 6.020

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.