| Literature DB >> 29077797 |
Simon Dirmeier1, Christiane Fuchs2,3, Nikola S Mueller2, Fabian J Theis2,3.
Abstract
Summary: Modelling biological associations or dependencies using linear regression is often complicated when the analyzed data-sets are high-dimensional and less observations than variables are available (n ≪ p). For genomic data-sets penalized regression methods have been applied settling this issue. Recently proposed regression models utilize prior knowledge on dependencies, e.g. in the form of graphs, arguing that this information will lead to more reliable estimates for regression coefficients. However, none of the proposed models for multivariate genomic response variables have been implemented as a computationally efficient, freely available library. In this paper we propose netReg, a package for graph-penalized regression models that use large networks and thousands of variables. netReg incorporates a priori generated biological graph information into linear models yielding sparse or smooth solutions for regression coefficients. Availability and implementation: netReg is implemented as both R-package and C ++ commandline tool. The main computations are done in C ++, where we use Armadillo for fast matrix calculations and Dlib for optimization. The R package is freely available on Bioconductorhttps://bioconductor.org/packages/netReg. The command line tool can be installed using the conda channel Bioconda. Installation details, issue reports, development versions, documentation and tutorials for the R and C ++ versions and the R package vignette can be found on GitHub https://dirmeier.github.io/netReg/. The GitHub page also contains code for benchmarking and example datasets used in this paper. Contact: simon.dirmeier@bsse.ethz.ch.Entities:
Mesh:
Year: 2018 PMID: 29077797 PMCID: PMC6030897 DOI: 10.1093/bioinformatics/btx677
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Timings of a pure R versus netReg implementation
| R | 2009 ms | 578 s | > 3 d |
| netReg | 25 ms | 12 s | 2.5 h |
Note: For each setting measurements are averaged over 10 runs with q = 10 response variables.
Fig. 1Mean residual sum of squares for LASSO versus netReg [Equation (1)]. netReg outperforms the LASSO for different levels of number of observations n, covariables p and different Gaussian noise with mean 0 and variance σ2 ∈{1, 2, 5} (low, medium, high) consistently. Boxes show 25, 50 and 75% quantiles
Fig. 2Mean residual sum of squares for LASSO versus netReg [Equation (1)]. netReg and the LASSO have similar estimates for coefficients