| Literature DB >> 22238272 |
Abstract
UNLABELLED: The deluge of data emerging from high-throughput sequencing technologies poses large analytical challenges when testing for association to disease. We introduce a scalable framework for variable selection, implemented in C++ and OpenCL, that fits regularized regression across multiple Graphics Processing Units. Open source code and documentation can be found at a Google Code repository under the URL http://bioinformatics.oxfordjournals.org/content/early/2012/01/10/bioinformatics.bts015.abstract. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.Entities:
Mesh:
Year: 2012 PMID: 22238272 PMCID: PMC3289918 DOI: 10.1093/bioinformatics/bts015
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Computational requirements
| Method | Time per iteration (min : s) | Memory per node |
|---|---|---|
| 250 000 variables | ||
| 2 : 20 | 46 GB | |
| 0 : 54 | 415 MB | |
| 0 : 1.85 | 415 MB | |
| 0 : 0.93 | 208 MB | |
| 1 million variables | ||
| NA | NA | |
| 3 : 47 | 1.7 GB | |
| 0 : 7.7 | 1.7 GB | |
| 0 : 3.8 | 863 MB |
A total of 7000 subjects in all analyses.
Fig. 1.ROC for simulations based on 1KGP exome data.
Stable variables
| SNP ID | Chr | Position | Selection probability |
|---|---|---|---|
| rs10505483 | 8 | 128 194 377 | 0.68 |
| rs7130881 | 11 | 68 752 534 | 0.95 |
| rs7210100 | 17 | 44 791 748 | 0.8 |
All reported variables based on a threshold πthr=0.506 which controls FDR at <0.05