| Literature DB >> 22471660 |
Ross E Curtis1, Anuj Goyal, Eric P Xing.
Abstract
BACKGROUND: Structured association mapping is proving to be a powerful strategy to find genetic polymorphisms associated with disease. However, these algorithms are often distributed as command line implementations that require expertise and effort to customize and put into practice. Because of the difficulty required to use these cutting-edge techniques, geneticists often revert to simpler, less powerful methods.Entities:
Mesh:
Year: 2012 PMID: 22471660 PMCID: PMC3342145 DOI: 10.1186/1471-2156-13-24
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Algorithms available to run in Auto-SAM
| Algorithm | Type | Input | Output | Mouse run Time (traits) | Mouse run time (genes) | Yeast run time | Automated steps |
|---|---|---|---|---|---|---|---|
| SAM | G, P, Ep | G-P association | 0 05:05:50 | 1 16:17:45 | 2 06:43:56 | 17 | |
| SAM | G, P, Pop | G-P association | 2 09:07:47 | - | 3 11:09:46 | 3 | |
| SAM | G, P, Ep | G-P association | 0 01:12:03 | 0 12:53:52 | 0 05:08:42 | 15 | |
| SAM | G, P, Fg | G-P association | - | - | 1 20:35:47 | 8 | |
| SAM | T, Et, P, Ep, G/T assoc | T-P association | N/A | 0 01:54:04 | N/A | 19 | |
| AM | G, P | G-P association | 0 00:23:29 | 0 09:51:23 | 0 00:54:04 | 5 | |
| AM | G, P | G-P association | 0 00:10:21 | 0 01:05:51 | 0 00:14:32 | 4 | |
| AM | G, P | G-P association | 0 00:59:21 | 0 08:22:42 | 0 04:07:53 | 6 | |
| AM | G, P, Pop | G-P association | 0 00:57:24 | 2 19:02:46 | 1 03:00:44 | 5 | |
| network | P | Ep | 0 00:01:50 | 0 00:06:05 | 0 00:09:29 | 3 | |
| network | P | Ep | 0 00:07:31 | 0 01:19:37 | 0 01:51:24 | 10 | |
| network | P | Ep | 0 00:03:11 | 0 00:41:23 | 0 00:41:08 | 6 | |
| tree | P | tree | 0 00:43:32 | 0 01:27:32 | 0 01:05:04 | 3 | |
| population | G | Pop | 0 13:30:32 | 0 13:30:32 | 0 00:41:54 | 4 | |
| Network analysis | Ep | phenotype clusters | N/A | 0 12:29:24 | 0 01:03:22 | 4 | |
We present a list of all algorithms available to run through the Auto-SAM system and GenAMap. We group the algorithms by type: SAM (structured association mapping), AM (association mapping), network generation, tree generation, population determination, and network analysis. The input for each algorithm can be G (genotype), P (phenotype), T (gene expression data), Ep (edges for the phenotype), Fg (features of the genotype), Et (edges for the gene expression values), and G/T (genome/transcriptome) associations. Times are represented as D HH:MM:SS where D day, H hour, M minute, and S second.
Figure 1Software design of the Auto-SAM system. Locally, through a front-end GUI, the user uploads data to the data database. The GUI also communicates with the jobs database, submitting job requests that use the loaded data. A service running on the distributed Auto-SAM system continuously monitors the jobs database, which spawns and monitoring jobs as they run through the condor cluster.
Figure 2Monitoring jobs in Auto-SAM. We implemented a job-monitoring system that regularly checks the progress of each job in the database. Using this monitor, the analyst can follow each job's progress, request error information, and pause and kill jobs. This job monitor is integrated into GenAMap.
Figure 3An overview of GenAMap visualization tools. GenAMap is a visualization tool for association mapping. We present a sampling of visualizations available in GenAMap: A) network analysis, B) association analysis, C) association-by-population analysis, D) three-way genome-transcriptome-phenome analysis.
GFlasso processing pipeline
| Description | Stage | No. Jobs using yeast data | Av. Time/Job on yeast data | Actual time from start to finish | Time saved via parallelization | |
|---|---|---|---|---|---|---|
| Lasso stage 1 | Preprocessing SNPs | 23 | 00:00:25 | 00:02:59 | 00:06:36 | |
| Lasso validation error | Preprocessing SNPs | 1 | 00:02:14 | 00:02:14 | 00:00:00 | |
| Lasso stage 2 | Preprocessing SNPs | 23 | 00:01:16 | 00:03:40 | 00:25:34 | |
| Lasso validation error | Preprocessing SNPs | 1 | 00:00:39 | 00:00:39 | 00:00:00 | |
| Marker Processing | Preprocessing SNPs | 1 | 00:00:10 | 00:00:10 | 00:00:00 | |
| Connected component analysis | Preprocessing traits | 1 | 00:00:56 | 00:00:56 | 00:00:00 | |
| Spectral clustering | Preprocessing traits | 118 | 00:00:56 | 00:06:44 | 01:43:03 | |
| Trait Processing | Preprocessing traits | 1 | 00:06:02 | 00:06:02 | 00:00:00 | |
| GFlasso stage 1 | GFlasso optimization | 130 | 00:51:34 | 01:41:36 | 110:02:30 | |
| GFlasso validation error stage 1 | GFlasso optimization | 1 | 01:43:01 | 01:43:01 | 00:00:00 | |
| GFlasso stage 2 | GFlasso optimization | 130 | 05:49:08 | 29:27:45 | 726:59:05 | |
| GFlasso validation error stage 2 | GFlasso optimization | 1 | 00:45:26 | 00:45:26 | 00:00:00 | |
| GFlasso stage 3 | GFlasso optimization | 130 | 03:18:04 | 10:56:18 | 418:12:37 | |
| GFlasso validation error stage 3 | GFlasso optimization | 1 | 00:56:32 | 00:56:32 | 00:00:00 | |