| Literature DB >> 18541045 |
Hiroyuki Mishima1, Andrew C Lidral, Jun Ni.
Abstract
BACKGROUND: Genetic association studies have been used to map disease-causing genes. A newly introduced statistical method, called exhaustive haplotype association study, analyzes genetic information consisting of different numbers and combinations of DNA sequence variations along a chromosome. Such studies involve a large number of statistical calculations and subsequently high computing power. It is possible to develop parallel algorithms and codes to perform the calculations on a high performance computing (HPC) system. However, most existing commonly-used statistic packages for genetic studies are non-parallel versions. Alternatively, one may use the cutting-edge technology of grid computing and its packages to conduct non-parallel genetic statistical packages on a centralized HPC system or distributed computing systems. In this paper, we report the utilization of a queuing scheduler built on the Grid Engine and run on a Rocks Linux cluster for our genetic statistical studies.Entities:
Mesh:
Year: 2008 PMID: 18541045 PMCID: PMC2423433 DOI: 10.1186/1471-2105-9-S6-S10
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Consecutive and combinational window haplotypes. Examples of consecutive and combinational window haplotypes for four loci. Each column of squares indicates loci, and each row indicates haplotype windows.
Computation Performance. Computation performance in each method. Analyzed haplotype window types were consecutive window haplotypes (ConsWH) and combinational window haplotypes (CombWH). Fold acceleration is defined as actual elapsed time divided by accumulated time for each process. Acceleration linearity is defined as fold acceleration divided by number of used compute-nodes. 22 nodes are used for analysis except five nodes for the Unphased CombWH analysis.
| program | option | analyzed haplotype | window size | elapsed time | accumulated time | fold acceleration | acceleration linearity |
| FBAT | -e | ConsWH | 1–26 | 2.2 min | 31.7 min | 14.4 | 65.5% |
| FBAT | -e | CombWH | 1–5 | 1.9 min | 30.3 min | 15.9 | 72.3% |
| Unphased | -uncertain | ConsWH | 1–17 | 69.9 day | 909.0 day | 13.0 | 59.1% |
| Unphased | -certain | ConsWH | 1–26 | 13.8 min | 256.1 min | 18.6 | 84.5% |
| Unphased | -certain | CombWH | 1–5 | 6.5 day | 7.2 day | 1.1 | 22.0% |
Figure 2Elapsed time for Unphased processes. Diversity of Unphased computation duration for each window size. For window size 1–3, haplotypes sharing same window size were analyze in single processes. For window size 4–17, each haplotype was analyzed by each single process. The x-axis indicates sizes of consecutive windows. The y-axis indicates common logarithms of elapsed seconds. A boxplot indicates first quantile (bottom of a box), third quantile (top of a box), median (a bold line in a box), smallest non-outlier (a lower whisker), largest non-outlier (a upper whisker), and outliers (circles) defined as data far from more than 1.5 fold of a box size.
Figure 3Array job execution scheme. Scheme of array job execution using the Grid Engine for FBAT (A) and Unphased (B). The Grid Engine assigns array job ID to an environmental variable SGE_TASK_ID. Then, script execution is cascaded.