| Literature DB >> 23335858 |
Savvas Petrou1, Terence M Sloan, Muriel Mewissen, Thorsten Forster, Michal Piotrowski, Bartosz Dobrzelecki, Peter Ghazal, Arthur Trew, Jon Hill.
Abstract
The statistical language R and its Bioconductor package are favoured by many biostatisticians for processing microarray data. The amount of data produced by some analyses has reached the limits of many common bioinformatics computing infrastructures. High Performance Computing systems offer a solution to this issue. The Simple Parallel R Interface (SPRINT) is a package that provides biostatisticians with easy access to High Performance Computing systems and allows the addition of parallelized functions to R. Previous work has established that the SPRINT implementation of an R permutation testing function has close to optimal scaling on up to 512 processors on a supercomputer. Access to supercomputers, however, is not always possible, and so the work presented here compares the performance of the SPRINT implementation on a supercomputer with benchmarks on a range of platforms including cloud resources and a common desktop machine with multiprocessing capabilities.Entities:
Year: 2011 PMID: 23335858 PMCID: PMC3546371 DOI: 10.1002/cpe.1787
Source DB: PubMed Journal: Concurr Comput ISSN: 1532-0626 Impact factor: 1.536
Figure 1The SPRINT framework architecture as described in [9].
Figure 2How permutations are distributed among the available processes.
Profile of pmaxT implementation (HECToR)
| Process count | Pre processing (s) | Broadcast parameters (s) | Create data (s) | Main kernel (s) | Compute | Speedup | Speedup (kernel) |
|---|---|---|---|---|---|---|---|
| 1 | 0.260 | 0.001 | 0.010 | 795.600 | 0.002 | 1.00 | 1.00 |
| 2 | 0.261 | 0.004 | 0.012 | 406.204 | 0.884 | 1.95 | 1.95 |
| 4 | 0.259 | 0.009 | 0.013 | 207.776 | 0.005 | 3.82 | 3.82 |
| 8 | 0.260 | 0.013 | 0.013 | 104.169 | 0.489 | 7.58 | 7.63 |
| 16 | 0.259 | 0.015 | 0.013 | 51.931 | 0.713 | 15.03 | 15.32 |
| 32 | 0.259 | 0.017 | 0.013 | 25.993 | 0.784 | 29.40 | 30.60 |
| 64 | 0.259 | 0.020 | 0.013 | 13.028 | 0.611 | 57.11 | 61.06 |
| 128 | 0.259 | 0.023 | 0.013 | 6.516 | 0.662 | 106.48 | 122.09 |
| 256 | 0.260 | 0.024 | 0.013 | 3.257 | 0.611 | 190.99 | 244.27 |
| 512 | 0.260 | 0.028 | 0.013 | 1.633 | 0.606 | 313.09 | 487.20 |
Profile of pmaxT implementation (Quad Core desktop)
| Process count | Pre processing (s) | Broadcast parameters (s) | Create data (s) | Main kernel (s) | Compute | Speedup | Speedup (kernel) |
|---|---|---|---|---|---|---|---|
| 1 | 0.140 | 0.000 | 0.007 | 566.638 | 0.001 | 1.00 | 1.00 |
| 2 | 0.136 | 0.003 | 0.008 | 282.623 | 0.085 | 2.00 | 2.00 |
| 4 | 0.135 | 0.010 | 0.013 | 167.439 | 0.705 | 3.37 | 3.38 |
Figure 3pmaxT speed-up on the various systems.
Comparing the elapsed run times of pmaxT and the original serial R implementation for processing two datasets of different size with increasing permutation count. The pmaxT runs were executed on 256 cores of HECToR whereas the serial R run times are estimates based on smaller permutation counts on a single core
| Input array dimension and size (genes ×samples) | Permutation count | Total run time (s) | Serial run time (approximation) (s) |
|---|---|---|---|
| 36 612 × 76 21.22 MB | 500 000 | 73.18 | 20 750 (6 h) |
| 1 000 000 | 146.64 | 41 500 (12 h) | |
| 2 000 000 | 290.22 | 83 000 (23 h) | |
| 73 224 × 76 42.45 MB | 500 000 | 148.46 | 35 000 (10 h) |
| 1 000 000 | 294.61 | 70 000 (20 h) | |
| 2 000 000 | 591.48 | 140 000 (39 h) |
Profile of pmaxT implementation (ECDF)
| Process count | Pre processing (s) | Broadcast parameters (s) | Create data (s) | Main kernel (s) | Compute | Speedup | Speedup (kernel) |
|---|---|---|---|---|---|---|---|
| 1 | 0.157 | 0.000 | 0.003 | 467.273 | 0.000 | 1.00 | 1.00 |
| 2 | 0.163 | 0.002 | 0.003 | 234.848 | 0.000 | 1.99 | 1.99 |
| 4 | 0.162 | 0.003 | 0.004 | 123.174 | 0.000 | 3.79 | 3.79 |
| 8 | 0.159 | 0.004 | 0.005 | 79.576 | 1.217 | 5.77 | 5.87 |
| 16 | 0.158 | 0.032 | 0.005 | 39.467 | 1.224 | 11.43 | 11.84 |
| 32 | 0.164 | 0.072 | 0.005 | 19.862 | 1.235 | 21.91 | 23.53 |
| 64 | 0.157 | 0.072 | 0.005 | 9.935 | 1.297 | 40.77 | 47.03 |
| 128 | 0.162 | 0.086 | 0.007 | 5.813 | 1.304 | 63.40 | 80.38 |
Profile of pmaxT implementation (Amazon EC2)
| Process count | Pre processing (s) | Broadcast parameters (s) | Create data (s) | Main kernel (s) | Compute | Speedup | Speedup (kernel) |
|---|---|---|---|---|---|---|---|
| 1 | 0.272 | 0.000 | 0.006 | 539.074 | 0.000 | 1.00 | 1.00 |
| 2 | 0.271 | 0.004 | 0.009 | 291.514 | 0.005 | 1.84 | 1.84 |
| 4 | 0.273 | 0.011 | 0.014 | 187.342 | 0.043 | 2.87 | 2.87 |
| 8 | 0.278 | 0.880 | 0.014 | 90.806 | 2.574 | 5.70 | 5.93 |
| 16 | 0.268 | 1.735 | 0.022 | 43.756 | 4.983 | 10.62 | 12.32 |
| 32 | 0.270 | 2.917 | 0.019 | 22.308 | 3.834 | 18.37 | 24.16 |
Profile of pmaxT implementation (Ness)
| Process count | Pre processing (s) | Broadcast parameters (s) | Create data (s) | Main kernel (s) | Compute | Speedup | Speedup (kernel) |
|---|---|---|---|---|---|---|---|
| 1 | 0.393 | 0.000 | 0.010 | 852.223 | 0.000 | 1.00 | 1.00 |
| 2 | 0.467 | 0.007 | 0.012 | 443.050 | 0.001 | 1.92 | 1.92 |
| 4 | 0.398 | 0.029 | 0.012 | 216.595 | 0.001 | 3.93 | 3.93 |
| 8 | 0.394 | 0.032 | 0.014 | 117.317 | 0.001 | 7.24 | 7.26 |
| 16 | 0.436 | 0.109 | 0.019 | 84.442 | 0.001 | 10.03 | 10.09 |