| Literature DB >> 18808714 |
Gonzalo Vera1, Ritsert C Jansen, Remo L Suppi.
Abstract
BACKGROUND: R is the preferred tool for statistical analysis of many bioinformaticians due in part to the increasing number of freely available analytical methods. Such methods can be quickly reused and adapted to each particular experiment. However, in experiments where large amounts of data are generated, for example using high-throughput screening devices, the processing time required to analyze data is often quite long. A solution to reduce the processing time is the use of parallel computing technologies. Because R does not support parallel computations, several tools have been developed to enable such technologies. However, these tools require multiple modications to the way R programs are usually written or run. Although these tools can finally speed up the calculations, the time, skills and additional resources required to use them are an obstacle for most bioinformaticians.Entities:
Mesh:
Year: 2008 PMID: 18808714 PMCID: PMC2557021 DOI: 10.1186/1471-2105-9-390
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Example using R/parallel. To parallelize a loop it is only needed to add an if-else structure. The loop to be parallelized is placed inside the else body and the parallelizer function runParallel inside the if body. The last step is to indicate in runParallel the variable names used to accumulate the partial results and the operations to apply after each iteration. Other arguments like the number of parallel processes (workers) are optional. Detailed documentation and examples can be found on the project web page as well as within the package as R help pages.
Figure 2Performance results. A) The speedup increases linearly with the number of used cores. Setting more workers (5) than existing cores (4) does not improve the results. B) The (super linear) speedup exceeds the theoretical maximum of number of processing units due to faster tasks. C) With a conservative average load of 15% for other tasks, the computer is overloaded when R/parallel claims 100%. By reducing the percentage for R/parallel (with an optional argument) we can recover responsiveness and keep working on other tasks while running our calculations.