| Literature DB >> 32010202 |
Jeyarajan Thiyagalingam1, Lykourgos Kekempanos1, Simon Maskell1.
Abstract
Particle filtering is a numerical Bayesian technique that has great potential for solving sequential estimation problems involving non-linear and non-Gaussian models. Since the estimation accuracy achieved by particle filters improves as the number of particles increases, it is natural to consider as many particles as possible. MapReduce is a generic programming model that makes it possible to scale a wide variety of algorithms to Big data. However, despite the application of particle filters across many domains, little attention has been devoted to implementing particle filters using MapReduce. In this paper, we describe an implementation of a particle filter using MapReduce. We focus on a component that what would otherwise be a bottleneck to parallel execution, the resampling component. We devise a new implementation of this component, which requires no approximations, has O(N) spatial complexity and deterministic O((logN)2) time complexity. Results demonstrate the utility of this new component and culminate in consideration of a particle filter with 224 particles being distributed across 512 processor cores.Entities:
Keywords: Big data sampling; MCMC methods; MapReduce; Particle filters; Resampling
Year: 2017 PMID: 32010202 PMCID: PMC6959401 DOI: 10.1186/s13634-017-0505-9
Source DB: PubMed Journal: EURASIP J Adv Signal Process ISSN: 1687-6172
Fig. 1General MapReduce processing model
Theoretical complexities (in terms of time, space and total data transfers per unit time) of various algorithmic components of the particle filter with N data and P processors
| Section | Algorithmic component | Time | Space | Data transfers |
|---|---|---|---|---|
|
| Element-wise operations |
|
|
|
|
| Rotation |
|
|
|
|
| Sum/max/min |
|
|
|
|
| Cumulative sum |
|
|
|
|
| Normalising the weights |
|
|
|
|
| Minimum variance resampling |
|
|
|
|
| (Bitonic) sort |
|
|
|
|
| Redistribution from [ |
|
|
|
|
| Improved redistribution |
|
|
|
|
| Naïve redistribution |
|
|
|
Fig. 2Example of cumulative sum for N=8 numbers. Subfigures a–d describe the sum computation, while the remaining balanced binary trees shown in subfigures e–g describe how the backward pass culminates in calculation of the cumulative sum of the given sequence
Fig. 3Example of bitonic sort using eight numbers. Each horizontal wire corresponds to a core. The blue color denotes that the larger value will be stored at the lower wire after the comparison, while the green color the opposite
Fig. 4An example of the redistribution for x=[10,9,12,6,1,3,14,2] and m=[4,2,1,1,0,0,0,0] using the original and improved (new) redistribute. The original redistribution always sorts the number of copies vector (bottom vector) in a descending order, while this is not required in the new redistribution (e.g., see node no. 3). a Original. b New
Details of the experimental platform used for evaluation
| Details | Single node system | Multi-node system |
|---|---|---|
| Name | Platform 1 | Platform 2 |
| Number of nodes | 1 | 28 |
| Hardware cores | 16 | 512 |
| Operating system | Linux | IBM Unix |
| Primary memory | 16 GB | 384 GB |
| Spark version | 1.6.2 | 1.4.1 |
| Hadoop version | 2.7.2 | 2.7.1 |
Fig. 5Worst-case performance of redistribution: platform 1. a Naïve implementation. b Proposed approach
Fig. 6Worst-case performance of redistribution: platform 2. a Naïve implementation. b Proposed approach
Fig. 7Ratio of average (and minimum and maximum) runtimes for worst-case and best-case scenarios using the deterministic and naïve redistribute
Fig. 8Overall runtime profile of the particle filtering algorithm for the following implementations: a Sequential. b Hadoop. c Spark with 217 particles. d Spark with 220 particles
Fig. 9Summation and cumulative summation on Spark and Hadoop. a Summation. b Cummulative summation
Fig. 10Bitonic sort and minimum variance resampling on Spark and Hadoop. a Bitonic sort. b Minimum variance resampling
Fig. 11Redistribution and the overall particle filtering on Spark and Hadoop. a Redistribution. b Overall particle filtering
Fig. 12Performance of the two variants of the redistribution component (using Spark). a Redistribution . b Redistribution
Fig. 13Performance of the overall particle filter using the two variants of the redistribution component. a Particle filter . b Particle filter
Fig. 14Relative speedup and scalability of the variant of the redistribution component on platform 1. a Relative speedup: . b Scalability:
Fig. 15Relative speedup and scalability of the variant of the redistribution component on platform 2. a Relative speedup: . b Scalability:
Fig. 16Relative speedup and scalability of the overall particle filter algorithm using the variant of the redistribution component on platform 1. a Relative speedup: . b Scalability:
Fig. 17Relative speedup and scalability of the overall particle filter algorithm using the variant of the redistribution component on platform 2. a Relative Speedup: . b Scalability:
Fig. 18Performance of summation using Spark with a fixed total number of values comprised of different number of keys and therefore different numbers of values per key