| Literature DB >> 35808249 |
Kassiano J Matteussi1,2, Julio C S Dos Anjos3, Valderi R Q Leithardt4,5, Claudio F R Geyer1.
Abstract
A significant rise in the adoption of streaming applications has changed the decision-making processes in the last decade. This movement has led to the emergence of several Big Data technologies for in-memory processing, such as the systems Apache Storm, Spark, Heron, Samza, Flink, and others. Spark Streaming, a widespread open-source implementation, processes data-intensive applications that often require large amounts of memory. However, Spark Unified Memory Manager cannot properly manage sudden or intensive data surges and their related in-memory caching needs, resulting in performance and throughput degradation, high latency, a large number of garbage collection operations, out-of-memory issues, and data loss. This work presents a comprehensive performance evaluation of Spark Streaming backpressure to investigate the hypothesis that it could support data-intensive pipelines under specific pressure requirements. The results reveal that backpressure is suitable only for small and medium pipelines for stateless and stateful applications. Furthermore, it points out the Spark Streaming limitations that lead to in-memory-based issues for data-intensive pipelines and stateful applications. In addition, the work indicates potential solutions.Entities:
Keywords: backpressure; big data; spark streaming; stream processing
Mesh:
Year: 2022 PMID: 35808249 PMCID: PMC9269592 DOI: 10.3390/s22134756
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Related Work: detailed overview.
|
|
|
|
|
|
|
|
|
|
|
| [ | End-to-end-latency | Low throughput | Not specified | No | UMM | Not specified | Adaptive batch | Spark-core | No |
| [ | Memory shortage | High scheduling delay | Stateful | No | UMM | Memory only | Data-driven latency | Spark-core | No |
| [ | State checkpointing | Low throughput | Stateful | No | UMM | Memory and Disk | Backpressure from Spark | Spark-core | No |
| [ | Network latency | Load imbalance | Stateful | No | Flink over | Memory and Disk | Adaptive batching with | Flink-core | Yes |
| [ | Memory shortage | Low throughput | Stateful | No | Flink over | Memory and Disk | Backpressure mechanism | Flink-core | No |
| [ | Inefficient memory | OOM crashes | Stateless | No | UMM | Memory and Disk | Application semantics | Spark-core | No |
Spark Performance Counters.
| Performance Counters | Definition |
|---|---|
| Time | The timestamp of the current batch interval that just finished; |
| Events | The number of records that were processed in the current batch; |
| Processing Time (PT) | Time in ms it took for the job to complete; |
| Scheduling Delay (SD) | The time in ms that the job spent in the scheduling queue; |
Figure 1PID Controller Model Implementation.
Figure 2Spark Backpressure PID Architecture.
Figure 3Unified Memory Manager.
Figure 4Memory Management Behaviour.
Software Stack.
| Operating System Debian 9, Kernel 4.9.0-11 amd64 |
| Hadoop 3.1.2 |
| Spark Streaming 2.4.3 |
| Java 1.8.081 |
| Scala 2.13 |
| OpenMpi 4.0.1 |
| ZeroMQ 3.1.1 |
Spark Configuration.
| Parameters | Rennes | Grenoble |
|---|---|---|
| Window (Batch Interval) | 2000 ms | 2000 ms |
| Block Interval | 400 ms | 400 ms |
| Concurrent DAGs | 1 | 1 |
| Spark Parallelism | default | default |
| #Driver Instances | 1 | 1 |
| #Executors instances | 8 | 8 |
| #Receivers per Executor | 1 | 1 |
| Main Memory per Node | 128 GB | 192 GB |
| Driver JVM Heap Memory | 117 GB | 174 GB |
| Executor JVM Heap Memory | 117 GB | 174 GB |
| Executor UMM Storage Region | 33 GB | 49 GB |
| Executors Global Storage Region Memory | 264 GB | 392 GB |
| Executors Global JVM Heap Memory | 936 GB | 1392 GB |
| Cores per Executors (HT) | 32 | 64 |
| #Total Cores in the Spark Cluster | 256 | 512 |
| JVM Memory Schema | On-heap | On-heap |
| GC Type | G1GC | G1GC |
Pipeline Configurations.
| Cod. | Size | Data | MQs | Driver | Executors |
|---|---|---|---|---|---|
| Pipeline 1 | Soft | 8 | 1 | 1 | 8 |
| Pipeline 2 | High | 8 | 8 | 1 | 8 |
Figure 5Stateless SumServer Application-Pipeline 2-Parasilo Cluster.
Figure 6Stateful SumServer Application Without Backpressure—Pipeline 1.
Performance Indicators of Stateful SUMServer Application—Pipeline 1 without Backpressure.
| Metrics | Parasilo | Dahu |
|---|---|---|
| AVG Th (MBps) | 870 | 918 |
| AVG PT (ms) | 1585 | 2094 |
| AVG SD (ms) | 519 | 50,679 |
| AVG Proc. Events | 95,051 | 100,279 |
Figure 7Stateful SumServer Application Without Backpressure—Pipeline 2.
Performance Crashing Indicators of Stateful SUMServer Application—Pipeline 2 without Backpressure.
| Metrics | Parasilo | Dahu |
|---|---|---|
| MAX PT (ms) | 63,371 | 69,044 |
| MAX SD (ms) | 90,792 | 137,480 |
| Crashing Start Time (s) | 14 | 26 |
Figure 8Backpressure Initial Rate Feature Comparison for Stateful SUMServer Application—Pipeline 2.
Backpressure Initial Value Comparison.
| Initial Value Set | Initial Value Not Set | |
|---|---|---|
| Parasilo | 874 | 764 |
| Dahu | 1076 | 590 |
Figure 9Stateful SumServer Application With Backpressure—Pipeline 2.
GC Statistics for Stateful SUMServer Application Without Backpressure—Pipeline 2.
| Dahu Cluster | Parasilo Cluster | |||
|---|---|---|---|---|
|
|
|
|
| |
| Ergonomics | 0 | 133 | 0 | 19 |
| Allocation Failure | 13 | 1392 | 36 | 184 |
| GCLocker Initiated GC | 6 | 25 | 0 | 9 |
| Metadata GC Threshold | 0 | 22 | 3 | 2 |
| Total | 19 | 1572 | 39 | 214 |
| GC Operations | ||||
| Minor GC stats | 16 | 1428 | 36 | 194 |
| Full GC stats | 3 | 144 | 3 | 20 |
| Total | 19 | 1572 | 39 | 214 |
| Performance Indicators | ||||
| Throughput % | 99 | 45 | 99 | 45 |
| Avg Pause GC Time (ms) | 34 | 835 | 57 | 591 |
| Max Pause GC Time (ms) | 90 | 10,184 | 160 | 7920 |
GC Statistics’ for Stateful SUMServer Application With Backpressure over Pipeline 2.
| Dahu Cluster | Parasilo Cluster | |||
|---|---|---|---|---|
|
|
|
|
| |
| Ergonomics | 1 | 125 | 0 | 38 |
| Allocation Failure | 725 | 1045 | 95 | 152 |
| GCLocker Initiated GC | 6 | 18 | 6 | 0 |
| Metadata GC Threshold | 1 | 32 | 0 | 30 |
| Total | 733 | 1220 | 101 | 220 |
| GC operations | ||||
| Minor GC stats | 731 | 1079 | 98 | 167 |
| Full GC stats | 6 | 141 | 3 | 53 |
| Total | 737 | 1220 | 101 | 220 |
| Performance Indicators | ||||
| Throughput % | 100 | 99 | 100 | 95 |
| Avg Pause GC Time (ms) | 26 | 338 | 21 | 238 |
| Max Pause GC Time (ms) | 300 | 1854 | 90 | 755 |