Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark.

Literature DB >> 27390389

BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark.

Muhammad Ali Gulzar¹, Matteo Interlandi¹, Seunghyun Yoo¹, Sai Deep Tetali¹, Tyson Condie¹, Todd Millstein¹, Miryung Kim¹.

Abstract

Developers use cloud computing platforms to process a large quantity of data in parallel when developing big data analytics. Debugging the massive parallel computations that run in today's data-centers is time consuming and error-prone. To address this challenge, we design a set of interactive, real-time debugging primitives for big data processing in Apache Spark, the next generation data-intensive scalable cloud computing platform. This requires re-thinking the notion of step-through debugging in a traditional debugger such as gdb, because pausing the entire computation across distributed worker nodes causes significant delay and naively inspecting millions of records using a watchpoint is too time consuming for an end user. First, BIGDEBUG's simulated breakpoints and on-demand watchpoints allow users to selectively examine distributed, intermediate data on the cloud with little overhead. Second, a user can also pinpoint a crash-inducing record and selectively resume relevant sub-computations after a quick fix. Third, a user can determine the root causes of errors (or delays) at the level of individual records through a fine-grained data provenance capability. Our evaluation shows that BIGDEBUG scales to terabytes and its record-level tracing incurs less than 25% overhead on average. It determines crash culprits orders of magnitude more accurately and provides up to 100% time saving compared to the baseline replay debugger. The results show that BIGDEBUG supports debugging at interactive speeds with minimal performance impact.

Entities: Chemical Disease Species

Keywords: Debugging; big data analytics; data-intensive scalable computing (DISC); fault localization and recovery; interactive tools

Year: 2016 PMID： 27390389 PMCID： PMC4933307 DOI： 10.1145/2884781.2884813

Source DB: PubMed Journal: Proc Int Conf Softw Eng ISSN： 0270-5257

1 in total

1. Titian: Data Provenance Support in Spark.

Authors: Matteo Interlandi; Kshitij Shah; Sai Deep Tetali; Muhammad Ali Gulzar; Seunghyun Yoo; Miryung Kim; Todd Millstein; Tyson Condie
Journal: Proceedings VLDB Endowment Date: 2015-11

1 in total

2 in total

1. Optimizing Interactive Development of Data-Intensive Applications.

Authors: Matteo Interlandi; Sai Deep Tetali; Muhammad Ali Gulzar; Joseph Noor; Tyson Condie; Miryung Kim; Todd Millstein
Journal: Proc ACM Symp Cloud Comput Date: 2016-10

2. Titian: Data Provenance Support in Spark.

Authors: Matteo Interlandi; Kshitij Shah; Sai Deep Tetali; Muhammad Ali Gulzar; Seunghyun Yoo; Miryung Kim; Todd Millstein; Tyson Condie
Journal: Proceedings VLDB Endowment Date: 2015-11

2 in total