Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Optimizing Interactive Development of Data-Intensive Applications.

Literature DB >> 28405637

Optimizing Interactive Development of Data-Intensive Applications.

Matteo Interlandi¹, Sai Deep Tetali¹, Muhammad Ali Gulzar¹, Joseph Noor¹, Tyson Condie¹, Miryung Kim¹, Todd Millstein¹.

Abstract

Modern Data-Intensive Scalable Computing (DISC) systems are designed to process data through batch jobs that execute programs (e.g., queries) compiled from a high-level language. These programs are often developed interactively by posing ad-hoc queries over the base data until a desired result is generated. We observe that there can be significant overlap in the structure of these queries used to derive the final program. Yet, each successive execution of a slightly modified query is performed anew, which can significantly increase the development cycle. Vega is an Apache Spark framework that we have implemented for optimizing a series of similar Spark programs, likely originating from a development or exploratory data analysis session. Spark developers (e.g., data scientists) can leverage Vega to significantly reduce the amount of time it takes to re-execute a modified Spark program, reducing the overall time to market for their Big Data applications.

Entities: Chemical Disease Gene Species

Keywords: Big Data; H.2.4 [Information Systems]: Database Management—query processing; Incremental Evaluation; Interactive Development; Languages; Performance; Query Rewriting; Spark; Theory; parallel databases

Year: 2016 PMID： 28405637 PMCID： PMC5386325 DOI： 10.1145/2987550.2987565

Source DB: PubMed Journal: Proc ACM Symp Cloud Comput

Keyword Cloud
References

3 in total

Optimizing Interactive Development of Data-Intensive Applications.

1. Enterprise Data Analysis and Visualization: An Interview Study.

2. BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark.

3. Titian: Data Provenance Support in Spark.