Literature DB >> 28405637

Optimizing Interactive Development of Data-Intensive Applications.

Matteo Interlandi1, Sai Deep Tetali1, Muhammad Ali Gulzar1, Joseph Noor1, Tyson Condie1, Miryung Kim1, Todd Millstein1.   

Abstract

Modern Data-Intensive Scalable Computing (DISC) systems are designed to process data through batch jobs that execute programs (e.g., queries) compiled from a high-level language. These programs are often developed interactively by posing ad-hoc queries over the base data until a desired result is generated. We observe that there can be significant overlap in the structure of these queries used to derive the final program. Yet, each successive execution of a slightly modified query is performed anew, which can significantly increase the development cycle. Vega is an Apache Spark framework that we have implemented for optimizing a series of similar Spark programs, likely originating from a development or exploratory data analysis session. Spark developers (e.g., data scientists) can leverage Vega to significantly reduce the amount of time it takes to re-execute a modified Spark program, reducing the overall time to market for their Big Data applications.

Entities:  

Keywords:  Big Data; H.2.4 [Information Systems]: Database Management—query processing; Incremental Evaluation; Interactive Development; Languages; Performance; Query Rewriting; Spark; Theory; parallel databases

Year:  2016        PMID: 28405637      PMCID: PMC5386325          DOI: 10.1145/2987550.2987565

Source DB:  PubMed          Journal:  Proc ACM Symp Cloud Comput


  3 in total

1.  Enterprise Data Analysis and Visualization: An Interview Study.

Authors:  S Kandel; A Paepcke; J M Hellerstein; J Heer
Journal:  IEEE Trans Vis Comput Graph       Date:  2012-12       Impact factor: 4.579

2.  BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark.

Authors:  Muhammad Ali Gulzar; Matteo Interlandi; Seunghyun Yoo; Sai Deep Tetali; Tyson Condie; Todd Millstein; Miryung Kim
Journal:  Proc Int Conf Softw Eng       Date:  2016-05

3.  Titian: Data Provenance Support in Spark.

Authors:  Matteo Interlandi; Kshitij Shah; Sai Deep Tetali; Muhammad Ali Gulzar; Seunghyun Yoo; Miryung Kim; Todd Millstein; Tyson Condie
Journal:  Proceedings VLDB Endowment       Date:  2015-11
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.