Literature DB >> 29399671

Big Data Provenance: Challenges, State of the Art and Opportunities.

Jianwu Wang1, Daniel Crawl2, Shweta Purawat2, Mai Nguyen2, Ilkay Altintas2.   

Abstract

Ability to track provenance is a key feature of scientific workflows to support data lineage and reproducibility. The challenges that are introduced by the volume, variety and velocity of Big Data, also pose related challenges for provenance and quality of Big Data, defined as veracity. The increasing size and variety of distributed Big Data provenance information bring new technical challenges and opportunities throughout the provenance lifecycle including recording, querying, sharing and utilization. This paper discusses the challenges and opportunities of Big Data provenance related to the veracity of the datasets themselves and the provenance of the analytical processes that analyze these datasets. It also explains our current efforts towards tracking and utilizing Big Data provenance using workflows as a programming model to analyze Big Data.

Entities:  

Keywords:  Big Data; distributed data-parallel programming models; provenance; workflows

Year:  2015        PMID: 29399671      PMCID: PMC5796788          DOI: 10.1109/BigData.2015.7364047

Source DB:  PubMed          Journal:  Proc IEEE Int Conf Big Data


  3 in total

Review 1.  Computational solutions to large-scale data management and analysis.

Authors:  Eric E Schadt; Michael D Linderman; Jon Sorenson; Lawrence Lee; Garry P Nolan
Journal:  Nat Rev Genet       Date:  2010-09       Impact factor: 53.242

2.  Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource.

Authors:  Shulei Sun; Jing Chen; Weizhong Li; Ilkay Altintas; Abel Lin; Steve Peltier; Karen Stocks; Eric E Allen; Mark Ellisman; Jeffrey Grethe; John Wooley
Journal:  Nucleic Acids Res       Date:  2010-11-02       Impact factor: 16.971

3.  Cloudgene: a graphical execution platform for MapReduce programs on private and public clouds.

Authors:  Sebastian Schönherr; Lukas Forer; Hansi Weißensteiner; Florian Kronenberg; Günther Specht; Anita Kloss-Brandstätter
Journal:  BMC Bioinformatics       Date:  2012-08-13       Impact factor: 3.169

  3 in total
  4 in total

1.  A demonstration of modularity, reuse, reproducibility, portability and scalability for modeling and simulation of cardiac electrophysiology using Kepler Workflows.

Authors:  Pei-Chi Yang; Shweta Purawat; Pek U Ieong; Mao-Tsuen Jeng; Kevin R DeMarco; Igor Vorobyov; Andrew D McCulloch; Ilkay Altintas; Rommie E Amaro; Colleen E Clancy
Journal:  PLoS Comput Biol       Date:  2019-03-08       Impact factor: 4.475

2.  Big social data provenance framework for Zero-Information Loss Key-Value Pair (KVP) Database.

Authors:  Asma Rani; Navneet Goyal; Shashi K Gadia
Journal:  Int J Data Sci Anal       Date:  2021-11-09

3.  Responsible Governance for a Food and Nutrition E-Infrastructure: Case Study of the Determinants and Intake Data Platform.

Authors:  Lada Timotijevic; Indira Carr; Javier De La Cueva; Tome Eftimov; Charo E Hodgkins; Barbara Koroušić Seljak; Bent E Mikkelsen; Trond Selnes; Pieter Van't Veer; Karin Zimmermann
Journal:  Front Nutr       Date:  2022-03-23

4.  Lightweight Distributed Provenance Model for Complex Real-world Environments.

Authors:  Rudolf Wittner; Cecilia Mascia; Matej Gallo; Francesca Frexia; Heimo Müller; Markus Plass; Jörg Geiger; Petr Holub
Journal:  Sci Data       Date:  2022-08-17       Impact factor: 8.501

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.