Literature DB >> 28316365

Extracting Databases from Dark Data with DeepDive.

Ce Zhang1, Jaeho Shin1, Christopher Ré1, Michael Cafarella2, Feng Niu3.   

Abstract

DeepDive is a system for extracting relational databases from dark data: the mass of text, tables, and images that are widely collected and stored but which cannot be exploited by standard relational tools. If the information in dark data - scientific papers, Web classified ads, customer service notes, and so on - were instead in a relational database, it would give analysts a massive and valuable new set of "big data." DeepDive is distinctive when compared to previous information extraction systems in its ability to obtain very high precision and recall at reasonable engineering cost; in a number of applications, we have used DeepDive to create databases with accuracy that meets that of human annotators. To date we have successfully deployed DeepDive to create data-centric applications for insurance, materials science, genomics, paleontologists, law enforcement, and others. The data unlocked by DeepDive represents a massive opportunity for industry, government, and scientific researchers. DeepDive is enabled by an unusual design that combines large-scale probabilistic inference with a novel developer interaction cycle. This design is enabled by several core innovations around probabilistic training and inference.

Entities:  

Year:  2016        PMID: 28316365      PMCID: PMC5350112          DOI: 10.1145/2882903.2904442

Source DB:  PubMed          Journal:  Proc ACM SIGMOD Int Conf Manag Data        ISSN: 0730-8078


  3 in total

1.  A machine reading system for assembling synthetic paleontological databases.

Authors:  Shanan E Peters; Ce Zhang; Miron Livny; Christopher Ré
Journal:  PLoS One       Date:  2014-12-01       Impact factor: 3.240

2.  Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction.

Authors:  Jaeho Shin; Christopher Ré; Michael Cafarella
Journal:  Proceedings VLDB Endowment       Date:  2015-08

3.  Large-scale extraction of gene interactions from full-text literature using DeepDive.

Authors:  Emily K Mallory; Ce Zhang; Christopher Ré; Russ B Altman
Journal:  Bioinformatics       Date:  2015-09-03       Impact factor: 6.937

  3 in total
  1 in total

1.  Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale.

Authors:  Stephen H Bach; Daniel Rodriguez; Yintao Liu; Chong Luo; Haidong Shao; Cassandra Xia; Souvik Sen; Alex Ratner; Braden Hancock; Houman Alborzi; Rahul Kuchhal; Chris Ré; Rob Malkin
Journal:  Proc ACM SIGMOD Int Conf Manag Data       Date:  2019 Jun-Jul
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.