Literature DB >> 29937618

Fonduer: Knowledge Base Construction from Richly Formatted Data.

Sen Wu1,2, Luke Hsiao1,2, Xiao Cheng1,2, Braden Hancock1,2, Theodoros Rekatsinas2, Philip Levis1,2, Christopher Ré1,2.   

Abstract

We focus on knowledge base construction (KBC) from richly formatted data. In contrast to KBC from text or tabular data, KBC from richly formatted data aims to extract relations conveyed jointly via textual, structural, tabular, and visual expressions. We introduce Fonduer, a machine-learning-based KBC system for richly formatted data. Fonduer presents a new data model that accounts for three challenging characteristics of richly formatted data: (1) prevalent document-level relations, (2) multimodality, and (3) data variety. Fonduer uses a new deep-learning model to automatically capture the representation (i.e., features) needed to learn how to extract relations from richly formatted data. Finally, Fonduer provides a new programming model that enables users to convert domain expertise, based on multiple modalities of information, to meaningful signals of supervision for training a KBC system. Fonduer-based KBC systems are in production for a range of use cases, including at a major online retailer. We compare Fonduer against state-of-the-art KBC approaches in four different domains. We show that Fonduer achieves an average improvement of 41 F1 points on the quality of the output knowledge base-and in some cases produces up to 1.87× the number of correct entries-compared to expert-curated public knowledge bases. We also conduct a user study to assess the usability of Fonduer's new programming model. We show that after using Fonduer for only 30 minutes, non-domain experts are able to design KBC systems that achieve on average 23 F1 points higher quality than traditional machine-learning-based KBC approaches.

Entities:  

Year:  2018        PMID: 29937618      PMCID: PMC6013301          DOI: 10.1145/3183713.3183729

Source DB:  PubMed          Journal:  Proc ACM SIGMOD Int Conf Manag Data        ISSN: 0730-8078


  9 in total

1.  PharmGKB: the Pharmacogenetics Knowledge Base.

Authors:  Micheal Hewett; Diane E Oliver; Daniel L Rubin; Katrina L Easton; Joshua M Stuart; Russ B Altman; Teri E Klein
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

2.  A novel connectionist system for unconstrained handwriting recognition.

Authors:  Alex Graves; Marcus Liwicki; Santiago Fernández; Roman Bertolami; Horst Bunke; Jürgen Schmidhuber
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2009-05       Impact factor: 6.226

3.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

Review 4.  Deep learning.

Authors:  Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal:  Nature       Date:  2015-05-28       Impact factor: 49.962

5.  Data Programming: Creating Large Training Sets, Quickly.

Authors:  Alexander Ratner; Christopher De Sa; Sen Wu; Daniel Selsam; Christopher Ré
Journal:  Adv Neural Inf Process Syst       Date:  2016-12

6.  Snorkel: Rapid Training Data Creation with Weak Supervision.

Authors:  Alexander Ratner; Stephen H Bach; Henry Ehrenberg; Jason Fries; Sen Wu; Christopher Ré
Journal:  Proceedings VLDB Endowment       Date:  2017-11

7.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.

Authors:  Danielle Welter; Jacqueline MacArthur; Joannella Morales; Tony Burdett; Peggy Hall; Heather Junkins; Alan Klemm; Paul Flicek; Teri Manolio; Lucia Hindorff; Helen Parkinson
Journal:  Nucleic Acids Res       Date:  2013-12-06       Impact factor: 16.971

8.  Incremental Knowledge Base Construction Using DeepDive.

Authors:  Jaeho Shin; Sen Wu; Feiran Wang; Christopher De Sa; Ce Zhang; Christopher Ré
Journal:  Proceedings VLDB Endowment       Date:  2015-07

9.  GWAS Central: a comprehensive resource for the comparison and interrogation of genome-wide association studies.

Authors:  Tim Beck; Robert K Hastings; Sirisha Gollapudi; Robert C Free; Anthony J Brookes
Journal:  Eur J Hum Genet       Date:  2013-12-04       Impact factor: 4.246

  9 in total
  1 in total

1.  Snuba: Automating Weak Supervision to Label Training Data.

Authors:  Paroma Varma; Christopher Ré
Journal:  Proceedings VLDB Endowment       Date:  2018-11
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.