| Literature DB >> 27144082 |
Jaeho Shin1, Christopher Ré1, Michael Cafarella2.
Abstract
End-to-end knowledge base construction systems using statistical inference are enabling more people to automatically extract high-quality domain-specific information from unstructured data. As a result of deploying DeepDive framework across several domains, we found new challenges in debugging and improving such end-to-end systems to construct high-quality knowledge bases. DeepDive has an iterative development cycle in which users improve the data. To help our users, we needed to develop principles for analyzing the system's error as well as provide tooling for inspecting and labeling various data products of the system. We created guidelines for error analysis modeled after our colleagues' best practices, in which data labeling plays a critical role in every step of the analysis. To enable more productive and systematic data labeling, we created Mindtagger, a versatile tool that can be configured to support a wide range of tasks. In this demonstration, we show in detail what data labeling tasks are modeled in our error analysis guidelines and how each of them is performed using Mindtagger.Entities:
Year: 2015 PMID: 27144082 PMCID: PMC4852148 DOI: 10.14778/2824032.2824101
Source DB: PubMed Journal: Proceedings VLDB Endowment ISSN: 2150-8097
Figure 1Iterative development cycle of a knowledge base construction system built with DeepDive and error analysis steps supported by Mindtagger.
Figure 2Screenshots of Mindtagger operating in three different modes.