Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 FacetGist: Collective Extraction of Document Facets in Large Technical Corpora.

Literature DB >> 28210517

FacetGist: Collective Extraction of Document Facets in Large Technical Corpora.

Tarique Siddiqui¹, Xiang Ren¹, Aditya Parameswaran¹, Jiawei Han¹.

Abstract

Given the large volume of technical documents available, it is crucial to automatically organize and categorize these documents to be able to understand and extract value from them. Towards this end, we introduce a new research problem called Facet Extraction. Given a collection of technical documents, the goal of Facet Extraction is to automatically label each document with a set of concepts for the key facets (e.g., application, technique, evaluation metrics, and dataset) that people may be interested in. Facet Extraction has numerous applications, including document summarization, literature search, patent search and business intelligence. The major challenge in performing Facet Extraction arises from multiple sources: concept extraction, concept to facet matching, and facet disambiguation. To tackle these challenges, we develop FacetGist, a framework for facet extraction. Facet Extraction involves constructing a graph-based heterogeneous network to capture information available across multiple local sentence-level features, as well as global context features. We then formulate a joint optimization problem, and propose an efficient algorithm for graph-based label propagation to estimate the facet of each concept mention. Experimental results on technical corpora from two domains demonstrate that Facet Extraction can lead to an improvement of over 25% in both precision and recall over competing schemes.

Entities: Chemical Disease Gene Species

Year: 2016 PMID： 28210517 PMCID： PMC5308212 DOI： 10.1145/2983323.2983828

Source DB: PubMed Journal: Proc ACM Int Conf Inf Knowl Manag ISSN： 2155-0751

Keyword Cloud
References

2 in total

1. ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering.

Authors: Xiang Ren; Ahmed El-Kishky; Chi Wang; Fangbo Tao; Clare R Voss; Heng Ji; Jiawei Han
Journal: KDD Date: 2015-08

2. Mining Quality Phrases from Massive Text Corpora.

Authors: Jialu Liu; Jingbo Shang; Chi Wang; Xiang Ren; Jiawei Han
Journal: Proc ACM SIGMOD Int Conf Manag Data Date: 2015 May-Jun

2 in total