Literature DB >> 28210517

FacetGist: Collective Extraction of Document Facets in Large Technical Corpora.

Tarique Siddiqui1, Xiang Ren1, Aditya Parameswaran1, Jiawei Han1.   

Abstract

Given the large volume of technical documents available, it is crucial to automatically organize and categorize these documents to be able to understand and extract value from them. Towards this end, we introduce a new research problem called Facet Extraction. Given a collection of technical documents, the goal of Facet Extraction is to automatically label each document with a set of concepts for the key facets (e.g., application, technique, evaluation metrics, and dataset) that people may be interested in. Facet Extraction has numerous applications, including document summarization, literature search, patent search and business intelligence. The major challenge in performing Facet Extraction arises from multiple sources: concept extraction, concept to facet matching, and facet disambiguation. To tackle these challenges, we develop FacetGist, a framework for facet extraction. Facet Extraction involves constructing a graph-based heterogeneous network to capture information available across multiple local sentence-level features, as well as global context features. We then formulate a joint optimization problem, and propose an efficient algorithm for graph-based label propagation to estimate the facet of each concept mention. Experimental results on technical corpora from two domains demonstrate that Facet Extraction can lead to an improvement of over 25% in both precision and recall over competing schemes.

Entities:  

Year:  2016        PMID: 28210517      PMCID: PMC5308212          DOI: 10.1145/2983323.2983828

Source DB:  PubMed          Journal:  Proc ACM Int Conf Inf Knowl Manag        ISSN: 2155-0751


  2 in total

1.  ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering.

Authors:  Xiang Ren; Ahmed El-Kishky; Chi Wang; Fangbo Tao; Clare R Voss; Heng Ji; Jiawei Han
Journal:  KDD       Date:  2015-08

2.  Mining Quality Phrases from Massive Text Corpora.

Authors:  Jialu Liu; Jingbo Shang; Chi Wang; Xiang Ren; Jiawei Han
Journal:  Proc ACM SIGMOD Int Conf Manag Data       Date:  2015 May-Jun
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.