Literature DB >> 26705503

ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering.

Xiang Ren1, Ahmed El-Kishky1, Chi Wang2, Fangbo Tao1, Clare R Voss3, Heng Ji4, Jiawei Han1.   

Abstract

Entity recognition is an important but challenging research problem. In reality, many text collections are from specific, dynamic, or emerging domains, which poses significant new challenges for entity recognition with increase in name ambiguity and context sparsity, requiring entity detection without domain restriction. In this paper, we investigate entity recognition (ER) with distant-supervision and propose a novel relation phrase-based ER framework, called ClusType, that runs data-driven phrase mining to generate entity mention candidates and relation phrases, and enforces the principle that relation phrases should be softly clustered when propagating type information between their argument entities. Then we predict the type of each entity mention based on the type signatures of its co-occurring relation phrases and the type indicators of its surface name, as computed over the corpus. Specifically, we formulate a joint optimization problem for two tasks, type propagation with relation phrases and multi-view relation phrase clustering. Our experiments on multiple genres-news, Yelp reviews and tweets-demonstrate the effectiveness and robustness of ClusType, with an average of 37% improvement in F1 score over the best compared method.

Entities:  

Keywords:  Entity Recognition and Typing; Relation Phrase Clustering

Year:  2015        PMID: 26705503      PMCID: PMC4688017          DOI: 10.1145/2783258.2783362

Source DB:  PubMed          Journal:  KDD        ISSN: 2154-817X


  3 in total

1.  FacetGist: Collective Extraction of Document Facets in Large Technical Corpora.

Authors:  Tarique Siddiqui; Xiang Ren; Aditya Parameswaran; Jiawei Han
Journal:  Proc ACM Int Conf Inf Knowl Manag       Date:  2016-10

2.  Conspiracy in the time of corona: automatic detection of emerging COVID-19 conspiracy theories in social media and the news.

Authors:  Shadi Shahsavari; Pavan Holur; Tianyi Wang; Timothy R Tangherlini; Vwani Roychowdhury
Journal:  J Comput Soc Sci       Date:  2020-10-28

3.  A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters.

Authors:  Wei Liu; Bo Chuen Chung; Rui Wang; Jonathon Ng; Nigel Morlet
Journal:  Health Inf Sci Syst       Date:  2015-12-09
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.