Literature DB >> 26705375

Mining Quality Phrases from Massive Text Corpora.

Jialu Liu1, Jingbo Shang1, Chi Wang2, Xiang Ren1, Jiawei Han1.   

Abstract

Text data are ubiquitous and play an essential role in big data applications. However, text data are mostly unstructured. Transforming unstructured text into structured units (e.g., semantically meaningful phrases) will substantially reduce semantic ambiguity and enhance the power and efficiency at manipulating such data using database technology. Thus mining quality phrases is a critical research problem in the field of databases. In this paper, we propose a new framework that extracts quality phrases from text corpora integrated with phrasal segmentation. The framework requires only limited training but the quality of phrases so generated is close to human judgment. Moreover, the method is scalable: both computation time and required space grow linearly as corpus size increases. Our experiments on large text corpora demonstrate the quality and efficiency of the new method.

Entities:  

Year:  2015        PMID: 26705375      PMCID: PMC4688018          DOI: 10.1145/2723372.2751523

Source DB:  PubMed          Journal:  Proc ACM SIGMOD Int Conf Manag Data        ISSN: 0730-8078


  9 in total

1.  FacetGist: Collective Extraction of Document Facets in Large Technical Corpora.

Authors:  Tarique Siddiqui; Xiang Ren; Aditya Parameswaran; Jiawei Han
Journal:  Proc ACM Int Conf Inf Knowl Manag       Date:  2016-10

2.  Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications.

Authors:  Dibakar Sigdel; Vincent Kyi; Aiden Zhang; Shaun P Setty; David A Liem; Yu Shi; Xuan Wang; Jiaming Shen; Wei Wang; JiaWei Han; Peipei Ping
Journal:  J Vis Exp       Date:  2019-02-23       Impact factor: 1.355

3.  Representing Documents via Latent Keyphrase Inference.

Authors:  Jialu Liu; Xiang Ren; Jingbo Shang; Taylor Cassidy; Clare R Voss; Jiawei Han
Journal:  Proc Int World Wide Web Conf       Date:  2016-04

4.  Unveiling Evolutionary Path of Nanogenerator Technology: A Novel Method Based on Sentence-BERT.

Authors:  Huailan Liu; Rui Zhang; Yufei Liu; Cunxiang He
Journal:  Nanomaterials (Basel)       Date:  2022-06-11       Impact factor: 5.719

5.  Phrase mining of textual data to analyze extracellular matrix protein patterns across cardiovascular disease.

Authors:  David A Liem; Sanjana Murali; Dibakar Sigdel; Yu Shi; Xuan Wang; Jiaming Shen; Howard Choi; John H Caufield; Wei Wang; Peipei Ping; JiaWei Han
Journal:  Am J Physiol Heart Circ Physiol       Date:  2018-05-18       Impact factor: 4.733

6.  Mining clinical phrases from nursing notes to discover risk factors of patient deterioration.

Authors:  Zfania Tom Korach; Jie Yang; Sarah Collins Rossetti; Kenrick D Cato; Min-Jeoung Kang; Christopher Knaplund; Kumiko O Schnock; Jose P Garcia; Haomiao Jia; Jessica M Schwartz; Li Zhou
Journal:  Int J Med Inform       Date:  2019-12-14       Impact factor: 4.046

7.  RedMed: Extending drug lexicons for social media applications.

Authors:  Adam Lavertu; Russ B Altman
Journal:  J Biomed Inform       Date:  2019-10-15       Impact factor: 6.317

8.  Automatic computer science domain multiple-choice questions generation based on informative sentences.

Authors:  Farah Maheen; Yazeed Yasin Ghadi; Muhammad Asif; Haseeb Ahmad; Shahbaz Ahmad; Fahad Alturise; Othman Asiry
Journal:  PeerJ Comput Sci       Date:  2022-08-16

9.  A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters.

Authors:  Wei Liu; Bo Chuen Chung; Rui Wang; Jonathon Ng; Nigel Morlet
Journal:  Health Inf Sci Syst       Date:  2015-12-09
  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.