Literature DB >> 30855564

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications.

Dibakar Sigdel1, Vincent Kyi1, Aiden Zhang2, Shaun P Setty3, David A Liem4, Yu Shi5, Xuan Wang5, Jiaming Shen5, Wei Wang6, JiaWei Han5, Peipei Ping7.   

Abstract

The rapid accumulation of biomedical textual data has far exceeded the human capacity of manual curation and analysis, necessitating novel text-mining tools to extract biological insights from large volumes of scientific reports. The Context-aware Semantic Online Analytical Processing (CaseOLAP) pipeline, developed in 2016, successfully quantifies user-defined phrase-category relationships through the analysis of textual data. CaseOLAP has many biomedical applications. We have developed a protocol for a cloud-based environment supporting the end-to-end phrase-mining and analyses platform. Our protocol includes data preprocessing (e.g., downloading, extraction, and parsing text documents), indexing and searching with Elasticsearch, creating a functional document structure called Text-Cube, and quantifying phrase-category relationships using the core CaseOLAP algorithm. Our data preprocessing generates key-value mappings for all documents involved. The preprocessed data is indexed to carry out a search of documents including entities, which further facilitates the Text-Cube creation and CaseOLAP score calculation. The obtained raw CaseOLAP scores are interpreted using a series of integrative analyses, including dimensionality reduction, clustering, temporal, and geographical analyses. Additionally, the CaseOLAP scores are used to create a graphical database, which enables semantic mapping of the documents. CaseOLAP defines phrase-category relationships in an accurate (identifies relationships), consistent (highly reproducible), and efficient manner (processes 100,000 words/sec). Following this protocol, users can access a cloud-computing environment to support their own configurations and applications of CaseOLAP. This platform offers enhanced accessibility and empowers the biomedical community with phrase-mining tools for widespread biomedical research applications.

Entities:  

Mesh:

Year:  2019        PMID: 30855564      PMCID: PMC7075490          DOI: 10.3791/59108

Source DB:  PubMed          Journal:  J Vis Exp        ISSN: 1940-087X            Impact factor:   1.355


  3 in total

1.  Automated Phrase Mining from Massive Text Corpora.

Authors:  Jingbo Shang; Jialu Liu; Meng Jiang; Xiang Ren; Clare R Voss; Jiawei Han
Journal:  IEEE Trans Knowl Data Eng       Date:  2018-03-05       Impact factor: 6.977

2.  Phrase mining of textual data to analyze extracellular matrix protein patterns across cardiovascular disease.

Authors:  David A Liem; Sanjana Murali; Dibakar Sigdel; Yu Shi; Xuan Wang; Jiaming Shen; Howard Choi; John H Caufield; Wei Wang; Peipei Ping; JiaWei Han
Journal:  Am J Physiol Heart Circ Physiol       Date:  2018-05-18       Impact factor: 4.733

3.  Mining Quality Phrases from Massive Text Corpora.

Authors:  Jialu Liu; Jingbo Shang; Chi Wang; Xiang Ren; Jiawei Han
Journal:  Proc ACM SIGMOD Int Conf Manag Data       Date:  2015 May-Jun
  3 in total
  1 in total

1.  Treatment of Cancer Gene Changes in Chronic Myeloid Leukemia by Big Data Analysis Platform-Based Dasatinib.

Authors:  Linlin Song; Qi Li; Hui Shi; Pengxia Zhang
Journal:  Appl Bionics Biomech       Date:  2022-06-10       Impact factor: 1.664

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.