Literature DB >> 27077137

Document Sublanguage Clustering to Detect Medical Specialty in Cross-institutional Clinical Texts.

Kristina Doing-Harris1, Olga Patterson2, Sean Igo1, John Hurdle3.   

Abstract

This paper reports on a set of studies designed to identify sublanguages in documents for domain-specific processing across institutions. Psychological evidence indicates that humans use context-specific linguistic information when they read. Natural Language Processing (NLP) pipelines are successful within specific domains (i.e., contexts). To limit the number of domain-specific NLP systems, a natural focus would be on sublanguages. Sublanguages are identified by shared lexical and semantic features.[1] Patterson and Hurdle[2] developed a sublanguage identification system that functioned well for 12 clinical specialties at the University of Utah. The current work compares sublanguages across institutions. Using a clinical NLP pipeline augmented by a new document corpus from the University of Pittsburg (UPitt), new documents were assigned to clusters based on the minimum cosine-distance to a Utah cluster centroid. The UPitt documents were divided into a nine-group specialty corpus. Across institutions, five of the specialty groups fell within the expected clusters. We find that clustering encounters difficulty due to documents with mixed sublanguages; naming convention differences across institutions; and document types used across specialties. The findings indicate that clinical specialty sublanguages can be identified across institutions.

Entities:  

Keywords:  Medical informatics applications; cognitive science; natural language processing

Year:  2013        PMID: 27077137      PMCID: PMC4827341          DOI: 10.1145/2512089.2512101

Source DB:  PubMed          Journal:  Proc ACM Int Workshop Data Text Min Biomed Inform


  11 in total

1.  Comparing syntactic complexity in medical and non-medical corpora.

Authors:  D A Campbell; S B Johnson
Journal:  Proc AMIA Symp       Date:  2001

2.  The structure of science information.

Authors:  Zellig S Harris
Journal:  J Biomed Inform       Date:  2002-08       Impact factor: 6.317

Review 3.  Two biomedical sublanguages: a description based on the theories of Zellig Harris.

Authors:  Carol Friedman; Pauline Kra; Andrey Rzhetsky
Journal:  J Biomed Inform       Date:  2002-08       Impact factor: 6.317

4.  Document clustering of clinical narratives: a systematic study of clinical sublanguages.

Authors:  Olga Patterson; John F Hurdle
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

5.  Using domain knowledge and domain-inspired discourse model for coreference resolution for clinical narratives.

Authors:  Prateek Jindal; Dan Roth
Journal:  J Am Med Inform Assoc       Date:  2012-07-10       Impact factor: 4.497

6.  An overview of MetaMap: historical perspective and recent advances.

Authors:  Alan R Aronson; François-Michel Lang
Journal:  J Am Med Inform Assoc       Date:  2010 May-Jun       Impact factor: 4.497

Review 7.  Data clustering in life sciences.

Authors:  Ying Zhao; George Karypis
Journal:  Mol Biotechnol       Date:  2005-09       Impact factor: 2.695

8.  Automatic acquisition of sublanguage semantic schema: towards the word sense disambiguation of clinical narratives.

Authors:  Olga Patterson; Sean Igo; John F Hurdle
Journal:  AMIA Annu Symp Proc       Date:  2010-11-13

9.  The problem oriented record as a basic tool in medical education, patient care and clinical research.

Authors:  L L Weed
Journal:  Ann Clin Res       Date:  1971-06

Review 10.  Linking genes to literature: text mining, information extraction, and retrieval applications for biology.

Authors:  Martin Krallinger; Alfonso Valencia; Lynette Hirschman
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

View more
  4 in total

1.  Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable.

Authors:  Jianlin Shi; John F Hurdle
Journal:  J Biomed Inform       Date:  2018-08-06       Impact factor: 6.317

2.  Ensembles of natural language processing systems for portable phenotyping solutions.

Authors:  Cong Liu; Casey N Ta; James R Rogers; Ziran Li; Junghwan Lee; Alex M Butler; Ning Shang; Fabricio Sampaio Peres Kury; Liwei Wang; Feichen Shen; Hongfang Liu; Lyudmila Ena; Carol Friedman; Chunhua Weng
Journal:  J Biomed Inform       Date:  2019-10-23       Impact factor: 6.317

3.  Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.

Authors:  Wei-Hung Weng; Kavishwar B Wagholikar; Alexa T McCray; Peter Szolovits; Henry C Chueh
Journal:  BMC Med Inform Decis Mak       Date:  2017-12-01       Impact factor: 2.796

4.  Collecting specialty-related medical terms: Development and evaluation of a resource for Spanish.

Authors:  Pilar López-Úbeda; Alexandra Pomares-Quimbaya; Manuel Carlos Díaz-Galiano; Stefan Schulz
Journal:  BMC Med Inform Decis Mak       Date:  2021-05-04       Impact factor: 2.796

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.