Literature DB >> 21619603

Exploring subdomain variation in biomedical language.

Thomas Lippincott1, Diarmuid Ó Séaghdha, Anna Korhonen.   

Abstract

BACKGROUND: Applications of Natural Language Processing (NLP) technology to biomedical texts have generated significant interest in recent years. In this paper we identify and investigate the phenomenon of linguistic subdomain variation within the biomedical domain, i.e., the extent to which different subject areas of biomedicine are characterised by different linguistic behaviour. While variation at a coarser domain level such as between newswire and biomedical text is well-studied and known to affect the portability of NLP systems, we are the first to conduct an extensive investigation into more fine-grained levels of variation.
RESULTS: Using the large OpenPMC text corpus, which spans the many subdomains of biomedicine, we investigate variation across a number of lexical, syntactic, semantic and discourse-related dimensions. These dimensions are chosen for their relevance to the performance of NLP systems. We use clustering techniques to analyse commonalities and distinctions among the subdomains.
CONCLUSIONS: We find that while patterns of inter-subdomain variation differ somewhat from one feature set to another, robust clusters can be identified that correspond to intuitive distinctions such as that between clinical and laboratory subjects. In particular, subdomains relating to genetics and molecular biology, which are the most common sources of material for training and evaluating biomedical NLP tools, are not representative of all biomedical subdomains. We conclude that an awareness of subdomain variation is important when considering the practical use of language processing applications by biomedical researchers.

Entities:  

Mesh:

Year:  2011        PMID: 21619603      PMCID: PMC3118171          DOI: 10.1186/1471-2105-12-212

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  13 in total

1.  Analysis of symbolic sequences using the Jensen-Shannon divergence.

Authors:  Ivo Grosse; Pedro Bernaola-Galván; Pedro Carpena; Ramón Román-Roldán; Jose Oliver; H Eugene Stanley
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2002-03-25

Review 2.  Two biomedical sublanguages: a description based on the theories of Zellig Harris.

Authors:  Carol Friedman; Pauline Kra; Andrey Rzhetsky
Journal:  J Biomed Inform       Date:  2002-08       Impact factor: 6.317

3.  GENIA corpus--semantically annotated corpus for bio-textmining.

Authors:  J-D Kim; T Ohta; Y Tateisi; J Tsujii
Journal:  Bioinformatics       Date:  2003       Impact factor: 6.937

4.  Data preparation and interannotator agreement: BioCreAtIvE task 1B.

Authors:  Marc E Colosimo; Alexander A Morgan; Alexander S Yeh; Jeffrey B Colombe; Lynette Hirschman
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

5.  BioInfer: a corpus for information extraction in the biomedical domain.

Authors:  Sampo Pyysalo; Filip Ginter; Juho Heimonen; Jari Björne; Jorma Boberg; Jouni Järvinen; Tapio Salakoski
Journal:  BMC Bioinformatics       Date:  2007-02-09       Impact factor: 3.169

6.  The textual characteristics of traditional and Open Access scientific journals are similar.

Authors:  Karin Verspoor; K Bretonnel Cohen; Lawrence Hunter
Journal:  BMC Bioinformatics       Date:  2009-06-15       Impact factor: 3.169

7.  Overview of BioCreative II gene normalization.

Authors:  Alexander A Morgan; Zhiyong Lu; Xinglong Wang; Aaron M Cohen; Juliane Fluck; Patrick Ruch; Anna Divoli; Katrin Fundel; Robert Leaman; Jörg Hakenberg; Chengjie Sun; Heng-hui Liu; Rafael Torres; Michael Krauthammer; William W Lau; Hongfang Liu; Chun-Nan Hsu; Martijn Schuemie; K Bretonnel Cohen; Lynette Hirschman
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

8.  Mining clinical relationships from patient narratives.

Authors:  Angus Roberts; Robert Gaizauskas; Mark Hepple; Yikun Guo
Journal:  BMC Bioinformatics       Date:  2008-11-19       Impact factor: 3.169

9.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes.

Authors:  Veronika Vincze; György Szarvas; Richárd Farkas; György Móra; János Csirik
Journal:  BMC Bioinformatics       Date:  2008-11-19       Impact factor: 3.169

10.  Nominalization and alternations in biomedical language.

Authors:  K Bretonnel Cohen; Martha Palmer; Lawrence Hunter
Journal:  PLoS One       Date:  2008-09-09       Impact factor: 3.240

View more
  11 in total

1.  Biomedical text mining for research rigor and integrity: tasks, challenges, directions.

Authors:  Halil Kilicoglu
Journal:  Brief Bioinform       Date:  2018-11-27       Impact factor: 11.622

2.  The effects of natural language processing on cross-institutional portability of influenza case detection for disease surveillance.

Authors:  Jeffrey P Ferraro; Ye Ye; Per H Gesteland; Peter J Haug; Fuchiang Rich Tsui; Gregory F Cooper; Rudy Van Bree; Thomas Ginter; Andrew J Nowalk; Michael Wagner
Journal:  Appl Clin Inform       Date:  2017-05-31       Impact factor: 2.342

3.  Cardioinformatics: the nexus of bioinformatics and precision cardiology.

Authors:  Bohdan B Khomtchouk; Diem-Trang Tran; Kasra A Vand; Matthew Might; Or Gozani; Themistocles L Assimes
Journal:  Brief Bioinform       Date:  2020-12-01       Impact factor: 11.622

4.  Sublanguage Corpus Analysis Toolkit: A tool for assessing the representativeness and sublanguage characteristics of corpora.

Authors:  Irina P Temnikova; William A Baumgartner; Negacy D Hailu; Ivelina Nikolova; Tony McEnery; Adam Kilgarriff; Galia Angelova; K Bretonnel Cohen
Journal:  LREC Int Conf Lang Resour Eval       Date:  2014-05

5.  BioCause: Annotating and analysing causality in the biomedical domain.

Authors:  Claudiu Mihăilă; Tomoko Ohta; Sampo Pyysalo; Sophia Ananiadou
Journal:  BMC Bioinformatics       Date:  2013-01-16       Impact factor: 3.169

6.  Semi-supervised learning of causal relations in biomedical scientific discourse.

Authors:  Claudiu Mihăilă; Sophia Ananiadou
Journal:  Biomed Eng Online       Date:  2014-12-11       Impact factor: 2.819

7.  FlexiTerm: a flexible term recognition method.

Authors:  Irena Spasić; Mark Greenwood; Alun Preece; Nick Francis; Glyn Elwyn
Journal:  J Biomed Semantics       Date:  2013-10-10

8.  Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.

Authors:  K Bretonnel Cohen; Arrick Lanfranchi; Miji Joo-Young Choi; Michael Bada; William A Baumgartner; Natalya Panteleyeva; Karin Verspoor; Martha Palmer; Lawrence E Hunter
Journal:  BMC Bioinformatics       Date:  2017-08-17       Impact factor: 3.169

9.  Redundancy-aware topic modeling for patient record notes.

Authors:  Raphael Cohen; Iddo Aviram; Michael Elhadad; Noémie Elhadad
Journal:  PLoS One       Date:  2014-02-13       Impact factor: 3.240

10.  Using text analysis to quantify the similarity and evolution of scientific disciplines.

Authors:  Laércio Dias; Martin Gerlach; Joachim Scharloth; Eduardo G Altmann
Journal:  R Soc Open Sci       Date:  2018-01-17       Impact factor: 2.963

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.