Literature DB >> 23907286

Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences.

Jung-wei Fan1, Elly W Yang, Min Jiang, Rashmi Prasad, Richard M Loomis, Daniel S Zisook, Josh C Denny, Hua Xu, Yang Huang.   

Abstract

OBJECTIVE: To develop, evaluate, and share: (1) syntactic parsing guidelines for clinical text, with a new approach to handling ill-formed sentences; and (2) a clinical Treebank annotated according to the guidelines. To document the process and findings for readers with similar interest.
METHODS: Using random samples from a shared natural language processing challenge dataset, we developed a handbook of domain-customized syntactic parsing guidelines based on iterative annotation and adjudication between two institutions. Special considerations were incorporated into the guidelines for handling ill-formed sentences, which are common in clinical text. Intra- and inter-annotator agreement rates were used to evaluate consistency in following the guidelines. Quantitative and qualitative properties of the annotated Treebank, as well as its use to retrain a statistical parser, were reported.
RESULTS: A supplement to the Penn Treebank II guidelines was developed for annotating clinical sentences. After three iterations of annotation and adjudication on 450 sentences, the annotators reached an F-measure agreement rate of 0.930 (while intra-annotator rate was 0.948) on a final independent set. A total of 1100 sentences from progress notes were annotated that demonstrated domain-specific linguistic features. A statistical parser retrained with combined general English (mainly news text) annotations and our annotations achieved an accuracy of 0.811 (higher than models trained purely with either general or clinical sentences alone). Both the guidelines and syntactic annotations are made available at https://sourceforge.net/projects/medicaltreebank.
CONCLUSIONS: We developed guidelines for parsing clinical text and annotated a corpus accordingly. The high intra- and inter-annotator agreement rates showed decent consistency in following the guidelines. The corpus was shown to be useful in retraining a statistical parser that achieved moderate accuracy.

Keywords:  annotation guidelines; corpus development; natural language processing; syntactic parsing

Mesh:

Year:  2013        PMID: 23907286      PMCID: PMC3822122          DOI: 10.1136/amiajnl-2013-001810

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  22 in total

1.  MEDSYNDIKATE--a natural language system for the extraction of medical information from findings reports.

Authors:  Udo Hahn; Martin Romacker; Stefan Schulz
Journal:  Int J Med Inform       Date:  2002-12-04       Impact factor: 4.046

Review 2.  Two biomedical sublanguages: a description based on the theories of Zellig Harris.

Authors:  Carol Friedman; Pauline Kra; Andrey Rzhetsky
Journal:  J Biomed Inform       Date:  2002-08       Impact factor: 6.317

3.  A system for coreference resolution for the clinical narrative.

Authors:  Jiaping Zheng; Wendy W Chapman; Timothy A Miller; Chen Lin; Rebecca S Crowley; Guergana K Savova
Journal:  J Am Med Inform Assoc       Date:  2012-01-31       Impact factor: 4.497

4.  Agreement, the f-measure, and reliability in information retrieval.

Authors:  George Hripcsak; Adam S Rothschild
Journal:  J Am Med Inform Assoc       Date:  2005-01-31       Impact factor: 4.497

5.  Deriving a probabilistic syntacto-semantic grammar for biomedicine based on domain-specific terminologies.

Authors:  Jung-Wei Fan; Carol Friedman
Journal:  J Biomed Inform       Date:  2011-04-28       Impact factor: 6.317

Review 6.  Natural language processing: an introduction.

Authors:  Prakash M Nadkarni; Lucila Ohno-Machado; Wendy W Chapman
Journal:  J Am Med Inform Assoc       Date:  2011 Sep-Oct       Impact factor: 4.497

Review 7.  Natural language processing in medicine: an overview.

Authors:  P Spyns
Journal:  Methods Inf Med       Date:  1996-12       Impact factor: 2.176

8.  A general natural-language text processor for clinical radiology.

Authors:  C Friedman; P O Alderson; J H Austin; J J Cimino; S B Johnson
Journal:  J Am Med Inform Assoc       Date:  1994 Mar-Apr       Impact factor: 4.497

9.  Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: the SHARPn project.

Authors:  Susan Rea; Jyotishman Pathak; Guergana Savova; Thomas A Oniki; Les Westberg; Calvin E Beebe; Cui Tao; Craig G Parker; Peter J Haug; Stanley M Huff; Christopher G Chute
Journal:  J Biomed Inform       Date:  2012-02-04       Impact factor: 6.317

10.  Towards comprehensive syntactic and semantic annotations of the clinical narrative.

Authors:  Daniel Albright; Arrick Lanfranchi; Anwen Fredriksen; William F Styler; Colin Warner; Jena D Hwang; Jinho D Choi; Dmitriy Dligach; Rodney D Nielsen; James Martin; Wayne Ward; Martha Palmer; Guergana K Savova
Journal:  J Am Med Inform Assoc       Date:  2013-01-25       Impact factor: 4.497

View more
  11 in total

1.  Semantic Role Labeling of Clinical Text: Comparing Syntactic Parsers and Features.

Authors:  Yaoyun Zhang; Min Jiang; Jingqi Wang; Hua Xu
Journal:  AMIA Annu Symp Proc       Date:  2017-02-10

2.  Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks.

Authors:  Özlem Uzuner; Amber Stubbs
Journal:  J Biomed Inform       Date:  2015-10-24       Impact factor: 6.317

3.  Parsing clinical text using the state-of-the-art deep learning based parsers: a systematic comparison.

Authors:  Yaoyun Zhang; Firat Tiryaki; Min Jiang; Hua Xu
Journal:  BMC Med Inform Decis Mak       Date:  2019-04-04       Impact factor: 2.796

Review 4.  Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis.

Authors:  S Velupillai; D Mowery; B R South; M Kvist; H Dalianis
Journal:  Yearb Med Inform       Date:  2015-08-13

5.  Creation of a new longitudinal corpus of clinical narratives.

Authors:  Vishesh Kumar; Amber Stubbs; Stanley Shaw; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2015-10-01       Impact factor: 6.317

6.  Radiology Text Analysis System (RadText): Architecture and Evaluation.

Authors:  Song Wang; Mingquan Lin; Ying Ding; George Shih; Zhiyong Lu; Yifan Peng
Journal:  IEEE Int Conf Healthc Inform       Date:  2022-09-08

7.  Parsing clinical text: how good are the state-of-the-art parsers?

Authors:  Min Jiang; Yang Huang; Jung-wei Fan; Buzhou Tang; Josh Denny; Hua Xu
Journal:  BMC Med Inform Decis Mak       Date:  2015-05-20       Impact factor: 2.796

8.  CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines.

Authors:  Ergin Soysal; Jingqi Wang; Min Jiang; Yonghui Wu; Serguei Pakhomov; Hongfang Liu; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2018-03-01       Impact factor: 4.497

9.  Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine.

Authors:  Tingting Zhang; Yaqiang Wang; Xiaofeng Wang; Yafei Yang; Ying Ye
Journal:  BMC Med Inform Decis Mak       Date:  2020-04-06       Impact factor: 2.796

10.  Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus.

Authors:  Aleksandar Savkov; John Carroll; Rob Koeling; Jackie Cassell
Journal:  Lang Resour Eval       Date:  2016-01-11       Impact factor: 1.358

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.