Literature DB >> 23304283

Building gold standard corpora for medical natural language processing tasks.

Louise Deleger1, Qi Li, Todd Lingren, Megan Kaiser, Katalin Molnar, Laura Stoutenborough, Michal Kouril, Keith Marsolo, Imre Solti.   

Abstract

We present the construction of three annotated corpora to serve as gold standards for medical natural language processing (NLP) tasks. Clinical notes from the medical record, clinical trial announcements, and FDA drug labels are annotated. We report high inter-annotator agreements (overall F-measures between 0.8467 and 0.9176) for the annotation of Personal Health Information (PHI) elements for a de-identification task and of medications, diseases/disorders, and signs/symptoms for information extraction (IE) task. The annotated corpora of clinical trials and FDA labels will be publicly released and to facilitate translational NLP tasks that require cross-corpora interoperability (e.g. clinical trial eligibility screening) their annotation schemas are aligned with a large scale, NIH-funded clinical text annotation project.

Entities:  

Mesh:

Year:  2012        PMID: 23304283      PMCID: PMC3540456     

Source DB:  PubMed          Journal:  AMIA Annu Symp Proc        ISSN: 1559-4076


  20 in total

Review 1.  Measuring agreement in medical informatics reliability studies.

Authors:  George Hripcsak; Daniel F Heitjan
Journal:  J Biomed Inform       Date:  2002-04       Impact factor: 6.317

2.  Identification of patient name references within medical documents using semantic selectional restrictions.

Authors:  Ricky K Taira; Alex A T Bui; Hooshang Kangarloo
Journal:  Proc AMIA Symp       Date:  2002

3.  Extracting temporal constraints from clinical research eligibility criteria using conditional random fields.

Authors:  Zhihui Luo; Stephen B Johnson; Albert M Lai; Chunhua Weng
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

4.  A practical method for transforming free-text eligibility criteria into computable criteria.

Authors:  Samson W Tu; Mor Peleg; Simona Carini; Michael Bobak; Jessica Ross; Daniel Rubin; Ida Sim
Journal:  J Biomed Inform       Date:  2010-09-17       Impact factor: 6.317

5.  Agreement, the f-measure, and reliability in information retrieval.

Authors:  George Hripcsak; Adam S Rothschild
Journal:  J Am Med Inform Assoc       Date:  2005-01-31       Impact factor: 4.497

6.  Inductive creation of an annotation schema for manually indexing clinical conditions from emergency department reports.

Authors:  Wendy W Chapman; John N Dowling
Journal:  J Biomed Inform       Date:  2005-08-22       Impact factor: 6.317

7.  Community annotation experiment for ground truth generation for the i2b2 medication challenge.

Authors:  Ozlem Uzuner; Imre Solti; Fei Xia; Eithon Cadag
Journal:  J Am Med Inform Assoc       Date:  2010 Sep-Oct       Impact factor: 4.497

8.  The MITRE Identification Scrubber Toolkit: design, training, and assessment.

Authors:  John Aberdeen; Samuel Bayer; Reyyan Yeniterzi; Ben Wellner; Cheryl Clark; David Hanauer; Bradley Malin; Lynette Hirschman
Journal:  Int J Med Inform       Date:  2010-10-14       Impact factor: 4.046

9.  Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes?

Authors:  Frances P Morrison; Li Li; Albert M Lai; George Hripcsak
Journal:  J Am Med Inform Assoc       Date:  2008-10-24       Impact factor: 4.497

10.  Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease.

Authors:  Brett R South; Shuying Shen; Makoto Jones; Jennifer Garvin; Matthew H Samore; Wendy W Chapman; Adi V Gundlapalli
Journal:  BMC Bioinformatics       Date:  2009-09-17       Impact factor: 3.169

View more
  22 in total

1.  LabeledIn: cataloging labeled indications for human drugs.

Authors:  Ritu Khare; Jiao Li; Zhiyong Lu
Journal:  J Biomed Inform       Date:  2014-08-23       Impact factor: 6.317

2.  Automating the Capture of Structured Pathology Data for Prostate Cancer Clinical Care and Research.

Authors:  Anobel Y Odisho; Mark Bridge; Mitchell Webb; Niloufar Ameli; Renu S Eapen; Frank Stauf; Janet E Cowan; Samuel L Washington; Annika Herlemann; Peter R Carroll; Matthew R Cooperberg
Journal:  JCO Clin Cancer Inform       Date:  2019-07

3.  Automated detection of medication administration errors in neonatal intensive care.

Authors:  Qi Li; Eric S Kirkendall; Eric S Hall; Yizhao Ni; Todd Lingren; Megan Kaiser; Nataline Lingren; Haijun Zhai; Imre Solti; Kristin Melton
Journal:  J Biomed Inform       Date:  2015-07-17       Impact factor: 6.317

4.  Trustworthy assertion classification through prompting.

Authors:  Song Wang; Liyan Tang; Akash Majety; Justin F Rousseau; George Shih; Ying Ding; Yifan Peng
Journal:  J Biomed Inform       Date:  2022-07-08       Impact factor: 8.000

5.  Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research.

Authors:  Todd Lingren; Yizhao Ni; Louise Deleger; Megan Kaiser; Laura Stoutenborough; Keith Marsolo; Michal Kouril; Katalin Molnar; Imre Solti
Journal:  J Biomed Inform       Date:  2014-02-17       Impact factor: 6.317

6.  Data-driven method to enhance craniofacial and oral phenotype vocabularies.

Authors:  Rashmi Mishra; Andrea Burke; Bonnie Gitman; Payal Verma; Mark Engelstad; Melissa A Haendel; Ilias Alevizos; William A Gahl; Michael T Collins; Janice S Lee; Murat Sincan
Journal:  J Am Dent Assoc       Date:  2019-11       Impact factor: 3.634

7.  Using Natural Language Processing to Measure and Improve Quality of Diabetes Care: A Systematic Review.

Authors:  Alexander Turchin; Luisa F Florez Builes
Journal:  J Diabetes Sci Technol       Date:  2021-03-19

8.  Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing.

Authors:  Haijun Zhai; Todd Lingren; Louise Deleger; Qi Li; Megan Kaiser; Laura Stoutenborough; Imre Solti
Journal:  J Med Internet Res       Date:  2013-04-02       Impact factor: 5.428

9.  Mining FDA drug labels for medical conditions.

Authors:  Qi Li; Louise Deleger; Todd Lingren; Haijun Zhai; Megan Kaiser; Laura Stoutenborough; Anil G Jegga; Kevin Bretonnel Cohen; Imre Solti
Journal:  BMC Med Inform Decis Mak       Date:  2013-04-24       Impact factor: 2.796

10.  A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records.

Authors:  Emily Wheater; Grant Mair; Cathie Sudlow; Beatrice Alex; Claire Grover; William Whiteley
Journal:  BMC Med Inform Decis Mak       Date:  2019-09-09       Impact factor: 3.298

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.