Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Building gold standard corpora for medical natural language processing tasks.

Literature DB >> 23304283

Building gold standard corpora for medical natural language processing tasks.

Louise Deleger¹, Qi Li, Todd Lingren, Megan Kaiser, Katalin Molnar, Laura Stoutenborough, Michal Kouril, Keith Marsolo, Imre Solti.

Abstract

We present the construction of three annotated corpora to serve as gold standards for medical natural language processing (NLP) tasks. Clinical notes from the medical record, clinical trial announcements, and FDA drug labels are annotated. We report high inter-annotator agreements (overall F-measures between 0.8467 and 0.9176) for the annotation of Personal Health Information (PHI) elements for a de-identification task and of medications, diseases/disorders, and signs/symptoms for information extraction (IE) task. The annotated corpora of clinical trials and FDA labels will be publicly released and to facilitate translational NLP tasks that require cross-corpora interoperability (e.g. clinical trial eligibility screening) their annotation schemas are aligned with a large scale, NIH-funded clinical text annotation project.

Entities: Disease

Mesh：

Year: 2012 PMID： 23304283 PMCID： PMC3540456

Source DB: PubMed Journal: AMIA Annu Symp Proc ISSN： 1559-4076

20 in total

Review 1. Measuring agreement in medical informatics reliability studies.

Authors: George Hripcsak; Daniel F Heitjan
Journal: J Biomed Inform Date: 2002-04 Impact factor: 6.317

2. Identification of patient name references within medical documents using semantic selectional restrictions.

Authors: Ricky K Taira; Alex A T Bui; Hooshang Kangarloo
Journal: Proc AMIA Symp Date: 2002

3. Extracting temporal constraints from clinical research eligibility criteria using conditional random fields.

Authors: Zhihui Luo; Stephen B Johnson; Albert M Lai; Chunhua Weng
Journal: AMIA Annu Symp Proc Date: 2011-10-22

4. A practical method for transforming free-text eligibility criteria into computable criteria.

Authors: Samson W Tu; Mor Peleg; Simona Carini; Michael Bobak; Jessica Ross; Daniel Rubin; Ida Sim
Journal: J Biomed Inform Date: 2010-09-17 Impact factor: 6.317

5. Agreement, the f-measure, and reliability in information retrieval.

Authors: George Hripcsak; Adam S Rothschild
Journal: J Am Med Inform Assoc Date: 2005-01-31 Impact factor: 4.497

6. Inductive creation of an annotation schema for manually indexing clinical conditions from emergency department reports.

Authors: Wendy W Chapman; John N Dowling
Journal: J Biomed Inform Date: 2005-08-22 Impact factor: 6.317

7. Community annotation experiment for ground truth generation for the i2b2 medication challenge.

Authors: Ozlem Uzuner; Imre Solti; Fei Xia; Eithon Cadag
Journal: J Am Med Inform Assoc Date: 2010 Sep-Oct Impact factor: 4.497

8. The MITRE Identification Scrubber Toolkit: design, training, and assessment.

Authors: John Aberdeen; Samuel Bayer; Reyyan Yeniterzi; Ben Wellner; Cheryl Clark; David Hanauer; Bradley Malin; Lynette Hirschman
Journal: Int J Med Inform Date: 2010-10-14 Impact factor: 4.046

9. Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes?

Authors: Frances P Morrison; Li Li; Albert M Lai; George Hripcsak
Journal: J Am Med Inform Assoc Date: 2008-10-24 Impact factor: 4.497

10. Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease.

Authors: Brett R South; Shuying Shen; Makoto Jones; Jennifer Garvin; Matthew H Samore; Wendy W Chapman; Adi V Gundlapalli
Journal: BMC Bioinformatics Date: 2009-09-17 Impact factor: 3.169

22 in total

10. A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records.

Authors: Emily Wheater; Grant Mair; Cathie Sudlow; Beatrice Alex; Claire Grover; William Whiteley
Journal: BMC Med Inform Decis Mak Date: 2019-09-09 Impact factor: 3.298

Building gold standard corpora for medical natural language processing tasks.

Review 1. Measuring agreement in medical informatics reliability studies.

2. Identification of patient name references within medical documents using semantic selectional restrictions.

3. Extracting temporal constraints from clinical research eligibility criteria using conditional random fields.

4. A practical method for transforming free-text eligibility criteria into computable criteria.

5. Agreement, the f-measure, and reliability in information retrieval.

6. Inductive creation of an annotation schema for manually indexing clinical conditions from emergency department reports.

7. Community annotation experiment for ground truth generation for the i2b2 medication challenge.

8. The MITRE Identification Scrubber Toolkit: design, training, and assessment.

9. Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes?

10. Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease.

1. LabeledIn: cataloging labeled indications for human drugs.

2. Automating the Capture of Structured Pathology Data for Prostate Cancer Clinical Care and Research.

3. Automated detection of medication administration errors in neonatal intensive care.

4. Trustworthy assertion classification through prompting.

5. Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research.

6. Data-driven method to enhance craniofacial and oral phenotype vocabularies.

7. Using Natural Language Processing to Measure and Improve Quality of Diabetes Care: A Systematic Review.

8. Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing.

9. Mining FDA drug labels for medical conditions.

10. A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records.