Literature DB >> 26407642

Hidden Markov model using Dirichlet process for de-identification.

Tao Chen1, Richard M Cullen2, Marshall Godwin3.   

Abstract

For the 2014 i2b2/UTHealth de-identification challenge, we introduced a new non-parametric Bayesian hidden Markov model using a Dirichlet process (HMM-DP). The model intends to reduce task-specific feature engineering and to generalize well to new data. In the challenge we developed a variational method to learn the model and an efficient approximation algorithm for prediction. To accommodate out-of-vocabulary words, we designed a number of feature functions to model such words. The results show the model is capable of understanding local context cues to make correct predictions without manual feature engineering and performs as accurately as state-of-the-art conditional random field models in a number of categories. To incorporate long-range and cross-document context cues, we developed a skip-chain conditional random field model to align the results produced by HMM-DP, which further improved the performance.
Copyright © 2015 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  De-identification; Dirichlet process; Hidden Markov model; Natural language processing; Variational method

Mesh:

Year:  2015        PMID: 26407642      PMCID: PMC4984397          DOI: 10.1016/j.jbi.2015.09.004

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  4 in total

1.  Finding scientific topics.

Authors:  Thomas L Griffiths; Mark Steyvers
Journal:  Proc Natl Acad Sci U S A       Date:  2004-02-10       Impact factor: 11.205

2.  State-of-the-art anonymization of medical records using an iterative machine learning framework.

Authors:  György Szarvas; Richárd Farkas; Róbert Busa-Fekete
Journal:  J Am Med Inform Assoc       Date:  2007 Sep-Oct       Impact factor: 4.497

3.  Evaluating the state-of-the-art in automatic de-identification.

Authors:  Ozlem Uzuner; Yuan Luo; Peter Szolovits
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

4.  Large-scale evaluation of automated clinical note de-identification and its impact on information extraction.

Authors:  Louise Deleger; Katalin Molnar; Guergana Savova; Fei Xia; Todd Lingren; Qi Li; Keith Marsolo; Anil Jegga; Megan Kaiser; Laura Stoutenborough; Imre Solti
Journal:  J Am Med Inform Assoc       Date:  2012-08-02       Impact factor: 4.497

  4 in total
  6 in total

1.  Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation.

Authors:  Hee-Jin Lee; Yaoyun Zhang; Kirk Roberts; Hua Xu
Journal:  AMIA Annu Symp Proc       Date:  2018-04-16

2.  Automatic prediction of coronary artery disease from clinical narratives.

Authors:  Kevin Buchan; Michele Filannino; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2017-06-27       Impact factor: 6.317

3.  Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks.

Authors:  Özlem Uzuner; Amber Stubbs
Journal:  J Biomed Inform       Date:  2015-10-24       Impact factor: 6.317

4.  A hybrid approach to automatic de-identification of psychiatric notes.

Authors:  Hee-Jin Lee; Yonghui Wu; Yaoyun Zhang; Jun Xu; Hua Xu; Kirk Roberts
Journal:  J Biomed Inform       Date:  2017-06-07       Impact factor: 6.317

5.  De-identification of clinical notes via recurrent neural network and conditional random field.

Authors:  Zengjian Liu; Buzhou Tang; Xiaolong Wang; Qingcai Chen
Journal:  J Biomed Inform       Date:  2017-06-01       Impact factor: 6.317

6.  Transferability of neural network clinical deidentification systems.

Authors:  Kahyun Lee; Nicholas J Dobbins; Bridget McInnes; Meliha Yetisgen; Özlem Uzuner
Journal:  J Am Med Inform Assoc       Date:  2021-11-25       Impact factor: 7.942

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.