Literature DB >> 31259040

Cost-sensitive Active Learning for Phenotyping of Electronic Health Records.

Zongcheng Ji1, Qiang Wei1, Amy Franklin1, Trevor Cohen2, Hua Xu1.   

Abstract

Developing high-throughput and high-performance phenotyping algorithms is critical to the secondary use of electronic health records for clinical research. Supervised machine learning-based methods have shown good performance, but often require large annotated datasets that are costly to build. Simulation studies have shown that active learning (AL) could reduce the number of annotated samples while improving the model performance when assuming that the time of labeling each sample is the same (i.e., cost-insensitive). In this study, we proposed a cost- sensitive AL (CostAL) algorithm for clinical phenotyping, using the identification of breast cancer patients as a use case. CostAL implements a linear regression model to estimate the actual time required for annotating each individual sample. We recruited two annotators to manual review medical records of 766 potential breast cancer patients and recorded the actual time of annotating each sample. We then compared CostAL, AL, and passive learning (PL, aka random sampling) using this annotated dataset and generated learning curves for each method. Our experimental results showed that CostAL achieved the highest area under the curve (AUC) score among the three algorithms (PL, AL, and CostAL are 0.784, 0.8501, and 0.8673 for user 1 and 0.8006, 0.8806 and 0.9006 for user 2). To achieve an accuracy of 0.94, AL and CostAL could save 36% and 60% annotation time for user 1 and 53% and 70% annotation time for user 2, when they were compared with PL, indicating the value of cost-sensitive AL approaches.

Entities:  

Year:  2019        PMID: 31259040      PMCID: PMC6568101     

Source DB:  PubMed          Journal:  AMIA Jt Summits Transl Sci Proc


  1 in total

1.  Active deep learning for the identification of concepts and relations in electroencephalography reports.

Authors:  Ramon Maldonado; Sanda M Harabagiu
Journal:  J Biomed Inform       Date:  2019-08-27       Impact factor: 6.317

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.