Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Cost-sensitive Active Learning for Phenotyping of Electronic Health Records.

Literature DB >> 31259040

Cost-sensitive Active Learning for Phenotyping of Electronic Health Records.

Zongcheng Ji¹, Qiang Wei¹, Amy Franklin¹, Trevor Cohen², Hua Xu¹.

Abstract

Developing high-throughput and high-performance phenotyping algorithms is critical to the secondary use of electronic health records for clinical research. Supervised machine learning-based methods have shown good performance, but often require large annotated datasets that are costly to build. Simulation studies have shown that active learning (AL) could reduce the number of annotated samples while improving the model performance when assuming that the time of labeling each sample is the same (i.e., cost-insensitive). In this study, we proposed a cost- sensitive AL (CostAL) algorithm for clinical phenotyping, using the identification of breast cancer patients as a use case. CostAL implements a linear regression model to estimate the actual time required for annotating each individual sample. We recruited two annotators to manual review medical records of 766 potential breast cancer patients and recorded the actual time of annotating each sample. We then compared CostAL, AL, and passive learning (PL, aka random sampling) using this annotated dataset and generated learning curves for each method. Our experimental results showed that CostAL achieved the highest area under the curve (AUC) score among the three algorithms (PL, AL, and CostAL are 0.784, 0.8501, and 0.8673 for user 1 and 0.8006, 0.8806 and 0.9006 for user 2). To achieve an accuracy of 0.94, AL and CostAL could save 36% and 60% annotation time for user 1 and 53% and 70% annotation time for user 2, when they were compared with PL, indicating the value of cost-sensitive AL approaches.

Entities: Disease Species

Year: 2019 PMID： 31259040 PMCID： PMC6568101

Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc

Keyword Cloud
Cited

1 in total

1. Active deep learning for the identification of concepts and relations in electroencephalography reports.

Authors: Ramon Maldonado; Sanda M Harabagiu
Journal: J Biomed Inform Date: 2019-08-27 Impact factor: 6.317

1 in total