Liang Chen1, Liting Song2, Yue Shao1, Dewei Li3, Keyue Ding4. 1. Department of Hepatobiliary Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, PR China. 2. Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Institute for Viral Hepatitis, Department of Infectious Diseases, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, PR China. 3. Department of Hepatobiliary Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, PR China. Electronic address: lidewei406@sina.com. 4. Medical Genetic Institute of Henan Province, Henan Provincial People's Hospital, Henan Key Laboratory of Genetic Diseases and Functional Genomics, Henan Provincial People's Hospital of Henan University, Zhengzhou, Henan Province, PR China. Electronic address: ding.keyue@gmail.com.
Abstract
AIMS: To develop a natural language processing (NLP)-based algorithm for extracting clinically useful information for patients with hepatocellular carcinoma (HCC) from Chinese electronic medical records (EMRs) and use these data for the assessment of HCC staging. MATERIALS AND METHODS: Clinical documents, including operation notes, radiology and pathology reports, of 92 HCC patients were collected from Chinese EMRs. We randomly grouped these patients into training (n = 60) and testing (n = 32) datasets. Rule-based and hybrid methods for extracting information were developed using the training set of manually-annotated operation notes. The method with better performance was used to process other documents. The performance of the algorithm was assessed via calculating the precision, recall and F-score for exact-boundary and partial-boundary matching strategies. The utility of clinically useful information for the HCC staging was assessed in comparison with that manually reviewed. RESULTS: For operation notes, the rule-based and hybrid methods had a precision, recall and F-score ≥80% when the exact-boundary and partial-boundary matching strategies were applied to the testing dataset. By using the rule-based method (which has better performance than the hybrid method), three other types of documents also obtained good performance. When the extracted clinically useful information was applied for the HCC staging, the concordance rate with the manual review was 75%. CONCLUSION: A NLP system was developed for clinical information extraction and HCC staging based on EMRs, and the results indicate that Chinese NLP has potential utility in clinical research.
AIMS: To develop a natural language processing (NLP)-based algorithm for extracting clinically useful information for patients with hepatocellular carcinoma (HCC) from Chinese electronic medical records (EMRs) and use these data for the assessment of HCC staging. MATERIALS AND METHODS: Clinical documents, including operation notes, radiology and pathology reports, of 92 HCC patients were collected from Chinese EMRs. We randomly grouped these patients into training (n = 60) and testing (n = 32) datasets. Rule-based and hybrid methods for extracting information were developed using the training set of manually-annotated operation notes. The method with better performance was used to process other documents. The performance of the algorithm was assessed via calculating the precision, recall and F-score for exact-boundary and partial-boundary matching strategies. The utility of clinically useful information for the HCC staging was assessed in comparison with that manually reviewed. RESULTS: For operation notes, the rule-based and hybrid methods had a precision, recall and F-score ≥80% when the exact-boundary and partial-boundary matching strategies were applied to the testing dataset. By using the rule-based method (which has better performance than the hybrid method), three other types of documents also obtained good performance. When the extracted clinically useful information was applied for the HCC staging, the concordance rate with the manual review was 75%. CONCLUSION: A NLP system was developed for clinical information extraction and HCC staging based on EMRs, and the results indicate that Chinese NLP has potential utility in clinical research.
Authors: Okechinyere J Achilonu; Elvira Singh; Gideon Nimako; René M J C Eijkemans; Eustasius Musenge Journal: Biomed Res Int Date: 2022-01-20 Impact factor: 3.411