Yvonne Sada1, Jason Hou, Peter Richardson, Hashem El-Serag, Jessica Davila. 1. *Michael E. DeBakey Veterans Administration Medical Center and Baylor College of Medicine †Health Services Research and Development Section Departments of ‡Oncology §Gastroenterology, Baylor College of Medicine, Houston, TX.
Abstract
BACKGROUND: Accurate identification of hepatocellular cancer (HCC) cases from automated data is needed for efficient and valid quality improvement initiatives and research. We validated HCC International Classification of Diseases, 9th Revision (ICD-9) codes, and evaluated whether natural language processing by the Automated Retrieval Console (ARC) for document classification improves HCC identification. METHODS: We identified a cohort of patients with ICD-9 codes for HCC during 2005-2010 from Veterans Affairs administrative data. Pathology and radiology reports were reviewed to confirm HCC. The positive predictive value (PPV), sensitivity, and specificity of ICD-9 codes were calculated. A split validation study of pathology and radiology reports was performed to develop and validate ARC algorithms. Reports were manually classified as diagnostic of HCC or not. ARC generated document classification algorithms using the Clinical Text Analysis and Knowledge Extraction System. ARC performance was compared with manual classification. PPV, sensitivity, and specificity of ARC were calculated. RESULTS: A total of 1138 patients with HCC were identified by ICD-9 codes. On the basis of manual review, 773 had HCC. The HCC ICD-9 code algorithm had a PPV of 0.67, sensitivity of 0.95, and specificity of 0.93. For a random subset of 619 patients, we identified 471 pathology reports for 323 patients and 943 radiology reports for 557 patients. The pathology ARC algorithm had PPV of 0.96, sensitivity of 0.96, and specificity of 0.97. The radiology ARC algorithm had PPV of 0.75, sensitivity of 0.94, and specificity of 0.68. CONCLUSIONS: A combined approach of ICD-9 codes and natural language processing of pathology and radiology reports improves HCC case identification in automated data.
BACKGROUND: Accurate identification of hepatocellular cancer (HCC) cases from automated data is needed for efficient and valid quality improvement initiatives and research. We validated HCC International Classification of Diseases, 9th Revision (ICD-9) codes, and evaluated whether natural language processing by the Automated Retrieval Console (ARC) for document classification improves HCC identification. METHODS: We identified a cohort of patients with ICD-9 codes for HCC during 2005-2010 from Veterans Affairs administrative data. Pathology and radiology reports were reviewed to confirm HCC. The positive predictive value (PPV), sensitivity, and specificity of ICD-9 codes were calculated. A split validation study of pathology and radiology reports was performed to develop and validate ARC algorithms. Reports were manually classified as diagnostic of HCC or not. ARC generated document classification algorithms using the Clinical Text Analysis and Knowledge Extraction System. ARC performance was compared with manual classification. PPV, sensitivity, and specificity of ARC were calculated. RESULTS: A total of 1138 patients with HCC were identified by ICD-9 codes. On the basis of manual review, 773 had HCC. The HCC ICD-9 code algorithm had a PPV of 0.67, sensitivity of 0.95, and specificity of 0.93. For a random subset of 619 patients, we identified 471 pathology reports for 323 patients and 943 radiology reports for 557 patients. The pathology ARC algorithm had PPV of 0.96, sensitivity of 0.96, and specificity of 0.97. The radiology ARC algorithm had PPV of 0.75, sensitivity of 0.94, and specificity of 0.68. CONCLUSIONS: A combined approach of ICD-9 codes and natural language processing of pathology and radiology reports improves HCC case identification in automated data.
Authors: Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute Journal: J Am Med Inform Assoc Date: 2010 Sep-Oct Impact factor: 4.497
Authors: Wildon R Farwell; Leonard W D'Avolio; Richard E Scranton; Elizabeth V Lawler; J Michael Gaziano Journal: J Natl Cancer Inst Date: 2011-04-15 Impact factor: 13.506
Authors: Brian Shiner; Leonard W D'Avolio; Thien M Nguyen; Maha H Zayed; Bradley V Watts; Louis Fiore Journal: J Eval Clin Pract Date: 2011-02-14 Impact factor: 2.431
Authors: Jessica A Davila; Louise Henderson; Jennifer R Kramer; Fasiha Kanwal; Peter A Richardson; Zhigang Duan; Hashem B El-Serag Journal: Ann Intern Med Date: 2011-01-18 Impact factor: 25.391
Authors: Timothy D Imler; Justin Morea; Charles Kahi; Eric A Sherer; Jon Cardwell; Cynthia S Johnson; Huiping Xu; Dennis Ahnen; Fadi Antaki; Christopher Ashley; Gyorgy Baffy; Ilseung Cho; Jason Dominitz; Jason Hou; Mark Korsten; Anil Nagar; Kittichai Promrat; Douglas Robertson; Sameer Saini; Amandeep Shergill; Walter Smalley; Thomas F Imperiale Journal: Am J Gastroenterol Date: 2015-03-10 Impact factor: 10.864
Authors: Hannah L Weeks; Cole Beck; Elizabeth McNeer; Michael L Williams; Cosmin A Bejan; Joshua C Denny; Leena Choi Journal: J Am Med Inform Assoc Date: 2020-03-01 Impact factor: 4.497
Authors: Rich Colbaugh; Kristin Glass; Christopher Rudolf; Mike Tremblay Volv Global Lausanne Switzerland Journal: AMIA Annu Symp Proc Date: 2018-12-05
Authors: Michelle R Ananda-Rajah; Christoph Bergmeir; François Petitjean; Monica A Slavin; Karin A Thursky; Geoffrey I Webb Journal: JCO Clin Cancer Inform Date: 2017-11
Authors: Joeky T Senders; Aditya V Karhade; David J Cote; Alireza Mehrtash; Nayan Lamba; Aislyn DiRisio; Ivo S Muskens; William B Gormley; Timothy R Smith; Marike L D Broekman; Omar Arnaout Journal: JCO Clin Cancer Inform Date: 2019-04