E Hope Weissler1, Jikai Zhang2, Steven Lippmann3, Shelley Rusincovitch4, Ricardo Henao2,4, W Schuyler Jones3,5. 1. Division of Vascular and Endovascular Surgery (E.H.W.), Duke University School of Medicine, Durham, NC. 2. Department of Biostatistics and Bioinformatics (J.Z., R.H.), Duke University School of Medicine, Durham, NC. 3. Department of Population Health Sciences (S.L., W.S.J.), Duke University School of Medicine, Durham, NC. 4. Duke Forge (S.R., R.H.), Duke University School of Medicine, Durham, NC. 5. Division of Cardiology (W.S.J.), Duke University School of Medicine, Durham, NC.
Abstract
BACKGROUND: Peripheral artery disease (PAD) is underrecognized, undertreated, and understudied: each of these endeavors requires efficient and accurate identification of patients with PAD. Currently, PAD patient identification relies on diagnosis/procedure codes or lists of patients diagnosed or treated by specific providers in specific locations and ways. The goal of this research was to leverage natural language processing to more accurately identify patients with PAD in an electronic health record system compared with a structured data-based approach. METHODS: The clinical notes from a cohort of 6861 patients in our health system whose PAD status had previously been adjudicated were used to train, test, and validate a natural language processing model using 10-fold cross-validation. The performance of this model was described using the area under the receiver operating characteristic and average precision curves; its performance was quantitatively compared with an administrative data-based least absolute shrinkage and selection operator (LASSO) approach using the DeLong test. RESULTS: The median (SD) of the area under the receiver operating characteristic curve for the natural language processing model was 0.888 (0.009) versus 0.801 (0.017) for the LASSO-based approach alone (DeLong P<0.0001). The median (SD) of the area under the precision curve was 0.909 (0.008) versus 0.816 (0.012) for the structured data-based approach. When sensitivity was set at 90%, the precision for LASSO was 65% and the machine learning approach was 74%, while the specificity for LASSO was 41% and for the machine learning approach was 62%. CONCLUSIONS: Using a natural language processing approach in addition to partial cohort preprocessing with a LASSO-based model, we were able to meaningfully improve our ability to identify patients with PAD compared with an approach using structured data alone. This model has potential applications to both interventions targeted at improving patient care as well as efficient, large-scale PAD research. Graphic Abstract: A graphic abstract is available for this article.
BACKGROUND:Peripheral artery disease (PAD) is underrecognized, undertreated, and understudied: each of these endeavors requires efficient and accurate identification of patients with PAD. Currently, PAD patient identification relies on diagnosis/procedure codes or lists of patients diagnosed or treated by specific providers in specific locations and ways. The goal of this research was to leverage natural language processing to more accurately identify patients with PAD in an electronic health record system compared with a structured data-based approach. METHODS: The clinical notes from a cohort of 6861 patients in our health system whose PAD status had previously been adjudicated were used to train, test, and validate a natural language processing model using 10-fold cross-validation. The performance of this model was described using the area under the receiver operating characteristic and average precision curves; its performance was quantitatively compared with an administrative data-based least absolute shrinkage and selection operator (LASSO) approach using the DeLong test. RESULTS: The median (SD) of the area under the receiver operating characteristic curve for the natural language processing model was 0.888 (0.009) versus 0.801 (0.017) for the LASSO-based approach alone (DeLong P<0.0001). The median (SD) of the area under the precision curve was 0.909 (0.008) versus 0.816 (0.012) for the structured data-based approach. When sensitivity was set at 90%, the precision for LASSO was 65% and the machine learning approach was 74%, while the specificity for LASSO was 41% and for the machine learning approach was 62%. CONCLUSIONS: Using a natural language processing approach in addition to partial cohort preprocessing with a LASSO-based model, we were able to meaningfully improve our ability to identify patients with PAD compared with an approach using structured data alone. This model has potential applications to both interventions targeted at improving patient care as well as efficient, large-scale PAD research. Graphic Abstract: A graphic abstract is available for this article.
Entities:
Keywords:
cohort studies; electronic health records; machine learning; natural language processing; peripheral artery disease
Authors: Guergana K Savova; Jin Fan; Zi Ye; Sean P Murphy; Jiaping Zheng; Christopher G Chute; Iftikhar J Kullo Journal: AMIA Annu Symp Proc Date: 2010-11-13
Authors: Li Li; Wei-Yi Cheng; Benjamin S Glicksberg; Omri Gottesman; Ronald Tamler; Rong Chen; Erwin P Bottinger; Joel T Dudley Journal: Sci Transl Med Date: 2015-10-28 Impact factor: 17.956
Authors: Beata Fonferko-Shadrach; Arron S Lacey; Angus Roberts; Ashley Akbari; Simon Thompson; David V Ford; Ronan A Lyons; Mark I Rees; William Owen Pickrell Journal: BMJ Open Date: 2019-04-01 Impact factor: 2.692
Authors: F Gerry R Fowkes; Victor Aboyans; Freya J I Fowkes; Mary M McDermott; Uchechukwu K A Sampson; Michael H Criqui Journal: Nat Rev Cardiol Date: 2016-11-17 Impact factor: 32.419
Authors: Jin Fan; Adelaide M Arruda-Olson; Cynthia L Leibson; Carin Smith; Guanghui Liu; Kent R Bailey; Iftikhar J Kullo Journal: J Am Med Inform Assoc Date: 2013-10-28 Impact factor: 4.497
Authors: Hongfang Liu; Suzette J Bielinski; Sunghwan Sohn; Sean Murphy; Kavishwar B Wagholikar; Siddhartha R Jonnalagadda; K E Ravikumar; Stephen T Wu; Iftikhar J Kullo; Christopher G Chute Journal: AMIA Jt Summits Transl Sci Proc Date: 2013-03-18
Authors: Feyisope R Eweje; Suzie Byun; Rajat Chandra; Fengling Hu; Ihab Kamel; Paul Zhang; Zhicheng Jiao; Harrison X Bai Journal: JAMA Netw Open Date: 2022-01-04
Authors: Ben Li; Tiam Feridooni; Cesar Cuen-Ojeda; Teruko Kishibe; Charles de Mestral; Muhammad Mamdani; Mohammed Al-Omran Journal: NPJ Digit Med Date: 2022-01-19