Literature DB >> 28815100

Surveillance of Peripheral Arterial Disease Cases Using Natural Language Processing of Clinical Notes.

Naveed Afzal1, Sunghwan Sohn1, Christopher G Scott1, Hongfang Liu1, Iftikhar J Kullo2, Adelaide M Arruda-Olson2.   

Abstract

Peripheral arterial disease (PAD) is a chronic disease that affects millions of people worldwide and yet remains underdiagnosed and undertreated. Early detection is important, because PAD is strongly associated with an increased risk of mortality and morbidity. In this study, we built a PAD surveillance system using natural language processing (NLP) for early detection of PAD from narrative clinical notes. Our NLP algorithm had excellent positive predictive value (0.93) and identified 41% of PAD cases before the initial ankle-brachial index (ABI) test date while in 12% of cases the NLP algorithm detected PAD on the same date as the ABI (the gold standard for comparison). Hence, our system ascertains PAD patients in a timely and accurate manner. In conclusion, our PAD surveillance NLP algorithm has the potential for translation to clinical practice for use in reminding clinicians to order ABI tests in patients with suspected PAD and to reinforce the implementation of guideline recommended risk modification strategies in patients diagnosed with PAD.

Entities:  

Year:  2017        PMID: 28815100      PMCID: PMC5543345     

Source DB:  PubMed          Journal:  AMIA Jt Summits Transl Sci Proc


Introduction

Peripheral arterial disease (PAD) is a common disease that affects 8.5 million adults in the United States. [1] PAD patients are at high risk for adverse outcomes including death, myocardial infarction, stroke and limb amputation. [2,3] Adverse vascular events often lead to poor quality of life and may also contribute to the high rate of depression in these patients. [4,5] However, PAD patients are often underdiagnosed and undertreated. [2,6,7] Lack of physician and public awareness of PAD-associated risks for adverse outcomes likely contribute to this public health problem. [6] The total annual cost associated with vascular hospitalization of PAD patients in the United States in 2004 was estimated to be in excess of 21 billion dollars and this number will increase as the population ages. [8] Timely detection and prompt implementation of guideline-recommended therapies for risk modification in PAD may lead to reduction of the risk for adverse outcomes as well as reduced costs. Automated surveillance of clinical notes from electronic health records (EHRs) may promptly identify PAD cases. The main objective of disease surveillance is detection of individuals with disease. [9] Manual methods for disease surveillance are costly, time-consuming and inconsistent. [10] With a computerized approach for disease surveillance, cases are detected by applying case definitions or algorithmic approaches to clinical data. [9] EHRs have the potential to enhance surveillance efforts as they contain a rich variety of information that facilitates timely and efficient surveillance. Accordingly, we built a PAD surveillance system using natural language processing (NLP) for early detection of PAD symptoms from narrative clinical notes.

Background

PAD is confirmed by measuring the ankle-brachial index (ABI) but this method remains underutilized, [11] and PAD remains an under-diagnosed condition in the primary care setting. [12,13] Delayed diagnosis of PAD contributes to high rates of morbidity, limb amputation and death. [14] ABI is the ratio of blood pressure (BP) at the ankle to BP in the arm and is the gold standard for PAD diagnosis. [2] However, results of ABI testing may not be available at the point of care and consequently clinicians need to manually review clinical notes to seek information to support the diagnosis. Manual review of the medical record is labor intensive, time consuming and often impractical for busy clinicians evaluating patients with multiple complex health conditions. Previously NLP systems were successfully applied to clinical notes for case identification [15] such as for bipolar disorder, [16] binge eating disorder,[17] diabetes and celiac disease. [18] We previously used NLP to identify PAD cases from radiology notes [19] and also from narrative clinical notes. [20] However, NLP systems have been underused for disease surveillance. During the surveillance process, relevant data and information is collected and analyzed, to generate knowledge that may be promptly distributed to healthcare providers for appropriate action so that they can implement risk modification strategies, which may lower risks for adverse outcomes. [21] Prior surveillance studies focused on infectious diseases, birth defects, mental health issues, drug abuse and environmental exposures. [22] Most previous studies efforts to harness EHRs for population health surveillance have used ICD-9 codes to extract structured data elements. For example, in Italy, health authorities used international classification of diseases ninth revision (ICD-9) codes, death certificates and pathology reports to monitor the incidence of birth defects, a main reason for infant mortality in that region. [23] Another study used an algorithm based on structured and unstructured data (clinical notes using NLP) for potential surveillance of post-operative surgical complications. [24] To the best of our knowledge no prior study developed an NLP algorithm for PAD surveillance from narrative clinical notes. The goal of the present study was to build a PAD surveillance system using NLP for early detection of PAD from narrative clinical notes.

Methods

Study Setting and Population

This study took place at Mayo Clinic, Rochester Minnesota and used the resources of the Rochester Epidemiology Project (REP)[25] to compile a community-based PAD case-control cohort from Olmsted County. The institutional review boards of participating medical centers approved this study.

Gold Standard

All patients underwent ABI testing at the Mayo noninvasive vascular laboratory.[1] The ABI reports were in PDF format and were not part of clinical notes. Controls were patients with normal ABI. PAD cases were patients with abnormal ABI defined as ABI ⋚ 0.9 at rest or 1 minute after exercise or by the presence of poorly compressible arteries (ABI ⋛ 1.40 or ankle systolic blood pressure > 255 mmHg). [1] In addition to PAD status, the date of ABI testing was also recorded as an index date for all patients.

Study Design

The automated NLP algorithm was validated by comprehensive manual medical record review. Figure 1 shows the overall design of the study. All retrieved clinical notes for each patient were used to ascertain patient PAD status as an output.
Figure 1:

Study Design

PAD Status by NLP Algorithm

We identified a list of PAD-related terms to build the NLP algorithm prototype. For this purpose, an expert clinician manually reviewed clinical notes of 20 patients with PAD and 20 patients without PAD. The clinician highlighted word/phrases in each of clinical note used to determine PAD status. Examples of sentences abstracted from clinical notes that were used to identify PAD related terms are shown in Table 1.
Table 1:

Examples of best confirmation and exclusion keywords for PAD status

Words highlighted in grey are examples of the best keywords for confirmation of PAD. Examples of the best keywords for exclusion are underlined. These keywords were used to create a list of appropriate terms of PAD status (Table 3). These notes were excluded from subsequent analysis.
Table 3:

Keywords in the NLP algorithm for ascertainment of PAD status

Confirmation Key Words Disease LocationConfirmation Key Words DiagnosisExclusion Key Words
leg/legs; lower limbs/limb; lower extremities/extremity; Iliac/femoral/ tibial/popliteal artery/arteries; distal/ infrarenal/abdominal aorta/aorto-(bi)iliac/ aorto(bi)iliac/aorto(bi)-iliac; aorto-(bi)femoral; foot, toe, toes, shin; plantar, heel, ankle, interdigital; below/above knee, claudication/calf pain;ischemic ulcer/ulcers; ASO/Arteriosclerosis obliterans/arterial sclerosis obliterans/atherosclerotic disease; PAD/peripheral arterial disease /peripheral vascular disease/ peripheral arterial occlusive disease; arterial occlusive disease/ occlusion/ occluded; stenosis; NCV/non compressible vessels; NCA/non compressible arteries; PCV/poorly compressible vessels; stiff vessels/ arteries ischemia; positive ABI/ankle brachial index/vascular labs/ extremities study/arterial studies; revascularization/recanalization/bypass/angioplasty/PTA/sten ting/stent/graendarterectomy/endarterectomies; thrombectomy/thromboembolectomy/throm bosis/embolectomy/embolectomies.family history of, upper extremities / upper extremity; arm/arms, hand(s); brachial artery, axillary artery, radial artery, ulnar artery; carotid, innominate artery, subclavian artery; mesenteric artery; celiac artery; AAA/abdominal aortic aneurysm/abd aortic aneurysm; renal arteries/ artery; coronaries, coronary arteries/ artery/cerebrovascular-disease/arteries/artery; pseudoclaudication/ pseudoclaudicatory pain. Amputation; traumatic /trauma; sarcoma/osteoma; diabetic foot, hammer toe/ toes; vascular calcification; varicose veins; lower extremity/extremities edema/cellulitis/venous system; carotid artery disease/spinal ischemia.
The list of keywords was further refined by manual review of charts conducted by a board certified cardiologist during the interactive validation of this algorithm. A detailed description of this approach has been previously reported. [20] In our prior study, we split the cohort into training and testing datasets. The training dataset was used to interactively refine the PAD NLP algorithm with refinement of PAD-related keywords and rules. During this interactive refinement, we identified note types, note sections and service groups that were relevant for ascertainment of PAD. [20] Table 2 contains a list of included note types, note sections and service groups used in the present study. We retrieved clinical notes from the EHR of each patient that were created until ABI test date plus 21 days (time interval for a subsequent clinic visit to review test results).
Table 2:

Note types, note sections and service groups included in this study

Note TypesNote SectionsService Groups
ConsultImpression / Report / PlanPrimary Care
Subsequent VisitDiagnosisHospital Internal Medicine
Patient ProgressPrincipal/primary DiagnosisGeneral Medicine
SupervisorySecondary DiagnosesFamily Medicine
Limited ExamPast Medical/Surgical HistoryCritical Care
Specialty EvaluationOngoing CareUrgent Care
Multisystem EvaluationImmunizationsCardiology
InjectionKey Findings / Test ResultsVascular
Educational VisitPre-Procedure InformationPulmonary
Hospital Service TransferPost-Procedure InformationOncology
 Vital SignsNephrology
 Current MedicationsNeurology
 Revision HistoryPathology
 Special InstructionsGastroenterology
 Advance DirectivesVascular Wound Care
 Discharge ActivityVascular Surgery
 Final Pathology DiagnosisCardiac Surgery

NLP algorithm

The NLP algorithm had two main components: text-processing and patient classification. The text-processing component identified concepts in text that matched specified criteria while the patient classification component defined the PAD status on the basis of available evidence from clinical notes. The NLP algorithm used MedTagger,[26] an open source NLP pipeline which used the Apache unstructured information management architecture (UIMA) framework. The NLP algorithm used keywords described in Table 3 for patient classification. The following rules were used: Rules to define PAD cases: Any diagnostic keyword + any disease location keyword within two sentences of a same note Rules for non-PAD cases: If not satisfied the definition for PAD case OR If exclusion keywords were present in the clinical note Whenever the NLP algorithm classified a patient as PAD it also provides the note type, inception date and a part of clinical note +/- 2 sentences with the evidence used by the system to classify the patient.

Results

The dataset processed by the NLP algorithm consisted of 1569 patients (806 cases and 763 controls). The total number of clinical notes in dataset was 512,471 and on average each note had 386 words. The average age of patients was 71.2 years, 44% of were women and 90% were whites. Table 4 summarizes the results of the NLP algorithm performance presented as positive predictive value (PPV), sensitivity, negative predictive value (NPV) and specificity.
Table 4:

Performance of NLP algorithm for ascertainment of PAD status compared with the gold standard (ABI results)

PPV0.93
Sensitivity0.70
NPV0.80
Specificity0.95
We compared the temporal association between NLP algorithm inception date (the date on which NLP algorithm classified the patient as PAD) with the gold standard index date for each PAD patient. For true positive cases, the difference between the NLP algorithm inception date and the gold standard index date was measured in days. We categorized whether the NLP algorithm identified PAD “before”, “at” the same time or “after” the gold standard index date. We found that in 329 cases (41%) the NLP algorithm identified PAD cases before the gold standard index date while in 93 cases (12%) NLP algorithm index date and gold standard index date were the same and in 141 cases (18%) the NLP index date was after gold standard index date but within the 21 day-window (Figure 2).
Figure 2:

Temporality of NLP algorithm for PAD ascertainment

Discussion

The extensive use of EHRs holds great promise for population health surveillance strategies as the ability to rapidly extract information from EHRs may benefit individual health, healthcare delivery and the health of populations. [27] For an aging population with multiple coexisting chronic conditions, ascertainment of relevant characteristics at the point of care can be extremely time-consuming and challenging as data are buried within the EHR. [28] Clinically, the ABI test is used to confirm of PAD as recommended by clinical practice guidelines. [29] However, PAD remains an under-diagnosed and undertreated disease. [12,13] For the present study, we used the ABI test as the gold standard for comparison for development and validation of the NLP algorithm for PAD. We have previously applied the NLP-PAD algorithm in PAD patients who did not undergo ABI testing [30] and the rules derived from that process have also been incorporated in the final version of the algorithm used in the present study. The novel findings of the present study were that an NLP algorithm identified accurately PAD cases from clinical notes, with high positive predictive value, and prior to the date of gold standard diagnostic test in 41% of the cases from the community. There was a time interval between documentation of evidence of PAD (e.g. symptoms) in narrative clinical notes and PAD diagnosis by ABI test. Hence, our data clearly shows a delay in establishing PAD diagnosis despite presence of PAD symptoms. The accurate and timely assessment of PAD will 1) remind clinicians to order ABI testing for patients with suspected PAD and 2) implement standardized risk modification strategies in patients with diagnosed PAD. Early identification of PAD and subsequent implementation of risk modification strategies therapies are important for the management of these patients. The risk modification strategies recommended by clinical practice guidelines [31] include smoking cessation, and therapy with aspirin, statin medications, and angiotensin-converting enzyme inhibitors. Implementation of these recommendations is associated with significant reduction in adverse outcomes in PAD patients. [31] However, despite the evidence, patients with PAD continue to receive suboptimal risk modification strategies. [6, 31] This NLP algorithm may be incorporated to EHRs to identify both patients with previously diagnosed PAD and patients with PAD symptoms (i.e. suspected PAD). The PAD surveillance NLP algorithm could be linked to clinical decision support (CDS) to remind clinicians to order the diagnostic test (ABI). This may result in diagnosis of PAD earlier in the course of disease progression. [32] In addition, after the test results are available and documented in the clinical notes automated reminders for implementation of risk modification strategies in PAD would be generated with the ultimate goal to prevent and reduce adverse outcomes in PAD patients. Future EHR-based studies will evaluate the impact of the NLP-based CDS system on outcomes in PAD patients. In conclusion, this PAD surveillance NLP algorithm has the potential for translation to clinical practice for use in CDS tools to remind clinicians to order ABI tests in patients with suspected PAD and to reinforce the implementation of guideline recommended risk modification strategies in patients diagnosed with PAD.
  31 in total

1.  Automated identification of patients with a diagnosis of binge eating disorder from narrative electronic health records.

Authors:  Brandon K Bellows; Joanne LaFleur; Aaron W C Kamauu; Thomas Ginter; Tyler B Forbush; Stephen Agbor; Dylan Supina; Paul Hodgkins; Scott L DuVall
Journal:  J Am Med Inform Assoc       Date:  2013-11-07       Impact factor: 4.497

2.  Identifying Peripheral Arterial Disease Cases Using Natural Language Processing of Clinical Notes.

Authors:  Naveed Afzal; Sunghwan Sohn; Sara Abram; Hongfang Liu; Iftikhar J Kullo; Adelaide M Arruda-Olson
Journal:  IEEE EMBS Int Conf Biomed Health Inform       Date:  2016-04-21

3.  Data resource profile: the Rochester Epidemiology Project (REP) medical records-linkage system.

Authors:  Jennifer L St Sauver; Brandon R Grossardt; Barbara P Yawn; L Joseph Melton; Joshua J Pankratz; Scott M Brue; Walter A Rocca
Journal:  Int J Epidemiol       Date:  2012-11-18       Impact factor: 7.196

Review 4.  CLINICAL PRACTICE. Peripheral Artery Disease.

Authors:  Iftikhar J Kullo; Thom W Rooke
Journal:  N Engl J Med       Date:  2016-03-03       Impact factor: 91.245

5.  Peripheral arterial disease detection, awareness, and treatment in primary care.

Authors:  A T Hirsch; M H Criqui; D Treat-Jacobson; J G Regensteiner; M A Creager; J W Olin; S H Krook; D B Hunninghake; A J Comerota; M E Walsh; M M McDermott; W R Hiatt
Journal:  JAMA       Date:  2001-09-19       Impact factor: 56.272

6.  Exploring the frontier of electronic health record surveillance: the case of postoperative complications.

Authors:  Fern FitzHenry; Harvey J Murff; Michael E Matheny; Nancy Gentry; Elliot M Fielstein; Steven H Brown; Ruth M Reeves; Dominik Aronsky; Peter L Elkin; Vincent P Messina; Theodore Speroff
Journal:  Med Care       Date:  2013-06       Impact factor: 2.983

7.  ACC/AHA 2005 guidelines for the management of patients with peripheral arterial disease (lower extremity, renal, mesenteric, and abdominal aortic): executive summary a collaborative report from the American Association for Vascular Surgery/Society for Vascular Surgery, Society for Cardiovascular Angiography and Interventions, Society for Vascular Medicine and Biology, Society of Interventional Radiology, and the ACC/AHA Task Force on Practice Guidelines (Writing Committee to Develop Guidelines for the Management of Patients With Peripheral Arterial Disease) endorsed by the American Association of Cardiovascular and Pulmonary Rehabilitation; National Heart, Lung, and Blood Institute; Society for Vascular Nursing; TransAtlantic Inter-Society Consensus; and Vascular Disease Foundation.

Authors:  Alan T Hirsch; Ziv J Haskal; Norman R Hertzer; Curtis W Bakal; Mark A Creager; Jonathan L Halperin; Loren F Hiratzka; William R C Murphy; Jeffrey W Olin; Jules B Puschett; Kenneth A Rosenfield; David Sacks; James C Stanley; Lloyd M Taylor; Christopher J White; John White; Rodney A White; Elliott M Antman; Sidney C Smith; Cynthia D Adams; Jeffrey L Anderson; David P Faxon; Valentin Fuster; Raymond J Gibbons; Jonathan L Halperin; Loren F Hiratzka; Sharon A Hunt; Alice K Jacobs; Rick Nishimura; Joseph P Ornato; Richard L Page; Barbara Riegel
Journal:  J Am Coll Cardiol       Date:  2006-03-21       Impact factor: 24.094

8.  One-year costs in patients with a history of or at risk for atherothrombosis in the United States.

Authors:  Elizabeth M Mahoney; Kaijun Wang; David J Cohen; Alan T Hirsch; Mark J Alberts; Kim Eagle; Frederique Mosse; Joseph D Jackson; P Gabriel Steg; Deepak L Bhatt
Journal:  Circ Cardiovasc Qual Outcomes       Date:  2008-09

9.  Validation of electronic health record phenotyping of bipolar disorder cases and controls.

Authors:  Victor M Castro; Jessica Minnier; Shawn N Murphy; Isaac Kohane; Susanne E Churchill; Vivian Gainer; Tianxi Cai; Alison G Hoffnagle; Yael Dai; Stefanie Block; Sydney R Weill; Mireya Nadal-Vicens; Alisha R Pollastri; J Niels Rosenquist; Sergey Goryachev; Dost Ongur; Pamela Sklar; Roy H Perlis; Jordan W Smoller
Journal:  Am J Psychiatry       Date:  2014-12-12       Impact factor: 18.112

10.  Multidisciplinary approach to the diagnosis and management of patients with peripheral arterial disease.

Authors:  Craig M Walker; Frank T Bunch; Nick G Cavros; Eric J Dippel
Journal:  Clin Interv Aging       Date:  2015-07-10       Impact factor: 4.458

View more
  3 in total

1.  Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes.

Authors:  Beau Norgeot; Kathleen Muenzen; Thomas A Peterson; Xuancheng Fan; Benjamin S Glicksberg; Gundolf Schenk; Eugenia Rutenberg; Boris Oskotsky; Marina Sirota; Jinoos Yazdany; Gabriela Schmajuk; Dana Ludwig; Theodore Goldstein; Atul J Butte
Journal:  NPJ Digit Med       Date:  2020-04-14

2.  Building a best-in-class automated de-identification tool for electronic health records through ensemble learning.

Authors:  Karthik Murugadoss; Ajit Rajasekharan; Bradley Malin; Vineet Agarwal; Sairam Bade; Jeff R Anderson; Jason L Ross; William A Faubion; John D Halamka; Venky Soundararajan; Sankar Ardhanari
Journal:  Patterns (N Y)       Date:  2021-05-12

3.  Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes.

Authors:  Beau Norgeot; Kathleen Muenzen; Thomas A Peterson; Xuancheng Fan; Benjamin S Glicksberg; Gundolf Schenk; Eugenia Rutenberg; Boris Oskotsky; Marina Sirota; Jinoos Yazdany; Gabriela Schmajuk; Dana Ludwig; Theodore Goldstein; Atul J Butte
Journal:  NPJ Digit Med       Date:  2020-04-14
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.