Literature DB >> 34095444

Application of a machine learning algorithm for detection of atrial fibrillation in secondary care.

Kevin G Pollock¹, Sara Sekelj², Ellie Johnston², Belinda Sandler¹, Nathan R Hill¹, Fu Siong Ng^3,4, Sadia Khan^3,4, Ayman Nassar¹, Usman Farooqui¹.

Abstract

Atrial fibrillation (AF) is the most common sustained heart arrhythmia and significantly increases risk of stroke. Opportunistic AF testing in high-risk patients typically requires frequent electrocardiogram tests to capture the arrhythmia. Risk-prediction algorithms may help to more accurately identify people with undiagnosed AF and machine learning (ML) may aid in the diagnosis of AF. Here, we applied an AF-risk prediction algorithm to secondary care data linked to primary care data in the DISCOVER database in order to evaluate changes in model performance, and identify patients not previously detected in primary care. We identified an additional 5,444 patients who had an AF diagnosis only in secondary care during the data extraction period. 2,696 (49.5%) were accepted by the algorithm and the algorithm correctly assigned 2,637 (97.8%) patients to the AF cohort. Using a risk threshold of 7.4% in patients aged ≥ 30 years, algorithm sensitivity and specificity was 38% and 95%, respectively. Approximately 15% of AF patients assigned to the AF cohort by the algorithm had a secondary care diagnosis with no record of AF in primary care. These additional patients did not substantially alter algorithm performance. The additional detection of previously undiagnosed AF patients in secondary care highlights unexpected potential utility of this ML algorithm. Crown

Entities: Chemical

Keywords: Artificial intelligence; Atrial fibrillation; Diagnosis; Machine learning

Year: 2020 PMID： 34095444 PMCID： PMC8164133 DOI： 10.1016/j.ijcha.2020.100674

Source DB: PubMed Journal: Int J Cardiol Heart Vasc ISSN： 2352-9067

Introduction

Atrial fibrillation (AF) is the most common sustained heart arrhythmia [1] and may increase the risk of thromboembolic stroke by five-fold [2]. Furthermore, patients with AF experience more severe strokes than those without AF [3]. The global incidence of AF has significantly increased over time and prevalence is estimated to be 3% in the UK population [4], [5]. However, AF can be difficult to diagnose because it is often paroxysmal and/or asymptomatic or minimally symptomatic [5]. Currently, there is no formal screening programme for AF in the UK. Detection of undiagnosed AF must equally consider the associated patient burden, healthcare resource use, diagnostic sensitivity and specificity. Opportunistic testing for AF in high-risk patients (such as those with irregular pulse or aged ≥ 65 years as risk of AF increases with age) typically requires frequent electrocardiogram (ECG) tests to capture the arrhythmia [5]. Risk-prediction algorithms may help to more accurately identify people with undiagnosed AF and machine learning (ML) may aid in the diagnosis of AF, through targeted screening modalities [6], [7]. Machine learning is particularly useful for examining non-linear associations and complex interactions between variables without having to specify these relationships a priori. Investment in, development, and adoption of artificial intelligence across the NHS is at the forefront of the UK government’s healthcare agenda [8]. As AF has a complex aetiology, models developed using ML methods may offer improved predictive performance compared with models built with classical statistical methods to estimate AF incidence. A recently published AF risk prediction algorithm, developed using routinely collected UK primary care data from the Clinical Practice Research Datalink (CPRD) was better able to identify patients at highest risk of AF compared with existing models [6]. Compared to the CHARGE-AF model, the AF-risk prediction algorithm was able to reduce the number of high-risk patients needed to be screened (NNS) to identify one case of AF by 31%, from 13 to 9. This was further validated in the DISCOVER primary care database (general or medical practitioner data) in North-West London (NWL) encompassing a population of 2.5 million [9], with a sensitivity of 75% and specificity of 99.1%. Here, we employed a machine learning algorithm that had been developed and validated in primary care datasets, with the aim of assessing its performance and utility when applied to secondary care linked data (hospital data). This machine learning model leveraged secondary care data to detect undiagnosed AF with reasonable performance.

Methods

A retrospective cohort study was undertaken using coded secondary care data from the Whole Systems Integrated Care (WSIC) dataset, which is one of Europe’s largest patient-level datasets, containing data from approximately 2.5 million patients across NWL. Study data were obtained through the DISCOVER secure environment, which was developed by Imperial College Health Partners, the Academic Health Science Network for NWL. Favourable ethical opinion was secured in October 2018 to use the Discover Research Platform for research purposes for a period of five years. Local Research and Development Department approval was obtained from the NWL Data Research Access Group on 18th October 2018. Patient consent was not required because the study was retrospective study using anonymised data. The development, application and validation of the machine learning algorithm in primary care systems has been described in detail elsewhere [5], [9]. De-identified secondary care hospital data for all patients were extracted via DISCOVER for the period 01 January 2001 to 31 December 2016. In order to assess the model’s predictive ability to distinguish between patients at high and relatively lower risk of AF, the first step was to generate a threshold for the risk of AF among patients not flagged by the algorithm for screening. These thresholds were derived from baseline risk factors (age, previous cardiovascular disease, antihypertensive medication usage) and additional time-varying predictors (proximity of cardiovascular events, body mass index [levels and changes], pulse pressure, and frequency of blood pressure measurements). The risk-prediction model did not look at ECG data as it was initially intended to identify patients at risk of AF in primary care, before they developed disease that might result in consequent ECGs. Based on the per-patient risk scores returned by the algorithm in the original primary care dataset (CPRD) [6], a risk threshold of 7.4% was set for AF versus not AF. Thus, all patients with risk scores ≥ 7.4% were categorised as being at high risk of developing AF, and those with a risk scores < 7.4% were categorised as being at low risk. The predictive performance of the model was then assessed using the following metrics: sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and number needed to screen (NNS). All analyses were performed using Microsoft Excel 2013, Stata version 15.0 and R version 3.6.0.

Results

We identified an additional 5,444 patients who had an AF diagnosis only in secondary care during the data extraction period. These patients were not diagnosed with AF in primary care. 2,696 (49.5%) were accepted by the algorithm and 2,748 (50.5%) were excluded. Among the 2,696 patients included, the algorithm correctly assigned 2,637 (97.8%) patients to the AF cohort and incorrectly assigned 59 patients (2.2%) to the non-AF cohort. These additional patients did not substantially alter algorithm performance (Table 1). Using a risk threshold of 7.4% in patients aged ≥ 30 years, algorithm sensitivity and specificity was 38% and 95%, respectively. Altering the risk threshold to 5.5% increased the sensitivity to 50% with specificity remaining at 95%.

Table 1

Algorithm performance in WSIC dataset with the CPRD risk threshold of 7.4% and 5.5% after addition of patients with AF diagnosis in secondary care only.

Risk threshold 7.4%	AF		Risk threshold 5.5%	AF
	Yes	No		Yes	No
Patients aged ≥ 30 years
Algorithm			Algorithm
Yes	8737	27,700	Yes	11,558	45,090
No	14,355	558,659	No	11,534	541,269
Sensitivity	38%		Sensitivity	50%
Specificity	95%		Specificity	95%
PPV	24%		PPV	20%
NPV	97%		NPV	98%
1/PPV (NNS)	4		1/PPV (NNS)	5
Patients aged ≥ 65 years
Algorithm			Algorithm
Yes	7797	23,169	Yes	10,252	37,143
No	8182	82,298	No	5727	68,324
Sensitivity	49%		Sensitivity	64%
Specificity	78%		Specificity	65%
PPV	25%		PPV	22%
NPV	91%		NPV	92%
1/PPV (NNS)	4		1/PPV (NNS)	5

Algorithm performance in WSIC dataset with the CPRD risk threshold of 7.4% and 5.5% after addition of patients with AF diagnosis in secondary care only. Approximately 15% of AF patients assigned to the AF cohort by the algorithm had only a secondary care diagnosis, with no READ codes relating to AF in their linked primary care record.

Discussion

Granular data collected from secondary care may represent an under-utilized method for improving the diagnosis of many health conditions [10]. Here, we employed a machine learning algorithm that had been developed and validated in primary care datasets, with the aim of assessing its performance and utility when applied to secondary care linked data. This machine-learning model leveraged secondary care data to detect undiagnosed AF with reasonable performance. However, the low prevalence of the disease resulted in a low positive predictive value, and for clinically meaningful sensitivity thresholds to be actionable, confirmatory testing with high specificity (e.g. electrocardiogram) would be required following model detection. However, the additional detection of AF in previously undiagnosed patients in secondary care highlights unexpected potential utility of this machine learning algorithm, with a number needed to screen (NNS) of 5. Further studies are required to externally validate algorithm performance in other secondary care datasets and explore the feasibility of embedding algorithms into the perioperative electronic health record for real-world use by clinicians.

Summary declarations of interest

KGP, BS, NRH, AN and UF are employees of Bristol-Myers Squibb Pharmaceuticals Ltd. SS and EJ are employees of Imperial College Health Partners, which received funding from BMS to undertake this study. FSN acknowledges funding from the British Heart Foundation (RG/16/3/32175) and the National Institute for Health Research (NIHR) Imperial Biomedical Research Centre (BRC).

Authors’ contributions

BS, NRH, and UF contributed to the conception or design of the work. All authors contributed to the acquisition, formal analysis, or interpretation of data for the work. KP drafted the manuscript. All authors critically revised the manuscript, gave final approval and agree to be accountable for all aspects of work ensuring integrity and accuracy.

Role of the funding source

This work was supported by Bristol-Myers Squibb Pharmaceuticals Ltd who provided funding for data collection and data analysis.

Data sharing statement

The datasets analysed in this study are available in the DISCOVER database and available on request from Imperial College Health Partners at https://www.registerfordiscover.org.uk/researchers/how-to-access-discover.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

7 in total

1. Early Detection of Heart Failure With Reduced Ejection Fraction Using Perioperative Data Among Noncardiac Surgical Patients: A Machine-Learning Approach.

Authors: Michael R Mathis; Milo C Engoren; Hyeon Joo; Michael D Maile; Keith D Aaronson; Michael L Burns; Michael W Sjoding; Nicholas J Douville; Allison M Janda; Yaokun Hu; Kayvan Najarian; Sachin Kheterpal
Journal: Anesth Analg Date: 2020-05 Impact factor: 5.108

2. Stroke severity in atrial fibrillation. The Framingham Study.

Authors: H J Lin; P A Wolf; M Kelly-Hayes; A S Beiser; C S Kase; E J Benjamin; R B D'Agostino
Journal: Stroke Date: 1996-10 Impact factor: 7.914

3. Atrial fibrillation as an independent risk factor for stroke: the Framingham Study.

Authors: P A Wolf; R D Abbott; W B Kannel
Journal: Stroke Date: 1991-08 Impact factor: 7.914

4. Worldwide epidemiology of atrial fibrillation: a Global Burden of Disease 2010 Study.

Authors: Sumeet S Chugh; Rasmus Havmoeller; Kumar Narayanan; David Singh; Michiel Rienstra; Emelia J Benjamin; Richard F Gillum; Young-Hoon Kim; John H McAnulty; Zhi-Jie Zheng; Mohammad H Forouzanfar; Mohsen Naghavi; George A Mensah; Majid Ezzati; Christopher J L Murray
Journal: Circulation Date: 2013-12-17 Impact factor: 29.690

5. Identification of patients with atrial fibrillation: a big data exploratory analysis of the UK Biobank.

Authors: Julien Oster; Jemma C Hopewell; Klemen Ziberna; Rohan Wijesurendra; Christian F Camm; Barbara Casadei; Lionel Tarassenko
Journal: Physiol Meas Date: 2020-03-06 Impact factor: 2.833

6. Detecting undiagnosed atrial fibrillation in UK primary care: Validation of a machine learning prediction algorithm in a retrospective cohort study.

Authors: Sara Sekelj; Belinda Sandler; Ellie Johnston; Kevin G Pollock; Nathan R Hill; Jason Gordon; Carmen Tsang; Sadia Khan; Fu Siong Ng; Usman Farooqui
Journal: Eur J Prev Cardiol Date: 2021-05-22 Impact factor: 7.804

7. Predicting atrial fibrillation in primary care using machine learning.

Authors: Nathan R Hill; Daniel Ayoubkhani; Phil McEwan; Daniel M Sugrue; Usman Farooqui; Steven Lister; Matthew Lumley; Ameet Bakhai; Alexander T Cohen; Mark O'Neill; David Clifton; Jason Gordon
Journal: PLoS One Date: 2019-11-01 Impact factor: 3.240

7 in total

4 in total

1. [How-to case report].

Authors: Harilaos Bogossian; Carsten W Israel
Journal: Herzschrittmacherther Elektrophysiol Date: 2021-11-04

2. Development and Internal Validation of Supervised Machine Learning Algorithms for Predicting the Risk of Surgical Site Infection Following Minimally Invasive Transforaminal Lumbar Interbody Fusion.

Authors: Haosheng Wang; Tingting Fan; Bo Yang; Qiang Lin; Wenle Li; Mingyu Yang
Journal: Front Med (Lausanne) Date: 2021-12-20

Review 3. Is machine learning the future for atrial fibrillation screening?

Authors: Pavidra Sivanandarajah; Huiyi Wu; Nikesh Bajaj; Sadia Khan; Fu Siong Ng
Journal: Cardiovasc Digit Health J Date: 2022-05-16

4. Application of supervised machine learning algorithms to predict the risk of hidden blood loss during the perioperative period in thoracolumbar burst fracture patients complicated with neurological compromise.

Authors: Bo Yang; Lin Gao; Xingang Wang; Jianmin Wei; Bin Xia; Xiangwei Liu; Peng Zheng
Journal: Front Public Health Date: 2022-09-26

4 in total