Literature DB >> 35178483

Explainable Machine Learning for Atrial Fibrillation in the General Population Using a Generalized Additive Model　- A Cross-Sectional Study.

Masaki Kawakami¹, Shigehiro Karashima², Kento Morita¹, Hayato Tada³, Hirofumi Okada³, Daisuke Aono⁴, Mitsuhiro Kometani⁴, Akihiro Nomura³, Masashi Demura⁵, Kenji Furukawa⁶, Takashi Yoneda^7,4,8, Hidetaka Nambo¹, Masa-Aki Kawashiri³.

Abstract

Background: Atrial fibrillation (AF) is the most common arrhythmia and is associated with increased thromboembolic stroke risk and heart failure. Although various prediction models for AF risk have been developed using machine learning, their output cannot be accurately explained to doctors and patients. Therefore, we developed an explainable model with high interpretability and accuracy accounting for the non-linear effects of clinical characteristics on AF incidence. Methods and
Results: Of the 489,073 residents who underwent specific health checkups between 2009 and 2018 and were registered in the Kanazawa Medical Association database, data were used for 5,378 subjects with AF and 167,950 subjects with normal electrocardiogram readings. Forty-seven clinical parameters were combined using a generalized additive model algorithm. We validated the model and found that the area under the curve, sensitivity, and specificity were 0.964, 0.879, and 0.920, respectively. The 9 most important variables were the physical examination of arrhythmia, a medical history of coronary artery disease, age, hematocrit, γ-glutamyl transpeptidase, creatinine, hemoglobin, systolic blood pressure, and HbA1c. Further, non-linear relationships of clinical variables to the probability of AF diagnosis were visualized. Conclusions: We established a novel AF risk explanation model with high interpretability and accuracy accounting for non-linear information obtained at general health checkups. This model contributes not only to more accurate AF risk prediction, but also to a greater understanding of the effects of each characteristic.

Entities: Chemical

Keywords: Atrial fibrillation; General population; Generalized additive model; Machine learning; Prediction

Year: 2021 PMID： 35178483 PMCID： PMC8811230 DOI： 10.1253/circrep.CR-21-0151

Source DB: PubMed Journal: Circ Rep ISSN： 2434-0790

Atrial fibrillation (AF) is the most common arrhythmia, with a global prevalence estimated at high as 46.3 million, with annual increases of 3.8 million cases in recent years.– AF is associated with an increased risk of thromboembolic stroke and heart failure, which is not only life threatening, but can also result in severe sequelae, leaving patients with severe mobility impairments or requiring nursing care., Individuals with AF have an approximately 5-fold higher risk of stroke than those without AF. Therefore, the development of AF should be predicted at the early onset. Artificial intelligence (AI) is gaining attention in preventive medicine, and there is an urgent need for the detection and prediction of AF. AI methods can automatically identify obscure but significant patterns from individual clinical data to predict the clinical outcomes of a patient. Some studies have demonstrated the high accuracy of AI models in the field of AF., However, the internal workings or the basis for decisions on inference results are not known by most users of such systems. The difficulty in explaining the results of and reasoning behind AI analysis is commonly referred to as a “black box”. Consequently, most models cannot explain their predictions and evidence to doctors and patients in clinical settings. However, doctors must use models that have both high accuracy and high interpretability when they explain diagnosis or treatment plans to patients. Several algorithms have been proposed to explain the black box model.– The most simple and easy to understand algorithm is the generalized linear model (GLM). The GLM extends the linear model to the exponential family of distribution and is expressed as a linear form of the relationship between independent variables and the outcome. Meanwhile, the generalized additive model (GAM) is an extension of the GLM that allows non-parametric forms of independent variables. The GAM is trained on linear predictors, such as the sum of each variable’s non-linear functions. Some AF prediction models have been proposed in prior studies, but few were developed with interpretable methodology. In addition, in Japan, specific health checkups started from 2008, and these are mainly conducted at clinics.– However, an electrocardiogram (ECG) is considered an additional rather than mandatory item on these health checkups. An interpretable algorithm for the prediction of AF using the information obtained from such checkups would help the general population to think about lifestyle or other factors related to the development of AF. The aim of this study was to develop and validate an interpretable model for the detection of AF, using data from specific health checkups, and to examine how clinical characteristics affect the risk of AF.

Methods

Study Participants

The study was a cross-sectional observational study that included 489,073 subjects aged ≥40 years who underwent community-based medical checkups between 2009 and 2018 in Kanazawa City. All clinicians were sent a manual that followed the guidelines of each academic society, and the checkups were performed accordingly. During the checkups, clinicians conduct standard medical examinations, recording data such as height, weight, waist circumference, blood pressure, results of biochemical examinations, urinalysis, and resting 12-lead ECG. AF was diagnosed on the basis of the baseline ECG, according to Minnesota Codes. Specifically, subjects diagnosed with Minnesota Code 8-3 were classified as having AF, and those diagnosed with Minnesota Code 1-0-0 were classified as having a normal ECG. All ECG findings were recorded and stored by the Kanazawa Medical Association (KMA). Subjects with partially missing data and Minnesota Codes other than 8-3 or 1-0-0 () were excluded from the study. The KMA collected and anonymized the data. This study was approved by the Ethics Committee of KMA (No. 16000003) and Kanazawa University (No. 2019-080), and was conducted in accordance with the Declaration of Helsinki and the ethical guidelines for human medical research. The data were exempt from informed consent because they were secondary data. Hence, an opt-out-style announcement regarding this study was issued on the KMA website (http://www.kma.jp/kenkyu/kenkyu_index.html).

Dataset and Features

The KMA database contains information on several clinical parameters, including physical observations and medical histories. Forty-seven variables, including age, sex, body mass index (BMI), waist circumference, systolic blood pressure (SBP), and diastolic blood pressure (DBP), were collected in the dataset. Moreover, the following were measured in blood samples within 24 h of collection using an automated clinical chemical analyzer: serum creatinine (s-Cr), estimated glomerular filtration rate (eGFR), serum uric acid, HbA1c, total cholesterol, triglycerides, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol (HDL-C), red blood cell (RBC) count, white blood cell (WBC) count, hemoglobin, hematocrit, platelet count, aspartate aminotransferase, alanine transaminase (ALT), γ-glutamyl transpeptidase (γ-GTP), and plasma glucose. Although the methods used for blood analyses were not calibrated between laboratories, the analyses were performed according to the methods for laboratory tests recommended by the Japan Society of Clinical Chemistry, which have been widely adopted by laboratories across Japan. Urinary protein, glucose, and occult blood were semiquantitatively examined and graded using 5 levels: “negative”, “trace”, “1+”, “2+”, and “3+”. In this study, urinary protein was analyzed as 3 levels (“negative”, “trace”, and “others”), whereas urinary glucose and urine occult blood were analyzed as 2 levels (“negative” and “others”). A medical history of stroke, coronary artery disease (CAD), chronic kidney disease (CKD), and anemia was analyzed as past history based on self-reported questionnaire. A history of hypertension, diabetes, and dyslipidemia “under treatment” or “untreated” were assigned a value of 1 and others were assigned a value of 0. Physical findings such as jaundice, arrhythmia, heart murmur, crackles, hepatomegaly, edema, cervical tumors, neuropathy, malnutrition, and anemia were also included in the model.

Statistical Analysis for Clinical Background

Data are expressed as the mean±SD or as percentages. Parameters were compared between the AF and normal ECG groups. Normality was evaluated using the Shapiro-Wilk test. Data that were not normally distributed were compared using the Mann-Whitney U test. The variance of formally distributed data was evaluated using the Bartlett test; normally distributed data with equal variance were compared using Student’s t-test, whereas data without equal variance were compared using Welch’s t-test. Two-sided P<0.05 was considered significant. Statistical analyses were performed using Python 3.8.3 programming language (Python Software Foundation, Wilmington, DE, USA) and SciPy 1.5.2.

General Process of Model Construction and Evaluation

This study adopted the ideas of the GAMensPlus algorithm combining the GAM and ensemble learning. Ensemble learning is a method of estimating the final prediction result by building several small models and averaging the class probability output. We used the GAM or the GLM as the small model to compare them in terms of the difference in non-linear and linear coefficients ().

Figure 1.

Process used to develop the machine learning model. The Kanazawa Medical Association (KMA) database containing 4,386 subjects with atrial fibrillation (AF) and 133,535 subjects with a normal electrocardiogram (ECG) was analyzed. The training dataset comprised subjects from 2009 to 2017, whereas the test dataset comprised subjects for whom data was obtained in 2018. After under-sampling and bootstrap sampling, the dataset was split into a training and out-of-bag (OOB) subset. A generalized additive model (GAM) or generalized linear model (GLM) was constructed using the training data, and the importance score was calculated using the OOB data. In all, 100 small models were developed as an ensemble, and their generalization performance was evaluated by using the test dataset. The database was divided into 2 different datasets: a training set, comprising subject data collected from 2009 to 2017, and a validation set, comprising data collected in 2018. Data for the 2018 study participants were removed from the training dataset to avoid using information from the same subjects twice (). The model was constructed by repeating the following 4 steps for the number of ensemble members: (1) under-sampling and bootstrap sampling; (2) training the small model; (3) calculating the generalized importance score with an out-of-bag (OOB) dataset; and (4) external validation with the validation dataset. First, a sub-dataset was constructed by using the under-sampling method to address class imbalance. After under-sampling, the training and OOB datasets were separated by repeat sampling with replacement named bootstrap sampling. Second, a small prediction model was constructed with the training dataset. Third, permutation importance scores were obtained with the OOB dataset to evaluate how much each feature contributed to the predictions of the model. Finally, the overall prediction and generalized importance scores were obtained by averaging the AF probability and the permutation importance score for every GAM, respectively. The generalization performance of the ensemble model can be evaluated in terms of the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Detailed methods are provided in the .

Results

Subject Characteristics

Residents who underwent health checkups (n=489,073) were registered in the KMA database (). After applying the inclusion and exclusion criteria, the study subjects were classified into AF (n=5,378) and normal ECG (n=167,950) groups. shows the baseline characteristics of subjects with AF and normal ECGs, divided into the training and validation subsets. Significant differences were observed between the AF and normal ECG groups for all variables, except for the RBC count in the training subset and DBP, WBC count, RBC count, and ALT in the validation subset.

Table 1.

Baseline Characteristics of Subjects in the Atrial Fibrillation (AF) and Normal Electrocardiogram (ECG) Groups in the Training and Test Datasets

	Training (n=137,921)		Test (n=35,407)
	AF (n=4,386)	Normal ECG (n=133,535)	AF (n=992)	Normal ECG (n=34,415)
Age (years)	80±8**	70±12	79±8**	71±11
Male sex (%)	59.9	31.2	63.0	32.1
BMI (kg/m²)	23.1±3.6**	22.6±3.4	23.6±3.6**	22.8±3.4
WC (cm)	84.7±9.8**	82.8±9.7	86.1±10.3**	83.4±9.6
SBP (mmHg)	126±16**	127±16	125±16**	127±16
DBP (mmHg)	72±11**	73±10	73±11	73±10
Laboratory tests
WBC (/μL)	5,506±2,027**	5,384±1,727	5,290±1,429	5,264±1,449
RBC (×10⁴/μL)	430±57	431±46	437±52	439±44
Hemoglobin (g/dL)	13.3±1.9**	13.1±1.5	13.5±1.7**	13.4±1.4
Hematocrit (%)	40.3±5.1**	39.7±4.0	40.9±4.7**	40.5±3.8
PLT (×10⁴/μL)	18.9±5.5**	22.5±5.9	19.8±5.4**	23.3±5.7
Cr (mg/dL)	1.0±0.5**	0.7±0.3	1.0±0.4**	0.8±0.2
eGFR (mL/min/1.73 m²)	58.3±17.6**	71.5±17.2	56.3±15.7**	67.5±15.3
UA (mg/dL)	5.8±1.5**	4.9±1.3	5.7±1.5**	5.0±1.3
AST (U/L)	26.3±12.4**	23.8±12.7	25.5±10.1**	23.7±10.8
ALT (U/L)	20.5±12.5*	20.4±14.5	20.1±11.0	20.7±13.0
γ-GTP (U/L)	52.7±82.5**	32.6±50.8	45.9±81.6**	31.9±46.1
HbA1c (NGSP; %)	5.7±0.7**	5.5±0.6	6.0±0.7**	5.8±0.6
PG (mg/dL)	112±40**	102±30	112±37**	101±27
TC (mg/dL)	180±35**	201±34	180±32**	202±34
LDL-C (mg/dL)	104±29**	118±30	104±28**	117±29
HDL-C (mg/dL)	54±16**	60±15	56±14**	62±16
TG (mg/dL)	114±73**	119±74	107±60**	119±76
Urine tests
Protein (%)
(−)/(±)	62.6/16.3	80.5/11.9	65.4/16.9	81.8/11.3
(+)/(2+)/(3+)	12.6/6.5/0.5	5.1/1.9/0.1	11.2/3.8/1.6	4.7/1.4/0.4
Glucose (%)
(−)/(±)	86.7/4.1	93.3/1.9	87.5/2.9	93.6/1.6
(+)/(2+)/(3+)	3.2/4.3/0.4	1.8/2.1/0.3	3.7/2.0/2.8	1.7/1.1/1.6
Occult blood (%)
(−)/(±)	60.0/18.9	66.4/16.7	57.7/20.4	65.0/17.1
(+)/(2+)/(3+)	11.6/7.7/0.4	9.5/6.4/0.4	12.1/6.0/2.6	10.1/5.6/1.8
Past
Hypertension (%)	59.3	38.8	63.6	41.4
Diabetes (%)	16.9	10.7	18.4	11.2
Dyslipidemia (%)	19.5	24.6	26.9	29.9
Stroke (%)	23.4	6.6	19.8	5.5
CAD (%)	63.0	8.7	63.9	7.6
CKD (%)	2.9	0.8	3.1	1.1
Anemia (%)	15.9	17.6	15.2	15.6
Exam
Anemia (%)	1.1	0.6	1.2	0.3
Jaundice (%)	0.1	0.0	0.0	0.0
Arrhythmia (%)	57.3	0.3	57.6	0.3
Heart murmur (%)	6.2	1.0	3.5	0.8
Crackles (%)	0.8	0.5	1.2	0.3
Hepatomegaly (%)	0.3	0.1	0.3	0.1
Edema (%)	9.4	2.3	8.2	2.0
Cervical tumors (%)	0.4	1.0	1.0	1.0
Neuropathy (%)	2.9	1.4	1.7	0.8
Malnutrition (%)	0.6	0.3	0.5	0.1
Other (%)	8.7	4.7	5.8	3.0

Data are given as the mean±SD or as percentages. *P<0.05, **P<0.001 compared with the normal ECG group. ALT, alanine aminotransferase; AST, aspartate aminotransferase; BMI, body mass index; CAD, coronary artery disease; CKD, chronic kidney disease; Cr, creatinine; DBP, diastolic blood pressure; eGFR, estimated glomerular filtration rate; Exam, physical examination; γ-GTP, γ-glutamyl transpeptidase; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; NGSP, National Glycohemoglobin Standardization Program; Past, past history; PG, plasma glucose; PLT, platelet count; RBC, red blood cell count; SBP, systolic blood pressure; TC, total cholesterol; TG, triglycerides; UA, uric acid; WBC, white blood cell count; WC, waist circumference.

Baseline Characteristics of Subjects in the Atrial Fibrillation (AF) and Normal Electrocardiogram (ECG) Groups in the Training and Test Datasets Data are given as the mean±SD or as percentages. *P<0.05, **P<0.001 compared with the normal ECG group. ALT, alanine aminotransferase; AST, aspartate aminotransferase; BMI, body mass index; CAD, coronary artery disease; CKD, chronic kidney disease; Cr, creatinine; DBP, diastolic blood pressure; eGFR, estimated glomerular filtration rate; Exam, physical examination; γ-GTP, γ-glutamyl transpeptidase; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; NGSP, National Glycohemoglobin Standardization Program; Past, past history; PG, plasma glucose; PLT, platelet count; RBC, red blood cell count; SBP, systolic blood pressure; TC, total cholesterol; TG, triglycerides; UA, uric acid; WBC, white blood cell count; WC, waist circumference.

Feature Importance Ranking

shows the generalized feature importance ranking and scores calculated by the GAM based on AUC, sensitivity, and specificity. Physical examination of arrhythmia, a history of CAD, age, and hematocrit were the top 4 items in all evaluation metrics, with blood γ-GTP, s-Cr, and hemoglobin concentrations ranked among the top 10 items.

Table 2.

Feature Importance Ranking and Scores Calculated on the Basis of Area Under the Curve (AUC), Sensitivity, and Specificity

Ranking	AUC	Score	Sensitivity	Score	Specificity	Score
1	Exam arrhythmia	0.429	Exam arrhythmia	0.149	Exam arrhythmia	0.454
2	Past CAD	0.177	Past CAD	0.121	Past CAD	0.144
3	Age	0.073	Age	0.088	Hematocrit	0.051
4	Hematocrit	0.054	Hematocrit	0.041	Age	0.034
5	γ-GTP	0.030	γ-GTP	0.035	Hemoglobin	0.029
6	Cr	0.029	Cr	0.032	ALT	0.024
7	Hemoglobin	0.028	Hemoglobin	0.024	SBP	0.023
8	SBP	0.021	HbA1c	0.023	Cr	0.022
9	ALT	0.018	UA	0.020	γ-GTP	0.021
10	HbA1c	0.012	TC	0.019	Exam no symptoms	0.016
11	TG	0.011	SBP	0.018	TG	0.014
12	TC	0.010	Past stroke	0.018	DBP	0.011
13	UA	0.010	AST	0.016	eGFR	0.010
14	AST	0.009	PLT	0.016	TC	0.010
15	DBP	0.008	HDL-C	0.016	AST	0.009
16	eGFR	0.008	eGFR	0.015	UA	0.008
17	Past stroke	0.007	UP (1+, 2+, 3+)	0.015	Sex	0.008
18	RBC	0.006	ALT	0.015	WC	0.008
19	PLT	0.006	RBC	0.014	LDL-C	0.008
20	UP (1+, 2+, 3+)	0.006	BMI	0.014	HbA1c	0.007
21	Exam no symptoms	0.005	Past hypertension	0.014	RBC	0.006
22	HDL-C	0.005	DBP	0.013	PLT	0.006
23	BMI	0.004	UG	0.013	UP (1+, 2+, 3+)	0.006
24	UP(−)	0.004	UP(±)	0.013	PG	0.006
25	LDL-C	0.004	TG	0.013	UP(−)	0.006
26	Past dyslipidemia	0.004	Past dyslipidemia	0.012	Past stroke	0.005
27	WC	0.003	UP(−)	0.012	Past diabetes	0.005
28	UP(±)	0.003	UOB	0.012	Past dyslipidemia	0.005
29	Sex	0.003	LDL-C	0.011	UOB	0.004
30	PG	0.002	WBC	0.011	UP(±)	0.004
31	UOB	0.002	Past anemia	0.011	BMI	0.004
32	Past diabetes	0.002	Exam anemia	0.011	HDL-C	0.004
33	UG	0.001	Exam cervical	0.011	Exam edema	0.003
34	Past hypertension	0.001	Exam crackles	0.011	Exam heart murmur	0.002
35	Exam edema	0.001	Exam malnutrition	0.011	WBC	0.002
36	Exam heart murmur	<0.001	Exam jaundice	0.011	Exam others	0.002
37	Exam anemia	<0.001	Exam neuropathy	0.011	Exam crackles	0.002
38	Exam crackles	<0.001	Exam hepatomegaly	0.011	Exam anemia	0.002
39	WBC	<0.001	Exam heart murmur	0.011	Exam jaundice	0.002
40	Past anemia	<0.001	Past CKD	0.010	Exam neuropathy	0.002
41	Exam jaundice	<0.001	Past diabetes	0.010	Exam malnutrition	0.002
42	Exam neuropathy	<0.001	Exam edema	0.010	Past anemia	0.002
43	Exam others	<0.001	Exam others	0.010	Exam cervical tumors	0.002
44	Exam malnutrition	<0.001	Sex	0.009	Past CKD	0.002
45	Exam cervical tumors	<0.001	WC	0.009	Exam hepatomegaly	0.002
46	Past CKD	<0.001	PG	0.009	UG	0.002
47	Exam hepatomegaly	<0.001	Exam no symptoms	<0.001	Past hypertension	<0.001

UG, urinary glucose; UOB, urine occult blood; UP, urinary protein. Other abbreviations as in Table 1.

Feature Importance Ranking and Scores Calculated on the Basis of Area Under the Curve (AUC), Sensitivity, and Specificity UG, urinary glucose; UOB, urine occult blood; UP, urinary protein. Other abbreviations as in Table 1.

Predictive Performance

In this study we developed an explainable prediction model that set the number of ensemble members at 100. shows the receiver operating characteristic curves for the final prediction in the validation dataset. The GAM (AUC, 0.964; sensitivity, 0.879; specificity, 0.920; PPV, 0.242; NPV, 0.996) was as accurate as the GLM (AUC, 0.962; sensitivity, 0.876; specificity, 0.924; PPV, 0.249; NPV, 0.996).

Figure 2.

The area under the receiver operating characteristic (AUC) curve, with sensitivity and specificity, for the prediction of atrial fibrillation (AF) with the generalized additive model (GAM; blue line) and the generalized linear model (GLM; orange line).

Interpretation of Features

shows the relationship between 9 variables selected on the basis of the generalized importance score and the risk of having AF. shows average trends for the GAM and GLM, data distribution, and mean trends. Physical examination of arrhythmia, history of CAD, age, hematocrit, γ-GTP, s-Cr, hemoglobin, SBP, and HbA1c were selected as the 9 features with a high contribution to AUC, sensitivity, and specificity (). The relationship between the remaining 38 variables and the risk of AF is shown in . Both a physical finding of arrhythmia and a past history of CAD exhibited strong relationships with AF incidence compared with no arrhythmia on physical examination and no medical history of CAD, respectively. In addition, the trends for age and hematocrit revealed an overall positive relationship between the variable and AF incidence; in particular, the magnitude of the increase in risk with increasing age was high. The relationship between HbA1c and AF showed a J-shaped waveform, with a minimum around 5%, with HbA1c values >5% positively associated with the development of AF. In contrast, the relationship between AF and both hemoglobin and SBP exhibited a negative trend. Finally, the relationship between γ-GTP or s-Cr and the probability of AF was almost parallel between the GAM and GLM.

Figure 3.

The probability of atrial fibrillation (AF), as determined by the generalized additive model (GAM; red lines) or generalized linear model (GLM; blue lines), according to 9 clinical variables, namely: (A) arrhythmia on physical examination (Exam), (B) past coronary artery disease (CAD), (C) age, (D) hematocrit, (E) γ-glutamyl transpeptidase (γ-GTP), (F) creatinine, (G) hemoglobin, (H) systolic blood pressure (SBP), and (I) HbA1c. Arrhythmia on Exam (A) and past CAD (B) are regarded as binary variables, with trends shown as the mean±SD. The remaining parameters (C–I) are regarded as continuous variables, with trends indicated by solid lines. The histograms show data distribution for each parameter (lighter shading indicating subjects with a normal electrocardiogram [ECG] and darker shading indicating AF subjects). The left axes show the AF risk transformed by the sigmoid function, and the right axes show data distribution. The closer the AF risk is to 1, the higher the likelihood of subclinical AF; conversely, the closer the risk is to 0, the less likely the parameter is associated with AF. Trends within a small distribution of data may be unreliable. In the GAM, suspicion of arrhythmia on Exam or having a medical history of CAD significantly increases the probability of AF compared with no arrhythmia on Exam and no medical history of CAD, respectively. The probability of AF increased with increasing age and hematocrit. The relationship between the probability of AF and γ-GTP, creatinine, and HbA1c values was almost parallel between the GAM and GLM, whereas there was a steady downward trend in the relationship between AF probability and increasing hemoglobin and SBP.

Discussion

We have developed and validated an explainable machine learning algorithm to predict AF, using clinical parameters obtained during health checkups. Using our model, we were able to detect incidents of AF with both high interpretability and high accuracy based on the easily obtained clinical findings. Based on AUC, sensitivity, and specificity, the 9 most essential elements were found to be a physical examination of arrhythmia, a medical history of CAD, age, hematocrit, γ-GTP, s-Cr, hemoglobin, SBP, and HbA1c. Notably, we revealed non-linear relationships between these clinical parameters and the probability of AF. The developed model has high detection ability, with correct diagnoses in 87.9% of AF subjects and 92.0% of subjects with normal ECGs. Because the NPV was 99.6%, ECGs may no longer need to be performed in cases that the model predicts to be “normal”. In addition, compared with complex methods, such as deep learning models or random forest classifiers, our model has the advantage of explaining the non-linear association between the occurrence of AF and clinical parameters. Therefore, the highly accurate predictions made by our model may be explained in detail, using clinical parameters, to the general population who participated in the medical examinations. Several AF risk score prediction models have been reported previously.– Those studies identified the following clinical variables as risk factors: age,–, BMI (height or weight),–, blood pressure (SBP, DBP, or the use of antihypertensive medication),–, lifestyle habits (smoking, drinking status),–, lipids (non-HDL-C), on medication for the treatment of diabetes,, eGFR, hemoglobin, heart disease (myocardial infarction and heart failure), CAD, cardiac murmur,,,, and ECG findings (PR interval, left atrial enlargement, and left ventricular hypertrophy).,, Our model also extracted several variables (i.e., age, SBP, γ-GTP [liver function], HbA1c, a past history of CAD, kidney function [creatinine], and anemia [hematocrit, hemoglobin]) that are similar to the factors derived from previous studies. These previous studies used Cox proportional hazard regression models, in which the significance of the influence different factors have is determined on the basis of the β coefficients, hazard ratios, and confidence intervals. Expressing the results in terms of coefficients is easier to understand as a linear function, but how the impact will change with changes in the clinical parameters remains unclear. However, in our model, the contribution of each of the 9 important features could be better understood by considering each feature as a non-linear function. Arrhythmia on physical examination, a medical history of CAD, and age were recognized as significant risk factors, as reported previously. Lower hemoglobin levels and anemia were previously reported to contribute to the prediction of AF, but hematocrit also seems to be important. Although hematocrit is an index of the same RBC properties, a high hematocrit may be caused by relative polycythemia due to dehydration, which can also be assessed by s-Cr. The present study has shown that the risk of AF increased with increasing s-Cr. Both hematocrit and s-Cr may be explained as predictors of AF via dehydration. In our database, we did not include whether the subjects drank alcohol as a variable. However, γ-GTP could be used as a surrogate of alcohol consumption, which was detected as a risk factor for AF in the present study. HbA1c ranging from 5% to 7% appeared to increase the risk of AF, and the effect of diabetes as a risk factor of AF was significant. Sandhu et al examined the differential associations between risk factors and the development of paroxysmal vs. non-paroxysmal AF. In that study, HbA1c quartiles (≤4.84%, 4.84 to ≤5.00%, 5.00 to ≤5.19%, >5.19%) and the occurrence of AF were analyzed, with higher HbA1c levels found to be preferentially associated with the early development of non-paroxysmal AF. Iguchi et al also investigated whether the prevalence of AF was associated with HbA1c levels in Japanese adults. In that study, the presence of AF was associated with the HbA1c level, especially in subjects with HbA1c <6.5%. The results of the present study show that high HbA1c increases the risk of AF with respect to glycemic control. Hypertension has been reported as a risk for AF in previous reports. A cross-sectional study enrolling elderly subjects suggested a negative relationship between SBP and AF prevalence. A similar trend has been observed in other studies that enrolled and followed older subjects who started hemodialysis. In subjects receiving antihypertensive treatment, a J-shaped relationship between SBP and AF was observed., In the present study, SBP in the range 100–120 mmHg was more likely to complicate AF than SBP above that range, and there was no J-shaped relationship between SBP and AF. We present 3 hypotheses regarding the negative correlation between SBP and AF: (1) low SBP may, indeed, increase the risk of AF via chronic coronary ischemia, myocardial proliferation, and fibrosis induced by inadequate coronary perfusion; (2) low SBP may be the result of AF or AF-related cardiac structural and functional abnormalities; and (3) subjects with higher SBP are more likely to receive better clinical care and more antihypertensive medications, which play an important role in the mutually affected relationship between hypertension and AF. In terms of interpretability, the GAM captured more detailed trends for the clinical parameters than the GLM. For example, the slope of the relationship between age and AF risk was steep for those subjects in their 60 s, but the trend changed in subjects their 70 s and 80 s. BMI was associated with a higher probability of AF starting from a BMI around 25 kg/m2, which was defined as “overweight” in the Suita City survey. HbA1c also exhibited a J-shaped relationship with AF risk, with a stronger positive relationship in for HbA1c 5–6.5%. The non-linear model for each clinical parameter developed using the GAM revealed part of the “black box” and helped understand the results of the prediction algorithm. This study has several limitations. First, subjects with Minnesota Codes other than 8-3 and 1-0-0 were excluded (i.e., some subjects were classified as neither AF [Minnesota Codes 8-3] nor normal ECG [Minnesota Codes 1-0-0]). Indeed, approximately 150,000 subjects were excluded. The algorithm constructed by excluding that group may be less accurate when examined using the actual general population. It may be desirable to re-evaluate the true accuracy of our model in specific health checkups without an ECG test in the future as part of a prospective study. Second, we achieved high accuracy in terms of AUC, sensitivity, and specificity. However, the PPV, representing the percentage of cases diagnosed as positive that are actually positive, was assessed as 0.242 in the GAM. This indicates that approximately three-quarters of the predicted AF subjects were normal ECG subjects who were misclassified. Hill et al developed machine learning models for risk prediction of AF using routinely collected data. In that study, the final neural network model achieved a PPV of 0.295, 0.183, and 0.115 to identify 25%, 50%, and 75% of diagnosed AF cases, respectively. In that study, there was a trade-off between sensitivity and PPV. However, we recommend using our model as screening test. We would strongly recommend that eligible subjects who test positive using our algorithm undergo a thorough an ECG. This is because approximately 25% of subjects may already have AF and could have a stroke at any time. This explainable model is expected to make physicians aware of AF complications even in the absence of ECG findings and prevent subjects from experiencing stroke and heart failure. Our model could detect persistent or chronic AF that appeared during medical examinations, but paroxysmal AF may be difficult to detect. However, high-risk subjects who are predicted to fall in the AF category in our model could undergo secondary examinations such as ECG and Holter ECG to confirm the presence of AF. We constructed an explainable AI system to detect current AF or normal ECG status and visualize the impact of clinical parameters, but the model cannot predict the future. The model is only designed to predict subjects with current AF complications in the general population. A useful method in terms of prevention could be, for example, a risk scoring system to predict the onset of AF within 3 years. An explainable AF future prediction and diagnosis system would be even better if it could visualize, in an understandable way, which clinical parameters could be improved in high-risk subjects to prevent AF and by how much. In summary, we established a novel AF risk prediction model by using data from physical observations, past histories, and general laboratory test results obtained during health checkups. Further, we expressed how clinical characteristics affect the incidence of AF with non-linear trends. Our model is expected to contribute not only to more accurate AF risk prediction, but also understanding of the effects of each parameter, potentially leading to personalized medicine.

Sources of Funding

This study did not receive any specific funding.

Disclosures

K. Furukawa has received lecture fees from Sanofi K.K., Eli Lilly Japan K.K., and Ono Pharmaceutical Co. Ltd. A. Nomura has received consulting fees from CureApp, Inc. The remaining authors have no conflicts of interest to declare.

IRB Information

This study was approved by the Ethics Committee of KMA (No. 16000003) and Kanazawa University (No. 2019-080). Supplementary Figure 1. Supplementary Figure 2.

27 in total

1. Heart Disease and Stroke Statistics-2016 Update: A Report From the American Heart Association.

Authors: Dariush Mozaffarian; Emelia J Benjamin; Alan S Go; Donna K Arnett; Michael J Blaha; Mary Cushman; Sandeep R Das; Sarah de Ferranti; Jean-Pierre Després; Heather J Fullerton; Virginia J Howard; Mark D Huffman; Carmen R Isasi; Monik C Jiménez; Suzanne E Judd; Brett M Kissela; Judith H Lichtman; Lynda D Lisabeth; Simin Liu; Rachel H Mackey; David J Magid; Darren K McGuire; Emile R Mohler; Claudia S Moy; Paul Muntner; Michael E Mussolino; Khurram Nasir; Robert W Neumar; Graham Nichol; Latha Palaniappan; Dilip K Pandey; Mathew J Reeves; Carlos J Rodriguez; Wayne Rosamond; Paul D Sorlie; Joel Stein; Amytis Towfighi; Tanya N Turan; Salim S Virani; Daniel Woo; Robert W Yeh; Melanie B Turner
Journal: Circulation Date: 2015-12-16 Impact factor: 29.690

2. HbA1c and atrial fibrillation: a cross-sectional study in Japan.

Authors: Yasuyuki Iguchi; Kazumi Kimura; Kensaku Shibazaki; Junya Aoki; Kenichiro Sakai; Yuki Sakamoto; Junichi Uemura; Shinji Yamashita
Journal: Int J Cardiol Date: 2010-11-19 Impact factor: 4.164

3. Development of a Basic Risk Score for Incident Atrial Fibrillation in a Japanese General Population　- The Suita Study.

Authors: Yoshihiro Kokubo; Makoto Watanabe; Aya Higashiyama; Yoko M Nakao; Kengo Kusano; Yoshihiro Miyamoto
Journal: Circ J Date: 2017-05-25 Impact factor: 2.993

4. Impact of atrial fibrillation on the risk of death: the Framingham Heart Study.

Authors: E J Benjamin; P A Wolf; R B D'Agostino; H Silbershatz; W B Kannel; D Levy
Journal: Circulation Date: 1998-09-08 Impact factor: 29.690

5. Cross-sectional Association Between Blood Pressure Status and Atrial Fibrillation in an Elderly Chinese Population.

Authors: Yi Chen; Qi-Fang Huang; Chang-Sheng Sheng; Lei Lei; Shao-Kun Xu; Wei Zhang; Shuai Shao; Dian Wang; Yi-Bang Cheng; Ying Wang; Qian-Hui Guo; Dong-Yan Zhang; Yan Li; Yong Li; S Ben Freedman; Ji-Guang Wang
Journal: Am J Hypertens Date: 2019-07-17 Impact factor: 2.689

6. Prevalence, self-awareness, and LDL cholesterol levels among patients highly suspected as familial hypercholesterolemia in a Japanese community.

Authors: Hayato Tada; Junichi Shibayama; Tetsuo Nishikawa; Hirofumi Okada; Akihiro Nomura; Soichiro Usui; Kenji Sakata; Atsushi Hashiba; Akihiro Inazu; Masayuki Takamura; Masa-Aki Kawashiri
Journal: Pract Lab Med Date: 2020-10-19

7. Worldwide epidemiology of atrial fibrillation: a Global Burden of Disease 2010 Study.

Authors: Sumeet S Chugh; Rasmus Havmoeller; Kumar Narayanan; David Singh; Michiel Rienstra; Emelia J Benjamin; Richard F Gillum; Young-Hoon Kim; John H McAnulty; Zhi-Jie Zheng; Mohammad H Forouzanfar; Mohsen Naghavi; George A Mensah; Majid Ezzati; Christopher J L Murray
Journal: Circulation Date: 2013-12-17 Impact factor: 29.690

8. Independent risk factors for atrial fibrillation in a population-based cohort. The Framingham Heart Study.

Authors: E J Benjamin; D Levy; S M Vaziri; R B D'Agostino; A J Belanger; P A Wolf
Journal: JAMA Date: 1994-03-16 Impact factor: 56.272

9. Global, regional, and national incidence, prevalence, and years lived with disability for 328 diseases and injuries for 195 countries, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016.

Authors:
Journal: Lancet Date: 2017-09-16 Impact factor: 79.321

10. Simple risk model predicts incidence of atrial fibrillation in a racially and geographically diverse population: the CHARGE-AF consortium.

Authors: Alvaro Alonso; Bouwe P Krijthe; Thor Aspelund; Katherine A Stepas; Michael J Pencina; Carlee B Moser; Moritz F Sinner; Nona Sotoodehnia; João D Fontes; A Cecile J W Janssens; Richard A Kronmal; Jared W Magnani; Jacqueline C Witteman; Alanna M Chamberlain; Steven A Lubitz; Renate B Schnabel; Sunil K Agarwal; David D McManus; Patrick T Ellinor; Martin G Larson; Gregory L Burke; Lenore J Launer; Albert Hofman; Daniel Levy; John S Gottdiener; Stefan Kääb; David Couper; Tamara B Harris; Elsayed Z Soliman; Bruno H C Stricker; Vilmundur Gudnason; Susan R Heckbert; Emelia J Benjamin
Journal: J Am Heart Assoc Date: 2013-03-18 Impact factor: 5.501