Literature DB >> 34159863

Effects of data preprocessing on results of the epidemiological analysis of coronary heart disease and behaviour-related risk factors.

Ari Voutilainen1, Christina Brester2, Mikko Kolehmainen2, Tomi-Pekka Tuomainen1.   

Abstract

BACKGROUND: We carried out this study to demonstrate the effects of outcome sensitivity, participant exclusions, and covariate manipulations on results of the epidemiological analysis of coronary heart disease (CHD) and its behaviour-related risk factors.
MATERIAL AND METHODS: Our study population consisted of 1592 54-year-old men, who participated in the Kuopio Ischaemic Heart Disease Risk Factor (KIHD) Study. We used the Cox proportional-hazards model to predict the hazard of CHD and applied different sets of outcomes concerning outcome sensitivity and data preprocessing procedures regarding participant exclusions and covariate manipulations.
RESULTS: The mean follow-up time was 23 years, and 730 men received the CHD diagnosis. Cox regressions based on data with no participant exclusions most often discovered statistically significant associations. Loose inclusion criteria for study participants with any CVD during the follow-up and strict exclusion criteria for participants with no CVD were best in discovering the associations between risk factors and CHD. Outcome sensitivity affected the associations, whereas the covariate type, continuous or categorical, did not.
CONCLUSIONS: This study suggests that excluding study participants who are not disease-free at baseline is probably unnecessary for epidemiological analyses. Epidemiological research reports should present results based on no data exclusions together with results based on reasoned exclusions.

Entities:  

Keywords:  Categorical covariate; continuous covariate; coronary heart disease; exclusion criterion; outcome sensitivity

Year:  2021        PMID: 34159863      PMCID: PMC8231358          DOI: 10.1080/07853890.2021.1921838

Source DB:  PubMed          Journal:  Ann Med        ISSN: 0785-3890            Impact factor:   4.709


Introduction

Typically, epidemiologic research produces at least partly contradictory results. Some reasons explaining this incoherence i.e. unexpectedly large variations in results across closely related studies, are only indirectly related to research, such as clinical factors and healthcare systems. Many reasons, however, originate from study designs, methodological choices, concept definitions, and observed data [1,2]. Reasons related to datasets include at least differences in sample size and representativeness of covariates. In prospective cohort studies, also the length of follow-up with respect to age at baseline amongst study participants, as in the risk of coronary heart disease (CHD) associated with high levels of C-reactive protein [3], and possible competing events affect the interpretation of study results [4]. Research regarding the epidemiologic relationship between CHD and risk factors has received a consensus during the past decades. There are six undisputable behaviour-related risk factors for CHD: tobacco smoke [5], overweight [6], physical inactivity [7], hypertension [8], diabetes [9], and hypercholesterolaemia [10]. Other behaviour-related factors, such as alcohol consumption and stress also may increase the risk for CHD, but their associations with CHD vary across studies. The association between alcohol and CHD is nonlinear [11], and stress is a symptom of different conditions, such as psychosocial aspects of work [12], which may or may not be associated with the risk of CHD. Yet other risk factors of CHD that at least indirectly relate to behaviour through diet are homocysteine, fibrinogen, and inflammation [13]. Moreover, there may be a weak association between iron status and CHD [14]. In addition to the behaviour-related factors, non-modifiable factors including age, male gender, genetics, and a family history of CHD increase the risk for CHD [13,15,16]. Differences between men and women regarding the risk of CHD relate mainly to oestrogens and, thus, premenopausal women [13]. The role played by personality in the development of CHD is controversial [17]. The purpose of this study was to demonstrate the effects of data exclusions, outcome variable selection, and covariate manipulations on the interpretation of the epidemiologic relationship between CHD and its traditional risk factors. These are predominantly subjective researcher-related actions unlike more technical questions, such as whether to consider competing events in statistical analyses or whether to use non-conventional statistical methods, such as neural networks [18], to deal with data-related matters. As a result of this study, we expected a combination of outcome variable selection, participant exclusion, and covariate manipulation procedures that best discovers presumable associations between CHD and risk factors.

Material and methods

Material

Men, n = 1592, from the Kuopio Ischaemic Heart Disease Risk Factor (KIHD) Study served as a study material. The KIHD Study is an ongoing prospective cohort study originally established to discover previously unestablished reasons for the extremely high AMI prevalence among eastern Finnish men [19]. To control the effect of age on CHD we selected men representing the same age cohort, 54-year-old at baseline between March 1984 and December 1989. Briefly, 778 of them had one or more CVDs at baseline based on self-reports to the question: Has your doctor told you that you have ‘the name of CVD’, and 1181 of them were diagnosed, during an inpatient special health care admission, as having CVDs, ICD-10 codes I00-I99 [20], by the end of 2017. Moreover, 381 men used medication for hypertension, 77 had insulin or non-insulin treated diabetes, and nine used medication for hypercholesterolaemia at baseline. The mean (SD) follow-up time was 23.4 (9.3) years. Table 1 presents study participants’ baseline characteristics with respect to variables used as exclusion criteria, covariates, and conditions and events diagnosed during the follow-up. All KIHD participants had given written informed consent, and the ethical committee of the Kuopio University had approved the KIHD Study (December 1, 1983). In 1980s, the committee did not necessarily provide study numbers but identified studies by date.
Table 1.

Baseline characteristics (the total column) and numbers of study participants with the following conditions diagnosed during the follow-up: any cardiovascular disease (CVD), coronary heart disease (CHD), a myocardial infarction (MI) or unstable angina (UA), and a fatal acute myocardial infarction (AMI).

  Conditions and events diagnosed during the follow-up
CharacteristicTotalCVDCHDMI or UAAMI
n15921181730502136
CVD, excluding hypertension, n (%)672 (42)542 (46)375 (51)260 (52)83 (61)
Use of hypertension medication, n (%)381 (24)318 (27)240 (33)168 (34)58 (43)
Diabetes, n (%)77 (4.8)61 (5.2)47 (6.4)41 (8.2)10 (7.4)
Use of cholesterol medication, n (%)9 (0.6)9 (0.8)8 (1.1)5 (1.0)2 (1.5)
Cigarette-yeara, mean (SD)336 (392)339 (402)356 (403)387 (428)431 (478)
Never-smokers, n (%)517 (33)377 (32)217 (30)133 (27)34 (25)
Former smokers, n (%)572 (36)443 (38)274 (38)196 (39)42 (31)
Current smokers, n (%)503 (32)361 (31)239 (33)173 (35)60 (44)
Alcohol, grams/week, mean (SD)71 (141)66 (105)62 (92)64 (96)75 (100)
No risk, ≤ 1 portion/week, n (%)604 (38)449 (38)285 (39)186 (37)53 (39)
Moderate risk, ≤ 3 portions/day, n (%)906 (57)676 (57)415 (57)295 (59)75 (55)
High risk, > 3 portions/day, n (%)82 (5.2)56 (4.7)30 (4.1)21 (4.2)8 (5.9)
Body Mass Index (BMI), mean (SD)27 (3.7)27 (3.7)27 (3.7)27 (3.7)28 (4.5)
Normal weight, BMI < 25.0 kg/m2, n (%)480 (30)324 (27)183 (25)118 (24)35 (26)
Overweight, BMI 25.0 − 29.9 kg/m2, n (%)830 (52)622 (53)400 (55)278 (55)60 (44)
Obese, BMI ≥ 30.0, kg/m2, n (%)282 (18)235 (20)147 (20)106 (21)41 (30)
Physical activityb, kcal/day, mean (SD)2380 (899)2377 (886)2349 (888)2326 (877)2381 (987)
Moderate, PALc < 2.00, n (%)293 (18)222 (19)144 (20)101 (20)29 (21)
Vigorous, PAL 2.00 − 2.40, n (%)507 (32)377 (32)236 (33)165 (33)41 (30)
Extreme, PAL > 2.40, n (%)774 (49)571 (49)341 (47)229 (46)65 (48)
No data available, n1811971
Systolic blood pressure, mean (SD)136 (18)137 (18)136 (18)137 (19)139 (18)
Desirable, < 120 mmHg, n (%)269 (17)186 (16)122 (17)89 (18)17 (13)
Borderline, 120 − 139 mmHg, n (%)763 (48)552 (47)331 (45)213 (42)61 (45)
High, ≥ 140 mmHg, n (%)560 (35)443 (38)277 (38)200 (40)58 (43)
Fasting blood glucose, mean (SD)4.8 (1.2)4.9 (1.3)4.9 (1.4)5.0 (1.5)5.1 (1.6)
Desirable, < 5.6 mmol/L, n (%)1435 (90)1056 (89)636 (87)432 (86)112 (82)
Borderline, 5.6 − 6.9 mmol/L, n (%)96 (6.0)71 (6.0)53 (7.3)40 (8.0)12 (8.8)
High, > 6.9 mmol/L, n (%)61 (3.8)54 (4.6)41 (5.6)30 (6.0)12 (8.8)
Serum total cholesterol, mean (SD)6.0 (1.1)6.0 (1.1)6.1 (1.2)6.2 (1.2)6.3 (1.2)
Desirable, < 5.2 mmol/L, n (%)383 (24)270 (23)151 (21)93 (19)25 (19)
Borderline, 5.2 − 6.2 mmol/L, n (%)607 (38)443 (38)274 (38)190 (38)46 (34)
High, > 6.2 mmol/L, n (%)602 (38)468 (40)305 (42)219 (44)65 (48)

Cigarettes per day times years of smoking.

Total energy expenditure (TEE) minus basal energy expenditure (BEE).

Physical activity level, TEE divided by BEE.

Baseline characteristics (the total column) and numbers of study participants with the following conditions diagnosed during the follow-up: any cardiovascular disease (CVD), coronary heart disease (CHD), a myocardial infarction (MI) or unstable angina (UA), and a fatal acute myocardial infarction (AMI). Cigarettes per day times years of smoking. Total energy expenditure (TEE) minus basal energy expenditure (BEE). Physical activity level, TEE divided by BEE.

Outcome variables

The KIHD Study includes annually updated data from the Care Register for Health Care of the Finnish Institute for Health and Welfare regarding diagnoses given during special health care admissions (License THL/93/5.05.00/2013) and from the Causes of Death Register of the Statistics Finland (License TK-53-1770-16). To study the effects of outcome sensitivity on model results we constructed four different outcome variables based on these register linkages. The first outcome was ‘CVD’ referring to ICD 10 codes I00 − I99. The second outcome was ‘CHD’ referring to codes I20 − I25. The third outcome was ‘MI or UA’ and it referred to codes I20.0 and I21 − I22. The fourth outcome ‘a fatal AMI’ referred to as I21.

Covariates

First, we selected the most common risk factors of CHD based on literature and, second, we searched variables that represent these risk factors from the KIHD Study database. The chosen risk factors were smoking, obesity, physical inactivity, hypertension, diabetes, and hypercholesterolaemia. Hajar [21], for example, summaries the association between these six risk factors and CHD. In addition to the indisputable risk factors of CHD, we included alcohol consumption as a covariate in the analyses. Alcohol, in general, increases mortality and morbidity [22], but the association between alcohol consumption and CHD is visualized by a J-shaped curve; light-to-moderate drinking acts as a protective factor, whereas heavy drinking increases the risk of CHD [11]. We expected that our analyses at best would demonstrate this nonlinear relationship between alcohol consumption and CHD. In the KIHD Study, participants self-reported their smoking behaviour, alcohol consumption, and physical activity at baseline. As a continuous smoking variable, we chose a cigarette-year that indicates the number of cigarettes per day multiplied by the number of years smoked. Moreover, we classified the participants as never-smokers, former smokers, and current smokers. Former smokers informed that they have not smoked within a month. The KIHD continuous alcohol consumption variable indicates the amount of alcohol as grams per week. For this study, we categorized the participants into those with no health risk due to the alcohol consumption, one portion (12 grams of pure alcohol according to Finnish standards) per week at most, those with a moderate health risk, three portions per day at most, and those with a high health risk. This categorization is mainly data-specific, although it sparsely follows Finnish current care guidelines published only in Finnish. Broadly, alcohol increases mortality and morbidity and, in men, more than three to four portions, 40 grams of pure alcohol, per day increase them significantly [22]. To determine study participants’ physical activity we, first, calculated the basal energy expenditure (BEE) based a body weight, body height, and age applying the Mifflin-St Jeor Equation [23]. Second, we subtracted BEE from the total energy expenditure (TEE) and used this TEE − BEE variable in the analyses as a continuous variable. To create activity ranks, we computed the physical activity level (PAL) by dividing TEE by BEE and classified the participants as follows: moderately active, PAL < 2.00, vigorously active, PAL 2.00 − 2.40, and extremely active, PAL > 2.40 [24]. In the KIHD cohort, practically, all participants were at least moderately active at baseline. Eight participants of this study had not reported their physical activity. Body weights and heights were not self-reported but measured by a research nurse during the baseline examination. Based on these measures we calculated the Body Mass Index (BMI) by dividing the weight in kilograms by the square of height in metres. In the analyses, we obeyed the standard guidelines for BMI: <25.0 kg/m2 refers to normal weight, 25.0−29.9 kg/m2 to overweight, and ≥30.0 kg/m2 to obesity [25] and classified the participants according to them. On the first baseline examination day, one research nurse measured the participant’s blood pressure six times with a random-zero mercury sphygmomanometer. After a supine rest of five minutes, the nurse took three measurements in supine, two in sitting, and one in a standing position with 5-min intervals. In the present analyses, we used the mean of six systolic blood pressures (SBP) values as a continuous variable. To distribute study participants into groups according to SBP, we followed the thresholds suggested by Mayo Clinic: SBP < 120 mmHg is a desirable level and SBP > 139 mmHg indicates hypertension [26]. Study participants gave blood samples between 8 and 10 a.m. after abstaining from alcohol for three days and from smoking and eating for 12 h. After a supine rest of 30 min, a research nurse draw blood with Terumo Venoject VT-100PZ vacuum tubes (Terumo Corp., Tokyo, Japan) using no tourniquet. The laboratory of our institute used an enzymatic method to measure STC concentrations (CHOD-PAP, Boehringer Mannheim, Mannheim, West Germany) and a glucose dehydrogenase method (Merck, Darmstadt, West Germany) after protein precipitation with TCA using a clinical chemistry analyzer (Kone Specific, KONE Instruments Oy, Espoo, Finland) to measure FBG concentrations. Salonen et al. [27] describe the lipid analysis in detail. For the present analyses, we classified the participants according to the serum total cholesterol (SCT) as follows: <5.2 mmol/L is a desirable level and >6.2 mmol/L indicates hypercholesterolaemia [28]. Correspondingly, we distributed the participants into groups according to the fasting blood glucose (FBG) as follows: < 5.6 mmol/L is a desirable level and > 6.9 indicates diabetes [29].

Statistical analyses

The Cox proportional-hazards model [30] served as an analysis method and IBM® SPSS® Statistics Version 25 served a statistical platform. In all analyses, we applied three different data exclusion criteria (Figure 1). The first criterion, termed as Criterion A later in the text, excluded study participants according to conditions. Precisely, we excluded participants, who reported that they have any CVD or diabetes at baseline or that they use hypercholesterolaemia medication. This exclusion criteria reduced the number of study participants from 1592 to 794. The second criterion, Criterion B, excluded study participants, who reported that they have a CVD, except for hypertension, at baseline. This criterion resulted in 920 participants. The third criterion, Criterion C, meant no exclusions. Correspondingly, in all analyses, we used CVD, CHD, AMI or UA, and a fatal AMI as dependent variables. These four “nested” outcomes demonstrate the outcome variable selection process with respect to outcome sensitivity. Moreover, to study the effect of covariate manipulations on the Cox model results, we executed Cox regressions adjusted for seven covariates, the six traditional risk factors and alcohol consumption that were either in their original continuous form or distributed in predetermined categories.
Figure 1.

Procedure for statistical analyses. First, analysis sets (AS) 1 and 2 studied effects of participants exclusions on analysis results prospectively; the exclusions were based on baseline characteristics (Criteria A − C). Second, AS 1 and 2 studied effects of covariate manipulations (continuous vs. categorical) on analysis results. Third, AS 3 studied effects of participant exclusions on analysis results retrospectively; the exclusions were based on outcomes (Scenarios Y and Z). Fourth, all AS studied effects of outcome sensitivity on analysis results (see Tables 2–4).

Procedure for statistical analyses. First, analysis sets (AS) 1 and 2 studied effects of participants exclusions on analysis results prospectively; the exclusions were based on baseline characteristics (Criteria A − C). Second, AS 1 and 2 studied effects of covariate manipulations (continuous vs. categorical) on analysis results. Third, AS 3 studied effects of participant exclusions on analysis results retrospectively; the exclusions were based on outcomes (Scenarios Y and Z). Fourth, all AS studied effects of outcome sensitivity on analysis results (see Tables 2–4). Altogether, we performed three analysis sets (Figure 1). The first set included covariates as continuous variables and tested their associations with CVD, CHD, AMI or UA, and a fatal AMI separately for each data exclusions criterion, A, B, and C. The second set included covariates as categorical variables. The reference categories were as follows: never-smoker, no health risk due to the alcohol consumption, normal weight, moderately physically active, desirable SBP, desirable FBG, and desirable STC. As the first set, the second set tested associations of covariates with CVD, CHD, AMI or UA, and a fatal AMI separately for each data exclusions criterion, A, B, and C. The third set, also, included covariates as categorical variables but used different data exclusion criteria for study participants, who received a CVD diagnosis during the follow-up, and for those, who did not. The third analysis set constituted two analysis scenarios (Figure 1). In the first scenario, termed as Scenario Y later in the text, the exclusion of men with CVD during the follow-up was based on Criterion A and that of men with no CVD during the follow-up was based on Criterion C i.e. no exclusions. This resulted in 957 study participants eligible for the analysis. In the second scenario, Scenario Z, the exclusion of men with CVD during the follow-up was based on Criterion C and that of men with no CVD during the follow-up on Criterion A. Scenario Z resulted in 1430 study participants.

Results

Outcome sensitivity

CVD and a fatal AMI associated with covariates differently compared to each other as well as compared to CHD and MI or UA (Tables 2–4). CVD was the outcome that most evidently associated with SBP; a high SBP increased the risk of CVD. A fatal AMI in turn was the only outcome that showed only statistically non-significant associations with SBP and physical activity. CHD and MI or UA highlighted the same risk factors. Specifically, they associated with STC more strongly than CVD and a fatal AMI did. Hazard ratios and corresponding p-values of any cardiovascular disease (CVD), coronary heart disease (CHD), a myocardial infarction (MI) or unstable angina (UA), and a fatal acute myocardial infarction (AMI) with respect to one unit (1 U) or one standard deviation (1 D) increase in seven factors used as continues covariates in the Cox proportional-hazards model. Note. A refers to a dataset excluding CVD, diabetes, and high total cholesterol at baseline (n = 794). B refers to a dataset excluding CVD, except for hypertension, at baseline (n = 920). C refers to a dataset with no exclusions (n = 1592). Bold font indicates a statistically significant HR. Cigarettes per day times years of smoking. Alcohol consumption grams/week. Body Mass Index, weight in kg divided by the square of height in m. Physical activity level, total energy expenditure minus basal, in kcal per day. Systolic blood pressure in mmHg. Fasting blood glucose in mmol/L. Serum total cholesterol in mmol/L.

Participant exclusions

Cox regressions based on data with no exclusions most often discovered statistically significant associations of CHD with its risk factors, irrespective of covariate manipulations (Tables 2 and 3). In all these associations, the direction of the association was correct i.e. the risk factors related to hazard ratios (HR) larger than one and the protective factors related to HRs below one. Only regressions based on data with no exclusions identified, statistically significantly, the protective effect of physical activity; the highest category versus the lowest one. Appendix presents sample size calculations regarding the main outcome of this study, CHD, and Criterion A that excluded study participants according to conditions at baseline.
Table 2.

Hazard ratios and corresponding p-values of any cardiovascular disease (CVD), coronary heart disease (CHD), a myocardial infarction (MI) or unstable angina (UA), and a fatal acute myocardial infarction (AMI) with respect to one unit (1 U) or one standard deviation (1 D) increase in seven factors used as continues covariates in the Cox proportional-hazards model.

 CVD
CHD
MI or UA
AMI
Covariate1U1Dp1U1Dp1U1Dp1U1Dp
A. Smokinga1.001.20<.011.001.26<.011.001.36<.011.001.27.16
B. Smoking1.001.17<.011.001.23<.011.001.32<.011.001.18.28
C. Smoking1.001.18<.011.001.22<.011.001.33<.011.001.42<.01
A. ACb1.001.05.321.001.01.891.001.01.871.001.15.39
B. AC1.001.05.301.001.00.961.000.97.681.001.11.48
C. AC1.001.00.951.000.94.241.000.95.391.001.06.58
A. BMIc1.031.12.021.041.13.051.031.12.141.081.29.06
B. BMI1.041.14<.011.051.19<.011.051.19.011.081.31.04
C. BMI1.051.20<.011.051.20<.011.051.21<.011.081.32<.01
A. PALd1.000.99.781.000.95.431.000.96.641.001.14.42
B. PAL1.000.98.661.000.93.251.000.94.431.001.15.36
C. PAL1.000.93.041.000.90.021.000.89.021.000.90.28
A. SBPe1.011.18<.011.001.05.441.001.03.731.011.18.32
B. SBP1.011.17<.011.011.09.151.011.09.211.011.24.14
C. SBP1.011.18<.011.011.10.021.011.13.011.011.16.09
A. FBGf1.191.13.031.391.26<.011.511.32<.011.551.35.01
B. FBG1.181.19<.011.141.15<.011.191.20<.011.251.27.01
C. FBG1.131.16<.011.101.12<.011.091.11.011.121.15.02
A. STCg1.041.04.341.171.18.011.201.21.011.181.19.30
B. STC1.031.04.421.151.17.011.221.24<.011.221.24.11
C. STC1.081.09.011.181.20<.011.201.23<.011.251.28<.01

Note. A refers to a dataset excluding CVD, diabetes, and high total cholesterol at baseline (n = 794). B refers to a dataset excluding CVD, except for hypertension, at baseline (n = 920). C refers to a dataset with no exclusions (n = 1592). Bold font indicates a statistically significant HR.

Cigarettes per day times years of smoking.

Alcohol consumption grams/week.

Body Mass Index, weight in kg divided by the square of height in m.

Physical activity level, total energy expenditure minus basal, in kcal per day.

Systolic blood pressure in mmHg.

Fasting blood glucose in mmol/L.

Serum total cholesterol in mmol/L.

Table 3.

Hazard ratios (HR), probabilities (P), and corresponding p-values of any cardiovascular disease (CVD), coronary heart disease (CHD), a myocardial infarction (MI) or unstable angina (UA), and a fatal acute myocardial infarction (AMI) with respect to seven factors used as categorical covariates in the Cox proportional-hazards model.

 CVD
CHD
MI or UA
AMI
CovariateHRPpHRPpHRPpHRPp
A. Former smoker1.070.52.531.150.53.351.490.60.030.760.43.51
B. Former smoker1.110.53.301.190.54.201.420.59.040.720.42.38
C. Former smoker1.190.54.021.240.55.021.400.58<.011.090.52.71
A. Current smoker1.500.60<.011.780.64<.012.310.70<.012.360.70.03
B. Current smoker1.490.60<.011.640.62<.012.050.67<.012.000.67.04
C. Current smoker1.500.60<.011.690.63<.011.940.66<.012.740.73<.01
A. ACa 13 − 2521.050.51.640.950.49.680.980.49.920.630.39.18
B. AC 13 − 2521.040.51.680.900.47.350.970.49.820.840.46.58
C. AC 13 − 2521.030.51.640.900.47.170.960.49.660.800.44.22
A. AC >2521.290.56.301.130.53.731.050.51.921.480.60.60
B. AC >2521.200.55.410.940.48.850.860.46.721.190.54.82
C. AC >2521.160.54.310.910.48.640.980.49.931.230.55.59
A. BMIb 25.0 − 29.91.140.53.201.190.54.211.330.57.090.900.47.78
B. BMI 25.0 − 29.91.180.54.071.270.56.061.410.59.030.930.48.84
C. BMI 25.0 − 29.91.260.56<.011.420.59<.011.490.60<.011.040.51.86
A. BMI ≥30.01.410.59.021.510.60.031.470.60.122.850.74.02
B. BMI ≥30.01.540.61<.011.640.62.011.700.63.022.510.72.02
C. BMI ≥30.01.750.64<.011.790.64<.011.900.66<.012.370.70<.01
A. PALc Vigorous1.050.51.730.940.48.730.930.48.740.790.44.63
B. PAL Vigorous1.070.52.590.980.49.890.990.50.961.140.53.78
C. PAL Vigorous0.910.48.240.860.46.170.840.46.190.720.42.18
A. PAL Extreme1.020.50.850.890.47.460.890.47.561.060.51.90
B. PAL Extreme1.030.51.790.870.47.340.890.47.521.310.57.54
C. PAL Extreme0.840.46.030.780.44.010.740.43.020.720.42.16
A. SBPd 120 − 1391.110.53.380.990.50.960.890.47.571.980.66.22
B. SBP 120 − 1391.160.54.201.020.50.900.870.47.452.260.69.13
C. SBP 120 − 1391.120.53.200.960.49.680.840.46.181.240.55.44
A. SBP ≥1401.500.60<.011.170.54.391.020.50.952.150.68.19
B. SBP ≥1401.500.60<.011.200.55.271.060.51.792.710.73.07
C. SBP ≥1401.420.59<.011.130.53.281.090.52.521.480.60.18
A. FBGe 5.6 − 6.91.510.60.071.670.63.081.910.66.050.780.44.81
B. FBG 5.6 − 6.91.400.58.081.680.63.032.110.68.011.120.53.88
C. FBG 5.6 − 6.91.300.57.041.730.63<.011.780.64<.012.050.67.02
A. FBG >6.93.070.75<.017.210.88<.015.290.84<.012.950.75.16
B. FBG >6.93.030.75<.013.640.78<.013.900.80<.012.730.73.07
C. FBG >6.92.530.72<.012.610.72<.012.280.70<.012.950.75<.01
A. STCf 5.2 − 6.21.160.54.201.420.59.041.320.57.191.410.59.47
B. STC 5.2 − 6.21.070.52.521.230.55.161.170.54.380.910.48.82
C. STC 5.2 − 6.21.090.52.291.240.55.031.350.57.021.210.55.45
A. STC >6.21.190.54.141.470.60.021.570.61.031.910.66.15
B. STC >6.21.070.52.521.270.56.101.410.59.041.480.60.28
C. STC >6.21.200.55.021.390.58<.011.560.61<.011.620.62.04

Note. A refers to a dataset excluding CVD, diabetes, and high total cholesterol at baseline (n = 794). B refers to a dataset excluding CVD, except for hypertension, at baseline (n = 920). C refers to a dataset with no exclusions (n = 1592). Bold font indicates a statistically significant HR.

Alcohol consumption in g/week.

Body Mass Index, kg/m2.

Physical activity level, the total energy expenditure divided by the basal energy expenditure, moderate <2.00, extreme >2.40.

Systolic blood pressure in mmHg.

Fasting blood glucose in mmol/L.

Serum total cholesterol in mmol/L.

Hazard ratios (HR), probabilities (P), and corresponding p-values of any cardiovascular disease (CVD), coronary heart disease (CHD), a myocardial infarction (MI) or unstable angina (UA), and a fatal acute myocardial infarction (AMI) with respect to seven factors used as categorical covariates in the Cox proportional-hazards model. Note. A refers to a dataset excluding CVD, diabetes, and high total cholesterol at baseline (n = 794). B refers to a dataset excluding CVD, except for hypertension, at baseline (n = 920). C refers to a dataset with no exclusions (n = 1592). Bold font indicates a statistically significant HR. Alcohol consumption in g/week. Body Mass Index, kg/m2. Physical activity level, the total energy expenditure divided by the basal energy expenditure, moderate <2.00, extreme >2.40. Systolic blood pressure in mmHg. Fasting blood glucose in mmol/L. Serum total cholesterol in mmol/L. The comparison between Scenarios Y and Z showed that strict data exclusions regarding men with no CVD during the follow-up combined with no exclusions regarding men with CVD during the follow-up yielded more often statistically significant and plausible results than no data exclusions concerning men with no CVD and strict exclusions regarding men with CVD (Table 4).
Table 4.

Hazard ratios (HR), probabilities (P), and corresponding p-values of any cardiovascular disease (CVD), coronary heart disease (CHD), a myocardial infarction (MI) or unstable angina (UA), and a fatal acute myocardial infarction (AMI) with respect to seven factors used as categorical covariates in the Cox proportional-hazards model.

 CVD
CHD
MI or UA
AMI
CovariateHRPpHRPpHRPpHRPp
Y. Former smoker1.060.51.561.010.50.521.370.58.090.750.43.48
Z. Former smoker1.180.54.021.250.56.021.450.59<.011.170.54.52
Y. Current smoker1.360.58.011.560.61<.011.960.66<.011.960.66.07
Z. Current smoker1.590.61<.011.790.64<.012.140.68<.013.160.76<.01
Y. ACa 13 − 2521.080.52.420.960.49.751.030.51.880.690.41.26
Z. AC 13 − 2521.000.50.980.890.47.140.900.47.310.730.42.10
Y. AC > 2521.220.55.420.860.46.670.870.47.741.250.56.77
Z. AC > 2521.190.54.241.130.53.521.160.54.531.420.59.37
Y. BMIb 25.0 − 29.91.120.53.231.130.53.361.210.55.260.830.45.62
Z. BMI 25.0 − 29.91.250.56<.011.420.59<.011.550.61<.011.130.53.59
Y. BMI ≥ 30.01.460.59.011.230.55.291.170.54.521.970.66.13
Z. BMI ≥ 30.01.660.62<.011.940.66<.012.160.68<.013.080.75<.01
Y. PALc Vigorous1.010.50.950.940.48.700.920.48.680.820.45.68
Z. PAL Vigorous0.920.48.320.870.47.200.850.46.200.710.42.16
Y. PAL Extreme0.970.49.830.87047.410.860.46.441.090.52.85
Z. PAL Extreme0.860.46.070.790.44.020.760.43.020.700.41.12
Y. SBPd 120 − 1391.160.54.210.970.49.840.860.46.461.970.66.22
Z. SBP 120 − 1391.080.52.360.980.49.860.880.47.331.300.57.35
Y. SBP ≥ 1401.570.61<.011.190.54.331.050.51.822.020.67.22
Z. SBP ≥ 1401.380.58<.011.140.53.261.090.52.541.540.61.14
Y. FBGe 5.6 − 6.91.120.53.611.340.57.311.460.59.250.570.36.59
Z. FBG 5.6 − 6.91.520.60<.011.930.66<.012.150.68<.012.550.72<.01
Y. FBG > 6.91.860.65.072.860.74<.012.210.69.041.890.65.39
Z. FBG > 6.92.960.75<.013.270.77<.012.990.75<.014.360.81<.01
Y. STCf 5.2 − 6.21.190.54.141.490.60.021.480.60.061.500.60.38
Z. STC 5.2 − 6.21.070.52.361.260.56.021.250.56.081.110.53.68
Y. STC > 6.21.200.55.121.480.60.021.660.62.011.930.66.14
Z. STC > 6.21.200.55.021.430.59<.011.500.60<.011.540.61.07

Note. Y refers to a dataset with no exclusions for study participants with no CVD during the follow-up (n = 411) and excluding CVD, diabetes, and high total cholesterol at baseline for study participants with CVD during the follow-up (n = 546). Z refers to a dataset excluding CVD, diabetes, and high total cholesterol at baseline for study participants with no CVD during the follow-up (n = 248) and no exclusions for study participants with CVD during the follow-up (n = 1182). Bold font indicates a statistically significant HR.

Alcohol consumption (g/week).

Body Mass Index (kg/m2).

Physical activity level, the total energy expenditure divided by the basal energy expenditure, moderate <2.00, extreme >2.40.

Systolic blood pressure (mmHg).

Fasting blood glucose (mmol/L).

Serum total cholesterol (mmol/L).

Hazard ratios (HR), probabilities (P), and corresponding p-values of any cardiovascular disease (CVD), coronary heart disease (CHD), a myocardial infarction (MI) or unstable angina (UA), and a fatal acute myocardial infarction (AMI) with respect to seven factors used as categorical covariates in the Cox proportional-hazards model. Note. Y refers to a dataset with no exclusions for study participants with no CVD during the follow-up (n = 411) and excluding CVD, diabetes, and high total cholesterol at baseline for study participants with CVD during the follow-up (n = 546). Z refers to a dataset excluding CVD, diabetes, and high total cholesterol at baseline for study participants with no CVD during the follow-up (n = 248) and no exclusions for study participants with CVD during the follow-up (n = 1182). Bold font indicates a statistically significant HR. Alcohol consumption (g/week). Body Mass Index (kg/m2). Physical activity level, the total energy expenditure divided by the basal energy expenditure, moderate <2.00, extreme >2.40. Systolic blood pressure (mmHg). Fasting blood glucose (mmol/L). Serum total cholesterol (mmol/L).

Covariate manipulations

There were only minor differences in Cox model results between analyses including covariates as continuous variables and those including covariates as categorical variables (Tables 2–4). Continuous and categorical covariates led to the same conclusions regarding the association of CHD with its risk factors. Being a former or current smoker, being overweight or obese, and having borderline high or high FBG or STC levels significantly increased the risk of CHD. The effect of high SBP levels on the risk of CHD was uncertain as well as the protective effect of physical activity. Our analyses found no statistically significant association between CHD and alcohol consumption.

Discussion

Traditionally, epidemiological studies use in their analyses only study participants who are free of the disease of interest at baseline. Our study suggests that excluding study participants who have the disease already at baseline is probably unnecessary. Specifically, our analyses led to the best results when we included all study participants who received the diagnosis during the follow-up irrespective of their self-reported baseline statuses but excluded all study participants who did not receive the diagnosis during the follow-up but had self-reported the disease at baseline. Moreover, our study does not, unconditionally, support participant exclusions with respect to covariates either. Excluding participants who are at risk already at baseline may enable discovering the strongest associations, such as the relationship between diabetes and CHD, but, simultaneously, it may fade out weaker, although relevant, associations, such as the relationship between physical activity and CHD. In other words, a combination of “loose cases” and “strict controls” may yield the best results. In the next paragraphs, we evaluate our results from the viewpoint of CHD risk factors. In our study, smoking, overweight, and high blood glucose levels, evidently, associated with CHD. Outcome variable selection, participant exclusion, and covariate manipulation procedures had no effects on conclusions drawn from results related to these three cornerstone risk factors. Being a current smoker or being obese (BMI ≥ 30.0) resulted in 1.5 times higher hazard compared to never smokers and normal-weight study participants, whereas diabetes (FBG > 6.9 mmol/L) approximately tripled the hazard of CHD. Large prospective cohort studies have reported even stronger effects of smoking and obesity on CHD already in 1960s [31]. The three times higher hazard of CHD among diabetic men seems to be a rule of thumb [9]. Total cholesterol and blood pressure were the covariates that most evidently revealed differences related to outcome sensitivity. Total cholesterol is only one of many measures of the lipid status of which all show somewhat unique associations with CHD and other CVDs [10,13]. Total cholesterol, for example, does not associate as strongly with the risk of stroke [32] as it associates with the risk of CHD [10]. Conversely, high blood pressure increases, specifically, the risk of stroke [33], which may for its part explain, together with reasons related to the sample size, why SBP associated statistically significantly with CVD but not with CHD and MIs in our study. Irrespective of outcome variable selection, participant exclusions, and covariate manipulations, our study found no statistically significant effects of alcohol consumption on the hazard of CHD. Although alcohol, in general, increases mortality and morbidity [22], light-to-moderate drinking may protect against CHD [11], which for its part may complicate the statistical detection of the association between alcohol consumption and CHD. Moreover, the association relates to the pattern of consumption i.e. binge drinking via the progression of atherosclerosis [34], which we did not considered in this study.

Limitations

Our results are based on one dataset and, therefore, them are not straightforwardly generalizable. Moreover, our study does not consider severity of diseases per se or diagnoses other than CVD, CHD, MI or UA, and AMI. All KIHD study participants, practically, were at least moderately active at baseline and nearly half of them were extremely active based on PAL values. This indicates the active lifestyles of the KIHD study participants; many of them were farmers or lumberjacks and highly interested in cross-country skiing, which to some extent distinguishes the KIHD cohort from otherwise similar cohorts. On the other hand, extreme physical activity levels, PAL > 2.4, are unrealistic in the long run because they lead to a negative energy balance i.e. weight loss [35]. This contradiction, most probably, is due to the KIHD assessment method of physical activity. In general, self-assessment physical activity questionnaires show low validity and reliability [36]. Consequently, the present TEE and PAL values are adequate for creating data-specific activity ranks [37] but not for comparing the KIHD cohort to other cohorts as such.

Conclusions

Our Cox model example of the epidemiological relationship between CHD and its common risk factors evidently demonstrated that outcome variable selection and participant exclusions must be considered when interpreting results of epidemiological analyses. Preprocessing procedures that were loose regarding study participants with any CVD during the follow-up and strict concerning study participants with no CVD during the follow-up were best in discovering the association between risk factors and CHD. Outcome sensitivity affected associations across covariates and outcomes. For example, total cholesterol associated, specifically, with CHD and MI or UA but weakly with CVD or AMI. The covariate type, continuous or categorial, had only minor effects on Cox model results. We strongly suggest that research reports present results based on no data exclusions together with results based on reasoned exclusions.
Table A1.

Results of sample size calculations regarding the main outcome of this study, coronary heart disease (CHD), with respect to Criterion A that excluded study participants according to conditions at baseline.

 Observed
Required n
 nCHD n (%)No CHD n (%)
Smoking: Never-smoker (Reference)296100 (34)196 (66)42131140
Smoking: Former smoker270101 (37)169 (63)3792 
Smoking: Current smoker22892 (40)136 (60) 912
Alcohol consumption: Light308112 (36)196 (64) 2854
Alcohol consumption: Moderate454172 (38)282 (62)1843 
Alcohol consumption: Heavy (Reference)329 (28)23 (72)184285
Normal weight (Reference)27894 (34)184 (66)1886794
Overweight415156 (38)259 (62)2829 
Obese10143 (43)58 (57) 318
Physical activity: Moderate14455 (38)89 (62) 6390
Physical activity: Vigorous25193 (37)158 (63)29089 
Physical activity: Extreme (Reference)391141 (36)250 (64)4848115975
Systolic blood pressure: Desirable14152 (37)89 (63)5860 
Systolic blood pressure: Prehypertension (Reference)404140 (35)264 (65)195351364
Systolic blood pressure: Hypertension249101 (41)148 (59) 818
Fasting blood glucose: Desirable (Reference)753269 (36)484 (64)5945239
Fasting blood glucose: Prediabetes3114 (45)17 (55)238 
Fasting blood glucose: Diabetes1010 (100)0 (0) 2
Serum total cholesterol: Desirable (Reference)19055 (29)135 (71)432177
Serum total cholesterol: Pre-hypercholesterolaemia315118 (37)197 (63)734 
Serum total cholesterol: Hypercholesterolaemia289120 (42)169 (58) 266

Note. Required n refers to comparisons across the category in which the proportion of CHD diagnoses during the follow-up was lowest (reference) and other categories applying a predetermined p-value of .05 (α) and a predetermined power of 0.80 (1 − β).

  30 in total

Review 1.  Limits to the measurement of habitual physical activity by questionnaires.

Authors:  R J Shephard
Journal:  Br J Sports Med       Date:  2003-06       Impact factor: 13.800

Review 2.  Assessment of physical activity: a critical appraisal.

Authors:  Klaas R Westerterp
Journal:  Eur J Appl Physiol       Date:  2009-02-11       Impact factor: 3.078

Review 3.  Association of overweight with increased risk of coronary heart disease partly independent of blood pressure and cholesterol levels: a meta-analysis of 21 cohort studies including more than 300 000 persons.

Authors:  Rik P Bogers; Wanda J E Bemelmans; Rudolf T Hoogenveen; Hendriek C Boshuizen; Mark Woodward; Paul Knekt; Rob M van Dam; Frank B Hu; Tommy L S Visscher; Alessandro Menotti; Roland J Thorpe; Konrad Jamrozik; Susanna Calling; Bjørn Heine Strand; Martin J Shipley
Journal:  Arch Intern Med       Date:  2007-09-10

4.  Coronary heart disease, roke, and aortic aneurysm. Factors in the etiology.

Authors:  E C Hammond; L Garfinkel
Journal:  Arch Environ Health       Date:  1969-08

Review 5.  Risk factors for coronary heart disease: implications of gender.

Authors:  Jeanine E Roeters van Lennep; H Tineke Westerveld; D Willem Erkelens; Ernst E van der Wall
Journal:  Cardiovasc Res       Date:  2002-02-15       Impact factor: 10.787

Review 6.  Iron status and its association with coronary heart disease: systematic review and meta-analysis of prospective studies.

Authors:  Sudeep Das De; Sreedhar Krishna; Ankeet Jethwa
Journal:  Atherosclerosis       Date:  2014-12-20       Impact factor: 5.162

7.  Total and high-density lipoprotein cholesterol and stroke risk.

Authors:  Yurong Zhang; Jaakko Tuomilehto; Pekka Jousilahti; Yujie Wang; Riitta Antikainen; Gang Hu
Journal:  Stroke       Date:  2012-04-10       Impact factor: 7.914

8.  Age and duration of follow-up as modulators of the risk for ischemic heart disease associated with high plasma C-reactive protein levels in men.

Authors:  M Pirro; J Bergeron; G R Dagenais; P M Bernard; B Cantin; J P Després; B Lamarche
Journal:  Arch Intern Med       Date:  2001-11-12

9.  Risk Factors for Coronary Artery Disease: Historical Perspectives.

Authors:  Rachel Hajar
Journal:  Heart Views       Date:  2017 Jul-Sep

10.  Understanding inconsistency in the results from observational pharmacoepidemiological studies: the case of antidepressant use and risk of hip/femur fractures.

Authors:  Patrick C Souverein; Victoria Abbing-Karahagopian; Elisa Martin; Consuelo Huerta; Francisco de Abajo; Hubert G M Leufkens; Gianmario Candore; Yolanda Alvarez; Jim Slattery; Montserrat Miret; Gema Requena; Miguel J Gil; Rolf H H Groenwold; Robert Reynolds; Raymond G Schlienger; John W Logie; Mark C H de Groot; Olaf H Klungel; Tjeerd P van Staa; Toine C G Egberts; Marie L De Bruin; Helga Gardarsdottir
Journal:  Pharmacoepidemiol Drug Saf       Date:  2016-03       Impact factor: 2.890

View more
  2 in total

1.  Epidemiological analysis of coronary heart disease and its main risk factors: are their associations multiplicative, additive, or interactive?

Authors:  Ari Voutilainen; Christina Brester; Mikko Kolehmainen; Tomi-Pekka Tuomainen
Journal:  Ann Med       Date:  2022-12       Impact factor: 5.348

2.  Lipid profile, lipid ratios, apolipoproteins, and risk of cardiometabolic multimorbidity in men: The Kuopio Ischaemic Heart Disease Risk Factor Study.

Authors:  Behnam Tajik; Ari Voutilainen; Jussi Kauhanen; Moshen Mazidi; Gregory Y H Lip; Tomi-Pekka Tuomainen; Masoud Isanejad
Journal:  Lipids       Date:  2022-01-20       Impact factor: 1.646

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.