Literature DB >> 27752296

Birth Month and Cardiovascular Disease Risk Association: Is meaningfulness in the eye of the beholder?

Eduard Poltavskiy¹, J David Spence², Jeehyoung Kim³, Heejung Bang⁴.

Abstract

In the modern era, with high-throughput technology and large data size, associational studies are actively being generated. Some have statistical and clinical validity and utility, or at least have biologically plausible relationships, while others may not. Recently, the potential effect of birth month on lifetime disease risks has been studied in a phenome-wide model. We evaluated the associations between birth month and 5 cardiovascular disease-related outcomes in an independent registry of 8,346 patients from Ontario, Canada in 1977-2014. We used descriptive statistics and logistic regression, along with model-fit and discrimination statistics. Hypertension and coronary heart disease (of primary interest) were most prevalent in those who were born in January and April, respectively, as observed in the previous study. Other outcomes showed weak or opposite associations. Ancillary analyses (based on raw blood pressures and subgroup analyses by sex) demonstrated inconsistent patterns and high randomness. Our study was based on a high risk population and could not provide scientific explanations. As scientific values and clinical implications can be different, readers are encouraged to read the original and our papers together for more objective interpretations of the potential impact of birth month on individual and public health as well as toward cumulative/total evidence in general.

Entities: Chemical Disease Gene Species

Keywords: birth month; cardiovascular disease; electronic medical record

Year: 2016 PMID： 27752296 PMCID： PMC5065521 DOI： 10.5210/ojphi.v8i2.6643

Source DB: PubMed Journal: Online J Public Health Inform ISSN： 1947-2579

INTRODUCTION

Possibly beginning with Hippocrates, the environment, including air, water and place has been suggested to influence human health [1]. Numerous researchers hypothesized that pre or peri-natal or early life conditions can have impacts on the occurrence of various diseases over a lifetime. Non-ignorable evidence has been found in some respiratory illness (e.g., asthma) and mental illnesses (e.g., attention deficit hyperactivity disorder, schizophrenia). For example, in the 1970’s there was evidence that being born in the winter increased the risk of schizophrenia by 10% [2-6]. On the other hand, some findings have been claimed and refuted by re-analysis of the same data, and are being used as example in statistical education [7-9]. Nowadays, the volume of related literature is growing and emergence of large, convenient and easily gathered datasets facilitates the analysis of a number of events and potential determinants. So far, the track record has been mixed [10,11]. A recent study assessed whether birth month affects lifetime disease risk of 1,688 conditions in a phenome-wide model based on an electronic medical record (EMR) database, including about 1.75 million individuals, from an institution in New York City from 1900-2000 [12]. The interesting findings generated from the statistical analyses of a huge database received great attention from the scientific community and the media. The authors reported that cardiovascular disease (CVD) was significantly dependent on birth month, asserting that this association was newly discovered in their study. An earlier review of 246 suggested coronary risk factors, including constitutional, demographic and environmental factors, did not include birth month [13]. In contrast, it has been reported that a general tendency for people born in the first half of the year to die at younger age, more from heart disease and cerebrovascular disease, than those born in the second half of the year in Austria [14]. The authors also performed an extensive literature review – 19 out of the 55 identified diseases are supported by the literature – and used rigorous methodologies, including widely accepted statistical adjustment of multiplicity and quality control, which have been common issues in similar studies based on convenient, tertiary datasets that were not collected for research or policy making purposes. In this paper, we attempt to evaluate the validity and generalizability of their findings in an independent, external patient registry. We focused on 5 CVD-related outcomes: hypertension, coronary heart disease (CHD), stroke, diabetes, and chronic kidney disease (CKD) [15,16]. Other outcomes (such as respiratory and reproductive diseases highlighted in the original paper) were not available to us.

MATERIALS and METHODS

Study population and sample

The study was conducted using the EMR of the Stroke Prevention & Atherosclerosis Research Centre, Robarts Research Institute, London, Ontario, with patient visits occurring in 1977-2014. Before 1995, patients were referred to the Hypertension Clinic at Victoria Hospital, London, Canada. Since 1995, they were referred to one of several clinics at University Hospital: a Stroke Prevention Clinic, an Urgent TIA Clinic, and a Premature Atherosclerosis Clinic. Western University Health Science Research Ethics Board approved this study.

Exposures and outcomes

We analyzed age and birth month (both in integer), without date or more details. Hypertension was defined as antihypertensive medication usage or systolic blood pressure (BP)>140 mmHg or diastolic BP>90 mmHg, where the higher value was selected from measurements in the left and right arms. We defined CHD as present if myocardial infarction or vascular surgery was recorded. A cerebrovascular disorder was defined if stroke or transient ischemic attack (TIA) was present; these were combined as Stroke. Diabetes was restricted to type-2 diabetes. CKD was defined using glomerular filtration rate (GFR)<60, estimated from the CKD-EPI formula [17].

Data analysis

We used summary statistics to describe patient characteristics, such as mean, standard deviation and interquartile range for continuous variables, and frequency and percentage for categorical variables. We computed the frequency and (row and column) percentage for each health outcome by birth month. We indicated the highest and lowest percentages, and tested the equality of the proportions over different months by the Chi-square test. Since we utilized EMR data, missing data were common. In all analyses, we included all available data, without imputations, in the included variables in each analysis, and we indicated the sample size. We fitted Simple logistic regression for each health outcome with each predictor separately. We considered 3 demographics as predictors or independent variables: month, sex, and age, where age was analyzed both as a continuous and dichotomized variable (>50 vs. ≤50 years) and we did not treat age and sex as confounders in regression. To compare the different models, we employed standard measures for evaluating models and prediction [18-20]: area under the receiver-operating-characteristic curve (AUC) and Akaike and Bayesian information criteria (AIC/BIC). AUC is a discrimination statistic; 0.5 means random and 1 means perfect discrimination between cases vs. non-cases. AIC is a measure of the relative quality of a statistical model for given data, and BIC might be considered a Bayesian extension. A lower value of AIC/BIC indicates improved model fit. Some interpret that AIC addresses explanation and BIC addresses prediction [21]. Of note, AIC/BIC do not have a simple range, unlike p-value, correlation or AUC; they should be compared within the same outcome, not across outcomes due to different sample sizes. As ancillary analyses, we computed the distribution of 4 raw BP measurements (left vs. right, systolic vs. diastolic) over months to examine time-trends, and to check if these measurements support the ‘January peak’ and ‘October trough’ for hypertension that were reported in the original study. Also, we fitted the event rate by penalized B-splines by sex in order to see if patterns are similar for men vs. women. SAS 9.3 was used for analysis (SAS Institute, Cary, NC). P-values and confidence intervals (CIs) are 2-sided and unadjusted for multiplicity.

RESULTS

Table 1 describes the characteristics of the 8,346 patients included in our study. Patients tended to be older (with mean=63 and range=9-99 years at the first visit to the clinic) and 52% were male. Hypertension was highly prevalent (66%), compared to other outcomes (<25%). Birth months were quite evenly distributed with the null value of 8.3% (=100/12) overall (7.5-8.9%, p=0.06). Figure 1 presents the event rate of individual health outcomes for each birth month. January (69%), April (22%), July (25%), November (20%), and March (27%) showed the highest proportions for hypertension, CHD, stroke, diabetes, and CKD, respectively, and the lowest proportions were in October (63%), September (13%), September (20%), and March (18%). When we computed the percentage of different birth months among those who had the outcome (i.e., using column percent in place of row percent), the same highest months were observed.

Table 1

Patient characteristics

Variable	N of complete data	Mean (Standard deviation) [Interquartile range] or Percentage
Age, years	8346	62.6 (14.7) [52.0-74.0]
Male	8346	51.5%
Height, cm	6876	168.6 (10.2) [160.0-176.0]
Weight, kg	7197	78.8 (17.7) [66.0-89.1]
Serum creatinine, mmol/L	4002	87.5 (39.0) [69.0-96.0]
Current smoker	8217	18.3%
Hypertension	6663	66.2%
Diabetes (type 2)	7971	16.4%
Myocardial infarction	6541	11.2%
Vascular surgery	6566	9.6%
Stroke	6855	14.4%
Transient ischemic attack	6751	12.3%
Chronic kidney disease*	4002	22.5%
Birth month -	8346
1		8.4%
2		8.1%
3		8.8%
4		8.5%
5		8.6%
6		8.9%
7		8.9%
8		8.2%
9		8.2%
10		8.0%
11		8.2%
12		7.5%

*The CKD-EPI formula was used to estimate glomerular filtration rate; the threshold used to define chronic kidney disease is an eGFR<60 mL/min/1.73 m2.

Figure 1

Nightingale plots of the distribution of birth month for health outcomes

*The CKD-EPI formula was used to estimate glomerular filtration rate; the threshold used to define chronic kidney disease is an eGFR<60 mL/min/1.73 m2. Nightingale plots of the distribution of birth month for health outcomes When we modeled different demographic factors as independent variable and different health outcomes as dependent variable by regression, birth month was associated with CHD, diabetes and CKD (mostly for post-hoc selection of the highest month) with p=0.02-0.05. In contrast, sex was highly significant with these 3 outcomes (p≤0.003). Months (without post-hoc dichotomization) yielded slightly higher AUC than sex for hypertension, stroke and CKD, which may imply enhanced discrimination, but AIC/BIC tended to indicate the reversed performance in model fit/quality; see Table 2.

Table 2

Discrimination and model-fit statistics from simple logistic regression

Health outcome	Predictor	P-value	AUC	AIC	BIC
Hypertension(total N=6663)	Birth months	0.58	0.522	8539	8621
	Highest month (Jan vs. the rest)*	0.12	0.506	8526	8540
	Sex	0.60	0.503	8528	8542
	Age (continuous)	<0.0001	0.643	8124	8138
	Age >50	<0.0001	0.585	8285	8299
Coronary heart disease(total N=6472)	Birth months	0.05	0.539	6020	6102
	Highest month (April vs. the rest)*	0.03	0.510	6016	6029
	Sex	<0.0001	0.598	5873	5886
	Age (continuous)	<0.0001	0.621	5846	5859
	Age >50	<0.0001	0.572	5889	5902
Stroke(total N=6845)	Birth months	0.66	0.523	7452	7534
	Highest month (July vs. the rest)*	0.25	0.505	7440	7453
	Sex	0.94	0.509⁺	7441	7455
	Age (continuous)	<0.0001	0.598	7300	7313
	Age >50	<0.0001	0.556	7344	7358
Diabetes(total N=7971)	Birth months	0.59	0.525	7118	7202
	Highest month (Nov vs. the rest)*	0.02	0.510	7102	7116
	Sex	<0.0001	0.530	7092	7106
	Age (continuous)	<0.0001	0.597	6974	6988
	Age >50	<0.0001	0.566	6977	6991
Chronickidney disease(total N=4002)	Birth months	0.57	0.530	4286	4361
	Highest month (March vs. the rest)*	0.04	0.511	4271	4284
	Sex	0.003	0.528	4267	4279
	Age (continuous)	<0.0001	0.796	3457	3470
	Age >50	<0.0001	0.602	3991	4003

Each predictor is separately modeled as a univariate covariate in Simple logistic regression.

Birth month (1-12) is included as a categorical covariate (via 11 dummies); sex is binary; and age (in years) is included as a continuous or binary covariate (>50 vs. ≤ 50 years old).

*Highest month (vs. rest as binary variable) is selected post-hoc, so results may suffer optimism bias.

P-value is computed from Wald Chi-square test; degrees of freedom=11 for birth month and 1 for all others.

AUC, area under the ROC curve, is a discrimination statistic; 0.5 means random discrimination and 1 means perfect discrimination.

AIC, Akaike information criteria, is a measure of the relative quality of a statistical model for a given set of data: a lower value means a better model fit.

BIC, Bayesian information criteria, is a Bayesian extension of AIC: a lower value means a better model fit.

AIC and BIC should be compared within the same outcome due to different Ns and amount of information.

+Estimation issue so we fitted the model with Y=stroke or TIA, and averaged the AUC of 0.511 and 0.507.

Each predictor is separately modeled as a univariate covariate in Simple logistic regression. Birth month (1-12) is included as a categorical covariate (via 11 dummies); sex is binary; and age (in years) is included as a continuous or binary covariate (>50 vs. ≤ 50 years old). *Highest month (vs. rest as binary variable) is selected post-hoc, so results may suffer optimism bias. P-value is computed from Wald Chi-square test; degrees of freedom=11 for birth month and 1 for all others. AUC, area under the ROC curve, is a discrimination statistic; 0.5 means random discrimination and 1 means perfect discrimination. AIC, Akaike information criteria, is a measure of the relative quality of a statistical model for a given set of data: a lower value means a better model fit. BIC, Bayesian information criteria, is a Bayesian extension of AIC: a lower value means a better model fit. AIC and BIC should be compared within the same outcome due to different Ns and amount of information. +Estimation issue so we fitted the model with Y=stroke or TIA, and averaged the AUC of 0.511 and 0.507. In the ancillary analysis with raw variables for hypertension, the key findings (January as highest and October as lowest) from the original study were confirmed. On the other hand, we observed that right arm BPs were highest among people who were born in January, whereas left arm BPs were highest in July, which are opposite seasons. Event rate plots by sex revealed less systematic, substantially different trends among men vs. women; see Figure 2.

Figure 2

Ancillary analyses

DISCUSSION

In the BigData era, with advanced, fancy statistics and informatics tools and highly educated minds, many things that have been impossible are becoming possible. Many small and previously unidentified effects or associations and rare cases are being discovered and reported on a daily basis. At the same time, high standards in data quality and statistical analyses are being emphasized, similarly to Deming’s 6-sigma that has been a gold standard in industry and quality control for decades [22,23]. Yet, two different issues are never answered by large sample size, statistical analysis and computing software: 1) clinical or practical meaningfulness (e.g., is the effect size large enough to be clinically meaningful or lead to any action?) and 2) biological plausibility (why does this happen? Is an association of insect bite and birth month with adjusted p=0.001 scientifically explainable?) [12]. Our findings support some of the authors’ claims (e.g., hypertension-January and CVD-April with the highest, and September-October with the lowest), which may be regarded as external validation, particularly because London, Ontario and New York City are not very different in climate. But we also found conflicting evidence in related diseases (e.g., January, April, July, November with the highest); so coherence, consistency and plausibility in causal viewpoints might be weakened [24]. Our analysis demonstrates high randomness going on, which may be natural. For example, the phenomena of ‘right arm BP highest in January and left arm BP highest in July’ and of the differential effects of birth month for males vs. females are not biologically plausible. Can sub-diseases/conditions within the same disease category be qualitatively different and be associated with different months? Small but real differences or being ‘fooled by randomness’ cannot be excluded [25]. The observed AUCs, a key measure in prediction, are tantalizing, accepting that the role of age is fully known. For hypertension and stroke, month may offer better discrimination than sex, but model-fit seems to show that sex could be better. Birth month did not increase discrimination ability for all outcomes once age and sex are included in the model; AUC increase=0.001 for hypertension and 0 for others (Results not shown). Large data sets cause impressive p-values with minor differences in biology. Are they clinically relevant [13,22,26,27]? Despite substantially different study populations and sample sizes, dramatically different p-values for two validated outcomes (CHD and hypertension) are noteworthy: p-values<0.001 adjusted for 1,688 comparisons (or p-value ~10-22 unadjusted using our best guess from the Manhattan plot) in the original study vs. unadjusted p-values=0.03-0.58 in our study [28,29]. When a number of p-values − probably the most popular statistical measure in research − are computed, a simple ‘p-value plot’ together with AUC could be helpful for assessing overall randomness in associations [8,30,31]. Related to the recent ‘bad luck-cancer controversy’, the validity and proper interpretation of another popular statistic, R2, for aggregated data have been discussed [32]. The limitations of our study and caveats for readers should be noted. First, we utilized retrospective data from a high-risk sample in a single geographical region, which could make already small associations even smaller. Very large population-based cohort or census would be ideal. Second, we are unable to explain some findings and to identify causes or underlying mechanisms; yet these issues are shared by the original, our and many other non-experimental studies. Third, the common goal of a long history of birth month research could be different from ours; its goal is generally a basic or pure science one to find diseases that may be related to developmental effects of environmental exposures, which presumably would later be investigated to elucidate the mechanism of that association. In contrast, our goal is closer to a clinical practice one, which may be better addressed by a statistical or prediction measure such as AUC, in addition to or place of p-value. For example, should physicians or patients be more suspicious of and investigate more closely for certain conditions based on birth month? (e.g., screening); should parent planning a pregnancy aim to have their child born in a certain month? Indeed, significant seasonality but different seasons/months for the best outcomes with high randomness have been demonstrated in infertility, autism and mortality-related research as well [2,14,33-37]. The main strengths are: a relatively large sample size covering a long term from multiple hospitals; multiple CVD-related outcomes; use of clinical data (e.g., multiple raw BPs in place of coded data where underreporting can be severe) and EMR with continuous quality checks [38-42]; and statistical measures that address different aspects of model and association, beyond p-value. Since our cohort mostly consists of older adults, the ‘lifetime risk’ of CVD that the original study intended to address might be well-captured although representativeness is weaker. Scientists and readers’ efforts to confirm important findings and attitudes to wait for more evidence should be valued more, in addition to discovery, innovation and productivity that are currently emphasized [22]. It is well documented that the Framingham risk score ─ a landmark in CVD research ─ does not perform well in Asian populations or HIV patients [43]. We do not think this is a major weakness of the method/finding as no method is perfect and virtually no finding is universal. Also, for every finding, we need to determine whether it is real vs. not (e.g., random), and if real, the next step might be to assess biological mechanisms as well as practical value and clinical implications.

CONCLUSIONS

We could validate the associations between birth month and the two primary CVD-related outcomes, but also found randomness was high. Until a definitive or ultimate answer, which may be provided from very large, representative samples with accurate outcomes data covering different climates, physicians and patients need not be much concerned about birth month; modifiable factors are a more appropriate focus. When faced with reports of novel discoveries, healthy skepticism and waiting for validations and explanations in similar and different settings are crucial for citizens in the Information Age. Finally, we still believe that EMR offers invaluable resources and opened a new chapter in research and data science. The following quotation, often attributed to Galileo Galilei, is apt: Measure what is measurable; make measurable what is not so [44]. Perhaps clinical and lab data are more suited to the first task, while administrative or self-report data try to do the latter (as the next best option).

31 in total

Review 1. Diabetes and cardiovascular disease: a statement for healthcare professionals from the American Heart Association.

Authors: S M Grundy; I J Benjamin; G L Burke; A Chait; R H Eckel; B V Howard; W Mitch; S C Smith; J R Sowers
Journal: Circulation Date: 1999-09-07 Impact factor: 29.690

2. No increased mortality in later life for cohorts born during famine.

Authors: V Kannisto; K Christensen; J W Vaupel
Journal: Am J Epidemiol Date: 1997-06-01 Impact factor: 4.897

Review 3. A survey of 246 suggested coronary risk factors.

Authors: P N Hopkins; R R Williams
Journal: Atherosclerosis Date: 1981 Aug-Sep Impact factor: 5.162

4. Lifespan depends on month of birth.

Authors: G Doblhammer; J W Vaupel
Journal: Proc Natl Acad Sci U S A Date: 2001-02-20 Impact factor: 11.205

5. Seasonal variability in fertilization and embryo quality rates in women undergoing IVF.

Authors: N Rojansky; A Benshushan; S Meirsdorf; A Lewin; N Laufer; A Safran
Journal: Fertil Steril Date: 2000-09 Impact factor: 7.329

Review 6. Seasonality of births in schizophrenia and bipolar disorder: a review of the literature.

Authors: E F Torrey; J Miller; R Rawlings; R H Yolken
Journal: Schizophr Res Date: 1997-11-07 Impact factor: 4.939

7. Is bad luck the main cause of cancer?

Authors: C R Weinberg; D Zaykin
Journal: J Natl Cancer Inst Date: 2015-05-08 Impact factor: 13.506

8. Obesity identified by discharge ICD-9 codes underestimates the true prevalence of obesity in hospitalized children.

Authors: Jessica G Woo; Meg H Zeller; Kimberly Wilson; Thomas Inge
Journal: J Pediatr Date: 2008-10-31 Impact factor: 4.406

9. Screening for kidney disease in vascular patients: SCreening for Occult REnal Disease (SCORED) experience.

Authors: Heejung Bang; Madhu Mazumdar; George Newman; Andrew S Bomback; Christie M Ballantyne; Allan S Jaffe; Phyllis A August; Abhijit V Kshirsagar
Journal: Nephrol Dial Transplant Date: 2009-03-26 Impact factor: 5.992

10. Prevalence, treatment, and control of dyslipidemia and hypertension in 4278 HIV outpatients.

Authors: Merle Myerson; Eduard Poltavskiy; Ehrin J Armstrong; Shari Kim; Victoria Sharp; Heejung Bang
Journal: J Acquir Immune Defic Syndr Date: 2014-08-01 Impact factor: 3.731

4 in total

1. Modeling month-season of birth as a risk factor in mouse models of chronic disease: from multiple sclerosis to autoimmune encephalomyelitis.

Authors: Jacob D Reynolds; Laure K Case; Dimitry N Krementsov; Abbas Raza; Rose Bartiss; Cory Teuscher
Journal: FASEB J Date: 2017-03-14 Impact factor: 5.191