Literature DB >> 27752296

Birth Month and Cardiovascular Disease Risk Association: Is meaningfulness in the eye of the beholder?

Eduard Poltavskiy1, J David Spence2, Jeehyoung Kim3, Heejung Bang4.   

Abstract

In the modern era, with high-throughput technology and large data size, associational studies are actively being generated. Some have statistical and clinical validity and utility, or at least have biologically plausible relationships, while others may not. Recently, the potential effect of birth month on lifetime disease risks has been studied in a phenome-wide model. We evaluated the associations between birth month and 5 cardiovascular disease-related outcomes in an independent registry of 8,346 patients from Ontario, Canada in 1977-2014. We used descriptive statistics and logistic regression, along with model-fit and discrimination statistics. Hypertension and coronary heart disease (of primary interest) were most prevalent in those who were born in January and April, respectively, as observed in the previous study. Other outcomes showed weak or opposite associations. Ancillary analyses (based on raw blood pressures and subgroup analyses by sex) demonstrated inconsistent patterns and high randomness. Our study was based on a high risk population and could not provide scientific explanations. As scientific values and clinical implications can be different, readers are encouraged to read the original and our papers together for more objective interpretations of the potential impact of birth month on individual and public health as well as toward cumulative/total evidence in general.

Entities:  

Keywords:  birth month; cardiovascular disease; electronic medical record

Year:  2016        PMID: 27752296      PMCID: PMC5065521          DOI: 10.5210/ojphi.v8i2.6643

Source DB:  PubMed          Journal:  Online J Public Health Inform        ISSN: 1947-2579


INTRODUCTION

Possibly beginning with Hippocrates, the environment, including air, water and place has been suggested to influence human health [1]. Numerous researchers hypothesized that pre or peri-natal or early life conditions can have impacts on the occurrence of various diseases over a lifetime. Non-ignorable evidence has been found in some respiratory illness (e.g., asthma) and mental illnesses (e.g., attention deficit hyperactivity disorder, schizophrenia). For example, in the 1970’s there was evidence that being born in the winter increased the risk of schizophrenia by 10% [2-6]. On the other hand, some findings have been claimed and refuted by re-analysis of the same data, and are being used as example in statistical education [7-9]. Nowadays, the volume of related literature is growing and emergence of large, convenient and easily gathered datasets facilitates the analysis of a number of events and potential determinants. So far, the track record has been mixed [10,11]. A recent study assessed whether birth month affects lifetime disease risk of 1,688 conditions in a phenome-wide model based on an electronic medical record (EMR) database, including about 1.75 million individuals, from an institution in New York City from 1900-2000 [12]. The interesting findings generated from the statistical analyses of a huge database received great attention from the scientific community and the media. The authors reported that cardiovascular disease (CVD) was significantly dependent on birth month, asserting that this association was newly discovered in their study. An earlier review of 246 suggested coronary risk factors, including constitutional, demographic and environmental factors, did not include birth month [13]. In contrast, it has been reported that a general tendency for people born in the first half of the year to die at younger age, more from heart disease and cerebrovascular disease, than those born in the second half of the year in Austria [14]. The authors also performed an extensive literature review – 19 out of the 55 identified diseases are supported by the literature – and used rigorous methodologies, including widely accepted statistical adjustment of multiplicity and quality control, which have been common issues in similar studies based on convenient, tertiary datasets that were not collected for research or policy making purposes. In this paper, we attempt to evaluate the validity and generalizability of their findings in an independent, external patient registry. We focused on 5 CVD-related outcomes: hypertension, coronary heart disease (CHD), stroke, diabetes, and chronic kidney disease (CKD) [15,16]. Other outcomes (such as respiratory and reproductive diseases highlighted in the original paper) were not available to us.

MATERIALS and METHODS

Study population and sample

The study was conducted using the EMR of the Stroke Prevention & Atherosclerosis Research Centre, Robarts Research Institute, London, Ontario, with patient visits occurring in 1977-2014. Before 1995, patients were referred to the Hypertension Clinic at Victoria Hospital, London, Canada. Since 1995, they were referred to one of several clinics at University Hospital: a Stroke Prevention Clinic, an Urgent TIA Clinic, and a Premature Atherosclerosis Clinic. Western University Health Science Research Ethics Board approved this study.

Exposures and outcomes

We analyzed age and birth month (both in integer), without date or more details. Hypertension was defined as antihypertensive medication usage or systolic blood pressure (BP)>140 mmHg or diastolic BP>90 mmHg, where the higher value was selected from measurements in the left and right arms. We defined CHD as present if myocardial infarction or vascular surgery was recorded. A cerebrovascular disorder was defined if stroke or transient ischemic attack (TIA) was present; these were combined as Stroke. Diabetes was restricted to type-2 diabetes. CKD was defined using glomerular filtration rate (GFR)<60, estimated from the CKD-EPI formula [17].

Data analysis

We used summary statistics to describe patient characteristics, such as mean, standard deviation and interquartile range for continuous variables, and frequency and percentage for categorical variables. We computed the frequency and (row and column) percentage for each health outcome by birth month. We indicated the highest and lowest percentages, and tested the equality of the proportions over different months by the Chi-square test. Since we utilized EMR data, missing data were common. In all analyses, we included all available data, without imputations, in the included variables in each analysis, and we indicated the sample size. We fitted Simple logistic regression for each health outcome with each predictor separately. We considered 3 demographics as predictors or independent variables: month, sex, and age, where age was analyzed both as a continuous and dichotomized variable (>50 vs. ≤50 years) and we did not treat age and sex as confounders in regression. To compare the different models, we employed standard measures for evaluating models and prediction [18-20]: area under the receiver-operating-characteristic curve (AUC) and Akaike and Bayesian information criteria (AIC/BIC). AUC is a discrimination statistic; 0.5 means random and 1 means perfect discrimination between cases vs. non-cases. AIC is a measure of the relative quality of a statistical model for given data, and BIC might be considered a Bayesian extension. A lower value of AIC/BIC indicates improved model fit. Some interpret that AIC addresses explanation and BIC addresses prediction [21]. Of note, AIC/BIC do not have a simple range, unlike p-value, correlation or AUC; they should be compared within the same outcome, not across outcomes due to different sample sizes. As ancillary analyses, we computed the distribution of 4 raw BP measurements (left vs. right, systolic vs. diastolic) over months to examine time-trends, and to check if these measurements support the ‘January peak’ and ‘October trough’ for hypertension that were reported in the original study. Also, we fitted the event rate by penalized B-splines by sex in order to see if patterns are similar for men vs. women. SAS 9.3 was used for analysis (SAS Institute, Cary, NC). P-values and confidence intervals (CIs) are 2-sided and unadjusted for multiplicity.

RESULTS

Table 1 describes the characteristics of the 8,346 patients included in our study. Patients tended to be older (with mean=63 and range=9-99 years at the first visit to the clinic) and 52% were male. Hypertension was highly prevalent (66%), compared to other outcomes (<25%). Birth months were quite evenly distributed with the null value of 8.3% (=100/12) overall (7.5-8.9%, p=0.06). Figure 1 presents the event rate of individual health outcomes for each birth month. January (69%), April (22%), July (25%), November (20%), and March (27%) showed the highest proportions for hypertension, CHD, stroke, diabetes, and CKD, respectively, and the lowest proportions were in October (63%), September (13%), September (20%), and March (18%). When we computed the percentage of different birth months among those who had the outcome (i.e., using column percent in place of row percent), the same highest months were observed.
Table 1

Patient characteristics

Variable N of complete data Mean (Standard deviation) [Interquartile range] or Percentage
Age, years834662.6 (14.7) [52.0-74.0]
Male834651.5%
Height, cm6876168.6 (10.2) [160.0-176.0]
Weight, kg719778.8 (17.7) [66.0-89.1]
Serum creatinine, mmol/L400287.5 (39.0) [69.0-96.0]
Current smoker821718.3%
Hypertension666366.2%
Diabetes (type 2)797116.4%
Myocardial infarction654111.2%
Vascular surgery65669.6%
Stroke685514.4%
Transient ischemic attack675112.3%
Chronic kidney disease*400222.5%
Birth month -8346
18.4%
28.1%
38.8%
48.5%
58.6%
68.9%
78.9%
88.2%
98.2%
108.0%
118.2%
127.5%

*The CKD-EPI formula was used to estimate glomerular filtration rate; the threshold used to define chronic kidney disease is an eGFR<60 mL/min/1.73 m2.

Figure 1

Nightingale plots of the distribution of birth month for health outcomes

*The CKD-EPI formula was used to estimate glomerular filtration rate; the threshold used to define chronic kidney disease is an eGFR<60 mL/min/1.73 m2. Nightingale plots of the distribution of birth month for health outcomes When we modeled different demographic factors as independent variable and different health outcomes as dependent variable by regression, birth month was associated with CHD, diabetes and CKD (mostly for post-hoc selection of the highest month) with p=0.02-0.05. In contrast, sex was highly significant with these 3 outcomes (p≤0.003). Months (without post-hoc dichotomization) yielded slightly higher AUC than sex for hypertension, stroke and CKD, which may imply enhanced discrimination, but AIC/BIC tended to indicate the reversed performance in model fit/quality; see Table 2.
Table 2

Discrimination and model-fit statistics from simple logistic regression

Health outcome Predictor P-value AUC AIC BIC
Hypertension(total N=6663)Birth months0.580.52285398621
Highest month (Jan vs. the rest)*0.120.50685268540
Sex0.600.50385288542
Age (continuous)<0.00010.64381248138
Age >50<0.00010.58582858299
Coronary heart disease(total N=6472)Birth months0.050.53960206102
Highest month (April vs. the rest)*0.030.51060166029
Sex<0.00010.59858735886
Age (continuous)<0.00010.62158465859
Age >50<0.00010.57258895902
Stroke(total N=6845)Birth months0.660.52374527534
Highest month (July vs. the rest)*0.250.50574407453
Sex0.940.509+74417455
Age (continuous)<0.00010.59873007313
Age >50<0.00010.55673447358
Diabetes(total N=7971)Birth months0.590.52571187202
Highest month (Nov vs. the rest)*0.020.51071027116
Sex<0.00010.53070927106
Age (continuous)<0.00010.59769746988
Age >50<0.00010.56669776991
Chronickidney disease(total N=4002)Birth months0.570.53042864361
Highest month (March vs. the rest)*0.040.51142714284
Sex0.0030.52842674279
Age (continuous)<0.00010.79634573470
Age >50<0.00010.60239914003

Each predictor is separately modeled as a univariate covariate in Simple logistic regression.

Birth month (1-12) is included as a categorical covariate (via 11 dummies); sex is binary; and age (in years) is included as a continuous or binary covariate (>50 vs. ≤ 50 years old).

*Highest month (vs. rest as binary variable) is selected post-hoc, so results may suffer optimism bias.

P-value is computed from Wald Chi-square test; degrees of freedom=11 for birth month and 1 for all others.

AUC, area under the ROC curve, is a discrimination statistic; 0.5 means random discrimination and 1 means perfect discrimination.

AIC, Akaike information criteria, is a measure of the relative quality of a statistical model for a given set of data: a lower value means a better model fit.

BIC, Bayesian information criteria, is a Bayesian extension of AIC: a lower value means a better model fit.

AIC and BIC should be compared within the same outcome due to different Ns and amount of information.

+Estimation issue so we fitted the model with Y=stroke or TIA, and averaged the AUC of 0.511 and 0.507.

Each predictor is separately modeled as a univariate covariate in Simple logistic regression. Birth month (1-12) is included as a categorical covariate (via 11 dummies); sex is binary; and age (in years) is included as a continuous or binary covariate (>50 vs. ≤ 50 years old). *Highest month (vs. rest as binary variable) is selected post-hoc, so results may suffer optimism bias. P-value is computed from Wald Chi-square test; degrees of freedom=11 for birth month and 1 for all others. AUC, area under the ROC curve, is a discrimination statistic; 0.5 means random discrimination and 1 means perfect discrimination. AIC, Akaike information criteria, is a measure of the relative quality of a statistical model for a given set of data: a lower value means a better model fit. BIC, Bayesian information criteria, is a Bayesian extension of AIC: a lower value means a better model fit. AIC and BIC should be compared within the same outcome due to different Ns and amount of information. +Estimation issue so we fitted the model with Y=stroke or TIA, and averaged the AUC of 0.511 and 0.507. In the ancillary analysis with raw variables for hypertension, the key findings (January as highest and October as lowest) from the original study were confirmed. On the other hand, we observed that right arm BPs were highest among people who were born in January, whereas left arm BPs were highest in July, which are opposite seasons. Event rate plots by sex revealed less systematic, substantially different trends among men vs. women; see Figure 2.
Figure 2

Ancillary analyses

Ancillary analyses

DISCUSSION

In the BigData era, with advanced, fancy statistics and informatics tools and highly educated minds, many things that have been impossible are becoming possible. Many small and previously unidentified effects or associations and rare cases are being discovered and reported on a daily basis. At the same time, high standards in data quality and statistical analyses are being emphasized, similarly to Deming’s 6-sigma that has been a gold standard in industry and quality control for decades [22,23]. Yet, two different issues are never answered by large sample size, statistical analysis and computing software: 1) clinical or practical meaningfulness (e.g., is the effect size large enough to be clinically meaningful or lead to any action?) and 2) biological plausibility (why does this happen? Is an association of insect bite and birth month with adjusted p=0.001 scientifically explainable?) [12]. Our findings support some of the authors’ claims (e.g., hypertension-January and CVD-April with the highest, and September-October with the lowest), which may be regarded as external validation, particularly because London, Ontario and New York City are not very different in climate. But we also found conflicting evidence in related diseases (e.g., January, April, July, November with the highest); so coherence, consistency and plausibility in causal viewpoints might be weakened [24]. Our analysis demonstrates high randomness going on, which may be natural. For example, the phenomena of ‘right arm BP highest in January and left arm BP highest in July’ and of the differential effects of birth month for males vs. females are not biologically plausible. Can sub-diseases/conditions within the same disease category be qualitatively different and be associated with different months? Small but real differences or being ‘fooled by randomness’ cannot be excluded [25]. The observed AUCs, a key measure in prediction, are tantalizing, accepting that the role of age is fully known. For hypertension and stroke, month may offer better discrimination than sex, but model-fit seems to show that sex could be better. Birth month did not increase discrimination ability for all outcomes once age and sex are included in the model; AUC increase=0.001 for hypertension and 0 for others (Results not shown). Large data sets cause impressive p-values with minor differences in biology. Are they clinically relevant [13,22,26,27]? Despite substantially different study populations and sample sizes, dramatically different p-values for two validated outcomes (CHD and hypertension) are noteworthy: p-values<0.001 adjusted for 1,688 comparisons (or p-value ~10-22 unadjusted using our best guess from the Manhattan plot) in the original study vs. unadjusted p-values=0.03-0.58 in our study [28,29]. When a number of p-values − probably the most popular statistical measure in research − are computed, a simple ‘p-value plot’ together with AUC could be helpful for assessing overall randomness in associations [8,30,31]. Related to the recent ‘bad luck-cancer controversy’, the validity and proper interpretation of another popular statistic, R2, for aggregated data have been discussed [32]. The limitations of our study and caveats for readers should be noted. First, we utilized retrospective data from a high-risk sample in a single geographical region, which could make already small associations even smaller. Very large population-based cohort or census would be ideal. Second, we are unable to explain some findings and to identify causes or underlying mechanisms; yet these issues are shared by the original, our and many other non-experimental studies. Third, the common goal of a long history of birth month research could be different from ours; its goal is generally a basic or pure science one to find diseases that may be related to developmental effects of environmental exposures, which presumably would later be investigated to elucidate the mechanism of that association. In contrast, our goal is closer to a clinical practice one, which may be better addressed by a statistical or prediction measure such as AUC, in addition to or place of p-value. For example, should physicians or patients be more suspicious of and investigate more closely for certain conditions based on birth month? (e.g., screening); should parent planning a pregnancy aim to have their child born in a certain month? Indeed, significant seasonality but different seasons/months for the best outcomes with high randomness have been demonstrated in infertility, autism and mortality-related research as well [2,14,33-37]. The main strengths are: a relatively large sample size covering a long term from multiple hospitals; multiple CVD-related outcomes; use of clinical data (e.g., multiple raw BPs in place of coded data where underreporting can be severe) and EMR with continuous quality checks [38-42]; and statistical measures that address different aspects of model and association, beyond p-value. Since our cohort mostly consists of older adults, the ‘lifetime risk’ of CVD that the original study intended to address might be well-captured although representativeness is weaker. Scientists and readers’ efforts to confirm important findings and attitudes to wait for more evidence should be valued more, in addition to discovery, innovation and productivity that are currently emphasized [22]. It is well documented that the Framingham risk score ─ a landmark in CVD research ─ does not perform well in Asian populations or HIV patients [43]. We do not think this is a major weakness of the method/finding as no method is perfect and virtually no finding is universal. Also, for every finding, we need to determine whether it is real vs. not (e.g., random), and if real, the next step might be to assess biological mechanisms as well as practical value and clinical implications.

CONCLUSIONS

We could validate the associations between birth month and the two primary CVD-related outcomes, but also found randomness was high. Until a definitive or ultimate answer, which may be provided from very large, representative samples with accurate outcomes data covering different climates, physicians and patients need not be much concerned about birth month; modifiable factors are a more appropriate focus. When faced with reports of novel discoveries, healthy skepticism and waiting for validations and explanations in similar and different settings are crucial for citizens in the Information Age. Finally, we still believe that EMR offers invaluable resources and opened a new chapter in research and data science. The following quotation, often attributed to Galileo Galilei, is apt: Measure what is measurable; make measurable what is not so [44]. Perhaps clinical and lab data are more suited to the first task, while administrative or self-report data try to do the latter (as the next best option).
  31 in total

Review 1.  Diabetes and cardiovascular disease: a statement for healthcare professionals from the American Heart Association.

Authors:  S M Grundy; I J Benjamin; G L Burke; A Chait; R H Eckel; B V Howard; W Mitch; S C Smith; J R Sowers
Journal:  Circulation       Date:  1999-09-07       Impact factor: 29.690

2.  No increased mortality in later life for cohorts born during famine.

Authors:  V Kannisto; K Christensen; J W Vaupel
Journal:  Am J Epidemiol       Date:  1997-06-01       Impact factor: 4.897

Review 3.  A survey of 246 suggested coronary risk factors.

Authors:  P N Hopkins; R R Williams
Journal:  Atherosclerosis       Date:  1981 Aug-Sep       Impact factor: 5.162

4.  Lifespan depends on month of birth.

Authors:  G Doblhammer; J W Vaupel
Journal:  Proc Natl Acad Sci U S A       Date:  2001-02-20       Impact factor: 11.205

5.  Seasonal variability in fertilization and embryo quality rates in women undergoing IVF.

Authors:  N Rojansky; A Benshushan; S Meirsdorf; A Lewin; N Laufer; A Safran
Journal:  Fertil Steril       Date:  2000-09       Impact factor: 7.329

Review 6.  Seasonality of births in schizophrenia and bipolar disorder: a review of the literature.

Authors:  E F Torrey; J Miller; R Rawlings; R H Yolken
Journal:  Schizophr Res       Date:  1997-11-07       Impact factor: 4.939

7.  Is bad luck the main cause of cancer?

Authors:  C R Weinberg; D Zaykin
Journal:  J Natl Cancer Inst       Date:  2015-05-08       Impact factor: 13.506

8.  Obesity identified by discharge ICD-9 codes underestimates the true prevalence of obesity in hospitalized children.

Authors:  Jessica G Woo; Meg H Zeller; Kimberly Wilson; Thomas Inge
Journal:  J Pediatr       Date:  2008-10-31       Impact factor: 4.406

9.  Screening for kidney disease in vascular patients: SCreening for Occult REnal Disease (SCORED) experience.

Authors:  Heejung Bang; Madhu Mazumdar; George Newman; Andrew S Bomback; Christie M Ballantyne; Allan S Jaffe; Phyllis A August; Abhijit V Kshirsagar
Journal:  Nephrol Dial Transplant       Date:  2009-03-26       Impact factor: 5.992

10.  Prevalence, treatment, and control of dyslipidemia and hypertension in 4278 HIV outpatients.

Authors:  Merle Myerson; Eduard Poltavskiy; Ehrin J Armstrong; Shari Kim; Victoria Sharp; Heejung Bang
Journal:  J Acquir Immune Defic Syndr       Date:  2014-08-01       Impact factor: 3.731

View more
  4 in total

1.  Modeling month-season of birth as a risk factor in mouse models of chronic disease: from multiple sclerosis to autoimmune encephalomyelitis.

Authors:  Jacob D Reynolds; Laure K Case; Dimitry N Krementsov; Abbas Raza; Rose Bartiss; Cory Teuscher
Journal:  FASEB J       Date:  2017-03-14       Impact factor: 5.191

2.  Does birth season correlate with childhood stunting? An input for astrological nutrition.

Authors:  Melese Linger Endalifer; Gedefaw Diress; Bedilu Linger Endalifer; Birhanu Wagaye; Hunegnaw Almaw
Journal:  BMC Pediatr       Date:  2022-05-24       Impact factor: 2.567

3.  The Use of Electronic Medical Record Data to Analyze the Association Between Atrial Fibrillation and Birth Month.

Authors:  Koji Matsuda; Keunsik Park; Hiroaki Tatsumi; Ryoko Kitada; Minoru Yoshiyama
Journal:  Online J Public Health Inform       Date:  2017-12-31

4.  Birth month, birth season, and overall and cardiovascular disease mortality in US women: prospective cohort study.

Authors:  Yin Zhang; Elizabeth E Devore; Susanne Strohmaier; Francine Grodstein; Eva S Schernhammer
Journal:  BMJ       Date:  2019-12-18
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.