Literature DB >> 30349634

A Causally Naïve and Rigid Population Model of Disease Occurrence Given Two Non-Independent Risk Factors.

Olaf Dammann^1,2, Kenneth Chui¹, Anselm Blumer².

Abstract

We describe a computational population model with two risk factors and one outcome variable in which the prevalence (%) of all three variables, the association between each risk factor and the disease, as well as the association between the two risk factors is the input. We briefly describe three examples: retinopathy of prematurity, diabetes in Panama, and smoking and obesity as risk factors for diabetes. We describe and discuss the simulation results in these three scenarios including how the published information is used as input and how changes in risk factor prevalence changes outcome prevalence.

Entities: Chemical Disease Gene Species

Year: 2018 PMID： 30349634 PMCID： PMC6194090 DOI： 10.5210/ojphi.v10i2.9357

Source DB: PubMed Journal: Online J Public Health Inform ISSN： 1947-2579

Introduction

In epidemiology, the concept of multi-causality holds that the occurrence of any disease depends on a set of risk factors, not just one. The generation of virtual databases that reflect the properties of populations is called micro-simulation [1]. In their simplest form, such models require as input two risk factors and their association with one outcome variable. One example is SYNTHEA, a virtual population of individuals and their electronic health records (EHRs) [2]. The algorithm could simulate individuals with, say, three characteristics: a binary disease outcome (coded as yes/no) and two binary risk factors (yes/no). The algorithm uses as input parameters the population prevalence of the two risk factors and the outcome variable; the allocation of “yes” or “no” for each variable is done by applying a Monte-Carlo simulation that uses random numbers and the population prevalence as a threshold. This ensures that, for instance, on average 37% of the virtual population will have a certain disease if the real population prevalence of that disease is 37% and the threshold for “disease = yes” is set at 0.37. These microsimulations have one particular disadvantage: if the presence or absence of each variable in the final database is based on separate yes/no attribution processes, the variables will be independent. This, of course, is highly unlikely in reality, because the very definition of a risk factor is that it is associated with the disease under investigation. Moreover, the two risk factors will be independent of each other, which is also rarely the case in real life situations. This way of performing microsimulations will lead to populations that look like their real-life counterparts only with regard to the population average of risk factors and outcome. However, these datasets cannot be utilized to simulate population-wide changes in risk factors with the goal to study population-wide changes in the outcome (disease). Therefore, we wanted to design a model that requires as input the population prevalence of the outcome of interest and of two risk factors, as well their three associations (Figure 1).

Figure 1

The associations among two non-independent risk factors and one outcome are quantified by three odds ratios.

The associations among two non-independent risk factors and one outcome are quantified by three odds ratios. In what follows, we describe a population model with two risk factors and one outcome variable in which the prevalence (%) of all three variables, the association between each risk factor and the disease, as well as the association between the two risk factors is the input. We briefly describe three examples: (#1) retinopathy of prematurity; (#2) diabetes in Panama, and (#3) smoking and obesity as risk factors for diabetes. Next, we describe the simulation results in these three scenarios including how the published information is used as input (Step 1) and how changes in risk factor prevalence changes outcome prevalence (Step 2).

METHODS

The Model

Suppose we have a standard 2 x 2 table for an outcome against a risk factor (Figure 2). Label the cells A, B, C, D where A is the percent of the population for which both the risk factor and the outcome are positive, B is the percent where the risk factor is positive but the outcome is negative, C is the percent where the risk factor is negative but the outcome is positive, and D is the percent where both are negative. Then if RF is the percent of the population with positive risk factor and OUT is the percent of the population with positive outcome, we have

Figure 2

Fourfold table depicting the four entities defined by the presence (+) or absence (-) of a binary risk factor and an outcome.

Fourfold table depicting the four entities defined by the presence (+) or absence (-) of a binary risk factor and an outcome. B = RF - A C = OUT - A D = 100 - A - B - C The equation for the odds ratio is based on the quantities depicted in Figure 2: OR = AD/BC. We can substitute for B, C, and D using the first three equations, giving a quadratic equation for A with coefficients in terms of RF, OUT, and OR: Solving this will give a 2 x 2 table that matches the given population values for RF and OUT and has the desired odds ratio. This much is calculated in "Step 1" in the JavaScript implementation of the model (available at http://www.cs.tufts.edu/~ablumer/PopStat.html). We can also use this equation to model the effect of keeping the odds ratio fixed and changing the percentage of the population that has the risk factor. This can be done by replacing A and RF in the above equation with r*A and r*RF and solving for the value of OUT that keeps the odds ratio constant. This assumes that relative percentages of the population with positive and negative outcome within positive risk factor (A relative to B) stay the same when the positive risk factor population is changed. Since we have two risk factors, we can do identical calculations relating risk factor 1 to the outcome and relating risk factor 2 to the outcome. Similarly, we can find the entries for the 2 x 2 table relating risk factor 1 to risk factor 2.

Examples

Example #1: Retinopathy of prematurity

We previously analyzed a data set of 617 very preterm newborns [3]. In that project, we found that 47% of all babies developed retinopathy of prematurity (ROP), a serious eye disorder among extremely preterm infants [4]. Systemic inflammation [5] and oxygen exposure data [6] are competing pathogenetic mechanisms that interfere with normal vasculogenesis [7]. The capability to simulate interventions on one or both of these pathomechanisms in order to study changes in ROP occurrence would be a groundbreaking step towards the prevention. In our data analysis, we also found that 32% of the infants had sepsis and 75% had been exposed to high levels of oxygen. The association between sepsis and oxygen on the one hand and ROP on the other (measured as an odds ratio, OR) were 2.8 and 3.6, respectively. The OR for the association between sepsis and oxygen was 2.6. In Figure 3 we clarify how these data were then entered into the model.

Figure 3

Simulation results of Step 1 in example #1, retinopathy of prematurity.

Example #2: Diabetes in Panama

A second example is a study on diabetes in Panama (5.4%) [8] with female sex (RF1: 60%) and age 50+ years (RF2: 31%) as risk factor exemplars. Female sex was associated with diabetes with an OR=1.4, age 50+ had an OR=5.1. The OR for the association between female sex and age 50+ was 0.85 (see Figure 4).

Figure 4

Simulation results of Step 1 in example #2, diabetes in Panama.

Simulation results of Step 1 in example #2, diabetes in Panama. Obviously, in this case, the risk factors are not to be modified to simulate a population intervention as in the previous example. Instead, we are interested in the effect on diabetes prevalence due to the discrepancy between the observed age distribution described in [8] (50+ years = 31%) compared to national data published by the United Nations (20%) [9].

Example #3: Smoking, BMI, and Diabetes

A randomized controlled trial (RCT) of estrogen plus progestin (EP) versus placebo was conducted in the 1990s to explore the effect of EP on subsequent development of coronary heart disease (CHD) in postmenopausal women [10]. We wanted to use the publicly available data from this RCT to explore the influence of smoking and body mass on diabetes, and use these data as input for a simulation of the effect of two interventions, smoking cessation weight reduction, on diabetes occurrence.

Results

Example #1

In Step 1, we entered the population percentages for both risk factors and the outcome, as well as the three associations among them. The estimated four-fold tables provided by the model are depicted in Figure 3. In Step 2, we proceeded to the simulation of risk factor modification. First, we reduced RF1 incrementally down from 32% to 0% (Table 1). This resulted in a drop of RF2 from 75% down to 70% and a reduction in outcome occurrence from 47% down to 39%.

Table 1

Example #1. Risk factor (RF)2 and outcome (OUT) changes when RF1 declines (%).

RF1 (Sepsis)	RF2 (Oxygen)	Outcome (Retinopathy of Prematurity)
32	75	47
30	75	46
25	74	45
20	73	44
15	72	43
10	72	41
5	71	40
0	70	39

Second, we reduced RF2 incrementally down from 75% to 0%. This resulted in a drop of RF1 from 32% down to 18% and a reduction in outcome occurrence from 47% down to 25%. Third, we calculated that even if both RF were reduced to 0, we are still left with a 21% outcome rate, which is probably attributable to other risk factors. It is also possible that the odds ratios change as the population statistics approach the extremes.

Example #2

The estimated four-fold tables provided by the model after Step 1 are depicted in Figure 4. In Step 2, risk factor modification simulation for Age 50+ from the observed 31% down to the 20% estimated by the UN in a population prevalence decrease for diabetes from 5.4% to 4.4% (data not shown).

Example #3

In the publicly available HERS dataset (http://www.biostat.ucsf.edu/vgsm/data.html), we looked at diabetes (on oral medication or insulin) as the outcome, and at smoking and overweight/obesity as risk factors (Table 3). In an exploratory data analysis we found that in this cohort of postmenopausal women with an average age of 67 years, 26% had diabetes, 13% were smokers, and 34% were obese (defined as a BMI ≥30). Smoking was associated with a reduced risk for diabetes (OR 0.5, 95%CI 0.4, 0.7), obesity with a strong risk increase (3.3; 2.7, 3.9), and smoking had an inverse association with obesity (0.6; 0.4, 0.7)(Table 3).

Table 3

Diabetes among 2758 postmenopausal women, the association between risk factors (smoking and overweight/obese) and diabetes, and the association between risk factors. These data served as input for example #3.

	Diabetes
	YES	NO	OR (95%C.I.)
N (row %)	728 (26)	2030 (74)
Smoking, N (col %)	60 (8)	299 (15)	0.5 (0.4, 0.7)
Obese, N (col %)	397 (55)	545 (27)	3.3 (2.7, 3.9)
Association RF1/RF2	Smoking
N (row %) Obese (BMI ≥30), N (col %)	YES35985 (24)	NO2399857 (36)	0.6 (0.4, 0.7)

We then simulated two interventions, smoking cessation and weight reduction. We have to keep in mind that while obesity is associated with a risk increase, smoking is associated with a decreased risk for diabetes. The fact that the two risk factors are negatively associated (less obesity among smokers) might explain this “protective effect of smoking”. Reducing smoking to zero in this population led to a minuscule increase of diabetes occurrence from 18 to 19%, which we confirmed in a stratified analysis excluding smokers (Table 4). Among non-smokers, diabetes prevalence was 19.2%.

Table 4

Example #3. Risk factor (RF)2 and outcome (OUT) changes when RF1 declines (%), simulating smoking cessation intervention.

RF1 (Smoking)	RF2 (Overweight/Obesity)	Outcome (Diabetes)
13	56	18
10	57	18
8	57	18
6	57	19
4	58	19
2	58	19
0	58	19

Reducing obesity was associated with a prominent risk reduction for diabetes, from 18% down to 10%. At the same time, smoking increased from 13 to 17% (Table 5).

Table 5

Example #3. Risk factor (RF)1 and outcome (OUT) changes when RF2 declines (%), simulating weight reduction intervention.

RF1 (Smoking)	RF2 (Overweight/Obesity)	Outcome (Diabetes)
13	56	18
13	50	17
14	40	16
15	30	14
16	20	13
17	10	11
17	0	10

DISCUSSION

Advantages

Our model has three prominent advantages. First, it is novel. To our knowledge, no other population model exists that appreciates the association between risk factors. Second, the model is relatively simple. With only one outcome and two risk factors, the complexity of inputs is limited to their population prevalence and associations between each other. We are currently developing a tool is that includes a third risk factor and that can be used for microsimulations, i.e., it outputs a data file of a virtual population, which can be used in further simulations. Third, the model is freely available online for the community to use and explore.

Drawbacks

The model is currently limited to two-level exposures and outcomes. It is also limited to only two risk factors. We are currently developing a similar model for three predictors and their inter-relations. Perhaps the most prominent limitation of the model is that it is causally naïve and rigid. Much of the complex methodology toolbox of modern epidemiology is geared towards the identification of causal risk factors [11]. Our model is not helpful in this regard. The association between risk factors and outcomes is modeled as odds ratios, which are simple measures of strength of association without implying causality or causal direction. The model is also rigid in that the input is reduced to population prevalence and association measures (odds ratios). Within the constraints of these values, the output is not probabilistic, but determined. However, the model can be run multiple times with different values for odds ratios as input that come from within the range of odds ratios defined by the observed confidence interval.

Conclusion

In this paper, we present a simple model of disease occurrence in populations. Based on the prevalence of a disease and of two risk factors, and of their association with the disease and between each other, the model calculates fourfold tables for these associations (Step 1). Thereafter, the population prevalence of either risk factor can be modified to simulate population risk factor increases or decreases, and changes in disease occurrence can be observed (Step 2). We will now develop this model further to include three risk factors and microsimulation capabilities. In the meantime, we hope it will be helpful to others and would appreciate feedback, preferably in the form of constructive criticism.

Table 2

Example #1. Risk factor (RF)1 and outcome (OUT) changes when RF2 declines (%).

RF1 (Sepsis)	RF2 (Oxygen)	Outcome (Retinopathy of Prematurity)
32	75	47
31	70	46
29	60	43
27	50	40
26	40	37
24	30	34
22	20	31
20	10	28
18	0	25

10 in total

Review 1. Retinopathy of prematurity.

Authors: Ann Hellström; Lois E H Smith; Olaf Dammann
Journal: Lancet Date: 2013-06-17 Impact factor: 79.321

Review 2. Diabetes in Panama: Epidemiology, Risk Factors, and Clinical Management.

Authors: Anselmo J Mc Donald Posso; Ryan A Bradshaw Meza; Enrique A Mendoza Morales; Ycly Jaen; Alberto Cumbrera Ortega; Enrique Jorge Mendoza Posada
Journal: Ann Glob Health Date: 2015 Nov-Dec Impact factor: 2.462

Review 3. Dynamic microsimulation models for health outcomes: a review.

Authors: Carolyn M Rutter; Alan M Zaslavsky; Eric J Feuer
Journal: Med Decis Making Date: 2010-05-18 Impact factor: 2.583

4. Blood gases and retinopathy of prematurity: the ELGAN Study.

Authors: Alisse K Hauspurg; Elizabeth N Allred; Deborah K Vanderveen; Minghua Chen; Francis J Bednarek; Cynthia Cole; Richard A Ehrenkranz; Alan Leviton; Olaf Dammann
Journal: Neonatology Date: 2010-07-30 Impact factor: 4.035

5. Infection, oxygen, and immaturity: interacting risk factors for retinopathy of prematurity.

Authors: Minghua Chen; Ayse Citil; Frank McCabe; Katherine M Leicht; John Fiascone; Christiane E L Dammann; Olaf Dammann
Journal: Neonatology Date: 2010-08-24 Impact factor: 4.035

Review 6. Causal inference in public health.

Authors: Thomas A Glass; Steven N Goodman; Miguel A Hernán; Jonathan M Samet
Journal: Annu Rev Public Health Date: 2013-01-07 Impact factor: 21.981

7. Randomized trial of estrogen plus progestin for secondary prevention of coronary heart disease in postmenopausal women. Heart and Estrogen/progestin Replacement Study (HERS) Research Group.

Authors: S Hulley; D Grady; T Bush; C Furberg; D Herrington; B Riggs; E Vittinghoff
Journal: JAMA Date: 1998-08-19 Impact factor: 56.272

8. Systemic Inflammation-Associated Proteins and Retinopathy of Prematurity in Infants Born Before the 28th Week of Gestation.

Authors: Mari Holm; Tora S Morken; Raina N Fichorova; Deborah K VanderVeen; Elizabeth N Allred; Olaf Dammann; Alan Leviton
Journal: Invest Ophthalmol Vis Sci Date: 2017-12-01 Impact factor: 4.799

Review 9. Retinopathy of prematurity: inflammation, choroidal degeneration, and novel promising therapeutic strategies.

Authors: José Carlos Rivera; Mari Holm; Dordi Austeng; Tora Sund Morken; Tianwei Ellen Zhou; Alexandra Beaudry-Richard; Estefania Marin Sierra; Olaf Dammann; Sylvain Chemtob
Journal: J Neuroinflammation Date: 2017-08-22 Impact factor: 8.322

10. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record.

Authors: Jason Walonoski; Mark Kramer; Joseph Nichols; Andre Quina; Chris Moesel; Dylan Hall; Carlton Duffett; Kudakwashe Dube; Thomas Gallagher; Scott McLachlan
Journal: J Am Med Inform Assoc Date: 2018-03-01 Impact factor: 4.497

10 in total