Literature DB >> 29736137

Simpson's Paradox: Examples.

Bokai Wang¹, Pan Wu², Brian Kwan³, Xin M Tu³, Changyong Feng^1,4.

Abstract

Simpson's paradox is very prevalent in many areas. It characterizes the inconsistency between the conditional and marginal interpretations of the data. In this paper, we illustrate through some examples how the Simpson's paradox can happen in continuous, categorical, and time-to-event data.

Entities: Disease Gene Species

Keywords: conditional expectation; odd ratio; time-to-event analysis

Year: 2018 PMID： 29736137 PMCID： PMC5936043 DOI： 10.11919/j.issn.1002-0829.218026

Source DB: PubMed Journal: Shanghai Arch Psychiatry ISSN： 1002-0829

1. Introduction

Consider the following scenario. Suppose the 4th grade students of two schools, Alpha and Beta, from DYC school district participated in a national standard math test. We want to compare the average scores of these two schools. Assume we are told that the average scores of both male and female in Beta are higher than those in Alpha. What can we say about the overall average score in those schools? Is it true that the School Beta gets a higher average score than Alpha? The answer seems to be affirmative and intuitive. To be more specific, assume the average scores of male and female students in each school are presented in Table 1.

Table 1.

Average scores of male and female students in two schools

School (X₁)	Gender (X₂)
	Male (1)		Female (2)
	n	Average	n	Average
Alpha (1)	80	84	20	80
Beta (2)	20	85	80	81

It is obvious that both male and female students in School Beta have higher average scores. However, simple calculation shows that the overall average scores in these two schools are 83.2 and 81.8, respectively. School Alpha won on the average score! Suppose the students in School Beta received a more advanced instruction which improves the traditional method (which was adapted by School Alpha). Intuitively, the students in Beta should get a better score on average. Why is this example so counterintuitive? Is there anything wrong here? Is the average score a reasonable measure of the performance of students in a school? In fact, when we talk about two schools, most of the time we assume that the proportion of male students in those two schools are approximately the same. It is easy to prove that if the proportions of male students in those two schools above are exactly same, and the average scores of male and female students in Beta are higher than their counterparts in Alpha, then the overall average score in Beta is higher. Our example means that the difference in the gender components may reverse the relation we want to study. The scenario above is an example of the well-known Simpson’s paradox.[ Loosely speaking, Simpson’s paradox says that the conditional relation (conditional on gender in each school in the example) does not imply marginal relation, and vice versa. Although the statistical community had known the ‘inconsistency’ between the conditional and marginal interpretation based on the same data, see for example Yule[, the effect of Simpson’s paradox has been way beyond the statistical community. In fact, the Simpson’s paradox is very prevalent in many areas, from natural science, [ to social sciences, [ and even in philosophy[. We can even say that it is an inherent property of data from observational studies. [ In this paper, we discuss some examples of Simpson’s paradox in continuous data, categorical data, and in time-to-event data. In Section 2 we give a general statistical interpretation of Simpson’s paradox using conditional expectation. In the next two sections, we show through examples how the Simpson’s paradox can occur in categorical data and in time-to-event data. The conclusion is reported in Section 5.

2. Simpson’s Paradox and Conditional Expectation

We know that if then (assuming b+d≠0). Do we have the similar property for inequalities of fractions? Specifically, assume sij, nij (i=1,2, j=1,2) are positive numbers with Is it true that Simpson [ says that it not may be. For example, However, This means that the pooled data shows a reversal relation. This is the original form of ‘Simpson’s paradox’. In this section, we construct a probability model to study why this reversion occurs. Let Y be a random variable with E|Y|<∞ . Suppose X1 and X2 are two random variables with Xi ∈ {1,2,…,ki}, where ki (≥2),i=1,2 are positve integers. Then, for any m ∈ {1,…,k1}, Let us make connection of equation (1) to our example of average score in Section 1. Let X1=1 or 2 denote schools Alpha and Beta, and X2=1 or 2 denote male and female in gender, respectively. Let Y denote the score of a randomly selected 4th student in those two schools. Then from Table 1 we have It is obvious that Equation (2) shows that both male and female students in School Beta have higher scores. When we calculate the average score of each school, we need to consider the gender component. In (1) we can see that the average scores of schools are the weighted average of the scores of males and females, which are Using (1), we find that A close look at the data shows that the distribution of gender plays an important role in reversing the inequalities from (2) to (3). It is obvious that if the inequalities in (2) hold, and two schools have the same proportions of male students, the average score in Beta will be higher than that in Alpha. In this example, gender is called a confounder in causal inference literatures.[ Although the new instruction method increases the score of both boys and girls, the imbalance of the gender distribution in two schools may confound the effect of the new instruction method. This has been widely studied in the causal inference literature based on observational studies especially in Epidemiology. [ The example above shows how Simpson’s paradox occurs in continuous outcomes. In the following two sections, we illustrate how such a phenomenon can occur in categorical data and time-to-event data.

3. Simpson’s Paradox in Categorical Data Analysis

Suppose a certain disease can be characterized as being less severe or more severe. The patients have an option to go to either one of two hospitals for treatment: better or normal hospital. The outcome of the treatment is binary: success or failure. Consider the following example. We can see that for less severe patients, the success rate in the better treatment hospital is much higher than the normal hospital. Similar results hold true for more severe patients. We construct three more tables from Table 2. Table 3 is the cross-classification of the treatment and outcome. The overall success rates of two types of hospitals are 50/100 and 68/100, respectively. This seems to show that the success rate in the normal hospital is higher than the better hospital. This is not what we have expected.

Table 2.

Success rate of the treatment outcome in different severity of the disease

Hospital	Severity	Outcome		Total
Hospital	Severity	Success	Failure	Total
Better	Less severe	18	2	20
Better	More severe	32	48	80
Normal	Less severe	64	16	80
Normal	More severe	4	16	20

Table 3.

Summary of the cross-classification of the treatment and outcome

Treatment	Outcome		Total
Treatment	Success	Failure	Total
Better	50	50	100
Normal	68	32	100

Table 4 is the cross-classification of severity and the outcome. The success rates of less severe and more severe patients are 82/100 and 36/100, respectively. This is reasonable.

Table 4.

Summary of the cross-classification of the severity and outcome

Severity	Outcome		Total
Severity	Success	Failure	Total
Less severe	82	18	100
More severe	36	64	100

Table 5 is the cross-classification of treatment and severity. We can see that proportion of more severe patients in the better treatment group is much higher than that in the normal treatment.

Table 5.

Summary of the cross-classification of the treatment and severity

Treatment	Severity		Total
Treatment	Less severe	More severe	Total
Better	20	80	100
Normal	80	20	100

Let O denote the outcome, which has possible values of s (“success”) or f (“failure”), T denote the treatment with possible values b (“better”) or n (“normal”), and S denote the severity with possible values l (“less severe”) or m (“more severe”). Note that Although from table 2 it is clear that Pr{O=s│T=b,S=l}>Pr{O=s│T=n,S=l} and Pr{O=s│T =b,S=m}>Pr{O=s│T=n,S=m}, table 3 shows that Pr{O=s│T=b}patients is much lower than the less severe patients, and the portion of more severe patients in the better treatment facility is much more than that in normal hospital. This imbalance reverses the direction of treatment effect.

4. Simpson’s Paradox in Time-to-event Data Analysis

Simpson’s paradox may also occur in time-to-event data. [ Suppose we have two treatment groups (denoted by X1: treatment (1)/control (0)). We consider two age groups X2= 1 (or 0) if age is ≤ 65 (> 65) years. Suppose the hazard function of the life time T of patients given the treatment and age categories are Furthermore, we assume that the distribution of age categories of treatment groups are It is obvious that within each age category, the hazard function of the treatment groups is always below that of the control group. Figure 1 shows the hazard functions of two treatment groups within each age category. It is clear that treatment does a better job than control.

Figure 1.

Hazard functions in different age categories

The marginal hazard functions of two treatment groups are Figure 2 shows the marginal hazard function of two treatment groups after integrating out the age. In Figure 1, the hazard ratio of treatment versus control is a constant within each age category. However, the marginal hazard ratio is not a constant any more. This may cause some confusion especially if the follow-up time is censored at some time point . In that case, the estimated hazard function of the treatment group may be much higher than the control group, although this may not be what was expected.

Figure 2.

Marginal hazard functions of two treatment groups

5. Conclusion

Simpson’s paradox is very common in observational studies due to effects of confounding. In this paper, we used some examples to show how this phenomenon can occur for continuous, categorical and survival outcomes. If the confounding effects are not addressed appropriately, conclusions obtained from statistical analyses may be totally wrong. The study of Simpson’s paradox, or more generally, of the effects of confounders, forms the rubric of the theory of causal inference, which is especially relevant in the error of big data as most data are observational in nature and confounders can obscure relationships of interest if not addressed.

1 in total

1. The nature of truth: Simpson's Paradox and the limits of statistical data.

Authors: M Heydtmann
Journal: QJM Date: 2002-04

1 in total

7 in total

Review 1. Inconsistency between overall and subgroup analyses.

Authors: Hongyue Wang; Bokai Wang; Xin M Tu; Changyong Feng
Journal: Gen Psychiatr Date: 2022-05-16

2. Dynamics of Gene Co-expression Networks in Time-Series Data: A Case Study in Drosophila melanogaster Embryogenesis.

Authors: Li Yieng Lau; Antonio Reverter; Nicholas J Hudson; Marina Naval-Sanchez; Marina R S Fortes; Pâmela A Alexandre
Journal: Front Genet Date: 2020-05-26 Impact factor: 4.599

3. Preserved white matter microstructure in adolescent patients with atypical anorexia nervosa.

Authors: Gaia Olivo; Ingemar Swenne; Christina Zhukovsky; Anna-Kaisa Tuunainen; Avista Saaid; Helena Salonen-Ros; Elna-Marie Larsson; Samantha J Brooks; Helgi B Schiöth
Journal: Int J Eat Disord Date: 2019-01-24 Impact factor: 4.861

4. Misleading Epidemiological and Statistical Evidence in the Presence of Simpson's Paradox: An Illustrative Study Using Simulated Scenarios of Observational Study Designs.

Authors: Chanapong Rojanaworarit
Journal: J Med Life Date: 2020 Jan-Mar