Eva Ascarza1, Ayelet Israeli1. 1. Marketing Unit, Harvard Business School, Harvard University, Boston, MA 02163.
Abstract
SignificanceDecision makers now use algorithmic personalization for resource allocation decisions in many domains (e.g., medical treatments, hiring decisions, product recommendations, or dynamic pricing). An inherent risk of personalization is disproportionate targeting of individuals from certain protected groups. Existing solutions that firms use to avoid this bias often do not eliminate the bias and may even exacerbate it. We propose BEAT (bias-eliminating adapted trees) to ensure balanced allocation of resources across individuals-guaranteeing both group and individual fairness-while still leveraging the value of personalization. We validate our method using simulations as well as an online experiment with N = 3,146 participants. BEAT is easy to implement in practice, has desirable scalability properties, and is applicable to many personalization problems.
SignificanceDecision makers now use algorithmic personalization for resource allocation decisions in many domains (e.g., medical treatments, hiring decisions, product recommendations, or dynamic pricing). An inherent risk of personalization is disproportionate targeting of individuals from certain protected groups. Existing solutions that firms use to avoid this bias often do not eliminate the bias and may even exacerbate it. We propose BEAT (bias-eliminating adapted trees) to ensure balanced allocation of resources across individuals-guaranteeing both group and individual fairness-while still leveraging the value of personalization. We validate our method using simulations as well as an online experiment with N = 3,146 participants. BEAT is easy to implement in practice, has desirable scalability properties, and is applicable to many personalization problems.
In the era of algorithmic personalization, resources are often allocated based on individual-level predictive models. For example, financial institutions allocate loans based on individuals’ expected risk of default, advertisers display ads based on users’ likelihood to respond to the ad, hospitals allocate organs to patients based on their chances to survive, and marketers allocate price discounts based on customers’ propensity to respond to such promotions. The rationale behind these practices is to leverage differences across individuals, such that a desired outcome can be optimized via personalized or targeted interventions. For example, a financial institution would reduce risk of default by approving loans to individuals with the lowest risk of defaulting, advertisers would increase profits when targeting ads to users who are most likely to respond to those ads, and so forth.There are, however, individual differences that firms may not want to leverage for personalization, as they might lead to disproportionate allocation to a specific group. These individual differences may include gender, race, sexual orientation, or other protected attributes. In fact, several countries have instituted laws against discrimination based on protected attributes in certain domains (e.g., in voting rights, employment, education, and housing). However, discrimination in other domains is lawful but is often still perceived as unfair or unacceptable (1). For example, it is widely accepted that ride-sharing companies set higher prices during peak hours, but these companies were criticized when their prices were found to be systematically higher in non-White neighborhoods compared with White areas (2).Intuitively, a potentially attractive solution to this broad concern of protected attributes–based discrimination may be to remove the protected attributes from the data and to generate a personalized allocation policy based on the predictions obtained from models trained using only the unprotected attributes. However, such an approach would not solve the problem as there might be other variables remaining in the dataset that are related to the protected attributes and therefore, will still generate bias. Interestingly, as we show in our empirical section, there are cases in which removing protected attributes from the data can actually increase the degree of discrimination on the protected attributes (i.e., a firm that chooses to exclude protected attributes from its database might create a greater imbalance). This finding is particularly relevant today because companies are increasingly announcing their plans to stop using protected attributes in fear of engaging in discrimination practices. In our empirical section, we show the conditions under which this finding applies in practice.Personalized allocation algorithms typically use data as input to a two-stage model. First, the data are used to predict accurate outcomes based on the observed variables in the data (the “inference” stage). Then, these predictions are used to create an optimal targeting policy with a particular objective function in mind (the “allocation” stage). The (typically unintended) biases in the policies might occur because the protected attributes are often correlated with the predicted outcomes. Thus, using either the protected attributes themselves or variables that are correlated with those protected attributes in the inference stage may generate a biased allocation policy.*This biased personalization problem could be in principle solved using constrained optimization, focusing on the allocation stage of the algorithm (e.g., refs. 3 and 4). Using this approach, a constraint is added to the optimization problem such that individuals who are allocated to receive treatment (the “targeted” group) are not systematically different in their protected attributes from those who do not receive treatment. Although methods for constrained optimization problems often work well in low dimensions, they are sensitive to the curse of dimensionality (e.g., if there are multiple protected attributes).Another option would be to focus on the data that are fed to the algorithm and “debias” the data before they are used: that is, transform the unprotected variables such that they become independent of the protected attributes and use the resulting data in the two-stage model (e.g., refs. 5 and 6). While doing so guarantees pairwise independence of each variable from the protected attributes, it is difficult to account for underlying dependencies between the protected attributes and interactions of the different variables (6). Most importantly, while these methods are generally effective at achieving group fairness (statistical parity), they often harm individual fairness (7–9). Finally, debiasing methods require the decision maker to collect protected attributes at all times, both when estimating the optimal policy and when applying that policy to new individuals. A more desired approach would be to create a mapping between unprotected attributes and policy allocations that not only is fair (both at the group level and at the individual level) but can also be applied without the need to collect protected attributes for new individuals.In this paper, we depart from those approaches and propose an approach that addresses the potential bias at the inference stage (rather than pre- or postprocessing the data or adding constraints to the allocation). Our focus is to infer an object of interest—“conditional balanced targetability” (CBT)—that measures the adjusted treatment effect predictions, conditional on a set of unprotected variables. Essentially, we create a mapping from unprotected attributes to a continuous targetability score that leads to balanced allocation of resources with respect to the protected attributes. Previous papers that modified the inference stage (e.g., refs. 10–14) are limited in their applicability because they typically require additional assumptions and restrictions and are limited in the type of classifiers they apply to. The benefits of our approach are noteworthy. First, allocating resources based on CBT scores does, by design, achieve both group and individual fairness. Second, we leverage computationally efficient methods for inference that are easy to implement in practice and also have desirable scalability properties. Third, out-of-sample predictions for CBT do not require protected attributes as an input. In other words, firms or institutions seeking allocation decisions that do not discriminate on protected attributes only need to collect the protected attributes when calibrating the model. Once the model is estimated, future allocation decisions can be based on (out-of-sample) predictions, which only require the unprotected attributes of the new individuals.We propose a practical solution where the decision maker can leverage the value of personalization without the risk of disproportionately targeting individuals based on protected attributes. The solution, which we name BEAT (bias-eliminating adapted trees), generates individual-level predictions that are independent of any preselected protected attributes. Our approach builds on general random forests (GRFs) (15, 16), which are designed to efficiently estimate heterogeneous outcomes. Our method preserves most of the core elements of GRF, including the use of forests as a type of adaptive nearest neighbor estimator and the use of gradient-based approximations to specify the tree-split point. Importantly, we depart from GRF in how we select the optimal split for partitioning. Rather than using divergence between children nodes as the primary objective of any partition, the BEAT algorithm combines two objectives—heterogeneity in the outcome of interest and homogeneity in the protected attributes—when choosing the optimal split. Essentially, the BEAT method only identifies individual differences in the outcome of interest (e.g., heterogeneity in response to price) that are homogeneously distributed in the protected attributes (e.g., race). As a result, not only the protected attributes will be equally distributed across policy allocations (group fairness), but the method will also ensure that individuals with the same unprotected attributes would have the same allocation (individual fairness).Using a variety of simulated scenarios, we show that our method exhibits promising empirical performance. Specifically, BEAT reduces the unintended bias while leveraging the value of personalized targeting. Further, BEAT allows the decision maker to quantify the trade-off between performance and discrimination. We also examine the conditions under which the intuitive approach of removing protected attributes from the data alleviates or increases the bias. Finally, we apply our solution to a marketing context in which a firm decides which customers to target with a discount coupon. Using an online sample of n = 3,146 participants, we find strong evidence of relationships between “protected” and “unprotected” attributes in real data. Moreover, applying personalized targeting to these data leads to significant bias against a protected group (in our case, older populations) due to these underlying correlations. Finally, we demonstrate that BEAT mitigates the bias, generating a balanced targeting policy that does not discriminate against individuals based on protected attributes.Our contribution fits broadly into the vast literature on fairness and algorithmic bias (e.g., refs. 2 and 17–22). Most of this literature has focused on uncovering biases and their causes as well as on conceptualizing the algorithmic bias problem and potential solutions for researchers and practitioners. We complement this literature by providing a practical solution that prevents algorithmic bias that is caused by underlying correlations. Our work also builds on the growing literature on treatment personalization (e.g., refs. 23–28). This literature has mainly focused on the estimation of heterogeneous treatment effects and designing targeting rules accordingly, but it has largely ignored any fairness or discrimination considerations in the allocation of treatment.
Illustrative Example
We begin with a simulated example to illustrate how algorithmic personalization can create unintended biases. Consider a marketing context in which a firm runs an experiment to decide which customers should receive the treatment (e.g., coupon) in the future. That is, the results of the experiment will determine which customers are most likely to be impacted by future treatment. We recreate this situation by simulating the behavior of (train and test) individuals. We use to denote the outcome (e.g., profitability) of individual i, to denote the vector of protected attributes (e.g., race, gender), to denote the set of unprotected variables that the firm has collected about the individual (e.g., past purchases, channel of acquisition), and to denote whether customer i has received the treatment. We generate individual-level data as follows:and we simulate individual behavior following the processThe main patterns to highlight are that the protected attribute and are positively correlated; that individuals respond differently to treatment depending on , and ; and that all other variables have no (direct) impact on the treatment effect (τ) or on the outcome of interest (Y).
Case 1: Maximizing Outcomes.
A firm maximizing the outcome (Y) via allocation of treatment W would estimate τ, the conditional average treatment effect (CATE), as a function of the observables X and Z and would allocate treatment to the units with highest predicted CATE (). We use GRF as developed in ref. 16 to estimate the model and present the results (on test data) in Fig. 1. Fig. 1 shows the distribution of across the population, showing a rich variation across individuals. Fig. 1 presents the marginal effect of each of the variables X1 through X4 on . GRF captures the true relationships (or lack thereof) between the observables and the treatment effect very well. At first glance, the relationship shown in Fig. 1 is surprising given that, by construction, X2 does not impact . However, the correlation between Z1 and X2 causes this indirect relationship, which is revealed in Fig. 1.
Fig. 1.
Estimated CATE using GRF including protected and unprotected attributes. A shows the distribution of CATE; B–E show the relationships between unprotected characteristics X1 through X4 (x axis), respectively, and predicted CATE (y axis); and F shows the distribution of CATE by different values of the protected attribute Z1.
Estimated CATE using GRF including protected and unprotected attributes. A shows the distribution of CATE; B–E show the relationships between unprotected characteristics X1 through X4 (x axis), respectively, and predicted CATE (y axis); and F shows the distribution of CATE by different values of the protected attribute Z1.Evidence of “unintended bias” is presented in Fig. 1, where we plot the distribution of by the different values of the binary protected attribute Z1. It clearly shows the systematic difference in treatment effect between customers with and , causing a disproportional representation of customers with in the targeted group. For example, if the firm was to target 50% of its customers in decreasing order of , 12 times more individuals with Z = 1 would be targeted compared with Z = 0 individuals (i.e., the ratio is 12:1), and the outcome would be 81% higher than that obtained from an average random policy. Specifically, to compute the relative outcome, we compute the inverse probability score (IPS) estimator of the outcome generated by each allocation policy in the test data if the firm was to target 50% of the population. We then normalize this metric with respect to the outcome generated by a random allocation, such that 0% corresponds to the outcome if the firm did not leverage any personalization. (We chose a fixed proportion of individuals to be targeted [i.e., 50%] to keep the “cost” of the intervention constant across scenarios. Our results are robust to choosing different levels of targeting.)Finally, to explore the degree of individual fairness of the policy, we compute the fraction of individuals for whom the allocation would change had their protected attributes been different. This metric, which we denote as Δ Policy, is inspired by ref. 9. Essentially, for each individual, we compare their allocation with that of their “almost identical twin” who is identical in all unprotected attributes but different on the protected attributes. We find that Δ Policy , meaning that 83% of individuals would have been assigned to a different policy allocation had their protected attributes be different.
Case 2: Maximizing Outcomes and Removing Protected Attributes.
A firm might try to avoid discrimination by removing protected attributes from the data. Naturally, Δ Policy in this case because the protected variables are excluded from the whole analysis, and therefore, a change in protected attributes would not change the policy allocation. To recreate that scenario, we replicate the analysis in case 1, but we exclude all Z variables from the policy analysis. The results are presented in Fig. 2 and are remarkably similar to those in which protected attributes are used. We direct attention to Fig. 2, which shows the persistent systematic differences between individuals with and , even when protected attributes Z were excluded from the estimation. Here, the presence of X2 causes the unbalanced targeting, which in this case, translates to a ratio of 5:1 and an increase of 71% with respect to the average random policy.
Fig. 2.
Estimated CATE using GRF including unprotected attributes only. A shows the distribution of CATE; B–E show the relationships between unprotected characteristics X1 through X4 (x axis), respectively, and predicted CATE (y axis); and F shows the distribution of CATE by different values of the protected attribute Z1.
Estimated CATE using GRF including unprotected attributes only. A shows the distribution of CATE; B–E show the relationships between unprotected characteristics X1 through X4 (x axis), respectively, and predicted CATE (y axis); and F shows the distribution of CATE by different values of the protected attribute Z1.
Case 3: Maximizing Outcomes and Balance Jointly.
We propose to pursue both objectives jointly by using BEAT. The key idea of our approach is to be selective about which differences in the data are “usable” for policy allocation. For example, even though empirically, there is a relationship between () and , BEAT will not leverage individual differences on X2 when estimating the targetability scores because that would necessarily lead to unbalanced targeting with respect to Z1. Importantly, our approach does not require the user to know which unprotected variables are related to the protected attributes or the shape and strength of those relationships. Instead, it captures any kind of relationship between them. The only information that the user needs a priori is the identity of the attributes that are meant to be protected.Fig. 3 shows the results of using BEAT. Starting from Fig. 3, it is remarkable how the distribution of CBT is almost identical for the different values of Z1, showing BEAT’s ability to produce balanced outcomes. The ratio of targeted individuals by Z1 is 1:1, and the outcome is a 44% increase over that of the average random policy. Although Fig. 3 and D shows a clear relationship between the unprotected attributes and CBT, Fig. 3 highlights that the proposed algorithm does not capture differences by unprotected variables (in this case, X2) that lead to bias in targeting.
Fig. 3.
Estimated CBT using BEAT. A shows the distribution of CBT; B–E show the relationships between unprotected characteristics X1 through X4 (x axis), respectively, and predicted CBT (y axis); and F shows the distribution of CBT by different values of the protected attribute Z1.
Estimated CBT using BEAT. A shows the distribution of CBT; B–E show the relationships between unprotected characteristics X1 through X4 (x axis), respectively, and predicted CBT (y axis); and F shows the distribution of CBT by different values of the protected attribute Z1.Finally, looking at Fig. 3, the distribution of CBT is narrower than those obtained for CATE in Figs. 1 and 2. This is a direct result of how BEAT leverages heterogeneity. Because the algorithm only extracts differences that are unrelated to protected attributes, it naturally captures less variability in the outcome of interest than other algorithms that are designed to achieve the single objective of exploring heterogeneity. Accordingly, the outcome in case 3 is lower than in the previous cases.Regarding individual fairness, even though BEAT incorporates Z in the analysis—to ensure a balanced targetability—the method is designed such that the CBT scores do not depend on the protected attributes and therefore, will always achieve individual fairness (i.e., Δ Policy ).
Methods
Overview.
The primary goal of BEAT is to identify (observed) heterogeneity across individuals that is unrelated to their protected attributes. That heterogeneity is then used to implement targeted allocations that are, by construction, balanced on the protected attributes. Furthermore, we want our method to be as general as possible, covering a wide variety of personalization tasks. In the illustrative example, we were interested in identifying differences in the effect of a treatment W on an outcome Y. This is commonly the case for personalized medicine, targeted firm interventions, or personalized advertising. In other cases (e.g., approving a loan based on the probability of default), the goal would be to extract heterogeneity on the outcome itself, Y (e.g., Y indicates whether the individual will default in the future).We build on GRFs (15, 16), which are designed to efficiently estimate heterogeneous outcomes. Generally speaking, GRF uses forests as a type of adaptive nearest neighbor estimator and achieves efficiency by using gradient-based approximations to specify the tree-split point. (Ref. 16 has details on the method, and https://github.com/grf-labs/grf has details on its implementation.) The key difference between GRF and our method is the objective when partitioning the data to build the trees (and therefore, forests). Rather than maximizing divergence in the object of interest (either treatment effect heterogeneity for the causal forest or the outcome variable for prediction forests) between the children nodes, BEAT maximizes a quantity we define as balanced divergence (BD), which combines the divergence of the outcome of interest (object from GRF) with a penalized distance between the empirical distribution of the protected attributes across the children nodes.Specifically, in the BEAT algorithm, trees are formed by sequentially finding the split that maximizes the BD object:where C1 and C2 denote the children nodes in each split s; denotes the divergence in the outcome of interest (as defined in equation 9 in ref. 16; for brevity, we refer to the notation and expressions introduced in ref. 16); γ is a nonnegative scalar denoting the balance penalty, which can be adjusted by the researcher (a trade-off discussion is given below); Dist is a distance function (e.g., Euclidean norm); and is the splitting point for C1 and C2 at dimension .** When selecting splits to build a tree, ref. 16 optimizes the first term of Eq. , indicated with “GRF split criterion,” and BEAT essentially adds a second term to the equation, introducing a penalty when there are differences in the Z attributes between the children nodes. Therefore, maximizing BD when sequentially splitting the data ensures that all resulting trees (and hence, forests) are balanced with respect to the protected attributes Z.Once trees are grown by maximizing Eq. , we proceed exactly as GRF and use the trees to calculate the similarity weights used to estimate the object of interest. More formally, indexing the trees by and defining a leaf as the set of training observations falling in the same leaf as x, we calculate balanced weights as the frequency with which the ith observation falls into the same leaf as x, and we calculate the CBT score by combining the weights and the local estimate of the quantity estimated by GRF (i.e., the treatment effect heterogeneity [τ] for causal forest or expected outcome variable [μ] for prediction forests). For example, in the simplest case of prediction forest, we can express the CBT score aswhere and is the leave-one-out estimator for .Note that the only difference between the CBT scores provided by BEAT and the estimates provided by GRF are the similarity weights, whereby (from BEAT) are balanced with respect to Z and from GRF do not depend on Z. For the special case in which variables X and Z are independent, maximizing Eq. corresponds to maximizing , exactly as GRF does, and therefore, from Eq. corresponds to the outcome of GRF, for causal forests and for regression forests.
Efficiency/Balance Trade-Off.
The penalty parameter γ in Eq. determines how much weight the algorithm gives to balance in protected attributes. Running BEAT while setting γ = 0 would yield exactly the same outcomes as running GRF without protected attributes. As γ increases, the imbalance in the protected attributes would be reduced, but the main outcome to optimize may be lower as well. Naturally, introducing an additional constraint in contexts where bias exists would necessarily reduce efficiency while improving fairness. Because the penalty parameter in BEAT is flexible, one can easily explore the efficiency/fairness trade-off of any particular context by adjusting the value of γ. Fig. 4 illustrates this trade-off using the illustrative example.
Fig. 4.
Efficiency/balance trade-off for an illustrative example. Efficiency is the proportion increase in outcome compared with random allocation. Imbalance is the Euclidean distance between the average value of the protected attribute (Z1) in treated and nontreated units of each policy. We compare CF-FD, causal forest (CF) without the protected attributes, and BEAT with . Select values of γ are reported.
Efficiency/balance trade-off for an illustrative example. Efficiency is the proportion increase in outcome compared with random allocation. Imbalance is the Euclidean distance between the average value of the protected attribute (Z1) in treated and nontreated units of each policy. We compare CF-FD, causal forest (CF) without the protected attributes, and BEAT with . Select values of γ are reported.The y axis in the figure represents the efficiency of each method measured as the value obtained by personalization compared with the average random policy as described above, while the x axis captures the imbalance in protected attribute Z1, which is the overallocation proportion of .As illustrated by the example, when using the causal forest (CF) with the full data (namely, CF-FD: CF with full data; case 1 above), both the outcome and imbalance are the highest: at 81% over random and 0.69, respectively. Using a causal forest without the protected attributes (case 2 above) yields lower results for both axes: 71% and 0.42, respectively. Finally, the red line captures the performance of BEAT when using different penalty weights γ and so, captures the efficiency/balance trade-off in this particular example. If one sets γ = 0, the outcome is identical to that of case 2. Then, as γ increases (γ is increased by 0.05 intervals in the figure), there is a reduction in both outcome and imbalance, getting to a minimum outcome of 44% when imbalance is reduced all the way to zero. Note that the relationship between efficiency and balance in this example is not linear—for values of , BEAT can reduce imbalance significantly without sacrificing too much efficiency. This gives the decision maker a degree of freedom to allow a certain amount of imbalance while ensuring sufficient efficiency.
Simulation.
We validate the effectiveness of BEAT by simulating a large variety of scenarios ( shows the full set of results). Table 1 presents a subset of results that illustrate BEAT’s performance compared with other personalization methods. Scenario 1 in Table 1 presents the illustrative example. Then, in scenario 2, we use this baseline example and reduce the strength of the relationship between the protected attributes (Z) and the unprotected attributes (X2). In scenarios 3 and 4, we further vary the empirical relationship between (X2, Z) and the variable of interest (τ). Specifically, we allow both (X2, Z) to directly affect τ, either with the same or the opposite sign. The table compares the performance of BEAT and other approaches, including running GRF with and without the protected attributes (cases 1 and 2 above) and running GRF using debiased data , with with f being a (multivariate) random forest with X as outcome variables and Z as features.
Table 1.
Simulated scenarios: Comparing outcomes across methods
Method
Scenario 1 ↑ corr.;τ=f(Z), %
Scenario 2 ↓ corr.;τ=f(Z), %
Scenario 3 ↓ corr.;τ=f(Z,X2), %
Scenario 4 ↓ corr.;τ=f(Z,−X2), %
Efficiency
Imbalance
Δ Policy
Efficiency
Imbalance
Δ Policy
Efficiency
Imbalance
Δ Policy
Efficiency
Imbalance
Δ Policy
CF-FD
81.2
100.0
82.8
78.7
100.0
82.6
74.6
100.0
28.8
76.3
100.0
1.6
CF-NP
71.1
60.8
0.0
50.8
5.6
0.0
71.8
30.3
0.0
76.7
112.0
0.0
Debiased
45.8
0.3
13.5
43.0
0.1
8.9
59.9
0.6
24.3
75.0
0.8
26.7
BEAT*
44.5
0.1
0.0
42.1
0.1
0.0
44.8
0.1
0.0
45.4
0.5
0.0
Results comparing CF-FD, CF-NP, debiased, and BEAT are reported. Debiased uses regression forests to debias the data. Additional benchmarks are reported in (). corr. indicates correlation, and the corresponding arrow sign indicates whether there is high or low correlation between Z and X2. Efficiency is measured as the percentage increase in outcome over random allocation. Imbalance is normalized to 100% for the imbalance obtained with CF-FD (e.g., column 2 of scenario 1 should be read CF-NP generates 60.8% of the imbalance obtained when using the full data). Δ Policy measures the percentage of individuals for whom the outcome would change if their protected attributes were different.
Scalar γ is set to a large number, yielding the most conservative estimates for efficiency and imbalance.
Simulated scenarios: Comparing outcomes across methodsResults comparing CF-FD, CF-NP, debiased, and BEAT are reported. Debiased uses regression forests to debias the data. Additional benchmarks are reported in (). corr. indicates correlation, and the corresponding arrow sign indicates whether there is high or low correlation between Z and X2. Efficiency is measured as the percentage increase in outcome over random allocation. Imbalance is normalized to 100% for the imbalance obtained with CF-FD (e.g., column 2 of scenario 1 should be read CF-NP generates 60.8% of the imbalance obtained when using the full data). Δ Policy measures the percentage of individuals for whom the outcome would change if their protected attributes were different.Scalar γ is set to a large number, yielding the most conservative estimates for efficiency and imbalance.These scenarios illustrate several important patterns that highlight the benefits of BEAT compared with existing solutions. First, when the unprotected attributes do not directly impact the variable of interest (scenarios 1 and 2), debiasing the X’s works almost as well as BEAT—the imbalance is completely removed. However, debiasing also suffers from individual fairness bias compared with BEAT. Second, comparing scenarios 1 and 2, when the correlation between the protected and unprotected attributes is relatively low, GRF without protected characteristics (which we name CF-NP: a CF with no protected attributes) is almost as effective as BEAT and debiasing in removing the imbalance. Third, once the correlated variables impact the variable of interest directly (scenarios 3 and 4), debiasing is almost as effective as BEAT in removing the imbalance, but the individual fairness bias becomes worse. Further, when both unprotected and protected attributes impact the variable of interest but with opposite signs (scenario 4), removing protected attributes yields more imbalance compared with the baseline approach that uses all the data. From a practical point of view, this insight is relevant because firms and organizations are increasingly eliminating protected attributes from their databases (or are pressured to do so). Our analysis shows that doing so could make things worse as there are cases when removing protected attributes increases—rather than decreases—the imbalance between protected and unprotected groups.
Experiment.
We demonstrate the real-world applicability of our method using experimental data. As highlighted in Table 1, the imbalance in outcomes depends on the strength of the relationships between protected and unprotected attributes as well as the underlying relationships between observables and outcomes. Therefore, the goal of this analysis is twofold: 1) investigate the extent to which underlying correlations between protected and unprotected characteristics occur in real-world data and 2) explore whether those underlying relationships lead to imbalance against a protected group (if so, demonstrate how BEAT mitigates the imbalance and generates balanced outcomes when those relationships are ingrained in high-dimensional data, as is the case in most real applications).To that end, we ran an incentive compatible online experiment where 3,146 participants were randomly assigned to different conditions of a marketing campaign akin to a situation where some participants were offered a $5 discount and others were not ( has details). Participants were asked to make a subsequent choice of buying from the company that offered the coupon or the competitor. In addition, we collected behavioral and self-reported measures about each participant (e.g., brand preferences, attitudes toward social issues, behavior in social media sites) as well as demographic characteristics, which are generally considered as protected attributes (age, gender, and race), to explore the underlying correlations between unprotected and protected attributes and the relationship between these variables and the choice outcome. All materials were reviewed and approved by Harvard University’s Institutional Review Board, and all participants provided informed consent. The experiment materials and the results are available in the .First, we explore the correlations between the protected attributes and other customer variables, for which we run a set of (univariate) regressions of each protected attribute on the (unprotected) variables. To illustrate the correlations, we report this analysis for the “activity” variables. Specifically, we use linear regression for age and logistic regressions for gender (coded as nonmale) and race (coded as non-White).Fig. 5 shows the coefficients from those regressions, revealing that many of the unprotected attributes predict the ones that are protected. Note that most of these activity-related variables capture very “seemingly innocent” behaviors, such as how frequently one cooks their own meals or uses a dating app, behaviors that are easily captured by information regularly collected by firms (e.g., past browsing activity). Importantly, the fact that these relationships commonly occur in real life implies that, in cases where personalization leads to imbalance in the protected attributes, simply removing those attributes from the data will not be enough to mitigate such a bias.
Fig. 5.
Experiment results: The relationship between protected and unprotected attributes. The figure shows regression coefficients (and 95% CIs) of the protected attributes on the activity-related variables. We use OLS (ordinary least squares) regression for age and logistic regression for nonmale and non-White. Blue bars indicate which regression coefficients are significantly different from zero at the 95% CI.
Experiment results: The relationship between protected and unprotected attributes. The figure shows regression coefficients (and 95% CIs) of the protected attributes on the activity-related variables. We use OLS (ordinary least squares) regression for age and logistic regression for nonmale and non-White. Blue bars indicate which regression coefficients are significantly different from zero at the 95% CI.Second, we explore what would have happened if the focal company used these data to design personalized marketing policies. Specifically, we are interested in exploring whether a profit-maximizing marketing campaign would lead to unbalanced policies (in the protected attributes) and if so, how much bias would still remain even when those attributes are removed from the data. Furthermore, we want to explore the performance of BEAT on the (real-world) marketing example and demonstrate how BEAT can mitigate that unintended bias. Accordingly, we analyze our experimental data by emulating what a firm designing targeted campaigns would do in practice (26, 27, 30). Specifically, we randomly split the data into train and test samples and use the training data to learn the optimal policy under each approach. This corresponds to a firm running an experiment (or a pilot test) among a subset of users to determine the optimal targeting policy. We then evaluate each policy using the test data (we use a 90/10 split for train and test samples), mimicking the step when the firm applies the chosen policy to a larger population. Because of the relatively low sample size and high variance in the real-world data, we replicate this procedure 1,000 times. We report the average and SD of each of the outcomes of the different methods in Table 2 and additional results in ().
Table 2.
Experiment results: Comparing outcomes across methods
Efficiency
Imbalance
Δ Policy, %
CF-FD
0.578
0.157
16.7
SD
(0.050)
(0.064)
(4.0)
CF-NP
0.574
0.057
0
SD
(0.050)
(0.026)
—
Debiased
0.569
0.240
42.3
SD
(0.047)
(0.088)
(3.9)
BEAT (γ = 3)
0.575
0.042
0
SD
(0.050)
(0.018)
—
BEAT (γ = 5)
0.573
0.042
0
SD
(0.049)
(0.018)
—
BEAT (γ = 8)
0.562
0.041
0
SD
(0.049)
(0.017)
—
Results comparing CF-FD, CF-NP, debiased, and BEAT are reported. Debiased uses regression forests to debias the data; BEAT was estimated with moderate and high penalties for imbalance. The table reports the average outcomes (on the test data) of 1,000 runs of each method, and the SDs are reported in parentheses. Efficiency is measured as the proportion of users choosing the discounted product (i.e., market share of the focal brand). Imbalance calculates the average distance between standardized protected attributes of targeted and nontargeted individuals in the test data. Δ Policy measures the percentage of individuals for whom the outcome would change if their protected attributes were different. In particular, we change the age of each customer by ±1 SD. (We chose age because that was the most meaningful protected attribute as can be seen in Fig. 6.)
Experiment results: Comparing outcomes across methodsResults comparing CF-FD, CF-NP, debiased, and BEAT are reported. Debiased uses regression forests to debias the data; BEAT was estimated with moderate and high penalties for imbalance. The table reports the average outcomes (on the test data) of 1,000 runs of each method, and the SDs are reported in parentheses. Efficiency is measured as the proportion of users choosing the discounted product (i.e., market share of the focal brand). Imbalance calculates the average distance between standardized protected attributes of targeted and nontargeted individuals in the test data. Δ Policy measures the percentage of individuals for whom the outcome would change if their protected attributes were different. In particular, we change the age of each customer by ±1 SD. (We chose age because that was the most meaningful protected attribute as can be seen in Fig. 6.)
Fig. 6.
Experiment results: Variable importance across methods. The figure plots the variable importance of the top 20 variables for the CF-FD and the corresponding importance of these variables across the other methods: CF-NP, the debiased method, and BEAT. Details about the experiment and the different variables appear in ().
As can be seen in Table 2, BEAT performs best in removing the imbalance generated when using the CF-FD; however, it is not statistically different from the imbalance achieved by just removing protected attributes (CF-NP). Additionally, unlike the simulated scenarios, the debiased method performs poorly and does not eliminate the imbalance. This is likely due to the real-world nature of the data, which have more variance than our simulated settings. In terms of individual fairness, the debiased method generates especially disparate outcomes with Δ Policy , while the full causal forest (CF-FD) yields different allocation for 16.7% of individuals. Unlike our simulated scenarios, the efficiencies of the personalization methods do not differ from each other in this real-world setting. [The finding that personalization may yield very little benefit in real settings is consistent with previous work (27, 31).] That is, in this case, a firm concerned about imbalance may be better off by simply using a uniform or random policy rather than personalizing the offers based on individual’s observable information. Therefore, another way BEAT can be used is to help a decision maker decide whether to use a uniform or a personalized policy.Finally, we explore in more depth the attributes that cause the imbalance in the most efficient marketing allocation. We identify the 20 variables with the highest variable importance when estimating the full causal forest, and then, we examine their relative importance when applying each of the other methods (Fig. 6). (For completeness, we report the top 20 variables identified by each method in .)Experiment results: Variable importance across methods. The figure plots the variable importance of the top 20 variables for the CF-FD and the corresponding importance of these variables across the other methods: CF-NP, the debiased method, and BEAT. Details about the experiment and the different variables appear in ().In the full causal forest, age (a protected attribute) is the most important variable, with participant’s location, time spent responding to the experiment survey, and variables related to price sensitivity (e.g., coupon usage, items on the price sensitivity scale, and spending across different domains) also ranking among the high-importance variables. The relative importance of these (unprotected) variables remains very similar when removing the protected attributes (CF-NP)—where age is not used for personalization and therefore, its importance is zero by construction. In contrast, when using the debiased method, where all the variables aside from age are still used (albeit transformed to be independent from protected attributes), their importance varies significantly compared with the previous two methods. Specifically, location and time spent on the survey (duration)—variables likely to correlate with age and other protected attributes—are less important, albeit still relatively important. Finally, results from BEAT reveal that such variables should not be used when training the model (i.e., splitting the trees) because doing so would create an imbalance in personalized policies. In other words, consistent with the results presented in Fig. 5, many of the unprotected variables used by the other methods were deemed to have a relationship with the protected attributes, which then generated unintended imbalance in the other methods. That led BEAT to eliminate these variables as a source of heterogeneity for personalization and use other unprotected attributes instead.
Discussion
As personalized targeted policies become more prevalent, the risk of algorithmic bias and discrimination of particular groups grows. The more data used for personalization, the more likely underlying correlations between protected and unprotected attributes are to exist, thus generating unintended bias and increasing the imbalance between targeted and nontargeted groups. As we have shown, even existing methods of personalization that are supposed to tackle this issue tend to maintain—or even exacerbate—imbalance between groups or threaten individual fairness. This insight is especially important as firms implement policies that remove protected attributes, presuming that doing so will remove unintended biases or eliminate the risk of discrimination in targeted policies.We introduce BEAT, a method to leverage unprotected attributes for personalization while achieving both group and individual fairness in protected attributes, thus eliminating bias. This flexible method can be applied to multidimensional protected attributes and to both discrete and continuous explanatory and outcome variables, making it implementable in a wide variety of empirical contexts. Through a series of simulations and also, using real data, we show that our method indeed reduces the unintended bias while allowing for benefits from personalization. We also demonstrate how our method can be used to evaluate the trade-off between personalization and imbalance and to decide when a uniform policy is preferred over a personalized one.Regarding the applicability of BEAT in our simulations and experiment, we used examples in which the allocation was based on the treatment effect. These included an initial phase of random assignment into treatment, and we utilized BEAT in the second stage for inference. (Like GRF, BEAT does not require randomized assignment to be present prior to analyzing the data. It can be used with observational data as long as there is unconfoundedness.) It is important to note that BEAT can easily be applied to any type of allocation decision that is based on prediction algorithms (such as medical treatments, hiring decisions, lending decisions, recidivism prediction, product recommendations, or dynamic pricing), even without the random assignment stage. For example, BEAT can be applied to the case of lending when the decision depends on the individual’s probability of default. In that case, BEAT is applied to simple regression forests (instead of causal forest) and similarly mitigates potential discrimination in the allocation of loans. We illustrate how BEAT can be applied in prediction tasks in ().A natural extension of this research would be to incorporate other measures of “fairness.” For example, BEAT uses the distribution of protected attributes in the sample population and assigns equal penalty weights to each attribute. A decision maker might want to apply different weights to different attributes or might be interested in mimicking the distribution of a different population. These are straightforward extensions to the proposed algorithm. Moreover, decision makers may have different fairness or optimization objectives in mind. As has been documented in the literature (32, 33), different fairness definitions often yield different outcomes, and practitioners and researchers should have a clear objective in mind when designing “fair” policies. We acknowledge that BEAT, in its current form, ensures equality of treatment among groups as well as individual parity but might not be effective for other measures of fairness (e.g., equity, equal distribution of prediction errors across protected groups). We look forward to future solutions that allow optimization of multiple goals, beyond the ones already incorporated in this research. We also acknowledge that the goal of BEAT is to leverage personalization while balancing the allocation of treatments among individuals with respect to protected attributes. In doing so, it naturally impacts the efficiency of the outcome of interest as documented in our simulation results. We hope that future research will develop methods that improve the efficiency levels while still achieving the desired goals.To conclude, BEAT offers a practical solution to an important problem and allows a decision maker to leverage the value of personalization while ensuring individual fairness and reducing the risk of disproportionately targeting individuals based on protected attributes.