| Literature DB >> 35213512 |
Mayuri Mahendran1, Daniel Lizotte1,2, Greta R Bauer1.
Abstract
BACKGROUND: Intersectionality theoretical frameworks have been increasingly incorporated into quantitative research. A range of methods have been applied to describing outcomes and disparities across large numbers of intersections of social identities or positions, with limited evaluation.Entities:
Mesh:
Year: 2022 PMID: 35213512 PMCID: PMC8983950 DOI: 10.1097/EDE.0000000000001466
Source DB: PubMed Journal: Epidemiology ISSN: 1044-3983 Impact factor: 4.822
Distributions of Variables in Data Generation Models 1 and 2
| Variable | Analogous Social Position | Model 1: Categorical Inputs | Model 2: Mixed Inputs (Categorical And Continuous) | ||
|---|---|---|---|---|---|
| Type | Distribution | Type | Distribution | ||
| X1 | Income | Categorical | P(X1=0) = 0.25 | Continuous (split in quartiles to create intersections for estimation) | Mean = 0, Variance = 1 |
| X2 | Racialization (person of color, non-POC) | Binary | P(X2=1) = 0.2 | Binary | P(X2=1) = 0.2 |
| X3 | Sex/gender (male, female) | Binary | P(X3=1) = 0.5 | Binary | P(X3=1) = 0.5 |
| X4 | Education (completed postsecondary, did not complete) | Binary | P(X4=1
X3=0) = 0.4 | Binary | P(X4=1
X3=0) = 0.4 |
| X5 | Immigrant status (immigrant, nonimmigrant) | Binary | P(X5=1) = 0.25 | Binary | P(X5=1) = 0.25 |
| X6 | Age | Categorical | P(X6=0) = 0.33 | Continuous (split in tertiles to create intersections for estimation) | Mean = 0, Variance = 1 |
FIGURE 1.A,B, Boxplots of the MSE of intersection estimations for four different sample sizes (graph excludes outliers): (A) Categorical inputs and (B) Mixed inputs. Methods include three single-level regression models, the MAIHDA, cross-classification, and three tree-based methods: CART, CTree, and random forest. CART, classification and regression trees; CTree, conditional inference trees; MAIHDA, multilevel analysis of individual heterogeneity and discriminatory accuracy; MSE, mean squared error.
CART and CTree Splitting Percentages (% of Replications Variable Is Split on in Tree) and Average Number of Leaves
| CART | CTree | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| N = 2,000 | N = 5,000 | N = 50,000 | N = 200,000 | N = 2,000 | N = 5,000 | N = 50,000 | N = 200,000 | ||
| Categorical inputs | ×1 (%) | 96.3 | 96.2 | 96.7 | 96.7 | 100 | 100 | 100 | 100 |
| ×2 (%) | 55.4 | 53.5 | 51.1 | 50.3 | 98.6 | 100 | 100 | 100 | |
| ×3 (%) | 79.5 | 79.2 | 76.3 | 77.5 | 99.3 | 100 | 100 | 100 | |
| ×4 (%) | 81.9 | 79.7 | 77 | 78.1 | 99.4 | 99.9 | 100 | 100 | |
| ×5 (%) | 78.3 | 77.9 | 73.2 | 74.2 | 99.4 | 100 | 100 | 100 | |
| ×6 (%) | 0 | 0 | 0 | 0 | 41.8 | 64.1 | 91 | 94.5 | |
| Average leaves (2.5th, 97.5th percentile) | 9 (5, 13) | 9 (5, 13) | 8 (4, 12) | 9 (5, 13) | 24 (12, 36) | 34 (19, 49) | 56 (39,68) | 62 (50,70) | |
| Mixed inputs | ×1 (%) | 93 | 92.5 | 91.3 | 92.3 | 99.6 | 99.9 | 100 | 100 |
| ×2 (%) | 56 | 54 | 49.7 | 52.8 | 98.2 | 99.7 | 100 | 100 | |
| ×3 (%) | 66.5 | 67.7 | 63.5 | 64.4 | 96.4 | 99.7 | 100 | 100 | |
| ×4 (%) | 67.5 | 69.8 | 64.3 | 62.7 | 98.3 | 99.6 | 100 | 100 | |
| ×5 (%) | 56.9 | 57.2 | 54.6 | 53.3 | 98 | 99.9 | 100 | 100 | |
| ×6 (%) | 0 | 0 | 0 | 0 | 34.6 | 55 | 92.5 | 98.8 | |
| Average leaves (2.5th, 97.5th percentile) | 10 (5, 14) | 10 (5, 14) | 9 (5, 14) | 10 (5, 14) | 34 (12, 59) | 56 (19, 95) | 162 (47, 276) | 279 (88, 449) | |
CART indicates classification and regression tree; CTree, conditional inference tree.
Average VIM From Random Forests Fitted to Categorical and Mixed Input Models at N = 2,000 and N = 200,000
| N = 2,000 | N = 200,000 | ||
|---|---|---|---|
| Categorical inputs | ×1 | 616 | 57,416 |
| ×2 | 231 | 19,919 | |
| ×3 | 651 | 65,800 | |
| ×4 | 698 | 64,650 | |
| ×5 | 339 | 31,567 | |
| ×6 | 72 | 85 | |
| Mixed inputs | ×1 | 2,731 | 239,953 |
| ×2 | 308 | 31.906 | |
| ×3 | 642 | 63,981 | |
| ×4 | 635 | 63,664 | |
| ×5 | 325 | 32,870 | |
| ×6 | 552 | 9,558 |
VIM indicates variable importance measures.
NHANES VIM Results for CART, CTree, and Random Forest
| CART | CTree | Random Forest | |||
|---|---|---|---|---|---|
| Splitting Variable (Yes/No) | Splitting Variable (Yes/No) | Impurity-based VIM | Permutation-based VIM | Permutation-based VIM, | |
| Age (20–39, 40–59, 60+ years) | Yes | Yes | 477,183.3 | 111.4 | 0.010 |
| Gender (male, female) | Yes | Yes | 38,325.7 | 10.5 | 0.010 |
| Race/ethnicity (Hispanic, non-Hispanic White, Black, Asian, other) | No | Yes | 52,847.6 | 13.6 | 0.020 |
| Education (high-school education or less, at least some college education) | No | Yes | 25,974.0 | 4.4 | 0.010 |
| Marital status (married, not married) | No | Yes | 16,299.8 | 2.2 | 0.010 |
| Health insurance (insured, not insured) | No | Yes | 14,982.7 | 3.5 | 0.010 |
| Immigrant (born in the United States, immigrant) | No | Yes | 12,195.0 | 6.0 | 0.792 |
| Income (above federal poverty line, below) | No | No | 12,368.3 | 1.5 | 0.188 |
CART indicates results for classification and regression trees; CTree, conditional inference trees; VIM, variable importance measure.
FIGURE 2.A–C, Estimated mean systolic blood pressure (mm Hg) by intersection. Methods include two single-level regression models, MAIHDA, cross-classification, and three tree-based methods: CART, CTree, and random forest. CART, classification and regression trees; CTree, conditional inference trees; MAIHDA, multilevel analysis of individual heterogeneity and discriminatory accuracy.
Recommendations for Methods When Assessing Continuous Outcomes
| Recommended Uses | Recommended With Potential Alterations | Not Generally Recommended | Not Applicable | |
|---|---|---|---|---|
| Cross-classification | Estimation at large-sample sizes[ | Estimation at small sample sizes[ | Variable selection | |
| Regression (saturated model) | Estimation at large-sample sizes[ | Estimation at small sample sizes[ | Variable selection (partial information) | |
| MAIHDA[ | Estimation at all sizes | Variable selection (partial information) | ||
| CART[ | Estimation at all sample sizes | |||
| CTree[ | Estimation at all sample sizes | Variable selection (may be improved with cross-validation for alpha) | ||
| Random forest | Estimation at all sample sizes | Variable selection: with adjusted VIM[ |
Defining sample size as “large” or “small” is relative to the number of intersections of interest. In our scenario with 192 intersections of interest, we considered smaller sample sizes to be N = 2,000 to 5,000 and larger sample sizes as N = 50,000 and greater. A smaller number of intersections under study would allow for smaller sample sizes.
Multilevel analysis of individual heterogeneity and discriminatory accuracy.
Classification and regression trees.
Conditional inference trees.
Variable importance measure.
Summary of Method Outputs and Capabilities
| Regression (Saturated Model) | Cross-Classification | MAIHDA[ | CART[ | CTree[ | Random Forest | |
|---|---|---|---|---|---|---|
| Outcome estimation by intersection | X | X | X | X | X | X |
| Variance estimates for intersections | X | X | X | |||
| Effect size estimates comparing outcome across intersections | X |
| X | |||
| Identification of social identity/position |
|
| X | X | X | |
| Identification of social identity/position |
|
| X | X | ||
| Identification of | X | X | X | |||
| Ability to use continuous social identity/position variables without prior categorization | X | X | X | X | ||
| Visual subgroup identification through tree diagrams | X | X | ||||
| Ability to control for confounding | X | X | X | X | X |
Multilevel analysis of individual heterogeneity and discriminatory accuracy.
Classification and regression trees.
Conditional inference trees.
Can be estimated using a linear regression
No singular measure is produced that indicates overall relevance, but there is some information contained (see Discussion).
Relevance is variably defined and quantified.