| Literature DB >> 29568132 |
Abstract
It is claimed the hierarchical-age-period-cohort (HAPC) model solves the age-period-cohort (APC) identification problem. However, this is debateable; simulations show situations where the model produces incorrect results, countered by proponents of the model arguing those simulations are not relevant to real-life scenarios. This paper moves beyond questioning whether the HAPC model works, to why it produces the results it does. We argue HAPC estimates are the result not of the distinctive substantive APC processes occurring in the dataset, but are primarily an artefact of the data structure-that is, the way the data has been collected. Were the data collected differently, the results produced would be different. This is illustrated both with simulations and real data, the latter by taking a variety of samples from the National Health Interview Survey (NHIS) data used by Reither et al. (Soc Sci Med 69(10):1439-1448, 2009) in their HAPC study of obesity. When a sample based on a small range of cohorts is taken, such that the period range is much greater than the cohort range, the results produced are very different to those produced when cohort groups span a much wider range than periods, as is structurally the case with repeated cross-sectional data. The paper also addresses the latest defence of the HAPC model by its proponents (Reither et al. in Soc Sci Med 145:125-128, 2015a). The results lend further support to the view that the HAPC model is not able to accurately discern APC effects, and should be used with caution when there appear to be period or cohort near-linear trends.Entities:
Keywords: Age–period–cohort; Hierarchical age period cohort model; Identification; MCMC; Multilevel modelling; Obesity
Year: 2017 PMID: 29568132 PMCID: PMC5847147 DOI: 10.1007/s11135-017-0488-5
Source DB: PubMed Journal: Qual Quant ISSN: 0033-5177
Key papers (and arguments made) in the debate around the HAPC model
| Paper | Argument |
|---|---|
| Yang ( | Argues the HAPC model can be used in a Bayesian framework. Uses real data on verbal test scores, and simulations (note that the latter’s DGPs have only independent and identically distributed Normal random variation to generate the period and cohort effects). [51 cites in Google Scholar as of 8th Feb 2017] |
| Yang and Land ( | Argues the treatment of age as quadratic in the HAPC model solves the identification problem. Example using real data on verbal test scores [233] |
| Yang and Land ( | Uses the Hausman test (on multiple parameters) to test if fixed or random effects should be used for the period and cohort terms. Example using real data on verbal test scores [237] |
| Yang and Land ( | Book argues the different treatment of age (fixed) and period/cohort (random) “completely avoids” (p. 70) the identification problem. Uses various real data sources to illustrate this [132] |
| Bell and Jones ( | Argues with simulations that the HAPC model is not good at recovering DGPs in the presence of linear effects |
| Bell and Jones ( | Argues that results can be reproduced using a completely different DGP do that suggested by those results |
| Reither et al. ( | Argues that linear effects do not occur in real-life data, and thus that the model works for real data (this is illustrated, ironically, with simulations) |
| Bell and Jones ( | Argues with simulations that even when the DGP does not include exactly linear effects, the HAPC model does not work |
| Reither et al. ( | Argues that model fit statistics, and descriptive and modelled graphics, should be used to judge whether the HAPC model is appropriate for use |
| Luo and Hodges ( | Argues that grouping cohorts in different ways can produce arbitrarily different results, using simulations |
| O’Brien ( | Demonstrates why treating one or more of APC as random effects allows models to be identified, but shows that the solution that is arrived at is an artefact of the way the log likelihood is maximized |
| Fienberg et al. ( | Responding to a positive book review they contend that “Yang and Land’s approaches really are no different from previous attempts to resolve the APC identification problem insofar as they impose constraints on the estimated age, period, or cohort effects; the constraints are simply hidden in the technical details of their methodology” (p. 457) |
Papers with Reither or Yang as first author are proponents of the model, others are for the most part critical of it
A somewhat parallel debate also exists on Yang and Land’s Intrinsic estimator as a means of tackling the problem (see Pelzer et al. 2015; Te Grotenhuis et al. 2016; Yang and Land 2013b; Luo 2013a, b; Luo et al. 2016)
Fig. 1Hypothetical period and cohort trends with a slope of one, for data with the structure of that used in Reither et al. (2009). As can be seen, the cohort trend of necessity produces much more extreme residual values than the period trend, despite both having the same slope value
Fig. 2Simulation results from the DGP in Eq. (1), with results (thin grey lines) compared to the truth (large back lines/points). Row 1: age-by-cohort data; row 2: age-by-period data
Fig. 4Replication of Reither et al. (2009), with data included up to 2014. Model uses 5-year birth cohort groups (the results are the same with no grouping, and with 3-year period groups)
Fig. 3Age-by-period representation of the full NHIS dataset, with the 12 samples taken shown. Samples defined by cohorts (10 years) are in black; samples defined by age (30 years) are coloured. (Color figure online)
Fig. 5Results for models without grouping in the random effects
Fig. 6Results with periods grouped into 3-year time intervals
Model fit statistics for an example dataset used in the simulations here (based on the models used in Reither et al. 2015a)
| Model |
| Age-by-cohort data | Age-by-period data | ||||
|---|---|---|---|---|---|---|---|
|
| AIC | BIC |
| AIC | BIC | ||
| APC | 20,000 | 89 | 84,732.58 | 85,435.99 | 43 | 84,707.87 | 85,047.72 |
| AP | 20,000 | 78 | 86,179.2 | 86,795.68 | 23 | 88,936.46 | 89,118.24 |
| AC | 20,000 | 14 | 93,592 | 93,702.65 | 23 | 88,158.13 | 88,339.91 |
| A | 20,000 | 3 | 10,2007.3 | 102,031 | 3 | 91,701.97 | 91,725.68 |
| P | 20,000 | 76 | 91,382.66 | 91,983.32 | 21 | 94,676.62 | 94,842.6 |
| C | 20,000 | 12 | 94,035.69 | 94,130.53 | 21 | 89,708.2 | 89,874.18 |
| AC (both quadratic) | 20,000 | 4 | 94,427.87 | 94,459.49 | 5 | 91,686.3 | 91,725.82 |
Fig. 7Descriptive APC plots for an example dataset used in the simulations here. Row 1: age-by-cohort data; row 2: age-by-period data