Literature DB >> 23579202

Reconstructing Past Populations With Uncertainty From Fragmentary Data.

Mark C Wheldon¹, Adrian E Raftery, Samuel J Clark, Patrick Gerland.

Abstract

Current methods for reconstructing human populations of the past by age and sex are deterministic or do not formally account for measurement error. We propose a method for simultaneously estimating age-specific population counts, fertility rates, mortality rates, and net international migration flows from fragmentary data that incorporates measurement error. Inference is based on joint posterior probability distributions that yield fully probabilistic interval estimates. It is designed for the kind of data commonly collected in modern demographic surveys and censuses. Population dynamics over the period of reconstruction are modeled by embedding formal demographic accounting relationships in a Bayesian hierarchical model. Informative priors are specified for vital rates, migration rates, population counts at baseline, and their respective measurement error variances. We investigate calibration of central posterior marginal probability intervals by simulation and demonstrate the method by reconstructing the female population of Burkina Faso from 1960 to 2005. Supplementary materials for this article are available online and the method is implemented in the R package "popReconstruct."

Entities: Chemical Disease Gene Species

Keywords: Bayesian hierarchical model; Cohort component model; Fertility rate; International migration; Markov chain Monte Carlo; Mortality rate

Year: 2013 PMID： 23579202 PMCID： PMC3613971 DOI： 10.1080/01621459.2012.737729

Source DB: PubMed Journal: J Am Stat Assoc ISSN： 0162-1459 Impact factor: 5.033

INTRODUCTION

Every two years, the United Nations Population Division (UNPD) publishes detailed estimates and projections of key demographic quantities for all countries in the world, from 1950 to 2100; they appear in the United Nations World Population Prospects (UNWPP; United Nations 2009a, b). The parameters reported include age-specific fertility and mortality rates (vital rates), population counts, and international migration rates. These numbers are used for the development and assessment of policy and are of particular importance for countries that lack their own well-resourced official statistical systems. For many of these countries, the UNPD is a key partner in the process of compiling, analyzing, and publishing internationally comparable demographic data. The UNWPP tables can be classified into two broad groups, known as estimates and projections. Projections are predictions about demographic parameters in the future while estimates concern populations of the past. We will use the term reconstruction, which we find less ambiguous. Nevertheless, use of the term “estimate” agrees with standard usage in statistics; data about demographic parameters in the past are used to estimate the true values. Between the biennial editions of the UNWPP, new data are collected and old data become available, so revisions must be made to the reconstructions to accommodate it. Demographers have long acknowledged the presence of uncertainty in their measurements and many so-called indirect methods exist for obtaining “best” point estimates (e.g., United Nations 1983). It is still common practice to produce ranges of projections of vital rates and population counts based on different scenarios; high-, medium-, and low-fertility, for example. These ranges cannot be interpreted probabilistically, but methods have been developed that do yield probabilistic projections; Lee (1998) and Booth (2006) provide reviews. In contrast, there has been relatively little work on the development of fully probabilistic methods for demographic reconstruction. Here we propose a general method for reconstruction that accounts for measurement uncertainty and works with the type of demographic data that have commonly been collected for most countries over the last 60 years or so. The article is structured as follows. Existing methods of population reconstruction are reviewed in Section 2. In Section 3, we define our notation and parameters and describe the method. In Section 4, we investigate some statistical properties of our method through simulation before applying it to real data from Burkina Faso in Section 5. We close with a summary of the results and a discussion in Section 6.

EXISTING METHODS OF POPULATION RECONSTRUCTION

Outside of official statistical agencies, demographic reconstructions have been undertaken to study historical populations of the past (e.g., Wrigley and Schofield 1981) and to estimate excess mortality in crises such as famine or social upheaval (e.g., Boyle and Ó Gráda 1986; Heuveline 1998; Merli 1998; Goodkind and West 2001). The most commonly used methods are based on the demographic balancing equations. These are the basic accounting relationships, which state that the population size at time t + δ is equal to the size at time t plus births and immigrants, minus deaths and emigrants. These relationships are encoded in the cohort component method of population projection (CCMPP; Lewis 1942; Leslie 1945, 1948; Preston, Heuveline, and Guillot 2001). Given the size and age-structure of a population at some baseline time, its size and structure at any point in the future can be determined from the baseline population and the fertility, mortality, and international migration rates that prevail over the period of reconstruction. The back projection method of reconstruction (Wrigley and Schofield 1981) attempts to apply these relationships in reverse by using an estimate of the population structure at the terminal year of the period of reconstruction. This approach is problematic since the CCMPP procedure is not formally invertible. To produce sensible results, some additional constraints have to be imposed or somewhat ad-hoc fixes applied. In response to these concerns, Lee (1971, 1974) proposed the method of inverse projection. Inverse projection enacts the reconstruction forward through time; it is named for the fact that, instead of estimating counts of births and deaths from rates, count data are used to infer rates. Both of these methods have been further developed since their inception. For example, McCaa and Barbi (2004) and Rosina (2004) described extensions of inverse projection, while Oeppen (1993) proposed generalized inverse projection (GIP). All of the extended methods remain purely deterministic. A stochastic reconstruction method was proposed by Bertino and Sonnino (2003) who modeled childbirth and death as inhomogeneous Poisson processes. The method is designed to work with counts of births and deaths aggregated over age and so requires the analyst to specify model age-patterns for fertility and mortality, although different schedules can be chosen for different subperiods. These schedules are taken as the intensity functions of the process and realizations are simulated through time to produce sequences of empirical estimates of population age structures. Hence, the only source of variation accounted for is natural variation around the demographic rates; the total numbers of deaths by year are assumed to be recorded without error. Moreover, international migration is assumed to be negligible over the period of reconstruction. Finally, all of the above methods were designed to work with individual data of the kind sometimes found in European parish registers, or aggregated summaries of them. The data available for many countries in our intended application are not as detailed. The aim of this article is to introduce a new method of reconstruction that uses measurements of demographic parameters from 50 to 60 years in the past from multiple noisy data sources, often available only for limited years or periods. Uncertainty due to measurement error is expressed as probability distributions and intervals rather than deterministic, scenario-based ranges. To our knowledge, there are no existing methods of human population reconstruction that have these features. Daponte, Kadane, and Wolfson (1997) used an approach similar to ours to construct a counterfactual history of the Iraqi Kurdish population between 1977 and 1990. They, too, represented errors in the measurement of demographic parameters as the standard deviations of probability distributions. However, they used a low-dimensional parameterization of mortality and fixed the age patterns of fertility. International and sub-national migration was assumed to be negligible. Where migration is accounted for, it has been handled in different ways. Lee's inverse projection method estimates it as a residual if censuses are available at intermediate years in the reconstruction interval. Alho (1992) added it as extra error in measuring survival. Here, migration is treated explicitly and age-time specific estimates are available as for the other parameters. Complex models have also been developed by fisheries researchers to study the population dynamics of marine life (e.g., Quinn and Deriso 1999). A large body of work also exists on the dynamics of land animal populations; Brooks, Catchpole, and Morgan (2000a) and Brooks et al. (2002) provide reviews. In most wildlife studies, data-collection methods and the type and amount of data available differ somewhat from those in human demography. Nevertheless, there are similar statistical challenges to overcome and these suggest a Bayesian approach may be appropriate. For instance, in ecology and demography, multiple sources of data informing the same demographic parameters often exist. Poole and Raftery (2000) found that Bayesian melding allowed these to be synthesized in a coherent manner (see also Givens, Raftery, and Zeh 1993; McAllister et al. 1994; Raftery, Givens, and Zeh 1995). Furthermore, population dynamics models are often over-parameterized (e.g., Lee 1985, 1993). The resulting ridges in the likelihood surface posed problems that Bayesian approaches have been able to overcome (e.g., Brooks et al. 2000b; Catchpole, Kgosi, and Morgan 2001). Here we propose a Bayesian solution to the reconstruction problem in which the dominant source of uncertainty, measurement error, is adequately accounted for through fully probabilistic estimates. We handle multiple, noisy data sources and we account for uncertainty about migration as well as fertility and mortality. We do not require individual-level data of the European parish register kind.

METHOD

Notation and Parameters of Interest

In this article, we restrict our attention to the dynamics of populations of females only. The parameters of interest are age-and time-specific vital rates, net international migration flows, and population counts. We will refer to “international migration” as simply “migration” as this is the only type we consider. We use the symbols n, s, g, and f to denote population counts, survival (a measure of mortality), net migration (immigrants minus emigrants), and fertility, respectively. All of these parameters will be indexed by five-year increments of age, denoted by a, and time, denoted by t. Reconstruction will be done over the time interval [t0, T]. The age scale runs from 0 to A > 0; in our application (Section 5) A is 80. To model fertility, we define a[fert] ≤ a[fert] where fertility is assumed to be zero at ages out-side the range [a[fert], a[fert] + 5). Throughout, a prime indicates vector transpose. Given 0 = (no,t0, n5,,…, nA,t)′, the vector of age-specific female population counts at baseline t0, and using “○” to denote entrywise product, the CCMPP gives = Q( + t ○ /2)) + ○ /2, t = t0, t0 + 5,…, T − 5. Here, Q is a K by K matrix encoding fertility and mortality and K is the number of age groups. The quantity is the net number of migrants expressed as a proportion of the population size so that ○ is the net number of migrants, a count. Projection proceeds in discrete steps; those alive at the beginning of each step are subjected to the fertility, mortality, and migration prevailing during the projection interval. This is a discrete time approximation to a continuous time process and there are standard adjustments to improve accuracy. Adding half of the migrants at the beginning of the projection step and the remainder at the end is one such adjustment. The specific form we use is: where (Preston et al. 2001). Annualized, age-specific fertility rates enter Equation (1) through the transformation defined in Equation (2). This transformation maps the rates into proportions so that projection can be compactly expressed as a matrix multiplication. We use a tilde to represent this transformation and discuss it further below. SRB is the sex ratio at birth (the ratio of males to females among all births). Multiplication by SRB in Equation (2) ensures only the female births are kept. Throughout, we take SRB to be fixed at 1.05, a demographic convention. We will use M(·) to represent CCMPP and abbreviate Equation (1) as The survival parameters, s, give the proportion of those aged a – 5 to a at time t who survive to be aged a to a + 5 at time t + 5, for a = 0, 5,…, A + 5. That is, s is the proportion of people surviving into the age range [a, a + 5) over the five years between t and t + 5. Also, s is the proportion aged [A − 5, A) at exact time t who survive to time t + 5, by which time they are in the age group [A, ∞). We allow for subsequent survival in this age group by letting s be the proportion aged [A, ∞) at time t who survive five more years. Age-specific mortality during a given time period will be summarized by life expectancy at birth (e0), where The derivation is straightforward but requires additional concepts from demography (see the online supplementary materials). The quantity e0, is the average age at death in a hypothetical cohort subjected for its entire life to the mortality conditions represented by the set of age-specific survival proportions, . The net number of migrants aged [a, a + 5) to the population during the time period [t, t + 5) is gn. The age-summarized measure of migration we use will be the average annual total net number of migrants that, for the period [t, t + 5), is (1/5) ∑ gn. We model g instead of gn, however, because we expect variation over time and age in the former to be less dependent on magnitude. The represent a combination of age-specific fertility and under-five mortality. When multiplied by n, they give the number of female births surviving to time t + 5. That is, the give the number of surviving female babies at t + 5 as a proportion of the number of females aged [a, a +5) alive at time t. We do not model , however, because data are typically gathered to estimate fertility rates, f, not the proportions, . Fertility rates are annualized occurrence/exposure rates. They give the ratio of births of both sexes to women aged a to a + 5 (occurrence) to person-years lived (exposure) during [t, t + 5). Thus, the number of births in the projection step t to t + 5 is not quite 5 f n because the denominator of fis not n (we need the factor of five because f are single-year rates but the projection intervals are five years wide). The expression in Equation (2) is more accurate. To illustrate, calculating n0, (and taking g0,n0, = 0 for clarity) gives (where we use the SRB to keep only the female births). The quantity in the right-most parentheses in Equation (5) is a better approximation for the denominator of f than n. We multiply by s0, to account for mortality of births during the projection step. Age-specific fertility rates for a given time period will be summarized by the total fertility rate (TFR) that, for the period [t, t + 5), is defined as It is the average number of children born to members of a hypothetical cohort of women who survive from age a[fert] to age a[fert] + 5 and experience the age-specific fertility rates that were in effect during the period [t, t + 5).

Data and Initial Estimates

Data on the parameters of interest come from numerous sources of varying quality and coverage. For example, estimates of fertility rates come from registers of births or from surveys such as the Demographic and Health Surveys (DHSs), depending on the country. Moreover, several data points for the same age- and time-specific parameters are often available due to multiple overlapping surveys. The UNPD often adjusts these outputs to reduce systematic biases. They apply existing demographic techniques and draw on their expert knowledge about the specific data sources. Further details about the UNPD's methodology can be found in the literature by the United Nations (2010). In many cases, this bias-reduction stage is highly specific to country and parameter, and so we do not propose a general approach to replace this part of the analysis. Instead, we take the outputs of this stage, which are single bias-reduced age-time series for all four parameters, as inputs to our model. We call the elements of these biased-reduced single series initial estimates and use an asterisk (*) to distinguish them from the true values. For example, we let f be the true (unknown) fertility rate and f* be an initial estimate of it; similarly for the other parameters. We also use an asterisk to indicate estimates of population counts based on bias-reduced census counts (e.g., adjusted census counts based on post-enumeration surveys) or similarly adjusted counts from a comprehensive demographic survey. Hence, we let n be the true population count and n* a census-based estimate of it.

Model Description

We will take t0 and T to be, respectively, the earliest and most recent years for which census-based population counts are available. These will usually be bias-adjusted census counts, and we will just call them census counts. We will denote the years after t0 for which census-based estimates of population counts are available by t0 < t[cen], …,t[cen] ≥ T. Let = (, , f + 5, …, f, , …, , , …, ) be the vector of all age- and time-specific vital rates and migration parameters, as well as population counts at t0. These are precisely the inputs required by the CCMPP, M(·) in Equation (3). Census counts for years t[cen], …, t[cen] are not included in θ because they are not inputs to M(·). Reconstruction is equivalent to estimation of . To achieve this, we propose the following hierarchical model that depends on the original data only through the initial estimates. We model the census counts after the earliest census at Level 1, conditional on the outputs of the CCMPP, which appear in Level 2. The initial estimates of fertility, mortality, and migration, as well as census counts at t0, are modeled at Level 3. Level 4 specifies informative prior distributions. We assume, a priori, that the elements of are mutually independent given *. (where t = t0, t0 + 5,…, T in Equations (9)–(12)) For 0 < x < 1, logit x = log(x/(1 — x)). We also impose the restriction that all population counts be nonnegative, so that the joint prior on at time t is multiplied by In standard Bayesian terms, Equation (7) is the likelihood of ,…, , while Equation (13) is the prior distribution of σ2, specified by the user-defined hyperparameters α and β. Population counts at baseline are treated differently because Equation (8) is essentially a standard difference equation with initial condition , which contains . These are the only population counts that M(·) takes as inputs. The remaining census counts, at times t[cen],…,t[cen], are compared with the outputs of M(·) via Equation (7). Inference will be based on the joint posterior distribution of . The quantities involved, and their dependence relations, are summarized in Figure 1.

Figure. 1.

Plot showing the relationships between parameters, initial estimates, and the cohort component method of population projection in the estimation model.

Determining the Hyperparameters

To determine plausible values of α and β, v ∊ {n, f, s, g}, we view the σ2 as representing the variance of the errors in the initial estimates of the respective demographic parameters. Although prior knowledge about these variances is unlikely to be exact, we expect that informative estimates can be derived from experts' knowledge of the data sources. Methods for eliciting prior information from experts are numerous (e.g., O'Hagan et al. 2006). Here, we use a straightforward method based on the mean absolute error (MAE) of the transformed initial estimates. Taking fertility rate as an example, note that Equation (10) implies that MAE(log f | σ2) ≡ E(| log f — log f* | | σ2) = . The prior distribution for σ2 can be specified by choosing quantiles for MAE(log f | σ2). Suppose expert opinion is that MAE(log f | σ2) is likely to be close to 0.1, but could be as high as 0.5. This suggests setting median(σ2) = (0.1)2π/2 = 0.016 and the 0.975 quantile to (0.5)2π/2 = 0.393. To find an inverse gamma distribution with these quantiles, we would fix α at a range of values between 0.3 and 6 and choose β such that median(MAE(log f | σ2)) = 0.1. The parameter α would then be chosen such that the 0.975 quantile of MAE(log f | σ2) was about 0.5. This would give α = 1 and β = 0.0109. Since demographers are more used to thinking about untransformed fertility rates, it is useful to consider what specifying MAE(log f | σ2) means on the original scale. MAE on the log scale approximates mean absolute relative error (MARE) on the original scale, where MARE(f | σ2) ≡ E(|f − f*| |σ2)/f*. This approximation is good for the MAE values used here. The MARE has been used previously by demographers as a way of quantifying measurement accuracy (Keilman 1998). For the migration parameter, which is already a proportion, we specify the MAE directly. The population count variance, σ2, is also modeled on the log scale and α, β are found in the same way as α and β. Survival is measured on the logit scale, but this should not be too difficult to interpret. Note that E(|logit – logit * | | σ2) = E(|logit s − logit s | | σ2) where = 1 − . In practice, the are close to zero, so that ≈ /(1 − ). Thus, specifying the MAE of the log-odds of survival is not that different from specifying it for the log-probability of death.

Estimation

We draw samples from the joint posterior using a Markov chain Monte Carlo (MCMC) sampler (Metropolis et al. 1953; Hastings 1970; Geman and Geman 1984). Without the restriction in Equation (14), the full conditional posterior distributions for the variance hyperparameters would be the usual conjugate inverse gamma distributions. With the restriction, the conjugate forms are not exactly correct but will probably be close to the true full conditionals. Therefore, to update these parameters we use the conjugate full conditional distributions as proposal densities in Metropolis-Hastings steps. The posterior densities of the remaining parameters are not easy to express analytically since each vital rate enters the likelihood through the map M(·). Therefore, these parameters are updated using Metropolis-Hastings steps with univariate normal proposal densities, with variances tuned by the method proposed by Raftery and Lewis (1996). We will use the term “iteration” to refer to one complete sweep through all age- and time-specific parameters and variance parameters. The R environment for statistical computing (R Development Core Team 2010) together with the CODA package (Plummer et al. 2006, 2010) were used for all data manipulation, model estimation, and output analysis. R code for the simulation study (Section 4), reconstruction of the female population of Burkina Faso, and model checks (Section 5) are available as online supplementary materials. The method is also implemented in the R package “popReconstruct.”

SIMULATION STUDY

We now describe the results of a simulation study carried out to investigate the calibration of posterior probability intervals for f, s, g, and n under repeated sampling of the initial estimates.

Inputs

The true vital and migration rates assumed to have prevailed in this population are shown in Tables 1 and 2. We denote these by f[true], s[true], g[true], and n[true]. In practice, these would be the unknown, true values of the parameters f, s, g, n that appear in Equations (9)–(12). This is not intended to be a realistic model of a human population; datasets typically encountered in human demography have up to 18 age categories and any number of time periods. However, we believe that this reduced population model is of sufficient size and complexity to explore the characteristics of the statistical model while not being too computationally expensive. The TFR and e0 were kept constant at 0.7 births per woman per year and 15.61 years, respectively, for the duration of the reconstruction. A varying pattern of migration was chosen that consisted of net out-migration in the first half of the reconstruction period followed by net in-migration in the second half. The magnitude of the flows was quite volatile, varying from 13% to 26% of the receiving population. Migration in both directions was concentrated in the two middle age groups.

Table 1.

True vital rates used in the simulation study Time period

	Time period
Age	(1960, 1965)	(1965, 1970)	(1970, 1975)	(1975, 1980)
Fertility rate
0	0.00	0.00	0.00	0.00
5	0.40	0.40	0.40	0.40
10	0.30	0.30	0.30	0.30
15	0.00	0.00	0.00	0.00
Survival proportion
0	0.90	0.90	0.90	0.90
5	0.95	0.95	0.95	0.95
10	0.85	0.85	0.85	0.85
15	0.80	0.80	0.80	0.80
20+	0.10	0.10	0.10	0.10
Migration proportion
0	−0.03	−0.05	0.03	0.05
5	−0.05	−0.10	0.05	0.10
10	−0.06	−0.11	0.06	0.11
15	−0.01	−0.01	0.01	0.01

Table 2.

True population counts used in simulation study

	Year
Age	1960	1965	1970	1975	1980
0	7500	8482	9453	11436	14504
5	6000	6886	7512	9280	11600
10	4000	4862	5293	6690	8651
15	3000	3404	3998	4762	6149

True vital rates used in the simulation study Time period True population counts used in simulation study The true population counts, denoted n[true], are shown in Table 2. Those for 1960 were chosen to represent a young population. Those for the subsequent time periods were derived by applying the CCMPP to the 1960 population using the vital rate and migration parameters in Table 1. Therefore, the underlying true population dynamics over the reconstruction period were completely and deterministically defined by Equation (1). The entries in Table 2 correspond to n, t = t0, t[cen], …, t[cen] in Equation (8). The hyperparameters of the inverse gamma distributions were determined as described in Section 3.4. The median MAEs for log-fertility rate, logit survival, and log-population counts were set to 0.1 with 0.975 quantiles of approximately 0.5. The same quantities for migration were set to 0.2 and approximately 1, respectively. The resulting values of α, β, and MAE quantiles are shown in Table 3. For fertility, MAE; similarly for population count. For survival, MAE and for migration MAE.

Table 3.

Level 4 hyperparameters and selected implied quantiles of the Mean Absolute Error (MAE) of the respective demographic parameters used in the simulation study and the application to Burkina Faso. f and n are modeled on the log scale, s is modeled on the logit scale, and g is untransformed

			Quantiles of MAE(v)
υ	α	β	0.025	0.25	0.5	0.75	0.975
f, s, n	1	0.0109	0.0433	0.0707	0.1	0.1552	0.5232
g	1	0.0436	0.0867	0.1414	0.2	0.3104	1.0465

Study Design

The values in Tables 1–3 remained fixed throughout the simulation study and the initial estimates f*, s*, g*, and n* were treated as random. We drew n* | σ2 from a logNormal(log n[true], σ2) distribution in accordance with Equation (7). The remaining initial estimates were drawn from distributions derived from Equations (9)–(12) in an analogous manner. For example, the σ* | σ2 were drawn from a logNormal(log f[true], σ2) distribution. The coverage of central marginal Bayesian confidence intervals (or credible intervals) under the model was estimated by the following experiment. For j = 1,…, J: Randomly sample, σv2[, v = n, f, s, g, from Equation (13). Generate initial estimates f*[, g*[, n*[ for a = 0,…, 15+, t = 1960, …, 1980, s*[, for a = 0, …, 20+ t = 1960,…, 1980, as described above. Check that Equation (14) is satisfied by the initial estimates; if not return to Step 1. Draw a large MCMC sample from the joint posterior and find the 0.025, 0.5, and 0.975 quantiles of the marginal distribution of each parameter. The estimated coverage is then the proportion of the J Bayesian confidence intervals containing the known, true value for each parameter. We set J = 200 and applied the estimation method described in Section 3.5. Initial values for the population counts, vital rates, and migration proportions were set to the initial estimates. Start values for the variances were arbitrarily set to 5 as they appeared to have a negligible effect on the final results.

Results and Discussion

Point estimates of the coverage of the marginal 0.95 posterior probability intervals are shown in Table 4. These are all close to 0.95. In practical applications with real datasets, where the true parameter values are unknown, interest will be in interval estimates of the demographic parameters. These should be based on the joint posterior distribution. For illustration, we have plotted central marginal Bayesian confidence intervals of a selection of age-specific and age-summarized parameters that might be of interest based on the MCMC sample from a single replicate of the simulation study (Figure 2). For comparison, we have also plotted the true parameter values used throughout the simulation and the noisy initial estimates generated under the model.

Table 4.

Estimated coverage probabilities of 95% posterior intervals for all demographic parameters of interest

		Demographic parameter
Years	Ages	Population	Fertility	Survival	Migration
1960	[0, 5)	0.930
	[5, 10)	0.940
	[10, 15)	0.945
	[15, 20)	0.920
	20+
[1960, 1965)	[0, 5)			0.940	0.940
	[5, 10)			0.945	0.935
	[10, 15)		0.965	0.950	0.950
	[15, 20)		0.935	0.955	0.960
	20+			0.960
[1965, 1970)	[0, 5)			0.950	0.940
	[5, 10)		0.920	0.945	0.940
	[10, 15)		0.960	0.935	0.965
	[15, 20)			0.925	0.955
	20+			0.935
[1970, 1975)	[0, 5)			0.915	0.960
	[5, 10)		0.920	0.955	0.950
	[10, 15)		0.930	0.950	0.935
	[15, 20)			0.965	0.940
	20+			0.890
[1975, 1980)	[0, 5)			0.950	0.930
	[5, 10)		0.920	0.950	0.970
	[10, 15)		0.955	0.945	0.950
	[15, 20)			0.950	0.955
	20+			0.975

Figure. 2.

Ninety-five percent Bayesian confidence intervals for selected parameters from a single replication of the simulation study. (a) Age-specific fertility rate. (b) Total fertility rate. (c) Life expectancy at birth. (d) Average annual total net number of migrants. The online version of this figure is in color. Estimated coverage probabilities of 95% posterior intervals for all demographic parameters of interest Bayesian confidence intervals can be plotted for age-specific parameters as has been done for age-specific fertility rates in Figure 2(a). Confidence intervals for any function of the age-specific parameters can be obtained immediately by transforming each vector of age-specific values in the MCMC sample and computing the sample quantiles. Quantities of particular interest are the summary measures defined in Section 3.1. We show TFR, e0, and the average annual total net number of migrants in Figures 2(b)–(d).

RECONSTRUCTION OF THE POPULATION OF BURKINA FASO, 1960–2005

We now illustrate the method by reconstructing the female population of Burkina Faso from 1960 to 2005. Uncertainty in this case is nonnegligible due to the fragmentary nature of the available data. This application shows how our method is able to quantify this appropriately by producing probabilistic interval estimates.

Initial Estimates

Brief descriptions of our initial estimates are given below. Further details are in the online supplementary materials, including the initial estimates themselves. The parameters α and β, v ∊ {n, f, s, g}, were set to the same values as in the simulation study (Table 3); the MAEs given there were based on expert opinions provided by UNPD analysts for the case of Burkina Faso.

Population Counts

Population counts, n*, in exact years 1960, 1975, 1985, 1995, and 2005 by sex were taken from the literature by the United Nations (2009a) that are based on a 1960–1961 demographic survey and censuses in 1975, 1985, 1996, and 2006. The United Nations (UN) figures are preferred over the raw census counts because important adjustments were made for underenumeration. This form of bias is more common in certain age-groups and efforts to reduce it are based on postcensal surveys. Burkina Faso experienced a high level of population growth between 1960 and 2005; the total female population increased from 2.3 million to 6.9 million, an average of 2.4% per year. The population has a young age structure, as illustrated by the age-specific population counts in Figure 3.

Figure. 3.

Population counts by five-year age group for the female population of Burkina Faso. The online version of this figure is in color.

Fertility Rates

Initial estimates of age-specific fertility rates, f*, were derived from multiple, overlapping series of point estimates taken from the 1960 and 1991 demographic surveys, the 1976 census post-enumeration survey, the 1985, 1996, and 2006 censuses, and the 1992–93, 1998–99, and 2003 DHSs. Alkema et al. (2012) studied estimates of TFR using these data and found evidence of bias. Therefore, we took Alkema et al.'s (2012) median bias-adjusted estimates as our initial estimates of TFR and multiplied them by age-specific fertility patterns. Age patterns sum to one and indicate the share of fertility attributable to each age-group. We obtained a separate pattern for each projection interval in the reconstruction period by smoothing the available point estimates over age, within interval, using loess (Cleveland 1979; Cleveland, Grosse, and Shyu 1992) and normalizing. The loess method performs a series of locally weighted regressions. Smoothing within five-year sub-interval (Figure 4) yielded trends that were also sensible, a priori, when viewed by five-year age group (see supplementary materials). No point estimates were available for the period 1965–1970. To generate initial estimates for this period, we multiplied the 1960–1965 age-pattern by Alkema et al.'s (2012) adjusted TFR estimate for 1965–1970.

Figure. 4.

Data points for the initial estimates of age-specific fertility patterns of Burkina Faso women, 1960–2005, grouped by five-year sub-interval. The lines are the within-time loess smooths. The online version of this figure is in color.

Survival Proportions

Initial estimates of survival proportions, s*, were derived from abridged demographic life tables constructed using data on under five and adult mortality from the 1960–1961 and 1991 national demographic surveys, the 1992–1993, 1998–1999, and 2003 DHSs, and UNICEF's Multiple Indicator Cluster Survey—Round 3 conducted in 2006. The tables were created by applying the Brass two-parameter relational logit model (Brass 1971) with the Timæus Sahelian standard (Timæus 1999). Standard methods were used to derive survival proportions from the life tables.

Migration Proportions

Estimates of migration of comparable detail to those for vital rates are seldom available for many countries. Some estimates for Burkina Faso by broad age-group and sex are given by Condé (1980) for the period 1960–1975 who concluded that migration during this period was primarily labor migration. In addition, whole-population estimates for 1960–2005 are available from the literature by the United States Census Bureau (2008) and United Nations (2009a) that indicate sustained net out-migration over the period 1960–2000. The latter source suggests a reversal to net in-migration over 2000–2005. We designed our initial estimates to reflect the direction and approximate magnitude suggested by these sources. The g* were set to −0.055 for 15 ≥ a ≥ 50, 1960 ≥ t ≥ 1995, 0.055 for 15 ≥ a ≥ 50, t = 2000, and zero otherwise. Thus, for example, our initial estimate for the net number of migrants over the five-year period 1960–1965 is centered at −5.5% of the 1960 population, an average of −1.1% of the 1960 population per year for the five years in the projection interval. The direction is reversed over 2000–2005, since Burkina Faso likely received refugees from the civil war that broke out in neighboring Côte D'Ivoire during this period. The greater uncertainty about these initial estimates than about those for the other parameters is accounted for by setting α, β such that the median and the 0.975 quantile of the MAE were twice as large as the values used for the other parameters (Table 3).

Results

We summarize the full joint posterior distribution with posterior medians and 95% central Bayesian confidence intervals based on the marginal distributions of age-specific input parameters and age-summarized versions such as TFR and e0. We use half the width of the confidence intervals (“half-widths”) to indicate the magnitude of posterior uncertainty. Summarizing the joint posterior distribution of θ by quantiles of marginal distributions provides coherent and probabilistic interval estimates. They are coherent because uncertainty about all other parameters is accounted for. Our simulation study suggests that the intervals are also well calibrated in that they achieve their nominal coverage over repeated random sampling of the initial estimates and census data, under the model. The posterior medians for population counts at baseline (Figure 5) are very close to their initial estimates; across age groups, the maximum absolute difference is 2.3% of the initial estimate. All of the intervals have half-widths less than 7.4% of the median, indicating that posterior uncertainty about this quantity is low.

Figure. 5.

Ninety-five percent Bayesian confidence intervals for the female population of Burkina Faso by age in 1960. Also shown are the initial estimates. The online version of this figure is in color.

Fertility

A similar plot for age-specific fertility rates is shown in Figure 6. Again, the posterior medians are close to their respective initial estimates, but the half-widths of the posterior intervals are wider, ranging from 18% to 21% of the median. This is because most of the information about fertility in the nonfertility parameters comes from the population counts in the age range 0–5, and this depends mainly on the level of fertility, not how it is distributed across age group of mother. Information about the age pattern of fertility in our posterior distribution comes mostly from the data-derived initial estimates. The interval widths narrow as age-specific fertility approaches zero, which occurs at the extremes of the age range of nonzero fertility. Due mainly to biology, human fertility is known to be low at the extremes of this range with a high degree of certainty. The shape of our posterior intervals reflects this.

Figure. 6.

Ninety-five percent Bayesian confidence intervals for age-specific fertility rates for the female population of Burkina Faso, 1960–2005. Also shown are the initial estimates. The online version of this figure is in color. Posterior median estimates of TFR (Figure 7(a)) increased from 7.1 children per woman in 1960–1965 to about 7.4 over the period 1965–1980, and then decreased to 6.4 children per woman in 2000–2005. Over the entire reconstruction period, the limits of the 95% intervals are equivalent to about plus or minus half a child. The posterior medians are slightly lower than the initial estimates, which were based only on information about fertility collected mainly in surveys. Our posterior distribution for TFR also takes population counts, mortality, and migration into account. There is more information about TFR in these parameters than there is about the fertility age pattern, explaining why the interval half-widths for TFR are narrower, relatively, than those for the age-specific rates; all are less than 7.5% of the posterior median.

Figure. 7.

Ninety-five percent Bayesian confidence intervals for selected age-summarized parameters for the female population of Burkina Faso, 1960–2005. (a) Total fertility rate. (b) Average annual total net number of migrants. (c) Under-5 mortality. (d) Life expectancy at birth. The online version of this figure is in color.

Mortality

Posterior estimates of mortality are presented in terms of e0 and under-five mortality, 1 – s0, (Figures 7(c) and (d)). Under-five mortality is often viewed as an indicator of a country's level of development. The posterior marginal distributions for these two parameters reflect a sustained improvement in mortality conditions in Burkina Faso over the period. Posterior median estimates of e0 for 1960–1965 and 2000–2005 are 35 and 52 years, respectively. The interval half-widths decrease from 2.2 years in 1960–1965 to 1.8 years in 2000–2005, indicating a decrease in uncertainty about this parameter over the interval of reconstruction. Posterior median estimates of under-five mortality declined from 0.25 to 0.12 over the period (95% intervals: [0.21, 0.29] and [0.1, 0.15], respectively).

Migration

We summarize age-specific migration by the average annual net number of migrants added to the population, in units of 1000 (Figure 7(b)). The average of the posterior median estimates between 1960 and 2000 is −18 (thousand women) per year. Importantly, the 95% intervals contain zero at all time periods and, moreover, they are very wide. The interval for 1960–1965 is [−28, 13], while it is [−73, 11], for 1995–2000. Burkina Faso has been characterized as a country of emigration between 1960 and 2000, with migration dominated by people moving to find work in neighboring countries. This view is consistent with knowledge about the labor market during that period, as well as with data collected in neighboring countries (Condé 1980; United States Census Bureau 2008). These data are fragmentary, however, and there are no reliable estimates of the magnitude of the migratory flow. Our results reflect this situation. Our marginal posterior distribution is centered below zero, which suggests that, on balance, the available data are more consistent with a net outflow up to 2000. Nevertheless, there is insufficient information to rule out a zero, or even a small positive, net flow. For 2000–2005, the 95% interval is [−29, 70] with posterior median 22. While a positive net flow is suggested, the interval is very wide and contains zero. Clearly, there is a great deal of uncertainty about migration during this period, even after taking information about all the other parameters into account.

Model Checking and Sensitivity Analysis

The census counts, n*, play a central role in our model; all vital rate and migration parameters are related to one another a posteriori through these counts. We conducted two checks; one to assess sensitivity to the form of the likelihood and another to assess out-of-sample predictive performance. We assessed sensitivity to the use of the relatively light-tailed normal likelihood with constant variance using the approach by Carlin and Polson (1991). Replacing σ2 with λσ2, λ ∼ InvGamma(1, 1), t = 1960, 1965, …, 2000, a = 0, 5, … 80+, in Equations (7) and (9) relaxes the assumption of constant variance. Moreover, in comparison with Equation (7), the (marginal) likelihood under this formulation is the heavier-tailed Student's t(mean = log n, scale = σ, df = 2) distribution. We refer to this modification as the “t2 likelihood model.” Posterior marginal quantiles of the demographic parameters were examined to assess the need for this additional flexibility. We compared the empirical 0.025,0.5, and 0.975 quantiles of the posterior marginal distributions of each age- and time-specific parameter using the MCMC samples from the runs under the modified and original models. This involved comparing two sets of multivariate distributions. Within each parameter, except migration, we calculated the absolute relative differences (ARDs) for each age and time, expressed as a percentage, and summarized them by their averages and maxima, taken over all ages and times. For example, the mean of the ARDs of the 0.025 quantiles for fertility rate was calculated as where f[0.025,orig] and f[0.025, are the 0.025 quantiles of the posterior distribution for f under the original and t likelihood models. The numerals 17 and 9 correspond to the number of age groups and time periods, respectively. The maximum ARDs were computed similarly. Migration is expressed as a proportion so we calculated absolute (nonrelative) differences for this parameter. Posterior distributions of the λ were also examined to assess the need for the additional flexibility of the t2 likelihood model. Posterior distributions of the λ concentrated away from 1 would indicate that the original normal-based model fits poorly (Carlin and Louis 2009, chap. 4). The predictive ability of the original model was tested by rerunning the analysis four times, each time omitting one of the census datasets. For t = 1975, 1985, 1995, 2005, we compared the posterior predictive distributions n | n* with the point values n*, where n* is the set of census counts for age a for all years except t. We call this the “out-of-sample validation.” For each of the four runs, we summarized the difference between the posterior predictive median and the census counts using MARE, expressed as a percentage, where ñ is the sample median of the n based on the MCMC sample and the numerals 17 and 4 refer to the number of age groups and censuses, respectively. Small values of MARE suggest that the model predicts the observed counts well.

Results

Quantiles of the posterior marginal distributions from the t2 likelihood model were close to those from the original model (Table 5). All mean ARDs were below 6% and most maximum ARDs were below 5%. Large ARDs were found between the extreme quantiles of some population counts, the largest for the 0.975 quantile in 1960, age group 65–69 (ARD = 36%). However, on the raw scale, this represented a difference of less than 10,000 people, which is small relative to the total population size, the median estimate of which was slightly over two million in 1960 under both models.

Table 5.

Parameter	Quantile	Mean ARD (%)	Max ARD (%)
Fertility rate	0.025	0.79	4.09
	0.500	0.22	0.65
	0.975	0.52	1.87
Survival proportion	0.025	0.38	2.67
	0.500	0.06	1.67
	0.975	0.30	2.39
Migration proportion	0.025	0.68	0.29
	0.500	0.72	2.36
	0.975	1.81	2.20
Population count, 1960	0.025	0.92	1.73
	0.500	0.20	0.48
	0.975	5.38	36.48
Population count, 1965–2005	0.025	0.92	19.08
	0.500	0.24	1.87
	0.975	1.07	24.36

Mean and maximum Absolute Relative Differences (ARDs) between posterior marginal quantiles of the posterior distributions from the t2 likelihood and original models. Absolute differences (ADs) are given for migration (see Section 5.3) The posterior distributions of the λ were virtually all centered around 1. That of λ80,1975 had the median furthest from 1, but was still very spread out; its 0.025, 0.25, 0.5, 0.75, and 0.975 quantiles were, respectively, 0.30, 0.93, 2.4, 9.4, and 167. This suggests that the extra flexibility of the t2 likelihood model is not needed. The MARE from the out-of-sample validation was 3.9%, indicating that the posterior predictive estimates of the census counts were close to the observed census counts. Overall, these results indicate that the normal-based model as originally formulated in Section 3.3 fits the data reasonably well.

DISCUSSION

We have described a method for reconstructing past populations by age and sex that is designed to work with the type of data commonly collected in modern demographic surveys and censuses, especially in developing countries. Population dynamics are modeled by the well-known cohort-component method of population projection and measurement error is accounted for in a coherent, fully probabilistic manner through a Bayesian hierarchical model. Inference is based on the joint posterior distribution of all parameters, which depends on data through bias-reduced initial estimates. We applied our method to a real dataset and found that the widths of our posterior intervals indicated a nonnegligible amount of uncertainty about all parameters, but were of magnitudes within the range of variation demographers are used to when working with estimates and projections. The method is implemented in the R package “popReconstruct.” Inferences are likely to be sensitive to changes in the initial estimates. However, these are primarily data-derived and our goal is to synthesize all of the available information; changes in these parameters should have an influence on the posteriors. In our application to the female population of Burkina Faso, the posterior medians were similar to the initial estimates in many cases. This suggests that the initial estimates of each parameter agreed closely with the census counts and the initial estimates of the other parameters. We would expect to see a greater difference between the posterior medians and initial estimates of parameters for which this agreement is weaker. Lee (1971, 1974) and Oeppen (1993) proposed deterministic methods of population reconstruction. We have assumed a deterministic model only for the population dynamics. That is, given the true vital and migration rates, the evolution of the population is modeled deterministically. In contrast, Bertino and Sonnino (2003) gave a method in which the population dynamics were stochastic and the vital rates functioned as mean parameters. We agree with Pollard (1968; see also Cohen 2006) and expect that, in our applications, variation due to measurement error will overwhelm any additional variation arising from a stochastic population dynamics model. Moreover, Bertino and Sonnino's (2003) method cannot be easily applied in our context because it was designed to take counts of baptisms and deaths by year as inputs. Information of this kind is seldom available in most of the cases where we wish to apply our method. Key features of our approach, therefore, are that the major source of uncertainty in population reconstruction, namely measurement error, is appropriately accounted for through fully probabilistic estimates and the method can be used with the kind of data available for most countries over the past 60 years. We made the simplifying assumption of constant variance across age and time for each demographic parameter on the log or logit scale. This might seem restrictive; more complex variance structures for the other parameters could be proposed to allow for the fact that more is typically known about infant and child mortality than old-age mortality, for example. However, the benefits of this extension would have to be weighed against the additional complexity of having to estimate more parameters. As our checks indicated, using separate age- and time-specific variance parameters for population counts made little difference to the final results, probably because there was not enough information in the initial estimates to estimate them. The story might be different in a multicountry context, however. For instance, with demographically similar countries, it might be realistic to set some of the variance parameters constant across countries but allow for variation over time or age. Census counts, or data of comparable reliability, are assumed available for the baseline year. This requirement could be removed by modifying Equation (9) so as to accommodate values of n* derived from noncensus sources. Since these would probably be less reliable than census-derived estimates, this would mean replacing σ2 in Equation (9) with a new variance parameter to account for the extra uncertainty. This could be modeled in the same way as the existing variance parameters per Equation (13). The period of reconstruction in our example was delimited by the years of the earliest and most recent data on population counts. This modification would permit the period of reconstruction to begin earlier if some initial estimate of population counts at an earlier baseline year were to be obtained. No modifications are required to continue reconstruction beyond the year of the most recent census where initial estimates of vital rates and migration are available. However, the posterior distribution of the vital rates and migration for years beyond the most recent census will be based entirely on the initial estimates. The structure of our model could be extended to produce more detailed reconstructions in several ways. An extension to two-sex populations is an obvious example. While we expect this to be feasible, dependencies between female- and male-specific parameters must be appropriately accounted for. We used our method to reconstruct national populations. There are at least two ways in which subnational reconstructions could be obtained. The most straightforward would be to obtain sets of initial estimates for each subnational region of interest, and apply the model separately to each. Subnational estimates of population counts by age can be obtained from many censuses. In developed countries, vital registration systems would likely provide initial estimates of regional fertility and mortality. Less developed countries often lack these systems and such data might be difficult to obtain. Obtaining subnational estimates of net migration is likely to be difficult for most countries. A more feasible approach is to use subnational initial estimates of population counts and set initial estimates of sub-national vital rates to their national-level estimates. A minimal set of modifications to allow this could involve replacing the national-level population count vectors, nt, in Equation (1) with vectors of stacked sub-national counts. For R regions, the population count vector at time t, ., ≡ (n0,, …, n)′, would be (′.,.,1,…,′.,.,)′. The projection matrix would take a block diagonal form with R blocks, one for each region. Similarly, the g would be national-level initial estimates of migration proportions. Level 1, Equation (7), and Level 3, Equations (9)–(12), of the hierarchical model would be extended to include additional, similar terms for each region. For example, Equation (11) could be replaced by logit s | s*,σ2 ∼ Normal(logit s,σ2) for regions r,…, R. Level 4, Equation (13), would be unchanged for v = f, s, g since we only have national-level initial estimates, but models with separate variance parameters for each region (i.e., σ2, r = 1, …, R) would be worth investigating. An advantage of this modification is that “hybrid” cases where subnational data are available for some time periods and/or parameters could be accommodated by using them to derive specific initial estimates and substituting these for the national-level estimates in the modified Level 3. Corresponding, additional variance parameters could also be added at Level 4. A possible shortcoming of this approach is lack of flexibility in cases where a few regional vital rates differ greatly from the national-level rates since the national level initial-estimates are used as fixed medians at Level 3. This might be detected by applying the approach by Carlin and Polson (1991) to the vital rate variance parameters, as was done in Section 5.3 for σ2. Alternatively, dependence among subnational vital rates could be explicitly modeled, perhaps using a spatial model in Level 3. Migration is not split into its constituent inflows and outflows since the projection model (1) requires net migration as an input. Reliable data on migration over long periods are not usually available, especially for developing countries. Even where data on flows are collected, such as in the European Union, there can be significant disagreement in estimates of binational flows between the records of the sending and receiving countries (e.g., Raymer, de Beer, and van der Erf 2011). The UNPD has considered this issue and net migration is currently their preferred measure. However, in cases where good information about both in- and out-migration is collected by means of population registers or border control agencies, it might be reasonable to model the two flows separately on a country-by-country basis. To allow this, the g would be replaced with g and g, say, for immigration and emigration, respectively. A minimal modification would then be to replace g with g − g in Equation (1) and replace Equation (12) with two similar terms for g and g. If good information about the accuracy of the immigration and emigration data were available, then σ2 could be replaced with two separate parameters; otherwise it would be unchanged.

SUPPLEMENTARY MATERIALS

The derivation of Equation (4); further information about the derivation of the initial estimates discussed in Section 5.1; the R code required to perform both the simulation study and the reconstruction of the female population of Burkina Faso (together with sensitivity analyzes); results of the Burkina Faso reconstruction as .csv_les.

13 in total

1. On the Bayesian analysis of ring-recovery data.

Authors: S P Brooks; E A Catchpole; B J Morgan; S C Barry
Journal: Biometrics Date: 2000-09 Impact factor: 2.571

2. Bayesian demography: projecting the Iraqi Kurdish population, 1977-1990.

Authors: B O Daponte; J B Kadane; L J Wolfson
Journal: J Am Stat Assoc Date: 1997-12 Impact factor: 5.033

3. Inverse projection and back projection: a critical appraisal, and comparative results for England, 1539 to 1871.

Authors: R D Lee
Journal: Popul Stud (Camb) Date: 1985-07

4. 'Between one and three million': towards the demographic reconstruction of a decade of Cambodian history (1970-79).

Authors: P Heuveline
Journal: Popul Stud (Camb) Date: 1998-03

5. Estimating series of vital rates and age structures from Baptisms and burials: a new technique, with applications to pre-industrial England.

Authors: R Lee
Journal: Popul Stud (Camb) Date: 1974-11

6. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.

Authors: S Geman; D Geman
Journal: IEEE Trans Pattern Anal Mach Intell Date: 1984-06 Impact factor: 6.226

7. On the use of matrices in certain population mathematics.

Authors: P H LESLIE
Journal: Biometrika Date: 1945-11 Impact factor: 2.445

8. The North Korean famine and its demographic impact.

Authors: D Goodkind; L West
Journal: Popul Dev Rev Date: 2001

9. Mortality in Vietnam, 1979-1989.

Authors: M G Merli
Journal: Demography Date: 1998-08

10. Estimating trends in the total fertility rate with uncertainty using imperfect data: Examples from West Africa.

Authors: Leontine Alkema; Adrian E Raftery; Patrick Gerland; Samuel J Clark; François Pelletier
Journal: Demogr Res Date: 2012-04-25

15 in total

1. Bayesian population reconstruction of female populations for less developed and more developed countries.

Authors: Mark C Wheldon; Adrian E Raftery; Samuel J Clark; Patrick Gerland
Journal: Popul Stud (Camb) Date: 2016-02-23

2. Bayesian Population Forecasting: Extending the Lee-Carter Method.

Authors: Arkadiusz Wiśniowski; Peter W F Smith; Jakub Bijak; James Raymer; Jonathan J Forster
Journal: Demography Date: 2015-06

3. bayesPop: Probabilistic Population Projections.

Authors: Hana Ševčíková; Adrian E Raftery
Journal: J Stat Softw Date: 2016-12-06 Impact factor: 6.440

4. The Causes of Death and Their Influence in Life Expectancy of Children Aged 5-14 Years in Low- and Middle-Income Countries From 1990 to 2019.

Authors: Juanjuan Liang; Yuanze Du; Xiang Qu; Changrong Ke; Guipeng Yi; Mi Liu; Juncheng Lyu; Yanfeng Ren; Jie Xing; Chunping Wang; Shiwei Liu
Journal: Front Pediatr Date: 2022-05-20 Impact factor: 3.569

5. The boundaries of genocide: Quantifying the uncertainty of the death toll during the Pol Pot regime in Cambodia (1975-79).

Authors: Patrick Heuveline
Journal: Popul Stud (Camb) Date: 2015-07-28

6. Population and fertility by age and sex for 195 countries and territories, 1950-2017: a systematic analysis for the Global Burden of Disease Study 2017.

Authors:
Journal: Lancet Date: 2018-11-08 Impact factor: 79.321

7. Global, regional, and national age-sex specific all-cause and cause-specific mortality for 240 causes of death, 1990-2013: a systematic analysis for the Global Burden of Disease Study 2013.

Authors:
Journal: Lancet Date: 2014-12-18 Impact factor: 79.321

8. Global age-sex-specific fertility, mortality, healthy life expectancy (HALE), and population estimates in 204 countries and territories, 1950-2019: a comprehensive demographic analysis for the Global Burden of Disease Study 2019.

Authors:
Journal: Lancet Date: 2020-10-17 Impact factor: 202.731

9. Bayesian Probabilistic Projection of International Migration.

Authors: Jonathan J Azose; Adrian E Raftery
Journal: Demography Date: 2015-10

10. Modeling age-specific mortality for countries with generalized HIV epidemics.

Authors: David J Sharrow; Samuel J Clark; Adrian E Raftery
Journal: PLoS One Date: 2014-05-22 Impact factor: 3.240