Current methods for reconstructing human populations of the past by age and sex are deterministic or do not formally account for measurement error. We propose a method for simultaneously estimating age-specific population counts, fertility rates, mortality rates, and net international migration flows from fragmentary data that incorporates measurement error. Inference is based on joint posterior probability distributions that yield fully probabilistic interval estimates. It is designed for the kind of data commonly collected in modern demographic surveys and censuses. Population dynamics over the period of reconstruction are modeled by embedding formal demographic accounting relationships in a Bayesian hierarchical model. Informative priors are specified for vital rates, migration rates, population counts at baseline, and their respective measurement error variances. We investigate calibration of central posterior marginal probability intervals by simulation and demonstrate the method by reconstructing the female population of Burkina Faso from 1960 to 2005. Supplementary materials for this article are available online and the method is implemented in the R package "popReconstruct."
Current methods for reconstructing human populations of the past by age and sex are deterministic or do not formally account for measurement error. We propose a method for simultaneously estimating age-specific population counts, fertility rates, mortality rates, and net international migration flows from fragmentary data that incorporates measurement error. Inference is based on joint posterior probability distributions that yield fully probabilistic interval estimates. It is designed for the kind of data commonly collected in modern demographic surveys and censuses. Population dynamics over the period of reconstruction are modeled by embedding formal demographic accounting relationships in a Bayesian hierarchical model. Informative priors are specified for vital rates, migration rates, population counts at baseline, and their respective measurement error variances. We investigate calibration of central posterior marginal probability intervals by simulation and demonstrate the method by reconstructing the female population of Burkina Faso from 1960 to 2005. Supplementary materials for this article are available online and the method is implemented in the R package "popReconstruct."
Entities:
Keywords:
Bayesian hierarchical model; Cohort component model; Fertility rate; International migration; Markov chain Monte Carlo; Mortality rate
Every two years, the United Nations Population Division (UNPD) publishes detailed
estimates and projections of key demographic quantities for all countries in the
world, from 1950 to 2100; they appear in the United Nations World Population
Prospects (UNWPP; United Nations
2009a, b). The parameters reported
include age-specific fertility and mortality rates (vital rates), population counts,
and international migration rates. These numbers are used for the development and
assessment of policy and are of particular importance for countries that lack their
own well-resourced official statistical systems. For many of these countries, the
UNPD is a key partner in the process of compiling, analyzing, and publishing
internationally comparable demographic data.The UNWPP tables can be classified into two broad groups, known as estimates and
projections. Projections are predictions about demographic parameters in the future
while estimates concern populations of the past. We will use the term
reconstruction, which we find less ambiguous. Nevertheless, use of the term
“estimate” agrees with standard usage in statistics; data about
demographic parameters in the past are used to estimate the true values. Between the
biennial editions of the UNWPP, new data are collected and old data become
available, so revisions must be made to the reconstructions to accommodate it.Demographers have long acknowledged the presence of uncertainty in their measurements
and many so-called indirect methods exist for obtaining “best” point
estimates (e.g., United Nations 1983). It is
still common practice to produce ranges of projections of vital rates and population
counts based on different scenarios; high-, medium-, and low-fertility, for example.
These ranges cannot be interpreted probabilistically, but methods have been
developed that do yield probabilistic projections; Lee (1998) and Booth (2006)
provide reviews. In contrast, there has been relatively little work on the
development of fully probabilistic methods for demographic reconstruction. Here we
propose a general method for reconstruction that accounts for measurement
uncertainty and works with the type of demographic data that have commonly been
collected for most countries over the last 60 years or so.The article is structured as follows. Existing methods of population reconstruction
are reviewed in Section 2. In Section 3, we define our notation and parameters
and describe the method. In Section 4, we
investigate some statistical properties of our method through simulation before
applying it to real data from Burkina Faso in Section
5. We close with a summary of the results and a discussion in Section 6.
EXISTING METHODS OF POPULATION RECONSTRUCTION
Outside of official statistical agencies, demographic reconstructions have been
undertaken to study historical populations of the past (e.g., Wrigley and Schofield 1981) and to estimate excess mortality in
crises such as famine or social upheaval (e.g., Boyle
and Ó Gráda 1986; Heuveline
1998; Merli 1998; Goodkind and West 2001). The most commonly used
methods are based on the demographic balancing equations. These are the basic
accounting relationships, which state that the population size at time
t + δ is equal to the size at time
t plus births and immigrants, minus deaths and emigrants. These
relationships are encoded in the cohort component method of population projection
(CCMPP; Lewis 1942; Leslie 1945, 1948; Preston, Heuveline, and Guillot 2001). Given
the size and age-structure of a population at some baseline time, its size and
structure at any point in the future can be determined from the baseline population
and the fertility, mortality, and international migration rates that prevail over
the period of reconstruction.The back projection method of reconstruction (Wrigley
and Schofield 1981) attempts to apply these relationships in reverse by
using an estimate of the population structure at the terminal year of the period of
reconstruction. This approach is problematic since the CCMPP procedure is not
formally invertible. To produce sensible results, some additional constraints have
to be imposed or somewhat ad-hoc fixes applied. In response to these concerns, Lee (1971, 1974) proposed the method of inverse projection. Inverse projection
enacts the reconstruction forward through time; it is named for the fact that,
instead of estimating counts of births and deaths from rates, count data are used to
infer rates. Both of these methods have been further developed since their
inception. For example, McCaa and Barbi
(2004) and Rosina (2004) described
extensions of inverse projection, while Oeppen
(1993) proposed generalized inverse projection (GIP). All of the extended
methods remain purely deterministic.A stochastic reconstruction method was proposed by Bertino and Sonnino (2003) who modeled childbirth and death as
inhomogeneous Poisson processes. The method is designed to work with counts of
births and deaths aggregated over age and so requires the analyst to specify model
age-patterns for fertility and mortality, although different schedules can be chosen
for different subperiods. These schedules are taken as the intensity functions of
the process and realizations are simulated through time to produce sequences of
empirical estimates of population age structures. Hence, the only source of
variation accounted for is natural variation around the demographic rates; the total
numbers of deaths by year are assumed to be recorded without error. Moreover,
international migration is assumed to be negligible over the period of
reconstruction.Finally, all of the above methods were designed to work with individual data of the
kind sometimes found in European parish registers, or aggregated summaries of them.
The data available for many countries in our intended application are not as
detailed. The aim of this article is to introduce a new method of reconstruction
that uses measurements of demographic parameters from 50 to 60 years in the past
from multiple noisy data sources, often available only for limited years or periods.
Uncertainty due to measurement error is expressed as probability distributions and
intervals rather than deterministic, scenario-based ranges. To our knowledge, there
are no existing methods of human population reconstruction that have these
features.Daponte, Kadane, and Wolfson (1997) used an
approach similar to ours to construct a counterfactual history of the Iraqi Kurdish
population between 1977 and 1990. They, too, represented errors in the measurement
of demographic parameters as the standard deviations of probability distributions.
However, they used a low-dimensional parameterization of mortality and fixed the age
patterns of fertility. International and sub-national migration was assumed to be
negligible.Where migration is accounted for, it has been handled in different ways. Lee's
inverse projection method estimates it as a residual if censuses are available at
intermediate years in the reconstruction interval. Alho (1992) added it as extra error in measuring survival. Here,
migration is treated explicitly and age-time specific estimates are available as for
the other parameters.Complex models have also been developed by fisheries researchers to study the
population dynamics of marine life (e.g., Quinn and
Deriso 1999). A large body of work also exists on the dynamics of land
animal populations; Brooks, Catchpole, and Morgan
(2000a) and Brooks et al. (2002)
provide reviews. In most wildlife studies, data-collection methods and the type and
amount of data available differ somewhat from those in human demography.
Nevertheless, there are similar statistical challenges to overcome and these suggest
a Bayesian approach may be appropriate. For instance, in ecology and demography,
multiple sources of data informing the same demographic parameters often exist.
Poole and Raftery (2000) found that
Bayesian melding allowed these to be synthesized in a coherent manner (see also
Givens, Raftery, and Zeh 1993; McAllister et al. 1994; Raftery, Givens, and Zeh 1995). Furthermore, population
dynamics models are often over-parameterized (e.g., Lee 1985, 1993). The resulting
ridges in the likelihood surface posed problems that Bayesian approaches have been
able to overcome (e.g., Brooks et al. 2000b;
Catchpole, Kgosi, and Morgan 2001).Here we propose a Bayesian solution to the reconstruction problem in which the
dominant source of uncertainty, measurement error, is adequately accounted for
through fully probabilistic estimates. We handle multiple, noisy data sources and we
account for uncertainty about migration as well as fertility and mortality. We do
not require individual-level data of the European parish register kind.
METHOD
Notation and Parameters of Interest
In this article, we restrict our attention to the dynamics of populations of
females only. The parameters of interest are age-and time-specific vital rates,
net international migration flows, and population counts. We will refer to
“international migration” as simply “migration” as
this is the only type we consider. We use the symbols n,
s, g, and f to denote
population counts, survival (a measure of mortality), net migration (immigrants
minus emigrants), and fertility, respectively. All of these parameters will be
indexed by five-year increments of age, denoted by a, and time,
denoted by t. Reconstruction will be done over the time
interval [t0, T]. The age scale runs from 0 to
A > 0; in our application (Section 5) A is 80. To model fertility, we
define a[fert] ≤
a[fert] where
fertility is assumed to be zero at ages out-side the range
[a[fert],
a[fert] +
5). Throughout, a prime indicates vector transpose.Given 0
= (no,t0,
n5,,…,
nA,t)′, the vector of
age-specific female population counts at baseline
t0, and using “○” to denote
entrywise product, the CCMPP gives
=
Q(
+ t ○
/2))
+ ○
/2,
t = t0,
t0 + 5,…, T
− 5. Here, Q is a
K by K matrix encoding fertility and
mortality and K is the number of age groups. The quantity
is the net
number of migrants expressed as a proportion of the population size so that
○
is the net
number of migrants, a count. Projection proceeds in discrete steps; those alive
at the beginning of each step are subjected to the fertility, mortality, and
migration prevailing during the projection interval. This is a discrete time
approximation to a continuous time process and there are standard adjustments to
improve accuracy. Adding half of the migrants at the beginning of the projection
step and the remainder at the end is one such adjustment. The specific form we
use is: where (Preston et al. 2001).
Annualized, age-specific fertility rates enter Equation (1) through the transformation defined in Equation (2). This transformation
maps the rates into proportions so that projection can be compactly expressed as
a matrix multiplication. We use a tilde to represent this transformation and
discuss it further below. SRB is the sex ratio at birth (the ratio of males to
females among all births). Multiplication by SRB in Equation (2) ensures only the female
births are kept. Throughout, we take SRB to be fixed at 1.05, a demographic
convention. We will use M(·) to represent CCMPP and
abbreviate Equation (1) asThe survival parameters, s, give
the proportion of those aged a – 5 to a
at time t who survive to be aged a to
a + 5 at time t + 5, for
a = 0, 5,…, A + 5.
That is, s is the proportion of
people surviving into the age range [a, a
+ 5) over the five years between t and
t + 5. Also,
s is the proportion aged
[A − 5, A) at exact time
t who survive to time t + 5, by
which time they are in the age group [A, ∞). We allow
for subsequent survival in this age group by letting
s be the
proportion aged [A, ∞) at time t who
survive five more years.Age-specific mortality during a given time period will be summarized by life
expectancy at birth (e0), where The derivation is straightforward but requires additional
concepts from demography (see the online supplementary materials). The quantity
e0, is the average age at
death in a hypothetical cohort subjected for its entire life to the mortality
conditions represented by the set of age-specific survival proportions,
.The net number of migrants aged [a, a +
5) to the population during the time period [t, t + 5)
is
gn.
The age-summarized measure of migration we use will be the average annual total
net number of migrants that, for the period [t, t + 5),
is (1/5) ∑
gn.
We model g instead of
gn,
however, because we expect variation over time and age in the former to be less
dependent on magnitude.The represent a combination of age-specific
fertility and under-five mortality. When multiplied by
n, they give the number
of female births surviving to time t + 5. That is, the
give the number of
surviving female babies at t + 5 as a proportion of the
number of females aged [a, a +5) alive at time
t. We do not model , however, because data are typically
gathered to estimate fertility rates,
f, not the proportions,
. Fertility rates are
annualized occurrence/exposure rates. They give the ratio of births of both
sexes to women aged a to a + 5
(occurrence) to person-years lived (exposure) during [t, t
+ 5). Thus, the number of births in the projection step
t to t + 5 is not quite 5
f
n because the denominator of
fis not
n (we need the factor of
five because f are single-year
rates but the projection intervals are five years wide). The expression in Equation (2) is more accurate. To
illustrate, calculating
n0, (and taking
g0,n0,
= 0 for clarity) gives (where we use the SRB to keep only the female births). The
quantity in the right-most parentheses in Equation (5) is a better approximation for the denominator of
f than
n. We multiply by
s0, to account for mortality
of births during the projection step.Age-specific fertility rates for a given time period will be summarized by the
total fertility rate (TFR) that, for the period [t, t +
5), is defined asIt is the average number of children born to members of a hypothetical cohort of
women who survive from age
a[fert] to age
a[fert] + 5
and experience the age-specific fertility rates that were in effect during the
period [t, t + 5).
Data and Initial Estimates
Data on the parameters of interest come from numerous sources of varying quality
and coverage. For example, estimates of fertility rates come from registers of
births or from surveys such as the Demographic and Health Surveys (DHSs),
depending on the country. Moreover, several data points for the same age- and
time-specific parameters are often available due to multiple overlapping
surveys. The UNPD often adjusts these outputs to reduce systematic biases. They
apply existing demographic techniques and draw on their expert knowledge about
the specific data sources. Further details about the UNPD's methodology
can be found in the literature by the United
Nations (2010). In many cases, this bias-reduction stage is highly
specific to country and parameter, and so we do not propose a general approach
to replace this part of the analysis.Instead, we take the outputs of this stage, which are single bias-reduced
age-time series for all four parameters, as inputs to our model. We call the
elements of these biased-reduced single series initial
estimates and use an asterisk (*) to distinguish them from
the true values. For example, we let
f be the true (unknown)
fertility rate and f* be
an initial estimate of it; similarly for the other parameters. We also use an
asterisk to indicate estimates of population counts based on bias-reduced census
counts (e.g., adjusted census counts based on post-enumeration surveys) or
similarly adjusted counts from a comprehensive demographic survey. Hence, we let
n be the true population
count and n* a
census-based estimate of it.
Model Description
We will take t0 and T to be,
respectively, the earliest and most recent years for which census-based
population counts are available. These will usually be bias-adjusted census
counts, and we will just call them census counts. We will denote the years after
t0 for which census-based estimates of
population counts are available by t0 <
t[cen],
…,t[cen]
≥ T. Let =
(,
,
f + 5,
…, f,
,
…, ,
,
…, ) be the
vector of all age- and time-specific vital rates and migration parameters, as
well as population counts at t0. These are precisely
the inputs required by the CCMPP, M(·) in Equation (3). Census counts for
years t[cen], …,
t[cen] are not
included in θ because they are not inputs to
M(·).Reconstruction is equivalent to estimation of
. To achieve this, we propose the
following hierarchical model that depends on the original data only through the
initial estimates. We model the census counts after the earliest census at Level
1, conditional on the outputs of the CCMPP, which appear in Level 2. The initial
estimates of fertility, mortality, and migration, as well as census counts at
t0, are modeled at Level 3. Level 4 specifies
informative prior distributions. We assume, a priori, that the elements of
are mutually independent given
*.(where t =
t0, t0 +
5,…, T in Equations (9)–(12))For 0 < x < 1, logit x =
log(x/(1 — x)). We also impose the
restriction that all population counts be nonnegative, so that the joint prior
on at time t is
multiplied byIn standard Bayesian terms, Equation
(7) is the likelihood of
,…,
,
while Equation (13) is the prior
distribution of
σ2,
specified by the user-defined hyperparameters
α and
β. Population counts at
baseline are treated differently because Equation (8) is essentially a standard difference equation with
initial condition , which contains
.
These are the only population counts that M(·) takes as
inputs. The remaining census counts, at times
t[cen],…,t[cen],
are compared with the outputs of M(·) via Equation (7). Inference will be
based on the joint posterior distribution of
.The quantities involved, and their dependence relations, are summarized in Figure 1.
Figure. 1.
Plot showing the relationships between parameters, initial estimates, and
the cohort component method of population projection in the estimation
model.
Plot showing the relationships between parameters, initial estimates, and
the cohort component method of population projection in the estimation
model.
Determining the Hyperparameters
To determine plausible values of
α and
β, v
∊ {n, f, s, g}, we view the
σ2 as
representing the variance of the errors in the initial estimates of the
respective demographic parameters. Although prior knowledge about these
variances is unlikely to be exact, we expect that informative estimates can be
derived from experts' knowledge of the data sources. Methods for
eliciting prior information from experts are numerous (e.g., O'Hagan et al. 2006). Here, we use a
straightforward method based on the mean absolute error (MAE) of the transformed
initial estimates.Taking fertility rate as an example, note that Equation (10) implies that MAE(log
f |
σ2) ≡
E(| log f — log
f* | |
σ2) =
. The prior distribution
for σ2 can be
specified by choosing quantiles for MAE(log
f |
σ2). Suppose
expert opinion is that MAE(log f
| σ2) is likely
to be close to 0.1, but could be as high as 0.5. This suggests setting
median(σ2)
= (0.1)2π/2 = 0.016 and the
0.975 quantile to (0.5)2π/2 = 0.393.
To find an inverse gamma distribution with these quantiles, we would fix
α at a range of values
between 0.3 and 6 and choose
β such that
median(MAE(log f |
σ2))
= 0.1. The parameter
α would then be chosen
such that the 0.975 quantile of MAE(log
f |
σ2) was
about 0.5. This would give
α = 1 and
β = 0.0109.Since demographers are more used to thinking about untransformed fertility rates,
it is useful to consider what specifying MAE(log
f |
σ2) means on
the original scale. MAE on the log scale approximates mean absolute relative
error (MARE) on the original scale, where
MARE(f |
σ2) ≡
E(|f −
f*|
|σ2)/f*.
This approximation is good for the MAE values used here. The MARE has been used
previously by demographers as a way of quantifying measurement accuracy (Keilman 1998). For the migration parameter,
which is already a proportion, we specify the MAE directly.The population count variance,
σ2, is also
modeled on the log scale and
α,
β are found in the same
way as α and
β. Survival is measured
on the logit scale, but this should not be too difficult to interpret. Note that
E(|logit
– logit
*
| | σ2) =
E(|logit s − logit
s | |
σ2) where
= 1 −
.
In practice, the
are close to zero, so that
≈
/(1
−
). Thus,
specifying the MAE of the log-odds of survival is not that different from
specifying it for the log-probability of death.
Estimation
We draw samples from the joint posterior using a Markov chain Monte Carlo (MCMC)
sampler (Metropolis et al. 1953; Hastings 1970; Geman and Geman 1984). Without the restriction in Equation (14), the full conditional
posterior distributions for the variance hyperparameters would be the usual
conjugate inverse gamma distributions. With the restriction, the conjugate forms
are not exactly correct but will probably be close to the true full
conditionals. Therefore, to update these parameters we use the conjugate full
conditional distributions as proposal densities in Metropolis-Hastings steps.
The posterior densities of the remaining parameters are not easy to express
analytically since each vital rate enters the likelihood through the map
M(·). Therefore, these parameters are updated using
Metropolis-Hastings steps with univariate normal proposal densities, with
variances tuned by the method proposed by Raftery and Lewis (1996). We will use the term
“iteration” to refer to one complete sweep through all age- and
time-specific parameters and variance parameters.The R environment for statistical computing (R
Development Core Team 2010) together with the CODA package (Plummer et al. 2006, 2010) were used for all data manipulation, model
estimation, and output analysis. R code for the simulation study (Section 4), reconstruction of the female
population of Burkina Faso, and model checks (Section 5) are available as online supplementary materials. The
method is also implemented in the R package “popReconstruct.”
SIMULATION STUDY
We now describe the results of a simulation study carried out to investigate the
calibration of posterior probability intervals for
f,
s,
g, and
n under repeated
sampling of the initial estimates.
Inputs
The true vital and migration rates assumed to have prevailed in this population
are shown in Tables 1 and 2. We denote these by
f[true],
s[true],
g[true], and
n[true]. In
practice, these would be the unknown, true values of the parameters
f,
s,
g,
n that
appear in Equations
(9)–(12). This
is not intended to be a realistic model of a human population; datasets
typically encountered in human demography have up to 18 age categories and any
number of time periods. However, we believe that this reduced population model
is of sufficient size and complexity to explore the characteristics of the
statistical model while not being too computationally expensive. The TFR and
e0 were kept constant at 0.7 births per woman
per year and 15.61 years, respectively, for the duration of the reconstruction.
A varying pattern of migration was chosen that consisted of net out-migration in
the first half of the reconstruction period followed by net in-migration in the
second half. The magnitude of the flows was quite volatile, varying from
13% to 26% of the receiving population. Migration in both
directions was concentrated in the two middle age groups.
Table 1.
True vital rates used in the simulation study Time period
Time period
Age
(1960, 1965)
(1965, 1970)
(1970, 1975)
(1975, 1980)
Fertility rate
0
0.00
0.00
0.00
0.00
5
0.40
0.40
0.40
0.40
10
0.30
0.30
0.30
0.30
15
0.00
0.00
0.00
0.00
Survival proportion
0
0.90
0.90
0.90
0.90
5
0.95
0.95
0.95
0.95
10
0.85
0.85
0.85
0.85
15
0.80
0.80
0.80
0.80
20+
0.10
0.10
0.10
0.10
Migration proportion
0
−0.03
−0.05
0.03
0.05
5
−0.05
−0.10
0.05
0.10
10
−0.06
−0.11
0.06
0.11
15
−0.01
−0.01
0.01
0.01
Table 2.
True population counts used in simulation study
Year
Age
1960
1965
1970
1975
1980
0
7500
8482
9453
11436
14504
5
6000
6886
7512
9280
11600
10
4000
4862
5293
6690
8651
15
3000
3404
3998
4762
6149
True vital rates used in the simulation study Time periodTrue population counts used in simulation studyThe true population counts, denoted
n[true], are
shown in Table 2. Those for 1960 were
chosen to represent a young population. Those for the subsequent time periods
were derived by applying the CCMPP to the 1960 population using the vital rate
and migration parameters in Table 1.
Therefore, the underlying true population dynamics over the reconstruction
period were completely and deterministically defined by Equation (1). The entries in Table 2 correspond to
n, t
= t0,
t[cen],
…, t[cen] in
Equation (8).The hyperparameters of the inverse gamma distributions were determined as
described in Section 3.4. The median MAEs
for log-fertility rate, logit survival, and log-population counts were set to
0.1 with 0.975 quantiles of approximately 0.5. The same quantities for migration
were set to 0.2 and approximately 1, respectively. The resulting values of
α,
β, and MAE quantiles
are shown in Table 3. For fertility,
MAE; similarly for
population count. For survival, MAE and for migration MAE.
Table 3.
Level 4 hyperparameters and selected implied quantiles of the Mean
Absolute Error (MAE) of the respective demographic parameters used in
the simulation study and the application to Burkina Faso.
f and n are modeled on the log
scale, s is modeled on the logit scale, and
g is untransformed
Quantiles of
MAE(v)
υ
α
β
0.025
0.25
0.5
0.75
0.975
f, s, n
1
0.0109
0.0433
0.0707
0.1
0.1552
0.5232
g
1
0.0436
0.0867
0.1414
0.2
0.3104
1.0465
Level 4 hyperparameters and selected implied quantiles of the Mean
Absolute Error (MAE) of the respective demographic parameters used in
the simulation study and the application to Burkina Faso.
f and n are modeled on the log
scale, s is modeled on the logit scale, and
g is untransformed
Study Design
The values in Tables 1–3 remained fixed throughout the simulation
study and the initial estimates
f*,
s*,
g*, and
n* were treated as
random. We drew n* |
σ2 from a
logNormal(log
n[true],
σ2)
distribution in accordance with Equation
(7). The remaining initial estimates were drawn from distributions
derived from Equations
(9)–(12) in an
analogous manner. For example, the
σ* |
σ2 were
drawn from a logNormal(log
f[true],
σ2)
distribution.The coverage of central marginal Bayesian confidence intervals (or credible
intervals) under the model was estimated by the following experiment. For
j = 1,…, J:Randomly sample,
σv2[,
v = n, f, s, g, from Equation (13).Generate initial estimates
f*[,
g*[,
n*[
for a = 0,…, 15+,
t = 1960, …, 1980,
s*[,
for a = 0, …, 20+
t = 1960,…, 1980, as described
above.Check that Equation (14)
is satisfied by the initial estimates; if not return to Step 1.Draw a large MCMC sample from the joint posterior and find the 0.025,
0.5, and 0.975 quantiles of the marginal distribution of each
parameter.The estimated coverage is then the proportion of the J Bayesian
confidence intervals containing the known, true value for each parameter.We set J = 200 and applied the estimation method
described in Section 3.5. Initial values
for the population counts, vital rates, and migration proportions were set to
the initial estimates. Start values for the variances were arbitrarily set to 5
as they appeared to have a negligible effect on the final results.
Results and Discussion
Point estimates of the coverage of the marginal 0.95 posterior probability
intervals are shown in Table 4. These are
all close to 0.95. In practical applications with real datasets, where the true
parameter values are unknown, interest will be in interval estimates of the
demographic parameters. These should be based on the joint posterior
distribution. For illustration, we have plotted central marginal Bayesian
confidence intervals of a selection of age-specific and age-summarized
parameters that might be of interest based on the MCMC sample from a single
replicate of the simulation study (Figure
2). For comparison, we have also plotted the true parameter values used
throughout the simulation and the noisy initial estimates generated under the
model.
Table 4.
Estimated coverage probabilities of 95% posterior intervals for
all demographic parameters of interest
Demographic parameter
Years
Ages
Population
Fertility
Survival
Migration
1960
[0, 5)
0.930
[5, 10)
0.940
[10, 15)
0.945
[15, 20)
0.920
20+
[1960, 1965)
[0, 5)
0.940
0.940
[5, 10)
0.945
0.935
[10, 15)
0.965
0.950
0.950
[15, 20)
0.935
0.955
0.960
20+
0.960
[1965, 1970)
[0, 5)
0.950
0.940
[5, 10)
0.920
0.945
0.940
[10, 15)
0.960
0.935
0.965
[15, 20)
0.925
0.955
20+
0.935
[1970, 1975)
[0, 5)
0.915
0.960
[5, 10)
0.920
0.955
0.950
[10, 15)
0.930
0.950
0.935
[15, 20)
0.965
0.940
20+
0.890
[1975, 1980)
[0, 5)
0.950
0.930
[5, 10)
0.920
0.950
0.970
[10, 15)
0.955
0.945
0.950
[15, 20)
0.950
0.955
20+
0.975
Figure. 2.
Ninety-five percent Bayesian confidence intervals for selected parameters
from a single replication of the simulation study. (a) Age-specific
fertility rate. (b) Total fertility rate. (c) Life expectancy at birth.
(d) Average annual total net number of migrants. The online version of
this figure is in color.
Ninety-five percent Bayesian confidence intervals for selected parameters
from a single replication of the simulation study. (a) Age-specific
fertility rate. (b) Total fertility rate. (c) Life expectancy at birth.
(d) Average annual total net number of migrants. The online version of
this figure is in color.Estimated coverage probabilities of 95% posterior intervals for
all demographic parameters of interestBayesian confidence intervals can be plotted for age-specific parameters as has
been done for age-specific fertility rates in Figure 2(a). Confidence intervals for any function of the
age-specific parameters can be obtained immediately by transforming each vector
of age-specific values in the MCMC sample and computing the sample quantiles.
Quantities of particular interest are the summary measures defined in Section 3.1. We show TFR,
e0, and the average annual total net number of
migrants in Figures 2(b)–(d).
RECONSTRUCTION OF THE POPULATION OF BURKINA FASO, 1960–2005
We now illustrate the method by reconstructing the female population of Burkina Faso
from 1960 to 2005. Uncertainty in this case is nonnegligible due to the fragmentary
nature of the available data. This application shows how our method is able to
quantify this appropriately by producing probabilistic interval estimates.
Initial Estimates
Brief descriptions of our initial estimates are given below. Further details are
in the online supplementary materials, including the initial estimates
themselves. The parameters
α and
β, v
∊ {n, f, s, g}, were set to the same values as in the
simulation study (Table 3); the MAEs
given there were based on expert opinions provided by UNPD analysts for the case
of Burkina Faso.
Population Counts
Population counts, n*,
in exact years 1960, 1975, 1985, 1995, and 2005 by sex were taken from the
literature by the United Nations
(2009a) that are based on a 1960–1961 demographic survey
and censuses in 1975, 1985, 1996, and 2006. The United Nations (UN) figures
are preferred over the raw census counts because important adjustments were
made for underenumeration. This form of bias is more common in certain
age-groups and efforts to reduce it are based on postcensal surveys.Burkina Faso experienced a high level of population growth between 1960 and
2005; the total female population increased from 2.3 million to 6.9 million,
an average of 2.4% per year. The population has a young age
structure, as illustrated by the age-specific population counts in Figure 3.
Figure. 3.
Population counts by five-year age group for the female population of
Burkina Faso. The online version of this figure is in color.
Population counts by five-year age group for the female population of
Burkina Faso. The online version of this figure is in color.
Fertility Rates
Initial estimates of age-specific fertility rates,
f*, were
derived from multiple, overlapping series of point estimates taken from the
1960 and 1991 demographic surveys, the 1976 census post-enumeration survey,
the 1985, 1996, and 2006 censuses, and the 1992–93, 1998–99,
and 2003 DHSs. Alkema et al. (2012)
studied estimates of TFR using these data and found evidence of bias.
Therefore, we took Alkema et al.'s
(2012) median bias-adjusted estimates as our initial estimates of
TFR and multiplied them by age-specific fertility patterns.Age patterns sum to one and indicate the share of fertility attributable to
each age-group. We obtained a separate pattern for each projection interval
in the reconstruction period by smoothing the available point estimates over
age, within interval, using loess (Cleveland 1979; Cleveland, Grosse, and Shyu 1992) and normalizing. The loess
method performs a series of locally weighted regressions. Smoothing within
five-year sub-interval (Figure 4)
yielded trends that were also sensible, a priori, when viewed by five-year
age group (see supplementary materials). No point estimates were available
for the period 1965–1970. To generate initial estimates for this
period, we multiplied the 1960–1965 age-pattern by Alkema et al.'s (2012) adjusted
TFR estimate for 1965–1970.
Figure. 4.
Data points for the initial estimates of age-specific fertility
patterns of Burkina Faso women, 1960–2005, grouped by
five-year sub-interval. The lines are the within-time loess smooths.
The online version of this figure is in color.
Data points for the initial estimates of age-specific fertility
patterns of Burkina Faso women, 1960–2005, grouped by
five-year sub-interval. The lines are the within-time loess smooths.
The online version of this figure is in color.
Survival Proportions
Initial estimates of survival proportions,
s*, were
derived from abridged demographic life tables constructed using data on
under five and adult mortality from the 1960–1961 and 1991 national
demographic surveys, the 1992–1993, 1998–1999, and 2003 DHSs,
and UNICEF's Multiple Indicator Cluster Survey—Round 3
conducted in 2006. The tables were created by applying the Brass
two-parameter relational logit model (Brass
1971) with the Timæus Sahelian standard (Timæus 1999). Standard methods
were used to derive survival proportions from the life tables.
Migration Proportions
Estimates of migration of comparable detail to those for vital rates are
seldom available for many countries. Some estimates for Burkina Faso by
broad age-group and sex are given by Condé (1980) for the period 1960–1975 who
concluded that migration during this period was primarily labor migration.
In addition, whole-population estimates for 1960–2005 are available
from the literature by the United States
Census Bureau (2008) and United
Nations (2009a) that indicate sustained net out-migration over
the period 1960–2000. The latter source suggests a reversal to net
in-migration over 2000–2005.We designed our initial estimates to reflect the direction and approximate
magnitude suggested by these sources. The
g* were set
to −0.055 for 15 ≥ a ≥ 50, 1960
≥ t ≥ 1995, 0.055 for 15 ≥
a ≥ 50, t = 2000, and
zero otherwise. Thus, for example, our initial estimate for the net number
of migrants over the five-year period 1960–1965 is centered at
−5.5% of the 1960 population, an average of
−1.1% of the 1960 population per year for the five years in
the projection interval. The direction is reversed over 2000–2005,
since Burkina Faso likely received refugees from the civil war that broke
out in neighboring Côte D'Ivoire during this period. The
greater uncertainty about these initial estimates than about those for the
other parameters is accounted for by setting
α,
β such that the
median and the 0.975 quantile of the MAE were twice as large as the values
used for the other parameters (Table
3).
Results
We summarize the full joint posterior distribution with posterior medians and
95% central Bayesian confidence intervals based on the marginal
distributions of age-specific input parameters and age-summarized versions such
as TFR and e0. We use half the width of the
confidence intervals (“half-widths”) to indicate the magnitude of
posterior uncertainty. Summarizing the joint posterior distribution of θ
by quantiles of marginal distributions provides coherent and probabilistic
interval estimates. They are coherent because uncertainty about all other
parameters is accounted for. Our simulation study suggests that the intervals
are also well calibrated in that they achieve their nominal coverage over
repeated random sampling of the initial estimates and census data, under the
model.The posterior medians for population counts at baseline (Figure 5) are very close to their initial estimates;
across age groups, the maximum absolute difference is 2.3% of the
initial estimate. All of the intervals have half-widths less than
7.4% of the median, indicating that posterior uncertainty about this
quantity is low.
Figure. 5.
Ninety-five percent Bayesian confidence intervals for the female
population of Burkina Faso by age in 1960. Also shown are the
initial estimates. The online version of this figure is in
color.
Ninety-five percent Bayesian confidence intervals for the female
population of Burkina Faso by age in 1960. Also shown are the
initial estimates. The online version of this figure is in
color.
Fertility
A similar plot for age-specific fertility rates is shown in Figure 6. Again, the posterior medians
are close to their respective initial estimates, but the half-widths of the
posterior intervals are wider, ranging from 18% to 21% of the
median. This is because most of the information about fertility in the
nonfertility parameters comes from the population counts in the age range
0–5, and this depends mainly on the level of fertility, not how it is
distributed across age group of mother. Information about the age pattern of
fertility in our posterior distribution comes mostly from the data-derived
initial estimates. The interval widths narrow as age-specific fertility
approaches zero, which occurs at the extremes of the age range of nonzero
fertility. Due mainly to biology, human fertility is known to be low at the
extremes of this range with a high degree of certainty. The shape of our
posterior intervals reflects this.
Figure. 6.
Ninety-five percent Bayesian confidence intervals for age-specific
fertility rates for the female population of Burkina Faso,
1960–2005. Also shown are the initial estimates. The online
version of this figure is in color.
Ninety-five percent Bayesian confidence intervals for age-specific
fertility rates for the female population of Burkina Faso,
1960–2005. Also shown are the initial estimates. The online
version of this figure is in color.Posterior median estimates of TFR (Figure
7(a)) increased from 7.1 children per woman in 1960–1965
to about 7.4 over the period 1965–1980, and then decreased to 6.4
children per woman in 2000–2005. Over the entire reconstruction
period, the limits of the 95% intervals are equivalent to about plus
or minus half a child. The posterior medians are slightly lower than the
initial estimates, which were based only on information about fertility
collected mainly in surveys. Our posterior distribution for TFR also takes
population counts, mortality, and migration into account. There is more
information about TFR in these parameters than there is about the fertility
age pattern, explaining why the interval half-widths for TFR are narrower,
relatively, than those for the age-specific rates; all are less than
7.5% of the posterior median.
Figure. 7.
Ninety-five percent Bayesian confidence intervals for selected
age-summarized parameters for the female population of Burkina Faso,
1960–2005. (a) Total fertility rate. (b) Average annual total
net number of migrants. (c) Under-5 mortality. (d) Life expectancy
at birth. The online version of this figure is in color.
Ninety-five percent Bayesian confidence intervals for selected
age-summarized parameters for the female population of Burkina Faso,
1960–2005. (a) Total fertility rate. (b) Average annual total
net number of migrants. (c) Under-5 mortality. (d) Life expectancy
at birth. The online version of this figure is in color.
Mortality
Posterior estimates of mortality are presented in terms of
e0 and under-five mortality, 1 –
s0, (Figures 7(c) and (d)). Under-five mortality is often viewed as an indicator of a
country's level of development. The posterior marginal distributions
for these two parameters reflect a sustained improvement in mortality
conditions in Burkina Faso over the period. Posterior median estimates of
e0 for 1960–1965 and 2000–2005
are 35 and 52 years, respectively. The interval half-widths decrease from
2.2 years in 1960–1965 to 1.8 years in 2000–2005, indicating a
decrease in uncertainty about this parameter over the interval of
reconstruction. Posterior median estimates of under-five mortality declined
from 0.25 to 0.12 over the period (95% intervals: [0.21, 0.29] and
[0.1, 0.15], respectively).
Migration
We summarize age-specific migration by the average annual net number of
migrants added to the population, in units of 1000 (Figure 7(b)). The average of the posterior median
estimates between 1960 and 2000 is −18 (thousand women) per year.
Importantly, the 95% intervals contain zero at all time periods and,
moreover, they are very wide. The interval for 1960–1965 is
[−28, 13], while it is [−73, 11], for 1995–2000.Burkina Faso has been characterized as a country of emigration between 1960
and 2000, with migration dominated by people moving to find work in
neighboring countries. This view is consistent with knowledge about the
labor market during that period, as well as with data collected in
neighboring countries (Condé
1980; United States Census Bureau
2008). These data are fragmentary, however, and there are no
reliable estimates of the magnitude of the migratory flow. Our results
reflect this situation. Our marginal posterior distribution is centered
below zero, which suggests that, on balance, the available data are more
consistent with a net outflow up to 2000. Nevertheless, there is
insufficient information to rule out a zero, or even a small positive, net
flow.For 2000–2005, the 95% interval is [−29, 70] with
posterior median 22. While a positive net flow is suggested, the interval is
very wide and contains zero. Clearly, there is a great deal of uncertainty
about migration during this period, even after taking information about all
the other parameters into account.
Model Checking and Sensitivity Analysis
The census counts, n*,
play a central role in our model; all vital rate and migration parameters are
related to one another a posteriori through these counts. We conducted two
checks; one to assess sensitivity to the form of the likelihood and another to
assess out-of-sample predictive performance.We assessed sensitivity to the use of the relatively light-tailed normal
likelihood with constant variance using the approach by Carlin and Polson (1991). Replacing
σ2 with
λσ2,
λ ∼
InvGamma(1, 1), t = 1960, 1965,
…, 2000, a = 0, 5,
… 80+, in Equations (7) and (9) relaxes the assumption of constant variance. Moreover, in
comparison with Equation (7), the
(marginal) likelihood under this formulation is the heavier-tailed
Student's t(mean = log
n, scale =
σ, df = 2)
distribution. We refer to this modification as the
“t2 likelihood model.”Posterior marginal quantiles of the demographic parameters were examined to
assess the need for this additional flexibility. We compared the empirical
0.025,0.5, and 0.975 quantiles of the posterior marginal distributions of each
age- and time-specific parameter using the MCMC samples from the runs under the
modified and original models. This involved comparing two sets of multivariate
distributions. Within each parameter, except migration, we calculated the
absolute relative differences (ARDs) for each age and time, expressed as a
percentage, and summarized them by their averages and maxima, taken over all
ages and times. For example, the mean of the ARDs of the 0.025 quantiles for
fertility rate was calculated as where
f[0.025,orig] and
f[0.025,
are the 0.025 quantiles of the posterior distribution for
f under the original and
t likelihood models. The numerals 17 and 9
correspond to the number of age groups and time periods, respectively. The
maximum ARDs were computed similarly. Migration is expressed as a proportion so
we calculated absolute (nonrelative) differences for this parameter.Posterior distributions of the
λ were also examined
to assess the need for the additional flexibility of the
t2 likelihood model. Posterior distributions of
the λ concentrated away
from 1 would indicate that the original normal-based model fits poorly (Carlin and Louis 2009, chap. 4).The predictive ability of the original model was tested by rerunning the analysis
four times, each time omitting one of the census datasets. For
t = 1975, 1985, 1995, 2005, we compared the
posterior predictive distributions
n |
n* with the point
values n*, where
n* is the set
of census counts for age a for all years except
t. We call this the “out-of-sample
validation.” For each of the four runs, we summarized the difference
between the posterior predictive median and the census counts using MARE,
expressed as a percentage, where
ñ is the sample median of
the n based on the MCMC sample
and the numerals 17 and 4 refer to the number of age groups and censuses,
respectively. Small values of MARE suggest that the model predicts the observed
counts well.
Results
Quantiles of the posterior marginal distributions from the
t2 likelihood model were close to those from
the original model (Table 5). All
mean ARDs were below 6% and most maximum ARDs were below 5%.
Large ARDs were found between the extreme quantiles of some population
counts, the largest for the 0.975 quantile in 1960, age group 65–69
(ARD = 36%). However, on the raw scale, this represented a
difference of less than 10,000 people, which is small relative to the total
population size, the median estimate of which was slightly over two million
in 1960 under both models.
Table 5.
Mean and maximum Absolute Relative Differences (ARDs) between
posterior marginal quantiles of the posterior distributions from the
t2 likelihood and original models.
Absolute differences (ADs) are given for migration (see Section 5.3)
Parameter
Quantile
Mean ARD (%)
Max ARD (%)
Fertility rate
0.025
0.79
4.09
0.500
0.22
0.65
0.975
0.52
1.87
Survival proportion
0.025
0.38
2.67
0.500
0.06
1.67
0.975
0.30
2.39
Migration proportion
0.025
0.68
0.29
0.500
0.72
2.36
0.975
1.81
2.20
Population count, 1960
0.025
0.92
1.73
0.500
0.20
0.48
0.975
5.38
36.48
Population count, 1965–2005
0.025
0.92
19.08
0.500
0.24
1.87
0.975
1.07
24.36
Mean and maximum Absolute Relative Differences (ARDs) between
posterior marginal quantiles of the posterior distributions from the
t2 likelihood and original models.
Absolute differences (ADs) are given for migration (see Section 5.3)The posterior distributions of the
λ were
virtually all centered around 1. That of
λ80,1975 had the median furthest from
1, but was still very spread out; its 0.025, 0.25, 0.5, 0.75, and 0.975
quantiles were, respectively, 0.30, 0.93, 2.4, 9.4, and 167. This suggests
that the extra flexibility of the t2 likelihood
model is not needed.The MARE from the out-of-sample validation was 3.9%, indicating that
the posterior predictive estimates of the census counts were close to the
observed census counts.Overall, these results indicate that the normal-based model as originally
formulated in Section 3.3 fits the data
reasonably well.
DISCUSSION
We have described a method for reconstructing past populations by age and sex that is
designed to work with the type of data commonly collected in modern demographic
surveys and censuses, especially in developing countries. Population dynamics are
modeled by the well-known cohort-component method of population projection and
measurement error is accounted for in a coherent, fully probabilistic manner through
a Bayesian hierarchical model. Inference is based on the joint posterior
distribution of all parameters, which depends on data through bias-reduced initial
estimates. We applied our method to a real dataset and found that the widths of our
posterior intervals indicated a nonnegligible amount of uncertainty about all
parameters, but were of magnitudes within the range of variation demographers are
used to when working with estimates and projections. The method is implemented in
the R package “popReconstruct.”Inferences are likely to be sensitive to changes in the initial estimates. However,
these are primarily data-derived and our goal is to synthesize all of the available
information; changes in these parameters should have an influence on the posteriors.
In our application to the female population of Burkina Faso, the posterior medians
were similar to the initial estimates in many cases. This suggests that the initial
estimates of each parameter agreed closely with the census counts and the initial
estimates of the other parameters. We would expect to see a greater difference
between the posterior medians and initial estimates of parameters for which this
agreement is weaker.Lee (1971, 1974) and Oeppen (1993) proposed
deterministic methods of population reconstruction. We have assumed a deterministic
model only for the population dynamics. That is, given the true vital and migration
rates, the evolution of the population is modeled deterministically. In contrast,
Bertino and Sonnino (2003) gave a method
in which the population dynamics were stochastic and the vital rates functioned as
mean parameters. We agree with Pollard (1968;
see also Cohen 2006) and expect that, in our
applications, variation due to measurement error will overwhelm any additional
variation arising from a stochastic population dynamics model. Moreover, Bertino and Sonnino's (2003) method
cannot be easily applied in our context because it was designed to take counts of
baptisms and deaths by year as inputs. Information of this kind is seldom available
in most of the cases where we wish to apply our method. Key features of our
approach, therefore, are that the major source of uncertainty in population
reconstruction, namely measurement error, is appropriately accounted for through
fully probabilistic estimates and the method can be used with the kind of data
available for most countries over the past 60 years.We made the simplifying assumption of constant variance across age and time for each
demographic parameter on the log or logit scale. This might seem restrictive; more
complex variance structures for the other parameters could be proposed to allow for
the fact that more is typically known about infant and child mortality than old-age
mortality, for example. However, the benefits of this extension would have to be
weighed against the additional complexity of having to estimate more parameters. As
our checks indicated, using separate age- and time-specific variance parameters for
population counts made little difference to the final results, probably because
there was not enough information in the initial estimates to estimate them. The
story might be different in a multicountry context, however. For instance, with
demographically similar countries, it might be realistic to set some of the variance
parameters constant across countries but allow for variation over time or age.Census counts, or data of comparable reliability, are assumed available for the
baseline year. This requirement could be removed by modifying Equation (9) so as to accommodate values
of n* derived
from noncensus sources. Since these would probably be less reliable than
census-derived estimates, this would mean replacing
σ2 in Equation (9) with a new variance
parameter to account for the extra uncertainty. This could be modeled in the same
way as the existing variance parameters per Equation (13). The period of reconstruction in our example was delimited
by the years of the earliest and most recent data on population counts. This
modification would permit the period of reconstruction to begin earlier if some
initial estimate of population counts at an earlier baseline year were to be
obtained.No modifications are required to continue reconstruction beyond the year of the most
recent census where initial estimates of vital rates and migration are available.
However, the posterior distribution of the vital rates and migration for years
beyond the most recent census will be based entirely on the initial estimates.The structure of our model could be extended to produce more detailed reconstructions
in several ways. An extension to two-sex populations is an obvious example. While we
expect this to be feasible, dependencies between female- and male-specific
parameters must be appropriately accounted for.We used our method to reconstruct national populations. There are at least two ways
in which subnational reconstructions could be obtained. The most straightforward
would be to obtain sets of initial estimates for each subnational region of
interest, and apply the model separately to each. Subnational estimates of
population counts by age can be obtained from many censuses. In developed countries,
vital registration systems would likely provide initial estimates of regional
fertility and mortality. Less developed countries often lack these systems and such
data might be difficult to obtain. Obtaining subnational estimates of net migration
is likely to be difficult for most countries.A more feasible approach is to use subnational initial estimates of population counts
and set initial estimates of sub-national vital rates to their national-level
estimates. A minimal set of modifications to allow this could involve replacing the
national-level population count vectors, nt, in Equation (1) with vectors of stacked
sub-national counts. For R regions, the population count vector at
time t,
., ≡
(n0,, …,
n)′, would be
(′.,.,1,…,′.,.,)′.
The projection matrix would take a block diagonal form with R
blocks, one for each region. Similarly, the
g would be national-level
initial estimates of migration proportions. Level 1, Equation (7), and Level 3, Equations (9)–(12), of the hierarchical model would be extended to include additional,
similar terms for each region. For example, Equation (11) could be replaced by logit
s |
s*,σ2
∼ Normal(logit
s,σ2)
for regions r,…, R. Level 4, Equation (13), would be unchanged for
v = f, s, g since we only have
national-level initial estimates, but models with separate variance parameters for
each region (i.e.,
σ2, r =
1, …, R) would be worth investigating.An advantage of this modification is that “hybrid” cases where
subnational data are available for some time periods and/or parameters could be
accommodated by using them to derive specific initial estimates and substituting
these for the national-level estimates in the modified Level 3. Corresponding,
additional variance parameters could also be added at Level 4. A possible
shortcoming of this approach is lack of flexibility in cases where a few regional
vital rates differ greatly from the national-level rates since the national level
initial-estimates are used as fixed medians at Level 3. This might be detected by
applying the approach by Carlin and Polson
(1991) to the vital rate variance parameters, as was done in Section 5.3 for
σ2.
Alternatively, dependence among subnational vital rates could be explicitly modeled,
perhaps using a spatial model in Level 3.Migration is not split into its constituent inflows and outflows since the projection
model (1) requires net migration as an input. Reliable data on migration over long
periods are not usually available, especially for developing countries. Even where
data on flows are collected, such as in the European Union, there can be significant
disagreement in estimates of binational flows between the records of the sending and
receiving countries (e.g., Raymer, de Beer, and van
der Erf 2011). The UNPD has considered this issue and net migration is
currently their preferred measure.However, in cases where good information about both in- and out-migration is
collected by means of population registers or border control agencies, it might be
reasonable to model the two flows separately on a country-by-country basis. To allow
this, the g would be replaced with
g
and g,
say, for immigration and emigration, respectively. A minimal modification would then
be to replace g with
g
−
g
in Equation (1) and replace Equation (12) with two similar terms
for g
and g.
If good information about the accuracy of the immigration and emigration data were
available, then σ2
could be replaced with two separate parameters; otherwise it would be unchanged.
SUPPLEMENTARY MATERIALS
The derivation of Equation (4); further
information about the derivation of the initial estimates discussed in Section 5.1; the R code required to perform both
the simulation study and the reconstruction of the female population of Burkina Faso
(together with sensitivity analyzes); results of the Burkina Faso reconstruction as
.csv_les.