Literature DB >> 26612972

Bayesian Reconstruction of Two-Sex Populations by Age: Estimating Sex Ratios at Birth and Sex Ratios of Mortality.

Mark C Wheldon¹, Adrian E Raftery², Samuel J Clark², Patrick Gerland³.

Abstract

The original version of Bayesian reconstruction, a method for estimating age-specific fertility, mortality, migration and population counts of the recent past with uncertainty, produced estimates for female-only populations. Here we show how two-sex populations can be similarly reconstructed and probabilistic estimates of various sex ratio quantities obtained. We demonstrate the method by reconstructing the populations of India from 1971 to 2001, Thailand from 1960 to 2000, and Laos from 1985 to 2005. We found evidence that in India, sex ratio at birth exceeded its conventional upper limit of 1.06, and, further, increased over the period of study, with posterior probability above 0.9. In addition, almost uniquely, we found evidence that life expectancy at birth (e0) was lower for females than for males in India (posterior probability for 1971-1976 equal to 0.79), although there was strong evidence for a narrowing of the gap through to 2001. In both Thailand and Laos, we found strong evidence for the more usual result that e0 was greater for females and, in Thailand, that the difference increased over the period of study.

Entities: Chemical Disease Gene Species

Keywords: Bayesian hierarchical model; Population projection; Sex ratio at birth; Sex ratio of mortality; Two-sex model; Vital Rates

Year: 2015 PMID： 26612972 PMCID： PMC4657758 DOI： 10.1111/rssa.12104

Source DB: PubMed Journal: J R Stat Soc Ser A Stat Soc ISSN： 0964-1998 Impact factor: 2.483

Introduction

The past, present and future dynamics of human populations at the country level are highly relevant to the work of social scientists in many disciplines as well as planners and evaluators of public policy. These dynamics are driven by population counts, fertility and mortality rate (vital rates), and net international migration. Demographers at the United Nations Population Division (UNPD) are tasked with producing detailed information on these quantities, which is published biennially in the World Population Prospects (WPP) (e.g. United Nations (2011)). Estimates for each country are provided, for periods stretching back from the present to about 1950. Currently, however, WPP estimates are not accompanied by any quantitative estimate of uncertainty. Uncertainty should also be measured because the availability, coverage and reliability of data that are used to derive the estimates differ greatly among countries. Developing countries, in particular, often lack the extensive registration and census‐taking systems that developed countries maintain, so estimates in these cases are subject to greater uncertainty. Estimates are not error free even for developed countries. Whereas estimates of population counts and vital rates are likely to be very accurate, uncertainty about net international migration can be quite substantial, even in places with well‐resourced statistical systems such as Europe (e.g. Poulain (1993), Willekens (1994) and de Beer et al. (2010)). Several methods have been proposed for quantifying uncertainty in estimates of the key parameters driving human population dynamics (e.g. Daponte et al. (1997), Bertino and Sonnino (2003) and Bryant and Graham (2013)). Wheldon et al. (2010, 2012, 2013a) proposed Bayesian population reconstruction (Bayesian reconstruction for short), which is a method of simultaneously estimating population counts, vital rate and net international migration at the country level, by age, together with uncertainty. The original formulation could reconstruct female‐only populations. In this paper, we describe a major extension to two‐sex populations. This allows us to estimate age‐ and time‐specific indicators of fertility, mortality and migration separately for females and males and, importantly, sex ratios of these quantities, all with probabilistic measures of uncertainty. In addition, we also show how Bayesian reconstruction can be used to derive probabilities of change over time in these quantities. To demonstrate the method, we reconstruct the full populations of India from 1971 to 2001, Thailand from 1960 to 2000 and Laos from 1985 to 2005. These countries were selected because, in all cases, the available data are fragmentary, which makes population reconstruction challenging. Bayesian reconstruction embeds a standard demographic projection model in a hierarchical statistical model. As inputs, it takes bias‐reduced initial estimates of age‐specific fertility rates, survival proportions (a measure of mortality), net international migration and census‐based population counts. Also required is expert opinion about the measurement error of these quantities, informed by data if available. The output is a joint posterior probability distribution on the inputs, allowing all parameters to be estimated simultaneously, together with fully probabilistic posterior estimates of measurement error. Wheldon et al. (2013a) showed that marginal credible intervals were well calibrated. They demonstrated the method by reconstructing the female population of Burkina Faso from 1960 to 2000. Wheldon et al. (2013b) extended Bayesian reconstruction to countries with censuses at irregular intervals and showed that it works well across a wide range of data quality contexts by reconstructing the female populations of Laos, Sri Lanka and New Zealand. Laos is a country with very little vital registration data where population estimation depends largely on surveys, Sri Lanka has some vital registration data, and New Zealand is a country with high quality vital registration data. In this paper we focus on countries which lack good vital registration data or for which there are gaps in, or inconsistencies among, the available data sources. Guilmoto (2007a) claimed that the population of Asia underwent ‘masculinization’ during the latter half of the 20th century. In Eastern Asia, which includes China, and in Southern Asia, which includes India, the sex ratio in the total population (SRTP), which is defined as the ratio of the number of males per female, ranged from 1.05 to 1.06 and from 1.09 to 1.06 between 1950 and 2010. Globally, over the same period, it ranged from 1.00 to 1.02 (United Nations, 2011). Imbalances in population sex ratios are caused by imbalances in sex ratios at birth (SRBs) and sex ratios of mortality (SRMs) (Guillot, 2002). These quantities have received considerable attention in the literature on the demography of Asia (e.g. Sen (1990), Coale (1991), Mayer (1999), Bongaarts (2001), Bhat (2002a, b), Das Gupta (2005) and Guilmoto (2009)). Sawyer (2012) called for further work to quantify uncertainty in estimates of SRMs. Estimates of the SRB are subject to a large amount of uncertainty, especially in India (Bhat, 2002a b; Guillot, 2002; Guilmoto, 2009). Here, we respond by quantifying uncertainty in these parameters. The paper is organized as follows. In the remainder of this section we provide some background on existing methods of population reconstruction and the demography of sex ratios in Asia. In Section 2 we describe the two‐sex version of Bayesian reconstruction. In Section 3 we present results from our case‐studies of India, Thailand and Laos. We focus mainly on posterior distributions of the total fertility rate (TFR), SRBs and the sex difference in life expectancy. Certain sex ratios in India are widely believed to be atypical so we devote more attention to this case. We end with a discussion in Section 4 which provides further demographic context and an overall conclusion. Selected mathematical derivations are given in the Web‐based supplementary materials, which also contain further details about sources of data and give results for additional parameters such as the sex ratio of under‐5‐years mortality rate, population sex ratios and net international migration. Bayesian reconstruction is implemented in the popReconstruct package (Wheldon, 2013) for the R environment for statistical computing (R Core Team, 2013).

Methods of population reconstruction

In human demography, population reconstruction is often referred to simply as ‘estimation’ to distinguish it from projections or forecasts of future population counts and vital rates. Use of estimation agrees with its meaning in statistics, namely the estimation of unknown quantities from data, but we use ‘reconstruction’ here to avoid ambiguity. Methods of demographic forecasting were comprehensively reviewed by Booth (2006) and some subsequent developments using Bayesian approaches are covered in Raftery et al. (2014a). Here, we focus exclusively on reconstruction; potential implications of our work for population forecasting are mentioned in Section 4. Reviews of existing methods of population reconstruction are given by Oeppen (1993a), Barbi et al. (2004) and Wheldon et al. (2013a). Many were developed for reconstructing populations of the distant past from data on births, deaths and marriages recorded in parish registers (e.g. Wrigley and Schofield (1981), Bertino and Sonnino (2003) and Walters (2008)), or to estimate the excess of mortality due to extreme events such as famine or genocide (e.g. Boyle and Ó Gráda (1986), Daponte et al. (1997), Heuveline (1998), Merli (1998) and Goodkind and West (2001)) or the number of ‘missing women’ due to male‐dominated sex ratios in Asia (Sen, 1990; Coale, 1991; Das Gupta 2005). Purely deterministic reconstruction methods that were used in some of these studies include ‘inverse projection’ (Lee, 1971, 1974), ‘back‐projection’ (Wrigley and Schofield, 1981) and ‘generalized back‐projection’ (Oeppen, 1993b). Bertino and Sonnino (2003) proposed ‘stochastic inverse projection’. This is a non‐deterministic method but the only form of uncertainty is that which comes from treating birth and death as stochastic processes at the individual level. Counts of births and deaths are assumed to be known without error and age patterns are fixed. In the cases that we treat, accurate data on births and deaths of the parish register kind are often unavailable and uncertainty due to stochastic vital rates is likely to be small relative to uncertainty due to measurement error (Pollard, 1968; Lee, 2004; Cohen, 2006). Moreover, it is designed to work with the kind of data that are commonly available for developing countries and does not rely on the existence of detailed births and deaths registers, although this information can be used when available (Wheldon et al., 2013b). Daponte et al. (1997) took a fully Bayesian approach to constructing a history of the Iraqi Kurdish population from 1977 to 1990. They constructed prior distributions for fertility and mortality rates by using survey data and expert opinion about uncertainty based on historical information and knowledge of demographic processes. Measurement error in the available fragmentary data was accounted for. However, there were some restrictions such as holding the age pattern of fertility fixed and allowing for mortality variation only through infant mortality. Rural‐to‐urban migration was accounted for by treating these populations separately; international migration was assumed to be negligible. Bayesian reconstruction is similar, but no model age patterns are assumed to hold and international migration is explicitly estimated along with fertility, mortality and population counts. A Bayesian model for estimating recent annual migration flows among European countries from multiple data sources was proposed by Raymer et al. (2013). Data on migration for the countries that they considered is spread across many inconsistent data sets collected by different statistical agencies, which also suffer from incompleteness and inaccuracy. They used a theoretical model of migration, a model for the available data and expert opinion on key parameters to harmonize the information into a complete set of international flows, which are not broken down by age or sex. Our method also involves combining information from different sources by using a Bayesian model, but we estimate vital rates and population counts by age and sex. Raymer et al. (2013) estimated immigration and emigration separately, however, whereas we estimate net migration. This makes our method easier to apply in situations where very little information about immigration and emigration is available, as is the case in many developing countries. Bryant and Graham (2013) proposed a Bayesian method for estimating subnational population counts by combining official statistics with administrative data such as tax records, electoral rolls and school rolls, and applied it to six regions of New Zealand over the period 1996–2011. They used probabilistic models for both the sources of data and counts of births, deaths, immigrants and emigrants. Uncertainty in the estimated counts comes from coverage errors in the sources of data and the probabilistic models for the counts. Models for the counts include structural parameters for region, age, sex, time period and selected interactions, depending on the type of count. Models for the sources of data are based on a common form but are tailored to the specific source in some cases. These features, and the incorporation of many different data sources, result in quite a complex model. The model that we propose here is simpler because it was developed primarily for use in less‐developed countries. Like the UNPD we focus on national level estimates. We do not directly use administrative data. Most methods of population reconstruction in demography use the cohort component model of population projection (CCMPP) in some form. Population projection uses vital rate and migration to project a set of age‐specific population counts in the baseline year, denoted , forwards in time to the end year, denoted T. In its simplest form, the population in year t+δ, , equals the population in year t plus the intervening births and net migration, minus the intervening deaths (e.g. Whelpton (1936) and Preston et al. (2001)). This is known as the demographic balancing equation. Population projection is distinct from population forecasting since it merely entails evolving a population forwards in time from some given baseline under assumptions about prevailing vital rates and migration (Keyfitz and Caswell, 2005). The period of projection may be in the future or in the past. Wheldon et al. (2013a, b) employed single‐sex population projection in this way to reconstruct female‐only populations by taking account of measurement error. In this paper we show how two‐sex population projection can be used to reconstruct full populations, thereby providing estimates of population sex ratios, SRBs and SRMs simultaneously while accounting for measurement error.

Estimating sex ratios

Methods of reporting sex ratios are not standardized. Here we adopt the convention that all ratios are ‘males per female’; in the Indian literature the inverse is more common. Hence, the SRTP is the total number of males per female in the population and the SRB is the number of male births per female birth. The SRM can be expressed by using various mortality indicators. We shall use the under‐5‐years mortality rate U5MR exclusively (see Section 3 for the definition). A low SRM means that mortality is lower among males than among females. All‐age mortality is summarized by life expectancy at birth, for which the standard demographic abbreviation is . A comparison of by sex is more commonly done by using the difference than by the ratio and we adopt that convention here. Male is subtracted from female to obtain the difference. The preferred way of estimating SRBs and SRMs at the national level is from counts of births and deaths recorded in official registers (vital registration) together with total population counts from censuses. In many countries where such registers are not kept, surveys such as demographic and health surveys and world fertility surveys must be used. These typically ask a sample of women about their birth histories. Full birth histories collect information about the times of each birth and, if the child subsequently died, the time of the death. Summary birth histories ask only about the total number of births and child deaths that the respondent has ever experienced (United Nations, 1983, Preston et al., 2001). Guilmoto (2009) also suggested using the males‐to‐female ratio among those aged 0–4 years (the ‘child sex ratio’; Guilmoto (2009)) as a proxy for the SRB when census data are thought to be more reliable than survey data. Under typical conditions, SRBs for most countries are in the range 1.04–1.06 (United Nations Population Fund, 2010), but estimates of the SRB in some regions in Asia are higher. For almost all countries, is higher for females than for males. Estimates based on both vital registration and surveys are susceptible to systematic biases and non‐systematic measurement error. Counts of births or deaths from vital registration may be biased downwards by the omission of events from the register or undercoverage of the target population. Full birth histories are susceptible to biases caused by omission of births or misreporting of the timing of events. Some omissions may be deliberate to avoid lengthy subsections of the survey (Hill et al., 2012). Fertility and mortality estimates from summary birth histories are derived by using so‐called indirect techniques such as the Brass P/F‐ratio method (Brass, 1964; United Nations, 1983; Feeney, 1996). In addition to the biases affecting full birth histories, estimates based on summary birth histories can also be affected when the assumptions behind the indirect methods are not satisfied. These assumptions concern the pattern of mortality by age and the association between mother and child mortality. They often do not hold, for example, in populations experiencing a rapid decline in mortality (Silva, 2012). In the absence of vital registration, estimates of adult mortality may be based on reports of sibling survival histories collected in surveys. Often, however, the only data that are available are on child mortality collected from surveys of women. In such cases, estimates of adult mortality are extrapolations based on model life tables or relational models (United Nations, 1983; Preston et al., 2001). Model life tables are families of life tables generated from mortality data collected from a wide range of countries over a long period of time. They are indexed by one or more summary parameters such as or U5MR and are grouped into regions. The Coale and Demeny system (Coale et al., 1983) and the UN system for developing countries (United Nations, 1982) both have five families. Relational models use a parsimonious parameterization of a single life table. Different age‐specific mortality patterns are obtained by varying the parameters. Available data are used to select parameter values that yield the best‐fitting model. For example, in the Brass two‐parameter relational logit model (Brass, 1971), the two parameters are estimated by regressing the logit observed survival proportions on the logit survival proportions from the standard life table. Errors in estimates of adult mortality that are derived in these ways come from errors in the survey‐based estimates of under 5 years mortality and the inability of the model life table family, or relational model, to capture the true patterns of mortality in the population of interest. Estimates of fertility, mortality, migration and population counts, and the implied sex ratios for successive quinquennia are all related to one another via the demographic balancing equation that underlies the CCMPP. The estimates of these quantities published in the WPP must be ‘projection consistent’ in that the age‐specific population counts for year t must be the counts that one obtains by projecting the published counts for year t−5 forwards by using the published fertility, mortality and migration rates. Bias reduction techniques are source and parameter specific. For birth history data, these might involve omitting responses of very old women, or responses pertaining to events in the distant past. For census counts, adjustments may be made to compensate for well‐known undercounts in certain age–sex groups. In other cases, parametric models of life tables, or specially constructed life tables, may be used if available. For this reason we do not propose a generally applicable method of bias reduction, one which would work well for all parameters and sources of data, since many specialized ones already exist (e.g. United Nations (1983), Murray et al. (2003, 2010) and Alkema et al. (2012)). Bayesian reconstruction takes as input bias‐reduced initial estimates of age‐specific fertility rates and age‐ and sex‐specific initial estimates of mortality, international migration and population counts. Measurement error is accounted for by modelling these quantities as probability distributions. Projection consistency is achieved by embedding the CCMPP in a Bayesian hierarchical model. Inference is based on the joint posterior distribution of the input parameters, which is sampled from by using Markov chain Monte Carlo (MCMC) sampling. Under the current UNPD procedure, all available representative sources of data for a given country are considered and techniques to reduce bias are applied where UNPD analysts deem them appropriate. Projection consistency is achieved through an iterative ‘project‐and‐adjust’ process. SRBs and SRMs are inputs to the procedure, whereas population sex ratios are calculated by using estimated population counts, which are an output.

Method

Notation and parameters

The parameters of interest are age‐ and time‐specific vital rates, net international migration flows, population counts and the SRB. The symbols n, s, g and f denote population counts, survival probabilities (a measure of mortality), net migration proportions (immigrants minus emigrants) and fertility rates respectively. All these parameters will be indexed by 5‐year increments of age, denoted by a, and time, denoted by t. The parameters n, s and g will also be indexed by sex, denoted by l ∈ {F,M}, where F and M indicate female and male respectively. The SRB is defined as the number of male births for every female birth. It will be indexed by time. Reconstruction will be done over the time interval [t 0,T). The age scale runs from 0 to A>0; in our applications A is 80 years. The total number of age groups is denoted K. To model fertility, we define , where fertility is assumed to be 0 at ages outside the range. Throughout, a prime indicates vector transpose. We shall use bold for vectors and a dot to indicate the indices whose entire range is contained therein. Multiple indices are stacked in the order a, t, l. For example,is the vector of age‐specific female population counts in exact year , and The parameters are the standard demographic parameters that are used for projection. The fertility parameters are age‐, time‐specific occurrence or exposure rates. They give the ratio of the average annual number of babies born over the period [t,t+5) to the number of person‐years lived over this period by women in the age range [a,a+5). If a woman survives for the whole quinquennium she contributes 5 person‐years to the denominator; if she survives only for the first year and a half she contributes 1.5 person‐years, and so forth. The survival parameters are age‐, time‐ and sex‐specific proportions. They give the proportion of those alive at time t who survive for 5 years. The age subscript on the survival parameters indicates the age range that the women will survive into. For example, the number of females aged [15,20) who were alive in 1965 would be the product (ignoring migration for simplicity). It also means that is the proportion of female births during 1960–1965 who were alive in 1965 and hence aged 0–5 years. The oldest age group is open ended and we must allow for survival in this age group. Thus, the proportion aged [A,∞) at time t that survives through the interval [t,t+5) is denoted by . Migration over the interval [t,t+5) is also expressed as a proportion, the denominator of which is the size of the receiving population at time t.

Projection of two‐sex populations

The CCMPP allows us to calculate the number alive by age and sex at any time, , using , the vector of age‐ and sex‐specific female and male population counts at baseline , and the age‐, time‐, sex‐specific vital rates and migration up to time t. Simply put, the vector of counts is plus the intervening births, minus the deaths, plus net migration. Projection of females and males is essentially done separately; the only link between the sexes is in the computation of births. The CCMPP is a discrete time approximation to a continuous time process, and several adjustments are made to improve accuracy. We use the standard form as described by Preston et al. (2001). It can be compactly expressed by using matrices as follows:where The in equation (1) have only two subscripts; they are the age‐specific (female) fertility rates that were introduced above. Thus the total number of births in the projection interval is a function of the number of females of reproductive age, but not of the number of males of any age, or of females of other ages. This is called ‘female dominant projection’. This approach is preferred to alternatives, such as basing fertility on the number of male person‐years lived, because survey‐based fertility data are often collected by interviewing mothers, not fathers. The are sex specific because they concern the survival of sex‐specific births. All‐sex births are computed first and then decomposed because the SRB is often a parameter of interest to demographers, as it is to us here. Splitting migration in half and adding the first half at the beginning of the projection interval and the second half at the end is a standard approximation to improve the discrete time approximation. Further details are in the Web‐based supplementary materials.

Modelling uncertainty

In many countries, the available data on vital rates and migration are fragmentary and subject to systematic biases and non‐systematic measurement error. Wheldon et al. (2013a) proposed Bayesian reconstruction as a way of estimating past vital rates, migration and population counts for a single‐sex population, which accounts for measurement error. Systematic biases are treated in a preprocessing step which yields a set of bias‐reduced ‘initial estimates’ for each age‐ and time‐specific fertility rate, survival and migration proportion and population counts. We use an asterisk to denote initial estimates. Hence is the initial estimate of . At the heart of Bayesian reconstruction is a hierarchical model which takes the initial estimates as inputs. Here, we present a substantial development of the model that was given in Wheldon et al. (2013a) which allows estimation of two‐sex populations. Set and T as the beginning and end years of the reconstruction. These might be the years for which the earliest and most recent bias‐adjusted census‐based population counts are available, but this is not a requirement. Henceforth, we refer to such counts simply as census counts. Let be the vector of all age‐, time‐ and sex‐specific fertility rates, survival and migration proportions over the period [t 0,T), the SRBs and the age‐ and sex‐specific census counts in year . These are the inputs that are required by the CCMPP. We denote the CCMPP by (·). Let be the components of corresponding to time t, excluding . Therefore, and . Reconstruction requires estimation of which we do using the following hierarchical model: level 1,level 2,level 3, (where a=0,5,…,A, and l≡F,M in expressions (2)–(8) unless otherwise specified); level 4,For 0SRB fixed at 1.05 and l≡ F. The SRB can be interpreted as the odds that a birth is male, so distribution (5) is a model for the log‐odds that a birth is male. In Bayesian terminology, level 1 can be viewed as a likelihood for the census counts, level 3 as a set of priors on the CCMPP input parameters and level 4 as a set of hyperparameters. Level 2 contains the deterministic projection model which maps the CCMPP inputs to projected population counts on which the likelihood in level 1 is conditioned. The measurement errors are a priori statistically independent but this will not be true of the posterior, in general. This is further discussed in Section 4.2. In addition to the modifications for two‐sex reconstruction, the formulation in expressions (2)–(10) differs in its treatment of the population counts at . Wheldon et al. (2013a) used an informative prior for at level 3 and excluded it from level 1. Here, we include at level 1 and use a non‐informative prior at level 3. For any K larger than any age‐, time‐ and sex‐specific population count, this formulation is equivalent (see the Web‐based supplementary materials). Setting K=40 million would ensure this, for example. The new formulation is clearer than the original as it treats all population count data in the same way. The hyperparameters , v ∈ {n,f,s,g,SRB} define the distribution of the variance parameters that represent measurement error in the initial estimates. We set these parameters by using expert opinion by eliciting liberal, but realistic, estimates of initial estimate accuracy. We elicit on the observable marginal quantities, , , and . On their respective transformed scales, these have Student t‐distributions centred at the initial point estimates and variance and degrees of freedom dependent on α and β. We set , v ∈ {f,s,n,g,SRB}, which gives the prior a weight that is equivalent to a single data point. The are determined by elicitation, but not directly. Instead, we elicit the limits of the central 90% probability intervals of the observable marginal quantities in the following way. For fertility rates, which are modelled on the log‐scale, we elicit in the statement ‘there is a 90% probability that the true fertility rates are within % of the initial point estimates'. The corresponding value of can then be determined from the marginal distribution of the . The process is the same for because population counts are also modelled on the log‐scale. Migration is explicitly modelled as a proportion so elicitation of is done by using the statement ‘there is a 90% probability that the true migration proportions are within percentage points of the initial point estimates'. Survival proportions are modelled on the logit (or log‐odds) scale. In demography, mortality tends to be more commonly dealt with than survival, so we elicit on the accuracy of . The conversion from to is, again, done by using the marginal distribution for . Further details are given in the Web‐based supplementary materials. In all cases, we asked UNPD analysts to provide the , which we refer to as elicited relative errors. Some methods of estimating migration, such as the survival ratio methods (Rowland (2003), chapter 11), rely on ‘residual’ counts. Here, projected counts based only on vital rates are compared with census counts and the difference is attributed to international migration. Methods of adjusting vital rates and census counts to ensure mutual consistency have also been proposed that use a similar approach (e.g. Luther et al. (1986) and Luther and Retherford (1988)). Initial estimates of , and should not be based on such methods since this would amount to using the data twice and uncertainty would be underestimated in the posterior.

Application

We apply two‐sex Bayesian reconstruction to the populations of India from 1971 to 2001, Thailand from 1960 to 2000 and Laos from 1985 to 2005. The periods of reconstruction are determined by the available data. Laos has no vital registration data. Initial estimates of fertility are based on surveys of women and the only mortality estimates are for ages under 5 years derived from these same surveys. Thailand and India have acceptable vital registration data for these periods which provide information about fertility and mortality at all ages. Nevertheless, adjustments are necessary to reduce bias due to undercount of certain groups. For example, vital registration is thought to have underestimated U5MR in Thailand (Hill et al., 2007; Vapattanawong and Prasartkul, 2011) and in India 50–60% of children are born at home, which increases the likelihood of omission from the register (United Nations Population Fund, 2010). Estimates of population sex ratios in India have been relatively high throughout the 20th century. Before the late 1970s, these were thought to have been caused by an excess of female mortality (high SRMs), and from the late 1970s onwards by high SRBs. Both of these phenomena have been linked to cultural preferences for sons over daughters which were intensified by a rapid fall in fertility rates (Visaria, 1971; Bhat, 2002a b; Das Gupta, 2005; Guilmoto, 2007b). Concern over the accuracy of certain estimates of the SRB has led some researchers to suggest using the SRTP and sex ratios for young age groups as proxies for the SRB and SRMs (Bhat, 2002a b; Guilmoto, 2009). We use Bayesian reconstruction to derive credible intervals for the SRTP and the sex ratio among those aged 0–5 years for India. Thailand experienced an even more rapid decline in TFR between 1960 and about 1980 (Kamnuansilpa et al., 1982). Estimates of Thailand's SRB between 1960 and about 1970 are relatively high but are within the typical range from about 1970 to 2000. Surveys of Thai families in the 1970s found that girls and boys were desired about equally (Knodel et al., 1996; Guilmoto, 2009). Fertility rates in Laos have fallen since 1985 but remain high relative to other Asian countries. Very little has been written about sex ratios for Laos (but see Frisen (1991)). In the remainder of this section, we briefly describe the sources of data for each country and the method that was used to derive initial estimates. These are followed by results for selected parameters. We focus on key details and the most interesting outputs; more details are in the Web‐based supporting materials. For the most part, we focus on age‐aggregated summary parameters instead of the age‐specific CCMPP input parameters. For fertility we report the TFR which, for the period [t,t+5) is the average number of children born to women of a hypothetical cohort who survive through ages from to , all the way experiencing the age‐specific average annual fertility rates, . Its definition in terms of the input parameters isFor mortality we report and the under‐5‐years mortality rate. Life expectancy at birth is the average age at death for members of a hypothetical cohort which, at each age, experience age‐specific survival . Its definition isThe derivation is straightforward and can be found in Wheldon et al. (2013a). U5MR is constructed in the same way as the standard ‘infant mortality rate’ (e.g. Preston et al. (2001), chapter 2), except for the age interval [0,5). It is defined as follows:It is neither a true demographic rate nor a probability but, nevertheless, is in common use. Posterior distributions for all these summary parameters can be computed easily from the MCMC sample from the joint posterior. This is a particular advantage afforded by taking a Bayesian approach to population reconstruction. All computations were done by using the R environment for statistical computing (R Core Team, 2013). Bayesian reconstruction is implemented in the package popReconstruct. The method of Raftery and Lewis (1996) was used to select the length of the MCMC chains.

Sources of data and initial estimates

India, 1971–2001

Censuses have been taken roughly every 10 years in India since 1871. We begin our period of reconstruction in 1971. This is the first census year for which vital rate data that are independent of the censuses are available, collected by the Indian sample registration system. Subsequent censuses were taken in 1981, 1991 and 2001 (sufficiently detailed results from the 2011 census were not available at the time of writing). Counts in the 2010 WPPs were used as these were adjusted to reduce bias. Estimates of the SRB, fertility and survival were based on data from the sample registration system (Registrar General and Census Commissioner of India, 2011), the National Family Health Surveys conducted between 1992 and 2006 (Registrar General and Census Commissioner of India, 2009) and the 2002–2004 Reproductive Child Health Survey. Weighted cubic splines were used to smooth estimates of the SRB and fertility. The same initial estimates for migration were used for India as for Laos and Thailand and the relative errors elicited were also the same; see below. The elicited relative error of 10% for the vital rates and the SRB is consistent with independent assessments of the coverage of the sample registration system (Bhat, 2002c; Mahapatra, 2010).

Thailand, 1960–2000

Censuses in Thailand were conducted in 1960, 1970, 1980, 1990 and 2000 (detailed results from the census that was conducted in 2010 were not available at the time of writing). We used the counts in WPP 2010 which were adjusted for known biases such as undercount. Initial estimates of the SRB were taken from current fertility based on vital registration. The relative error elicited was set to 10%. Initial estimates of age‐specific fertility were based on direct and indirect estimates of current fertility and children ever born based on the available data including surveys and vital registration. Each data series was normalized to give the age pattern and summed to give the TFR. These were smoothed separately by using weighted cubic splines and the resulting estimates were combined to yield a single series of initial estimates of age‐specific fertility rates, in the same manner as for India. The weights were determined by UNPD analysts on the basis of their expert judgement about the relative reliability of each source. The relative error elicited was set to 10%. Initial estimates of survival for both sexes were based on life tables calculated from vital registration, adjusted for undercount by using data from surveys. We used the same initial estimates of international migration as for Laos; see below.

Laos, 1985–2004

National censuses in Laos were conducted in 1985, 1995 and 2005, so we reconstruct the whole population between 1985 and 2005. We used the initial estimates of Wheldon et al. (2013b) for fertility, female mortality, migration and population counts. In these, migration was centred at zero for all sexes, ages and quinquennia, with a large relative error of 20%. Initial estimates for males were derived in an analogous manner. There was very little information about the SRB, so initial estimates were set at 1.05, which is a demographic convention (Preston et al., 2001), with a large elicited relative error of 20%.

Results

Key results are given by country; more results are presented in the Web‐based supporting materials. We show the limits of central 95% credible intervals for the marginal prior and posterior distributions of selected parameters. The magnitude of uncertainty will be summarized by using halfwidths of these intervals, averaged over age, time and sex. We compare our results with those published in the 2010 WPP for years with comparable estimates. The 2010 WPP did not use Bayesian reconstruction but was based on the same data, so the comparison is useful. Figs 1(a) and 1(b) show posterior 95% intervals for the TFR and SRB for India. The median TFR decreased consistently and the posterior intervals have halfwidth 0.11 children per woman, on average. The marginal posterior for the SRB is centred above the range 1.04–1.06 from 1976–2001, which suggests that SRBs might have been atypically high over this period. There also appears to have been an increase in the SRB over the same period. Under Bayesian reconstruction, the posterior probabilities of these events can be estimated in a straightforward manner from the posterior sample. The posterior probabilities that the SRB exceeded 1.06 in each of the quinquennia are in Table 1, part (a). Strong evidence for a high SRB was found for the period 1991–2001.

Figure 1

Table 1

Probability that sex ratios and differences exceeded certain thresholds for the reconstructed population of India, 1971–2001, by quinquennium

Probabilities for the following years:
1971	1976	1981	1986	1991	1996
(a)Pr(SRB > 1.06)
0.44	0.66	0.83	0.86	0.93	0.96
(b)Pr(female e0− male e0> 0)
0.21	0.58	0.74	0.78	0.91	0.99

Prior () and posterior medians () and 95% credible intervals for the reconstructed population of India, 1971–2001 (, 2010 WPP): (a) TFR; (b) SRB (four trajectories from the MCMC sample are also shown) Probability that sex ratios and differences exceeded certain thresholds for the reconstructed population of India, 1971–2001, by quinquennium To investigate the trend further we looked at the posterior distributions of two measures of linear increase: the difference between SRBs in the first and last quinquennia; the slope coefficient in the ordinary least squares (OLS) regression of the SRB on the start year of each quinquennium. Each quantity was calculated separately for each SRB trajectory in the posterior sample. Some actual trajectories are shown in Fig. 1(b). These measures summarize the posterior distribution that was obtained from the reconstruction in simple ways; linear regression models were not used to obtain the sample from the posterior. The probabilities that the simple difference and slope coefficient were greater than 0 are 0.92 and 0.93 respectively (Table 2, part (a)).

Table 2

Measure of trend	95% credible interval	Prob > 0
(a) SRB
SRB1996−SRB1971	[−0.011, 0.054]	0.92
OLS slope (SRB∼year)	[−0.00034, 0.0021]	0.93
(b) Sex difference in life expectancy at birth (sex diff. e0 )
(sex diff.e0)1996−(sex diff.e0)1971	[0.12, 5]	0.98
OLS slope (sex diff. e0∼ year)	[0.0066, 0.17]	0.98
(c) SRTP
SRTP1996−SRTP1971	[−0.034, −0.017]	0.000027
OLS slope (SRTP∼year)	[−0.0013, −0.00045]	0.00035
(d) Sex ratio in the population under 5 years (SRU5)
SRU51996−SRU51971	[−0.037, 0.017]	0.2
OLS slope (SRU5∼year)	[−0.0011, 0.00062]	0.29

Two measures of trend are used: the difference over the period of reconstruction and the slope coefficient from the OLS regression on the start year of each quinquennium. 95% credible intervals for the measures are also given.

Probabilities of increasing linear trends for the SRB, sex difference in life expectancy at birth (sex diff. ), SRTP and sex ratio in the population under 5 years (SRU5) for the reconstructed population of India, 1971–2001a Two measures of trend are used: the difference over the period of reconstruction and the slope coefficient from the OLS regression on the start year of each quinquennium. 95% credible intervals for the measures are also given. Results for are shown in Fig. 2. Life expectancy at birth increased for both sexes over the period of reconstruction (Fig. 2(a)) but the sex difference suggests that the female might have increased more rapidly than the male and even exceeded it in the period 1996–2001 (Fig. 2(c)). The mean interval halfwidth for the sex difference in life expectancies is 1.7 years. The posterior probabilities that the female exceeded the male support this (Table 1, part (b)). The possibility of an increase in the female–male difference in was investigated by using the same method applied to the SRB. The probability of an increase between 1971 and 2001 is 0.98 and the probability that the slope is greater than 0 is 0.98 (Table 2, part (b)); strong evidence of a positive time trend.

Figure 2

Prior and posterior medians and 95% credible intervals for life expectancy at birth () for the reconstructed population of India, 1971–2001: (a) female () and male () posterior quantiles with 2010 WPP estimates (, females; , males); (b) female () and male () prior quantiles only; (c) sex difference female − male (, prior; , posterior) Population sex ratios are shown in Fig. 3(a). The probability of a decrease in SRTP and the probability that the OLS slope was less than 0 are both greater than 0.999 (Table 2, part (c)); very strong evidence for a decline over the period of reconstruction. Sex ratios in the population under 5 years, SRU5, increased in the WPP population counts, but our posterior median remained relatively constant after an initial decline. The probability that the sex ratio declined is 0.8; the probability that the OLS coefficient was negative is 0.71 (Table 2, part (d)). Mean halfwidths of the intervals for the SRTP and the sex ratio in ages 0–5 years are 0.01 and 0.027 respectively. Uncertainty is higher in years without a census.

Figure 3

Posterior medians and 95% credible intervals for sex ratios in the reconstructed population of India, 1971–2001 (, posterior; , WPP census): (a) total population; (b) population aged 0–5 years

Posterior medians and 95% credible intervals for sex ratios in the reconstructed population of India, 1971–2001 (, posterior; , WPP census): (a) total population; (b) population aged 0–5 years Using the sex ratio, the probability that the female U5MR exceeded that of males ranged from 0.72 to 0.89 over the period of reconstruction. Evidence of a linear trend was not found. Further results for U5MR and net international migration are given in the Web‐based supporting materials. The TFR fell steeply in Thailand from 1960 to 2000 (Fig. 4(a)). Posterior uncertainty about this parameter is small; the mean halfwidth of the posterior intervals is 0.07 children per woman.

Figure 4

Prior () and posterior medians () and 95% credible intervals for the reconstructed population of Thailand, 1960–2000 (, 2010 WPP): (a) TFR; (b) SRB

Prior () and posterior medians () and 95% credible intervals for the reconstructed population of Thailand, 1960–2000 (, 2010 WPP): (a) TFR; (b) SRB 95% credible intervals for the SRB contain the typical range of 1.04–1.06 for all except the first two quinquennia (Fig. 4(b)). The probability that the SRB exceeded 1.06 in each period is given in Table 3. There is strong evidence that SRBs were atypically high in the period 1960–1969. The time trend for SRBs appears to be curvilinear, so the simple linear summaries that were used to analyse the trend in Indian SRB are not appropriate here. Piecewise linear regression models are available (e.g. Hinkley (1969, 1971)), but each trajectory in the posterior sample consists of only eight values and they are quite volatile. These characteristics make identification of a change point difficult. Instead we partition the period of reconstruction into two subperiods, 1960–1984 and 1985–2000, and summarize the time trend with the following two difference quantities: and (the subscripts indicate the start years of the quinquennia). The posterior joint probability of a decrease from 1960 to 1984 followed by an increase to 1995–1999 (i.e. is 0.84. This type of analysis is straightforward given the MCMC sample from the joint posterior, and it illustrates the advantage of using Bayesian inference.

Table 3

Probability that the SRB was greater than 1.06 for the reconstructed population of Thailand, 1960–2000, by quinquennium

Year	Probability
1960	0.99
1965	0.95
1970	0.66
1975	0.32
1980	0.11
1985	0.19
1990	0.35
1995	0.54

Probability that the SRB was greater than 1.06 for the reconstructed population of Thailand, 1960–2000, by quinquennium Results for the sex difference in are shown in Fig. 5. Our posterior intervals for the sex difference in lie entirely above zero in each quinquennium, suggesting that female longevity was greater than that of males in Thailand from 1960 to 2000 (mean halfwidth of the difference: 2.4 years). There is also strong evidence for a positive trend; the probability that the simple difference (1995 period minus 1960 period) in the sex differences in was greater than 0 is 0.96 and the probability that the OLS slope is greater than 0 is 1 (Table 4).

Figure 5

Table 4

Probabilities of an increasing linear trend and 95% credible intervals for sex difference in life expectancy at birth (sex diff. ) for the reconstructed population of Thailand, 1960–2000a

Measure of trend	95% credible interval	Prob > 0
(sex diff.e0)1996−(sex diff.e0)1971	[−0.42, 7.1]	0.96
OLS slope (sex diff.e0∼year)	[0.027, 0.18]	1

Two measures of trend are used: the difference over the period of reconstruction and the slope coefficient from the OLS regression on the start year of each quinquennium.

Prior and posterior medians and 95% credible intervals for life expectancy at birth () for the reconstructed population of Thailand, 1960–2000: (a) female () and male () posterior quantiles with 2010 WPP estimates (, females; , males); (b) female () and male () prior quantiles only; (c) sex difference female − male (, prior; , posterior) Probabilities of an increasing linear trend and 95% credible intervals for sex difference in life expectancy at birth (sex diff. ) for the reconstructed population of Thailand, 1960–2000a Two measures of trend are used: the difference over the period of reconstruction and the slope coefficient from the OLS regression on the start year of each quinquennium. The posterior for the sex ratio of under‐5‐years mortality rate suggests that mortality at ages 0–5 years was similar for both sexes. In no quinquennium is there convincing evidence for a male‐to‐female ratio that is less than 1. Similarly there is not strong evidence for a linear trend in this parameter over the period of reconstruction. Details are in the Web‐based supporting materials along with results for net international migration and population sex ratios. Medians and prior and posterior credible intervals for the TFR and SRB for Laos in 1985–2004 are shown in Fig. 6. The posterior for the TFR that is obtained here (Fig. 6(a)) is very similar to that obtained by Wheldon et al. (2013b) who reconstructed the female‐only population and did not estimate the SRB; it was kept fixed at 1.05 throughout, as a demographic convention (Preston et al., 2001).

Figure 6

Prior () and posterior medians () and 95% credible intervals for the reconstructed population of Laos, 1985–2004 (, 2010 WPP): (a) TFR; (b) SRB

Prior () and posterior medians () and 95% credible intervals for the reconstructed population of Laos, 1985–2004 (, 2010 WPP): (a) TFR; (b) SRB There were very little data on the SRB for Laos; therefore, in this study, the initial estimate of the SRB was fixed at 1.05 in all quinquennia but a posterior distribution was estimated by using the model. The posterior median SRB deviates very little from the initial estimate, although the uncertainty has been considerably reduced; the mean of the halfwidths of the 95% credible intervals is 0.038 compared with 0.39 for the prior (Fig. 6(b)). The probability that the SRB was above 1.06 in any of the quinquennia is low (Table 5) and the evidence for a trend over time is equivocal (Table 6).

Table 5

Probability that sex ratios and differences exceeded certain thresholds for the reconstructed population of Laos, 1985–2005, by quinquennium

Probabilities for the following years:
1985	1990	1995	2005
(a) Pr(SRB>1.06)
0.20	0.30	0.27	0.27
(b) Pr(female e0− male e0>0)
0.56	1.00	1.00	1.00

Table 6

Probabilities of an increasing linear trend and 95% credible intervals for sex difference in life expectancy at birth (sex diff. ) for the reconstructed population of Laos, 1985–2000a

Measure of trend	95% credible interval	Prob > 0
(a) SRB
SRB2000−SRB1985	[−0.046, 0.061]	0.58
OLS slope (SRB∼ year)	[−0.0034, 0.0042]	0.56
(b) Sex difference in life expectancy at birth (sex diff.e0)
(sex diff.e0)2000−(sex diff.e0)1985	[0.54, 4.2]	0.99
OLS slope (sex diff. e0∼ year)	[0.025, 0.26]	0.99

Two measures of trend are used: the difference over the period of reconstruction and the slope coefficient from the OLS regression on the start year of each quinquennium.

Probability that sex ratios and differences exceeded certain thresholds for the reconstructed population of Laos, 1985–2005, by quinquennium Probabilities of an increasing linear trend and 95% credible intervals for sex difference in life expectancy at birth (sex diff. ) for the reconstructed population of Laos, 1985–2000a Two measures of trend are used: the difference over the period of reconstruction and the slope coefficient from the OLS regression on the start year of each quinquennium. There is strong evidence that for females was higher from 1990 through 2005 but there appears to be no evidence for a sex difference between 1985 and 1990 (Fig. 7 and Table 5, part (b)). The posterior distributions of both trend summaries provide strong evidence for an increase in the sex difference over the period of reconstruction (Table 6), although this is due primarily to the increase immediately following the 1985–1990 period.

Figure 7

Prior and posterior medians and 95% credible intervals for life expectancy at birth () for the reconstructed population of Laos, 1985–2004: (a) female () and male () posterior quantiles with 2010 WPP estimates (, females; , males); (b) female () and male () prior quantiles only; (c) sex difference female − male (, prior; , posterior) Results for the sex ratio of under‐5‐years mortality rate, net international migration and population sex ratios are in the Web‐based supporting materials.

Discussion

We have described Bayesian reconstruction for two‐sex populations, which is a method of reconstructing human populations of the recent past which yields probabilistic estimates of uncertainty (Wheldon et al., 2010, 2012, 2013a, b). We reconstructed the populations of Laos from 1985 to 2005, Thailand from 1960 to 2000 and India from 1971 to 2001, paying particular attention to sex ratios of fertility and mortality indicators. We estimate that the posterior probability that the SRB was above 1.06 is greater than 0.9 in India between 1991 and 2001 and the probability that it increased over this period is about 0.92. The SRB was also above 1.06 with high posterior probability in Thailand from 1960 to 1970. We estimate that the probability that it decreased between 1960 and 1980, then increased from 1985 to 2000, is 0.84. We found no evidence for atypically high SRBs, or a trend over the period of reconstruction, for Laos, a country with much less available data than Thailand and India. In both Thailand and Laos, we found strong evidence that was greater for females and, in Thailand, that the difference increased over the period of reconstruction. In India, the probability that for females was lower during 1971–1976 was 0.79 but there was strong evidence for a narrowing of the gap through to 2001. In its original formulation, Bayesian reconstruction was for female‐only populations; here we show how two‐sex populations can be reconstructed by using the same framework. The method takes a set of data‐derived, bias‐reduced initial estimates of age‐specific fertility rates, and age–sex‐specific survival proportions, migration proportions and population counts from censuses, together with expert opinion on the measurement error informed by data if available. Bayesian reconstruction updates initial estimates by using adjusted census counts via a Bayesian hierarchical model. The periods of reconstruction that were used in our applications begin in the earliest census year for which non‐census vital rate data were available, and end with the year of the most recent census. Reconstruction can be done further ahead, but without a census the results are based purely on the initial estimates. The initial estimates, , , , and , enter the model as fixed prior medians (Section 2.3), and so the posterior will be sensitive to changes in these inputs. This is actually desirable; because the initial estimates are based heavily on data that are specific to that estimate, the posterior should be sensitive to changes in these data. Several previous methods of population reconstruction were purely deterministic, were not designed to work with the type of data that are commonly available for many countries over the last 60 years or did not account for measurement error (e.g. Lee (1971, 1974), Wrigley and Schofield (1981), Oeppen (1993b) and Bertino and Sonnino (2003)). Alternative Bayesian models for estimation have been proposed but they used fixed age patterns (Daponte et al., 1997), focused on migration only (Raymer et al., 2013) or appear better suited to situations in which the available data are relatively detailed and abundant (Bryant and Graham, 2013; Raymer et al., 2013). Our method was developed primarily for use in less‐developed countries where subnational and historical administrative data are unlikely to be available. Hence it has lighter data demands and can be applied in a wide range of data quality contexts. Our applications illustrate some significant advantages to taking a Bayesian approach to population reconstruction. In many countries, the available data are fragmentary and detailed information about measurement error is not available, so expert opinion is useful. The Bayesian paradigm can formally incorporate this information through informative priors. Bayesian inference using MCMC sampling provides the full joint posterior distribution over all input parameters; this would not be so under other methods of inference, such as maximum likelihood. With the MCMC sample from the joint posterior, we could easily derive posterior probability intervals for any transformation of the input parameters, such as the TFR and . We devoted particular attention to the levels and trends of the SRB. Where it seemed appropriate (India and Laos) we used simple linear summaries to summarize the trend, and we did something more complex when it did not (Thailand). Without the full joint posterior distribution, these types of analyses would have been more difficult. We considered SRBs and SRMs because these are of interest to demographers and policy makers, especially since they determine the SRTP (Griffiths et al., 2000; Guillot, 2002). It is conventional to compare sex‐specific U5MRs with a ratio but sex‐specific s with a difference. We have not studied the associations among U5MR‐ratios, ‐differences and population sex ratios. The SRTP is a function of lifetime cohort mortality but the U5MR and that were presented here are period measures for which the relationship that was used by Guillot (2002) does not hold. Our results add to previous work on SRMs, especially that of Sawyer (2012) who studied sex ratios of U5MR and called for further work to quantify its uncertainty. Sawyer (2012) decomposed U5MR into mortality between ages 0 and 1 years (infant mortality) and mortality between ages 1 and 5 years (child mortality). We reconstructed populations in age–time intervals of width 5 years because these are the intervals for which data are most widely available across all countries. Many methods of adjusting vital registration using census data have been proposed (e.g. Bennett and Horiuchi (1981), Hill (1987) and Luther and Retherford (1988)), but these deal with intercensal migrations by essentially truncating the age groups that are most affected by migration (Murray et al., 2010), or ignore them altogether. Luther et al. (1986) and Hill et al. (2007) applied these methods to Thailand to estimate undercounts. The aim of these methods is to produce improved point estimates of vital rates. We have not used intercensal count data to derive our initial estimates. For example, initial estimates of survival were not based on inter‐censal cohort survival and initial estimates of migration were not based on ‘residual’ counts. Doing so would have amounted to using the census data twice and would have underestimated the uncertainty. The outputs of Bayesian reconstruction are interval estimates which quantify uncertainty probabilistically.

Sex ratios in Asia

SRTPs in Asia are the highest in the world (United Nations, 2011). Guilmoto (2007a, 2012) warned that this masculinization could lead to a marriage squeeze in which many young males will struggle to marry owing to a lack of eligible females. Formally, high SRTPs are the result of high SRBs and lifetime sex ratios of cohort mortality (Guillot, 2002). The relative effects of these two factors may vary by time and country. A normal range for the SRB is believed to be 1.04–1.06 (United Nations Population Fund, 2010). Concerns about quality, or a complete lack of data, have made it difficult to estimate SRBs accurately in many Asian countries. Bayesian reconstruction of these populations quantifies the accuracy of SRB estimates probabilistically. With few exceptions, country level s for females exceed those of males. This pattern is consistent with a female survival advantage; in virtually all contemporary human populations, females age more slowly and live longer than males (Soliani and Luchetti, 2006; Vallin, 2006). Behavioural factors appear to play an important role, especially in developed countries where the prevalence of harmful activities, such as smoking, consumption of alcohol and risky behaviour leading to accidental death, tends to be lower among women than among men (Waldron, 1985, 2009; Luy, 2003; Raftery et al., 2014b). The persistence of lower male across the development spectrum suggests that there are other factors as well. In less‐developed countries, is determined more by infant and child mortality. Pongou (2013) studied infant mortality in Africa and found an association with parental characteristics extant before conception. Biological, as well as behavioural, determinants have also been proposed but there is little consensus on the specific mechanism (Austad, 2011). The only countries in which the female is not higher are countries in Africa and some in south and central Asia. The African countries are those with generalized human immunodeficiency virus epidemics where mortality due to acquired immune deficiency syndrome changes the typical sex difference in (United Nations, 2011). Those in Asia include, most notably, India where cultural preference for males is thought to be the major cause.

India

The SRTP in India has received considerable attention, particularly as an indicator of discrimination against women (United Nations Population Fund, 2010). Griffiths et al. (2000) showed that only slightly elevated SRBs and SRMs at young ages are sufficient to produce the observed SRTP in India, if they persist for a long period of time. The relative contribution of these two factors may have varied over time. Bhat (2002b) and Guilmoto (2007b, 2009) argued that India experienced a transition in the 1970s whereby a high SRB replaced a low SRM as the cause of the high SRTPs that were observed throughout the period (recall that a low SRM is a result of lower male mortality). Evidence suggests that low SRMs were due to female neglect and infanticide. In the 1970s, these practices gave way to sex‐selective abortion which raised the SRB instead. The transition hypothesis is based on several pieces of evidence. Data suggest a possible rise in the SRB in India above the typical range of 1.04–1.06 in the mid‐1980s (Guilmoto, 2007b). In the 1970s, amniocentesis started to become available as a method for determining the sex of a fetus and abortion was legalized. Ultrasonography, which is a less invasive way of determining fetal sex, started to became available in many parts of India in the 1980s. In certain regions, such as the northern states and highly urbanized areas, there is a long standing tradition of preference for sons over daughters (Mayer, 1999; Bongaarts, 2001). The arrival of these new technologies in this context appears to have led to an increase in sex‐selective abortions in India (Bhat, 2002b; Jha et al., 2006b, 2011; Guilmoto, 2007c; United Nations Population Fund, 2010). The steep decline in Indian TFR, which began in the early 1970s, could have increased the prevalence of such procedures. Several studies have found evidence that the SRB is higher at higher parities (birth orders), both in India (Das Gupta and Bhat, 1997; Bhat, 2002b, Jha et al., 2006a, 2011) and other Asian countries (Das Gupta, 2005). The increase appears to be greater if none of the earlier births were male. As the TFR decreases so does the average family size, so the risk of having no sons increases (Guilmoto, 2009). Therefore, in cultures where sons play important economic and social functions, or where families benefit materially much more from the marriage of a son than of a daughter, the incentive to use sex‐selective abortion increases (Mayer, 1999; Guilmoto, 2009). After combining all available data and including uncertainty, we estimate that the probability that there was an increase in the SRB between 1971 and 2001 is above 0.92. The probability that the female U5MR was higher over this period is estimated to be between 0.72 and 0.89, but there was no evidence of a trend. Therefore, our results provide support for the SRB part of the transition hypothesis but not the U5MR part. Overall mortality decreased rapidly in India from about 1950 as infectious diseases were brought under control, food security increased and health services became more widely available (Bhat, 2002b). Our results suggest that, after taking account of uncertainty, there was an increase in and a continual decrease in U5MR between 1971 and 2001 for both sexes. Using sample registration system data, Bhat (2002b) noted that the decrease was greater for females than for males and our analysis of the change in the sex difference of supports this; we found that the probability that the difference increased over the period of reconstruction is about 0.98. The probability of a decline in the SRTP was found to be similarly strong. However, as with the sex ratio U5MR, there was little evidence for a trend in SRU5. India's large population makes it a very important case for the study of sex ratios in Asia (Guilmoto, 2007b) and, like other researchers (e.g. Guillot (2002) and United Nations (2011)), we have focused on country level estimates only. However, where available, data suggest that there are large regional variations in SRBs and population sex ratios, with estimates for urban areas and northern states being much higher than for other areas (Bhat, 2002b; Guilmoto, 2009; Jha et al., 2011). Currently, Bayesian reconstruction cannot produce subnational estimates but could be extended to do so in the future.

Thailand

Like other parts of Asia, Thailand experienced rapid economic growth and a rapid fall in the TFR beginning in 1960. The TFR decline was accompanied by an increase in the widespread use of modern contraceptive methods made available through government‐supported, voluntary family planning programmes (Kamnuansilpa et al., 1982; Knodel, 1987; Knodel et al., 1996). Unlike India, Thailand is not considered to have had a high SRB (Guilmoto, 2009). Vital registration data, which formed the basis of our initial estimates, indicate that the SRB was high during the period 1960–1970 but remained within the typical range thereafter. This is consistent with studies that were conducted after the early 1970s which found that small families of two or three children consisting of at least one boy and one girl were the most commonly desired configuration in Thailand. TFR decline in Thailand may have intensified this preference (Knodel and Prachuabmoh, 1976; Knodel et al., 1996). Posterior intervals for sex ratios of mortality in Thailand reflect the typical pattern which is one of higher female life expectancy. There is no evidence that this pattern was also true for U5MR (see the Web‐based supporting materials).

Laos

Fertility in Laos remained high relative to its neighbours. For example, the estimated TFR for 1985 in Laos is comparable with the 1960 estimate for Thailand (Frisen, 1991). Posterior uncertainty about the SRB is high and our results provide no evidence to suggest that levels were atypical between 1985 and 2005. All‐age mortality as summarized by does appear to have been higher for females from 1990 onwards but, as with Thailand, there is no evidence that this advantage held for U5MR (see the Web‐based supporting materials).

Further work and extensions

Our prior distributions were constructed from initial point estimates of the CCMPP input parameters, together with information about measurement error. Measurement errors are a priori independent and depend on hyperparameters which, in the examples that were given here, were elicited from UNPD analysts who are very familiar with the sources of data and the demography of each country. In cases where good data on measurement error are available, they can be used. For example, Wheldon et al. (2013b) used information from post‐enumeration surveys and vital register coverage studies to estimate the accuracy of initial estimates for reconstruction of the New Zealand female population. Data of this kind are rarely available in developing countries for either census or vital rate data, however. Explicitly modelling the dependence structure among measurement errors a priori could be done using, for example, multivariate normal distributions in levels 1 and 3, or Wishart distributions at level 4. We have not pursued this here because a single covariance structure may not be optimal across a wide range of countries and we wish to keep the method as generalizable as possible. Such a modification would also introduce additional parameters which would require their own priors, potentially increasing the elicitation burden. If some generalizability were to be ceded in favour of a more sophisticated prior on measurement errors, it might first be worth attempting to extend the approach of Alkema et al. (2012), who did not consider covariances but, instead, developed a method for estimating the quality of survey‐based TFR estimates in west Africa by modelling bias and measurement error variance as a function of data quality covariates. In their reconstruction of the female population of Burkina Faso, Wheldon et al. (2013a) used the resulting point estimates of the TFR to help to determine initial estimates of fertility rates. The covariate‐based estimates of measurement error were not used, but further work could involve modifications to allow this. Incorporating additional information in the form of data quality covariates could increase the accuracy of measurement error estimates. Bayesian evidence synthesis (e.g. Eddy et al. (1992), Ades and Sutton (2006) and Presanis et al. (2008)) could also be investigated as a way for determining the initial point estimates and measurement error. This is a method for synthesizing information from multiple sources on the same parameter, or functions of parameters. For example, information about the TFR could be synthesized with information on age‐specific fertility rates from a different source. As with the data quality covariate approach, such efforts would be country or region specific because they would need to reflect the sources of data that are available. Posterior uncertainty in estimates of U5MR was found to be substantial and we did not find strong evidence for a skewed sex ratio in any of our examples. U5MR is underidentified in the model because the only census count that it affects, that for ages 0–5 years, is also dependent on the SRB and TFR. Recent work by the UN Interagency Group for Child Mortality Estimation has focused on producing probabilistic estimates of U5MR (Hill et al., 2012; Alkema and Ann, 2011). Further work could look at ways of using these estimates in Bayesian reconstruction to improve estimation of U5MR. The census counts that we used were not raw census output but adjusted counts published in WPP. In some cases, these counts are adjusted by the UNPD to reduce bias due to factors such as differential undercount by age (Booth and Gerland (2015), for example). This may have led to an underestimate of uncertainty. The use of raw census outputs instead is worth investigating, but the effects of undercount would still need to be addressed through modifications to the method. Our prior distributions for international migration were centred at zero with large variances. This is a sensible default when accurate data are not available. Further work could investigate the possibility of using the change in stocks of foreign born to provide more accurate initial estimates (e.g. United Nations (2012)). Data on refugee movements, such as those gathered by the United Nations High Commissions for Refugees, is another source that could be investigated where available. We have reconstructed national level populations only, principally because this is the level at which the UN operate. We have already mentioned that subnational reconstructions might be of interest. Subnational reconstructions could be done without any modifications to the method if the requisite data are available; national level initial estimates and population counts would just be replaced with their subnational equivalents and the method applied as above. Reconstructing adjacent regions, or regions between which there is likely to be a large amount of migration, would require special care, however, as there is no way of accounting for dependence between migratory flows under the current approach. ‘Web‐based supporting materials for “Bayesian reconstruction of two‐sex populations by age: estimating sex ratios at birth and sex ratios of mortality”’. Click here for additional data file.

34 in total

1. Population characteristics in the Lao People's Democratic Republic.

Authors: C M Frisen
Journal: Asia Pac Popul J Date: 1991-06

2. Thailand's reproductive revolution.

Authors: J Knodel
Journal: Soc Sci Date: 1987

3. What do we know about causes of sex differences in mortality? A review of the literature.

Authors: I Waldron
Journal: Popul Bull UN Date: 1985

4. Preferences for sex of children in Thailand: a comparison of husbands' and wives' attitudes.

Authors: J Knodel; V Prachuabmoh
Journal: Stud Fam Plann Date: 1976-05

5. Epidemiologic transition interrupted: a reassessment of mortality trends in Thailand, 1980-2000.

Authors: Kenneth Hill; Patama Vapattanawong; Pramote Prasartkul; Yawarat Porapakkham; Stephen S Lim; Alan D Lopez
Journal: Int J Epidemiol Date: 2006-12-20 Impact factor: 7.196

6. Bayesian Population Projections for the United Nations.

Authors: Adrian E Raftery; Leontine Alkema; Patrick Gerland
Journal: Stat Sci Date: 2014-02 Impact factor: 2.901

7. Completeness of India's sample registration system: an assessment using the general growth balance method.

Authors: P N Mari Bhat
Journal: Popul Stud (Camb) Date: 2002-07

Review 8. Child mortality estimation: accelerated progress in reducing global child mortality, 1990-2010.

Authors: Kenneth Hill; Danzhen You; Mie Inoue; Mikkel Z Oestergaard
Journal: PLoS Med Date: 2012-08-28 Impact factor: 11.069

9. Reconstructing Past Populations With Uncertainty From Fragmentary Data.

Authors: Mark C Wheldon; Adrian E Raftery; Samuel J Clark; Patrick Gerland
Journal: J Am Stat Assoc Date: 2013-03-15 Impact factor: 5.033

10. Child mortality estimation: consistency of under-five mortality rate estimates using full birth histories and summary birth histories.

Authors: Romesh Silva
Journal: PLoS Med Date: 2012-08-28 Impact factor: 11.069

3 in total

1. Bayesian population reconstruction of female populations for less developed and more developed countries.

Authors: Mark C Wheldon; Adrian E Raftery; Samuel J Clark; Patrick Gerland
Journal: Popul Stud (Camb) Date: 2016-02-23

2. Population and fertility by age and sex for 195 countries and territories, 1950-2017: a systematic analysis for the Global Burden of Disease Study 2017.

Authors:
Journal: Lancet Date: 2018-11-08 Impact factor: 79.321

3. Mortality, morbidity, and risk factors in China and its provinces, 1990-2017: a systematic analysis for the Global Burden of Disease Study 2017.

Authors: Maigeng Zhou; Haidong Wang; Xinying Zeng; Peng Yin; Jun Zhu; Wanqing Chen; Xiaohong Li; Lijun Wang; Limin Wang; Yunning Liu; Jiangmei Liu; Mei Zhang; Jinlei Qi; Shicheng Yu; Ashkan Afshin; Emmanuela Gakidou; Scott Glenn; Varsha Sarah Krish; Molly Katherine Miller-Petrie; W Cliff Mountjoy-Venning; Erin C Mullany; Sofia Boston Redford; Hongyan Liu; Mohsen Naghavi; Simon I Hay; Linhong Wang; Christopher J L Murray; Xiaofeng Liang
Journal: Lancet Date: 2019-06-24 Impact factor: 79.321

3 in total