Literature DB >> 23304912

Population forecasts for Bangladesh, using a Bayesian methodology.

Md Mahsin1, Syed Shahadat Hossain.   

Abstract

Population projection for many developing countries could be quite a challenging task for the demographers mostly due to lack of availability of enough reliable data. The objective of this paper is to present an overview of the existing methods for population forecasting and to propose an alternative based on the Bayesian statistics, combining the formality of inference. The analysis has been made using Markov Chain Monte Carlo (MCMC) technique for Bayesian methodology available with the software WinBUGS. Convergence diagnostic techniques available with the WinBUGS software have been applied to ensure the convergence of the chains necessary for the implementation of MCMC. The Bayesian approach allows for the use of observed data and expert judgements by means of appropriate priors, and a more realistic population forecasts, along with associated uncertainty, has been possible.

Entities:  

Mesh:

Year:  2012        PMID: 23304912      PMCID: PMC3763617     

Source DB:  PubMed          Journal:  J Health Popul Nutr        ISSN: 1606-0997            Impact factor:   2.000


INTRODUCTION

A widely-used method of forecasting the age and sex-specific population for future years, in which the initial population is stratified by age and sex and projections, is generated by application of survival ratios and birth rates, followed by an additive adjustment for net migration. To get this information, the behaviour of the related variables is analyzed based on the past data by statisticians, and then inferences are drawn from the analysis to make forecasts of the desired variable. At present, there exist two major paradigms in statistics, namely conventional (frequentist) and Bayesian statistics for the purpose of data analysis. Use of Bayesian methodology in the field of data analysis is comparatively new and has found massive support in the last two decades from the experts belonging to various disciplines. Probably, the main reason behind the increasing support is its flexibility and generality that allows it to deal with the complex situations. Besides, Bayesian method is typically preferred over classical approach in parameter estimation because of the intractable form of the likelihood function (1). There are a number of methodologies used for population projections. One of the most popular methods is cohort component method which is based on the estimates about the future levels of fertility, mortality, sex composition, migration, and other parameters. Many studies have examined the relative performance of simple mathematical models, extrapolation based on time-series and cohort-component models of population forecasting. Most have found that constant growth mathematical models or standard time-series models of population growth are as least accurate as cohor component models (2-4). The present study is not intended to assess the relative accuracy of various projection models. Rather, it only aims to investigate the usefulness of cohort component method in making the population projection for Bangladesh, using Bayesian approach. Bayesian analysis has been applied in cohort component model for providing a neat and transparent way of estimation. It provides probabilistic point estimates of the parameters, along with the highest posterior density interval (HPD) or Bayesian credible interval. Bayesian credible interval is a measure of uncertainty, and it is based on statistical theory and data on error distributions that provide an explicit estimate of the probability that a given range will contain the future population. This approach develops statistical prediction intervals to accompany population forecasts (5-7). Prediction intervals will provide extremely valuable information to data-users and will improve the quality of decision-making, based on population forecasts.

LITERATURE REVIEW

A cohort component strategy of population projection is based on the logic of a general population-component methodology which examines separately the components of population change, fertility, mortality, and net migration. The cohort-component model of population projection (CCMPP) is perhaps the iconic method in demography (8-16). This classic method forwards, in time, a population defined by age according to a specified life table and set of age-specific fertility rates, taking into account the net migration at each age. A very basic equation can show the whole model: P(t+n)=P(t)+Births−Deaths+Immigrants−Emigrants where, t is the starting point of time; n is the projection interval; P(t) is the population-size at time t; and P(t+n) is the population size at time t+n. If we put immigrants and emigrants together, then we get: where, Net Migrants=Immigrant−Emigrants. A population grows through the addition of births and in-migrants and declines through the subtraction of deaths and out-migrants. The term ‘fertility’ refers to the ability of an individual to give a livebirth (or births). This is equally applicable to a group or an entire population. Age-specific fertility rates are required to project the number of births in future fertility projections, which are made by projecting the course of TFR over time and translating this total fertility rate into age-specific fertility rates. In general, the projection of TFR is divided into assumptions regarding a level at which fertility eventually becomes constant in a country or a region and the path taken from current to eventual levels. Once fertility reaches its eventual level, the population will reach a stable age-structure and constant growth rate assuming that mortality and migration rates are also fixed. If the eventual fertility level is at replacement level and net migration is zero, the growth rate will eventually be zero. Both projected pace of fertility decline and the assumed eventual fertility level are important for determining trends in population-size and age-structure. The lower the assumed eventual fertility level, the more important the pace of fertility decline becomes to projected population-size (17). Births in cohort component models are typically projected by applying projected age-specific birth rates to projections of the female population by age. In this approach, the size and age composition of the female population of childbearing ages have a major impact on the projected number of births. Since most mothers for the first 25 years of the projection period are already alive at the time the projection is made, the size and age composition of the female population are the most predictable elements in short-term fertility projections. Time-series techniques have been used for projecting births or birth rates. Several authors have applied time-series methods by themselves, using autoregressive integrated moving average (ARIMA) methods to forecast total births (18-20). While these efforts yielded some insights into the use of time-series methods on fertility, the forecasts ignored the advantage of using cohort component methods (21). This omission was partially remedied by Lee (22-23) who applied time-series methods to TFR, the sum of all age-specific rates that occur in a given year. In our study, we have applied the Gompertz model, using Bayesian methodology to TFR. The representation of mortality data via a parametric model has attracted the attention of actuaries, demographers, and statisticians for over a century. One of the most common models is that of logistic curve (13). In this paper, we adopt a Bayesian analysis to this curve, using MCMC technique to produce the posterior summaries required. For other Bayesian work relating to mortality smoothing and life-table construction (24-25), Carlin (26) used MCMC methods but not in a parametric curve modelling context.

MATERIALS AND METHODS

Table 1 provides TFR in Bangladesh from 1991 to 2001, which have been used for a fertility model fit to making future fertility projections. Using these data and the Gompertz growth model, a WinBUGS program has been developed to make a Bayesian analysis of the data and to provide projections of the TFR of Bangladesh. In this paper, we follow the time-series tradition in developing a method to forecast TFR and then convert it to the age-specific fertility rates on the basis of base-year age-specific fertility rates. Multiplying these forecasts by forecasts of the size of the age-specific female population would then yield fertility forecasts derived from both time-series and demographic cohort component traditions. In this way, the advantages of the demographic tradition in taking account of the predictability of the size and age composition of the female population can be combined with the more statistically-rigorous time-series techniques of modelling the short-term variability of the age-specific fertility rates.
Table 1.

Total fertility rate in Bangladesh from 1991 to 2001

Year (ti)19911992199319941995199619971998199920002001
TFR (Yi)4.244.183.843.583.453.413.102.982.642.592.56

Source: Statistical Pocket Book; 2001-2007, Bangladesh Bureau of Statistics (BBS)

Total fertility rate in Bangladesh from 1991 to 2001 Source: Statistical Pocket Book; 2001-2007, Bangladesh Bureau of Statistics (BBS) Let Y to denote TFR in Bangladesh in the year t (i=1, 2, …, 11) where i refers to successive censuses starting from 1991, for which i=1 and the data are given in Table 1. The most famous growth model is that of Gompertz (27) and is used for TFR where TFR Y in the year t has been assumed to follow normal distribution with respective means h and common precision τ. Non-informative priors have been assigned to all the parameters of the model. The nonlinear regression model for TFR is described as: where h is the deterministic part, and e is the disturbance part; assuming the disturbance to be e (0, τ), where τ is the precision (=1/variance), the fertility model and the non-informative priors might be defined as: where d is the lower asymptote, c is the upper asymptote, b is the rate at which the fertility increases, and a is the parameter that determines the shape of the Gompertz curve. For Bayesian analysis, we need to provide prior distributions to all the parameters a, b, c, d and τ. A massive discussion on the choice of priors is also available in the BUGS manual (28). Mortality projections are based on projecting future life-expectancy at birth for males and females, defined as the average lifespan of a child born today if current age-specific mortality levels were held fixed in the future. In developing countries where mortality remains high, future life-expectancy will be determined by the effciency of local health services, the spread of traditional (e.g. malaria) and new (e.g. AIDS) diseases, and the general standards of living and education. In this paper, we avoid the new epidemics (AIDS). The life-expectancy at birth (average number of years lived by a newborn baby if he/she follows the current age-specific mortality patterns) is projected on the basis of the past experience of increase in the life-expectancy at birth. A logistic curve has been fitted using trends in life-expectancy at birth, and it assumes that increase in life-expectancy at birth follows an S-shaped curve. The logic behind using logistic curve is that when the life-expectancy at birth is very low, the increase is expected to be slow due to poor health facilities. Once the health facilities are provided and with improvement in socioeconomic conditions, the life-expectancy increases at a faster rate. At the higher level of life-expectancy, the rate of increase is slow, and it would stabilize at the biological maximum. To project the population from one year to the next, survival rates by age and sex are needed and, to obtain future survival rates, future life tables may be constructed. Model life tables developed by United Nations (29), Coale and Demeny (30), Regional life tables, and South Asian model life tables, whichever is applicable for Bangladesh, should be used. In this study, South Asian model life table has been used. Let Q be the life-expectancy at birth for males and females of Bangladesh in the year t (i=1, 2, …, 21) where i represents time and j sex. The data were collected from office of the Bangladesh Bureau of Statistics [Sample Vital Registration System (SVRS): 2002, 2003, 2005-06, BBS], and a logistic growth model is used. In this model, the life-expectancy at birth Q in the year t has been assumed to follow normal distribution with respective means p and common precision τ. Non-informative priors have been assigned to all the parameters of the model. The non-linear regression model for the population growth is described as: where p is the deterministic part, and ϵ is the random error part; assuming the error to be ϵ~N (0, τ) where τ is precision, the mortality model and the non-informative priors are: where q is the upper asymptote, q is the lower asymptote, q and q are the other parameters that define the shape of the logistic curve, and e is the base of the natural logarithm. Future international migration is more difficult to project than fertility or mortality. Migration can be volatile since short-term changes in economic, social, or political factors often play an important role. In addition, projections are generally based on past trends and current policies since no single, compelling theory of migration exists; however, data on historical migration are sparse for Bangladesh. In this work, we assumed that the population is closed, i.e. no migration takes place, or even if it does, net effect is zero. As for the sex ratio at births which divide the future number of newborns into male and female, the female to male ratio is set at 100:105 based on the results of the last five years, and it remains consistent from 2001 onward.

Diagnostics

Bayesian approach faces serious computational difficulties due to likely involvement of complicated mathematical expressions in the posterior distributions. Many of these have been suitably addressed with greater ease, using MCMC methods. These methods enable us to carry out analysis on a wide range of Bayesian statistical models. More details with examples of the MCMC implementation in Bayesian inference can be found elsewhere (31-35). As an iterative tool, the MCMC methods are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution (33). MCMC tool has been used in the WinBUGS to obtain the posterior distribution of the unknown parameters in the model. In the process, we need to run a number of chains for each parameter for a long time. When the chains have run sufficiently large number of iterations and have reached the stationary distribution, the samples obtained by further running of the chains are supposed to be drawn randomly from the posterior distribution of the parameter. WinBUGS provides a number of inbuilt diagnostics to assess the convergence of chains. For a more formal approach to convergence diagnosis, the software also provides an implementation of the techniques described in Brooks and Gelman (34), and a facility for outputting monitored samples in a format that is compatible with the CODA software (36). In practice, WinBUGS allows multiple chains for each parameter to run simultaneously. Running multiple chains is a way to check the convergence of MCMC simulations. Two chains have been set in the model of this problem. When the diffierent chains do not provide sufficient mixing of chains even after a long run, it will be an evidence of lack of convergence of the chains. Once we are convinced that chains have been converged through the diagnostics, we will need to run the simulation for a further number of iterations to obtain samples that can be used for posterior inference. The more samples we save, the more accurate will be our posterior estimates. Once we have run enough updates and are satisfied with the history of the chains, we discard the earlier samples. We obtain the summary statistics only from the samples generated afterwards.

RESULTS

The summary statistics of the estimated parameters of the fertility model after 10,000 initial updates were discarded and 80,000 updates were run after the initial burn-in is presented in Table 2. During these updates, none of the diagnostics indicated any symptom of non-convergence of the chains. The number of iterations required to run after the convergence of the chains is assessed on the basis of Monte Carlo error (MC error) for each parameter. MC error is an estimate of the difference between the mean of the sampled values (which we are using as our estimate of the posterior mean for each parameter) and the true posterior mean.
Table 2.

Summary statistics of the node of fertility model

NodeMeanSDMC errorHPD region
2.50%Median97.50%
a3.6252.0950.0831-1.133.97.496
b0.18780.071140.002470.083020.17690.3508
c-3.4881.230.04591-6.537-3.226-1.857
d5.0580.72750.028374.1684.8716.952
σ0.10110.030133.61E-040.061310.095130.1768
Summary statistics of the node of fertility model It has been suggested by the WinBUGS manual as a rule of thumb that the simulation should be run until the MC error for each parameter of interest is less than about 5% of the sample standard deviation, and this was followed in our analysis. From Table 2, it is obvious that MC errors for each parameter were less than 5% of the sample standard deviation. Fitted, projected and HPD region of the estimates on Gompertz model Figure 1 illustrates a graphical presentation of the fitting of the Gompertz model. The graph shows that the model provides a close fit (the closeness of the smoothed line representing the estimated values and the dots showing the observed values) to the observed data. Dotted blue lines provide 95% HPD (highest posterior density) region.
Figure 1.

Fitted, projected and HPD region of the estimates on Gompertz model

Table 3 presents the summary statistics of estimated parameters of the mortality model on life-expectancy at birth for both males and females after discarding 10,000 initial updates and 70,000 updates were run after the initial burn-in. During these updates, none of the diagnostics indicated a symptom of non-convergence of the chains. While running our model with the WinBUGS, we have monitored five nodes q and τ.
Table 3.

Summary statistics of the node for life-expectancy at birth for both males and females

NodeMeanSDMC errorHPD region
0.025Median0.975
q1Male16.354.5790.17059.58915.6527.05
Female18.214.4420.173511.5917.5629.1
q2Male4.8730.91170.037813.4594.7257.185
Female5.3080.60380.025834.275.2526.593
q3Male0.25610.067990.0028480.15830.24240.4299
Female0.2680.04710.002040.18870.26220.372
q4Male54.80.50470.0192653.6854.8555.61
Female54.460.31510.0117253.7754.4855.0
σMale0.60120.11360.0016980.42790.58490.8674
Female0.50630.093710.001260.36170.49340.7248
Summary statistics of the node for life-expectancy at birth for both males and females The graphical presentation of the models fitted and forecasted to both males and females are depicted in Figure 2 and 3. The graphs show that the model provides a close fit (the closeness of the smoothed line representing the estimated values and the dots showing the observed values) to the observed data. Dotted blue lines provide 95% HPD (highest posterior density) region. The graph of the projections approaches to S-shape, indicating the stabilization of the life-expectancy at birth for males as well as females.
Figure 2.

Fitted, projected and HPD region of the estimates under logistic model (male)

Figure 3.

Fitted, projected and HPD region of the estimates under logistic model (Female)

Fitted, projected and HPD region of the estimates under logistic model (male) Fitted, projected and HPD region of the estimates under logistic model (Female)

DISCUSSION

The final calculations of cohort component method combine the results from the mortality, migration, and fertility modules. On the basis of the future forecasts of population growth components, the forecasted population of Bangladesh from 2006 to 2051 has been presented in Appendix. The present study was an attempt to show the application and suitability of the MCMC tool in Bayesian data analysis for fitting population data and making projection of the future population, using cohort component model. The use of Bayesian approach in fitting the components of growth models allows for further extensions over classical estimation methods, leading to a more realistic forecasts and associated uncertainty measures. The cohort component population projection method follows the process of demographic change and is viewed as a more reliable projection method than those that primarily rely on census data or information that reflect population change. In this paper, we had been presenting the basics of the implementation of the Bayesian data analysis with an illustration of the population projection. We have not performed the sensitivity analysis taking different prior distributions mainly because the selected priors were non-informative. These priors did not provide substantial information to the posterior distribution. However, they were necessary for the implementation of the Bayesian data analysis.

Limitations

In this study, we are unable to provide future forecasts for the component of migration because of sparse data for Bangladesh. To overcome this problem, we have used a strong assumption, and this is the major drawback of our study. Apart from this shortcoming, the total fertility rate has declined to replacement level in 2010 and afterwards, which is unrealistic for Bangladesh but it is evident from Figure 2 and 3 that the mortality component has fitted very well. In both fertility and mortality models, we have applied non-informative priors, and it is also a limitation of this study. We hope to further explore these areas in future, using Bayesian methods motivated by the augments provided throughout this paper.

Conclusions

Utilizing Bayesian methods to the growth components, a more realistic summary in population forecasts has been produced because it allows formal incorporation of expert judgement embodied in priors and, hence, alter the forecasted population characteristics and their levels of uncertainty. In this paper, we have applied non-informative priors to fertility and mortality models and, thus, a large level of uncertainty in the forecasted population is resulted. This level of uncertainty could be reduced through the inclusion of informative priors. Moreover, informative priors based purely on expert opinions regarding the future of population growth rates could have been included. Such prior information would result in further reductions in the estimated uncertainty due to added information in the parameter estimation and model-choice procedures. Age and sex-structure of the projected population (in thousands), 2006–2051
Appendix.

Age and sex-structure of the projected population (in thousands), 2006–2051

Age (years)Sex2001 (base)2006201120162021202620312036204120462051
All agesPersons123,851133,436143,515154,212164,899174,494182,384188,577193,432196,933198,964
Males63,89568,69973,70078,95784,26788,94292,73995,68797,98999,671100,682
Females59,95664,73769,81575,25580,63285,55289,64592,89095,44397,26298,282
0-5Males8,3626,7116,7227,1587,4087,1736,7176,3866,2726,2106,099
Females7,7246,3266,3616,8057,0086,8176,3836,0685,9605,9025,796
>5-10Males8,8228,1896,6236,6467,0887,3357,1026,6516,3236,2106,149
Females7,9567,5336,2346,2906,7286,9406,7506,3226,0095,9025,844
>10-15Males8,4218,7798,1636,6056,6297,0707,3187,0846,6346,3076,194
Females7,4327,9147,5106,2206,2766,7156,9266,7366,3085,9975,890
>15-20Males6,2928,3918,7598,1456,5936,6177,0577,3047,0726,6226,296
Females5,6727,4047,8977,4986,2106,2676,7056,9166,7276,2995,989
>20-25Males4,8596,2658,3688,7378,1276,5786,6027,0417,2877,0566,608
Females6,0575,6447,3847,8807,4826,1996,2566,6936,9056,7156,288
>25-30Males4,8954,8346,2438,3408,7118,1036,5596,5837,0207,2667,035
Females5,8656,0235,6267,3667,8617,4666,1856,2426,6796,8886,700
>30-35Males4,3134,8634,8126,2188,3108,6798,0746,5356,5596,9957,239
Females4,4365,8255,9995,6097,3437,8397,4456,1686,2246,6606,869
>35-40Males4,2044,2764,8354,7866,1888,2698,6378,0346,5036,5276,961
Females3,7954,3975,7945,9735,5857,3157,8097,4166,1456,2016,634
>40-45Males3,4264,1494,2354,7934,7496,1408,2068,5707,9736,4536,477
Females2,7743,7494,3625,7565,9355,5527,2737,7647,3736,1096,165
>45-50Males2,6103,3564,0854,1754,7314,6886,0608,0998,4597,8696,369
Females1,9912,7273,7054,3185,6985,8805,5027,2067,6927,3066,053
>50-55Males2,1752,5213,2663,9844,0794,6224,5805,9217,9138,2657,688
Females1,8261,9372,6733,6424,2445,6085,7875,4157,0937,5717,191
>55-60Males1,3092,0552,4083,1283,8253,9164,4374,3975,6857,5977,935
Females1,0471,7461,8722,5953,5344,1285,4555,6295,2676,8987,364
>60-65Males1,5291,1941,9022,2382,9173,5673,6524,1384,1005,3017,084
Females1,2999711,6461,7772,4643,3663,9315,1955,3605,0156,569
>65-70Males8141,3201,0521,6871,9952,6013,1803,2573,6903,6564,727
Females6291,1488801,5051,6252,2633,0933,6124,7734,9264,609
>70-75Males9266501,0868721,4091,6662,1732,6572,7203,0823,054
Females6995159747571,2961,4091,9632,6823,1334,1404,272
>75-80Males3586634848166621,0701,2661,6502,0182,0662,341
Females2585113967645931,0281,1181,5572,1282,4863,285
>80Males5804836576298468481,1191,3801,7612,1892,426
 Females4963675025007507601,0641,2691,6672,2472,765
  11 in total

1.  Forecasting U.S. population totals with the Box-Jenkins approach.

Authors:  P Pflaumer
Journal:  Int J Forecast       Date:  1992-11

2.  Evaluating the forecast accuracy and bias of alternative population projections for states.

Authors:  S K Smith; T Sincich
Journal:  Int J Forecast       Date:  1992-11

3.  Stochastic population forecasts for the United States: beyond high, medium, and low.

Authors:  R D Lee; S Tuljapurkar
Journal:  J Am Stat Assoc       Date:  1994-12       Impact factor: 5.033

4.  A survey of Census Bureau population projection methods.

Authors:  J F Long; D B Mcmillen
Journal:  Clim Change       Date:  1987       Impact factor: 4.743

5.  Forecasting births in post-transition population: stochastic renewal with serially correlated fertility.

Authors:  R D Lee
Journal:  J Am Stat Assoc       Date:  1974-09       Impact factor: 5.033

6.  Modeling demographic relationships: an analysis of forecast functions for Australian births.

Authors:  J Macdonald
Journal:  J Am Stat Assoc       Date:  1981-12       Impact factor: 5.033

7.  On the use of matrices in certain population mathematics.

Authors:  P H LESLIE
Journal:  Biometrika       Date:  1945-11       Impact factor: 2.445

8.  On the Rate of Growth of the Population of the United States since 1790 and Its Mathematical Representation.

Authors:  R Pearl; L J Reed
Journal:  Proc Natl Acad Sci U S A       Date:  1920-06       Impact factor: 11.205

9.  Population forecasts and confidence intervals for Sweden: a comparison of model-based and empirical approaches.

Authors:  J E Cohen
Journal:  Demography       Date:  1986-02

Review 10.  Deciphering death: a commentary on Gompertz (1825) 'On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies'.

Authors:  Thomas B L Kirkwood
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2015-04-19       Impact factor: 6.237

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.