| Literature DB >> 21475704 |
Jeff Tayman, Stanley K Smith, Stefan Rayer.
Abstract
Many studies have evaluated the impact of differences in population size and growth rate on population forecast accuracy. Virtually all these studies have been based on aggregate data; that is, they focused on average errors for places with particular size or growth rate characteristics. In this study, we take a different approach by investigating forecast accuracy using regression models based on data for individual places. Using decennial census data from 1900 to 2000 for 2,482 counties in the US, we construct a large number of county population forecasts and calculate forecast errors for 10- and 20-year horizons. Then, we develop and evaluate several alternative functional forms of regression models relating population size and growth rate to forecast accuracy; investigate the impact of adding several other explanatory variables; and estimate the relative contributions of each variable to the discriminatory power of the models. Our results confirm several findings reported in previous studies but uncover several new findings as well. We believe regression models based on data for individual places provide powerful but under-utilized tools for investigating the determinants of population forecast accuracy.Entities:
Year: 2010 PMID: 21475704 PMCID: PMC3061008 DOI: 10.1007/s11113-010-9187-9
Source DB: PubMed Journal: Popul Res Policy Rev ISSN: 0167-5923
County population size and growth rate characteristics, 1900–2000
| 1900 | 1910 | 1920 | 1930 | 1940 | 1950 | 1960 | 1970 | 1980 | 1990 | 2000 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Size | |||||||||||
| Mean | 26,126 | 30,787 | 35,044 | 40,615 | 43,336 | 49,453 | 58,502 | 66,122 | 72,950 | 79,054 | 88,574 |
| Median | 16,930 | 17,975 | 18,462 | 18,570 | 19,285 | 19,269 | 19,236 | 19,454 | 22,651 | 23,376 | 25,936 |
| 10th Percentile | 4,104 | 5,530 | 5,877 | 6,514 | 6,417 | 6,151 | 5,786 | 5,574 | 6,066 | 5,827 | 6,001 |
| 90th Percentile | 44,308 | 49,178 | 56,920 | 65,812 | 72,544 | 83,852 | 99,382 | 115,342 | 136,608 | 148,605 | 167,474 |
| Growth ratea | |||||||||||
| Mean | 44.0 | 7.7 | 20.8 | 5.9 | 4.0 | 5.3 | 5.5 | 15.4 | 3.6 | 10.7 | |
| Median | 8.8 | 3.0 | 2.2 | 4.3 | 0.1 | 0.0 | 2.2 | 11.5 | 1.0 | 8.0 | |
| 10th Percentile | −8.1 | −10.8 | −11.6 | −8.8 | −16.3 | −16.4 | −12.6 | −4.0 | −11.4 | −4.0 | |
| 90th Percentile | 68.4 | 29.8 | 33.0 | 19.5 | 26.4 | 30.3 | 26.2 | 37.0 | 21.1 | 28.7 | |
| % Negative | 29.6 | 40.2 | 42.6 | 30.7 | 49.6 | 50.0 | 43.0 | 17.8 | 46.7 | 22.2 | |
aPercentage change over previous 10 years
Error characteristics of counties, 10-year forecast horizons
| MAPE | MALPE | Sample size | |
|---|---|---|---|
| Size | |||
| <5,000 | 20.9 | −6.0 | 1,469 |
| 5,000–9,999 | 14.4 | −2.5 | 2,966 |
| 10,000–24,999 | 11.3 | −0.9 | 7,486 |
| 25,000–49,999 | 9.5 | −0.5 | 4,135 |
| 50,000–99,999 | 9.3 | 0.3 | 1,948 |
| 100,000+ | 8.7 | 0.8 | 1,852 |
| Growth ratea | |||
| <−10% | 15.9 | −14.7 | 2,799 |
| −10.0 to 0% | 8.9 | −4.0 | 5,159 |
| 0.0–9.9% | 9.0 | 0.1 | 5,654 |
| 10.0–24.9% | 11.5 | 4.0 | 3,807 |
| 25.0–49.9% | 15.6 | 6.8 | 1,704 |
| 50.0+% | 27.4 | 15.4 | 733 |
| Prior-Abs. % error | |||
| <2.0 | 8.5 | −1.4 | 2,499 |
| 2.0–3.9 | 8.7 | −1.3 | 2,433 |
| 4.0–7.9 | 9.2 | −0.9 | 4,300 |
| 8.0–14.9 | 10.5 | −1.1 | 5,142 |
| 15.0–24.9 | 13.6 | −0.6 | 3,251 |
| 25.0+ | 23.2 | −2.1 | 2,231 |
| Prior-Alg. % error | |||
| <−15.0 | 19.4 | 9.1 | 2,646 |
| −15.0 to −8.0 | 11.4 | 2.9 | 2,643 |
| −7.9 to 0.0 | 9.0 | 0.0 | 4,746 |
| 0.0–7.9 | 8.7 | −2.4 | 4,486 |
| 8.0–14.9 | 9.6 | −5.2 | 2,499 |
| 15.0+ | 15.7 | −10.8 | 2,836 |
| Census division | |||
| New England | 6.6 | −0.8 | 456 |
| Mid Atlantic | 7.3 | −0.5 | 1,152 |
| East North Central | 8.6 | −1.1 | 3,432 |
| West North Central | 9.9 | −0.2 | 4,512 |
| South Atlantic | 10.8 | −1.9 | 3,000 |
| East South Central | 11.5 | −1.4 | 2,632 |
| West South Central | 17.3 | −1.0 | 2,872 |
| Mountain | 21.7 | −2.7 | 880 |
| Pacific | 15.6 | −2.9 | 920 |
| Launch year | |||
| 1920 | 13.2 | −0.6 | 2,482 |
| 1930 | 14.1 | 0.8 | 2,482 |
| 1940 | 13.2 | 3.0 | 2,482 |
| 1950 | 9.7 | −2.1 | 2,482 |
| 1960 | 10.6 | −2.4 | 2,482 |
| 1970 | 12.5 | −9.2 | 2,482 |
| 1980 | 10.9 | 8.7 | 2,482 |
| 1990 | 9.0 | −7.3 | 2,482 |
aPercentage change for 10 years prior to launch year
Simple single-variable regression models: unstandardized coefficients and adjusted R 2 values
| Horizon length | ||
|---|---|---|
| 10-year | 20-year | |
| Absolute % errors | ||
| Population size | ||
| Size | −0.003*** | −4.5E−06*** |
| Adjusted | 0.003 | 0.002 |
| Growth rate | ||
| GR-Abs | 0.023*** | 0.026*** |
| Adjusted | 0.031 | 0.017 |
| Algebraic % errors | ||
| Population size | ||
| Size | 2.9E−06*** | 7.8E−06*** |
| Adjusted | 0.001 | 0.003 |
| Growth rate | ||
| GR-Alg | 0.032*** | 0.043*** |
| Adjusted | 0.032 | 0.024 |
*** Significant at 0.001
Complex single-variable regression models: unstandardized coefficients and adjusted R 2 values
| Horizon length | ||
|---|---|---|
| 10-year | 20-year | |
| Absolute % errors | ||
| Population size | ||
| Ln Size | −20.150*** | −30.493*** |
| (Ln Size)2 | 0.857*** | 1.290*** |
| Adjusted | 0.091 | 0.065 |
| Growth rate | ||
| Ln GR-Abs | −0.514*** | 0.673** |
| Ln (GR-Abs)2 | 0.451*** | 1.434*** |
| Ln (GR-Abs)3 | 0.143*** | – |
| Adjusted | 0.153 | 0.126 |
| Algebraic % errors | ||
| Population size | ||
| Ln Size | 1.413*** | 2.774*** |
| Adjusted | 0.010 | 0.011 |
| Growth rate | ||
| GR-Alg | 0.249*** | 0.449*** |
| (GR-Alg)2 | −8.3E−05*** | −1.5E−04*** |
| (GR-Alg)3 | 6.8E−09*** | 1.2E−08*** |
| Adjusted | 0.149 | 0.164 |
*** Significant at 0.001
Fig. 1Prediction of absolute percent errors using population size (Based on the quadratic function of the natural log of population size. For ease of interpretation, population size is expressed as untransformed values.)
Fig. 2Prediction of absolute percent errors using growth rate (Based on the cubic function of the natural log of the absolute value of the growth rate for 10-year horizons, and on the quadratic function of the natural log of the absolute value of the growth rate for 20-year horizons. For ease of interpretation, the horizontal axis is expressed as the untransformed absolute value of the growth rate.)
Fig. 3Prediction of algebraic percent errors using population size (Based on the natural log of population size. For ease of interpretation, population size is expressed as untransformed values.)
Fig. 4Prediction of algebraic percent errors using growth rate (Based on the cubic function of the growth rate.)
Multivariate regression models: unstandardized coefficients and adjusted R 2 values, 10-year horizons (absolute % errors)
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | |
|---|---|---|---|---|---|
| Ln Size | −14.778*** | −9.220*** | −8.764*** | −9.751*** | −9.366*** |
| (Ln Size)2 | 0.582*** | 0.347*** | 0.329*** | 0.375*** | 0.362*** |
| Ln GR-Abs | −0.613*** | −0.400*** | −0.416*** | −0.305** | −0.322*** |
| (Ln GR-Abs)2 | 0.487*** | 0.390*** | 0.357*** | 0.408*** | 0.375*** |
| (Ln GR-Abs)3 | 0.141*** | 0.101*** | 0.098*** | 0.097*** | 0.094*** |
| Prior-Abs | 0.231*** | 0.209*** | 0.225*** | 0.203*** | |
| Prior-Abs2 | −1.3E−04*** | −1.1E−04*** | −1.2E−04*** | −1.1E−04*** | |
| New England | −1.073* | −1.187* | |||
| Mid Atlantic | 0.111 | −0.016 | |||
| East North Central | −0.170 | −0.190 | |||
| West North Central | −1.026*** | −0.993*** | |||
| East South Central | 1.300*** | 1.333*** | |||
| West South Central | 2.811*** | 2.840*** | |||
| Mountain | 4.301*** | 4.309*** | |||
| Pacific | 1.385*** | 1.282*** | |||
| 1920 | 1.568*** | 1.669*** | |||
| 1930 | 2.800*** | 2.866*** | |||
| 1940 | 2.953*** | 2.964*** | |||
| 1950 | −1.249** | −1.197*** | |||
| 1970 | 2.386*** | 2.361*** | |||
| 1980 | −0.343 | −0.273 | |||
| 1990 | −0.362 | −0.429 | |||
| Adjusted | 0.242 | 0.299 | 0.313 | 0.315 | 0.329 |
*** Significant at 0.001
** Significant at 0.01
* Significant at 0.05
Multivariate regression models: unstandardized coefficients and adjusted R 2 values, 20-year horizons (absolute % errors)
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | |
|---|---|---|---|---|---|
| Ln Size | −21.510*** | −16.589*** | −17.239*** | −18.206*** | −19.065*** |
| (Ln Size)2 | 0.823*** | 0.619*** | 0.660*** | 0.705*** | 0.758*** |
| Ln GR-Abs | 0.531** | 0.414* | 0.325 | 0.652*** | 0.560** |
| (Ln GR-Abs)2 | 1.512*** | 1.367*** | 1.284*** | 1.413*** | 1.333*** |
| Prior-Abs | 0.127*** | 0.100*** | 0.122*** | 0.096*** | |
| New England | −2.762* | −2.942** | |||
| Mid Atlantic | −0.774 | −1.116 | |||
| East North Central | −2.128*** | −2.160*** | |||
| West North Central | −3.310*** | −3.155*** | |||
| East South Central | 2.392*** | 2.468*** | |||
| West South Central | 6.475*** | 6.371*** | |||
| Mountain | 4.442*** | 4.228*** | |||
| Pacific | −0.844 | −1.417 | |||
| 1930 | −0.504 | −0.502 | |||
| 1940 | 5.070*** | 4.801*** | |||
| 1950 | −4.562*** | −4.664*** | |||
| 1970 | −4.274*** | −4.573*** | |||
| 1980 | −9.252*** | −9.297*** | |||
| Adjusted | 0.194 | 0.214 | 0.236 | 0.257 | 0.278 |
*** Significant at 0.001
** Significant at 0.01
* Significant at 0.05
Multivariate regression models: unstandardized coefficients and adjusted R 2 values, 10-year horizons (algebraic % errors)
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | |
|---|---|---|---|---|---|
| Ln Size | 0.157 | 0.395*** | 0.589*** | 0.546*** | 0.772*** |
| GR-Alg | 0.247*** | 0.153*** | 0.167*** | 0.138*** | 0.151*** |
| (GR-Alg)2 | −8.3E−05*** | −5.2E−05*** | −5.7E−05*** | −4.7E−05*** | −5.1E−05*** |
| (GR-Alg)3 | 6.8E−09*** | 4.3E−09*** | 4.7E−09*** | 3.9E−09*** | 4.3E−09*** |
| Prior-Alg | −0.265*** | −0.259*** | −0.259*** | −0.253*** | |
| Prior-Alg2 | 1.3E−04*** | 1.3E−04*** | 1.3E−04*** | 1.3E−04*** | |
| New England | 0.274 | 0.093 | |||
| Mid Atlantic | 0.655 | 0.398 | |||
| East North Central | 0.968 | 0.853* | |||
| West North Central | 3.812*** | 3.748*** | |||
| East South Central | 1.486*** | 1.411*** | |||
| West South Central | 1.330*** | 1.403*** | |||
| Mountain | −1.411* | −1.161* | |||
| Pacific | −3.907*** | −3.754*** | |||
| 1920 | 4.190*** | 4.144*** | |||
| 1930 | 2.913*** | 2.868*** | |||
| 1940 | 6.045*** | 6.034*** | |||
| 1950 | 1.820*** | 1.817*** | |||
| 1970 | −6.918*** | −6.928*** | |||
| 1980 | 7.809*** | 7.674*** | |||
| 1990 | −1.993*** | −2.071*** | |||
| Adjusted | 0.149 | 0.221 | 0.232 | 0.289 | 0.299 |
*** Significant at 0.001
* Significant at 0.05
Multivariate regression models: unstandardized coefficients and adjusted R 2 values, 20-year horizons (algebraic % errors)
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | |
|---|---|---|---|---|---|
| Ln Size | 0.367 | 0.365 | 0.728*** | 0.482* | 0.861*** |
| GR-Alg | 0.446*** | 0.428*** | 0.457*** | 0.406*** | 0.434*** |
| (GR-Alg)2 | −1.5E−04*** | −1.4E−04*** | −1.6E−04*** | −1.4E−04*** | −1.5E−04*** |
| (GR-Alg)3 | 1.2E−08*** | 1.2E−08*** | 1.3E−08*** | 1.1E−08*** | 1.2E−08*** |
| Prior-Alg | −0.043*** | −0.041*** | −0.040*** | −0.039*** | |
| New England | −0.121 | −0.225 | |||
| Mid Atlantic | 3.246** | 3.098** | |||
| East North Central | 2.885*** | 2.823*** | |||
| West North Central | 8.882*** | 8.746*** | |||
| East South Central | 6.138*** | 6.053*** | |||
| West South Central | 5.192*** | 5.297*** | |||
| Mountain | −4.623*** | −4.375*** | |||
| Pacific | −8.924*** | −8.582*** | |||
| 1930 | 14.179*** | 14.091*** | |||
| 1940 | 18.382*** | 18.389*** | |||
| 1950 | 7.132*** | 7.180*** | |||
| 1970 | 2.376** | 2.366** | |||
| 1980 | 16.551*** | 16.225*** | |||
| Adjusted | 0.164 | 0.166 | 0.187 | 0.220 | 0.240 |
*** Significant at 0.001
** Significant at 0.01
* Significant at 0.05
Adjusted R 2 and reduction in adjusted R 2 after removing explanatory variables from Model 5
| Absolute percent error | Algebraic percent error | |||
|---|---|---|---|---|
| Horizon length | 10 | 20 | 10 | 20 |
| Adjusted | 0.329 | 0.278 | 0.299 | 0.240 |
| Variable removeda | ||||
| Population size | 0.031 | 0.033 | 0.002 | 0.001 |
| Growth rate | 0.065 | 0.089 | 0.038 | 0.122 |
| Prior error | 0.042 | 0.010 | 0.060 | 0.002 |
| Census division | 0.014 | 0.022 | 0.010 | 0.020 |
| Launch year | 0.016 | 0.042 | 0.068 | 0.053 |
aIncludes all terms for each variable
Multivariate regression models: unstandardized coefficients and adjusted R 2 values, Model 5 (a: absolute % errors and b: algebraic % errors)
| 10-Year horizons | 20-Year horizons | |||
|---|---|---|---|---|
| All | Reduced | All | Reduced | |
|
| ||||
| Ln Size | −9.366*** | −8.543*** | −19.065*** | −19.876*** |
| (Ln Size)2 | 0.362*** | 0.333*** | 0.758*** | 0.817*** |
| Ln GR-Abs | −0.322*** | −0.463*** | 0.560** | −0.218 |
| (Ln GR-Abs)2 | 0.375*** | 0.317*** | 1.333*** | 1.296*** |
| (Ln GR-Abs)3 | 0.094*** | 0.104*** | – | – |
| Prior-Abs | 0.203*** | 0.176*** | 0.096*** | 0.114*** |
| Prior-Abs2 | −1.1E−04*** | 4.8E−04*** | – | – |
| Adjusted | 0.329 | 0.491 | 0.278 | 0.470 |
|
| ||||
| Ln Size | 0.772*** | 0.291*** | 0.861*** | −0.039 |
| GR-Alg | 0.151*** | 0.273*** | 0.434*** | 0.824*** |
| (GR-Alg)2 | −5.1E−05*** | −0.003*** | −1.5E−04*** | −0.008*** |
| (GR-Alg)3 | 4.3E−09*** | 1.6E−05*** | 1.2E−08*** | 2.7E−05*** |
| Prior-Alg | −0.253*** | −0.195*** | −0.039*** | −0.02*** |
| Prior-Alg2 | 1.3E−04*** | 4.7E−05 | – | – |
| Adjusted | 0.299 | 0.600 | 0.240 | 0.517 |
*** Significant at 0.001
** Significant at 0.01
Note Reduced equations are those excluding influential observations
Predicted residual sums of squares and adjusted R 2 values
| Model | 10-Year | |||
|---|---|---|---|---|
| APE | ALPE | |||
| PRESS | Adj. | PRESS | Adj. | |
| Pop. size simple | 2,874,957 | 0.003 | 5,557,136 | 0.001 |
| Growth rate simple | 2,815,647 | 0.031 | 5,441,129 | 0.032 |
| Pop. size complex | 2,623,033 | 0.091 | 5,509,648 | 0.010 |
| Growth rate complex | 2,446,680 | 0.153 | 5,874,646 | 0.149 |
| Model 1 | 2,192,807 | 0.242 | 5,886,284 | 0.149 |
| Model 2 | 2,051,180 | 0.299 | 4,727,909 | 0.221 |
| Model 3 | 2,002,676 | 0.313 | 4,806,092 | 0.232 |
| Model 4 | 2,004,838 | 0.315 | 4,233,964 | 0.289 |
| Model 5 | 1,956,223 | 0.329 | 4,286,873 | 0.299 |
| Model 5 excl. LnSize2 | 1,966,905 | 0.324 | – | – |
| Model 5 excl. GR2 + GR3 | – | – | 4,127,578 | 0.269 |