| Literature DB >> 35238633 |
Robert Stelter1,2, Diego Alburez-Gutierrez3.
Abstract
Crowdsourced online genealogies have an unprecedented potential to shed light on long-run population dynamics, if analyzed properly. We investigate whether the historical mortality dynamics of males in familinx, a popular genealogical dataset, are representative of the general population, or whether they are closer to those of an elite subpopulation in two territories. The first territory is the German Empire, with a low level of genealogical coverage relative to the total population size, while the second territory is The Netherlands, with a higher level of genealogical coverage relative to the population. We find that, for the period around the turn of the 20th century (for which benchmark national life tables are available), mortality is consistently lower and more homogeneous in familinx than in the general population. For that time period, the mortality levels in familinx resemble those of elites in the German Empire, while they are closer to those in national life tables in The Netherlands. For the period before the 19th century, the mortality levels in familinx mirror those of the elites in both territories. We identify the low coverage of the total population and the oversampling of elites in online genealogies as potential explanations for these findings. Emerging digital data may revolutionize our knowledge of historical demographic dynamics, but only if we understand their potential uses and limitations.Entities:
Keywords: bias; big data; lifespan dynamics
Mesh:
Year: 2022 PMID: 35238633 PMCID: PMC8915999 DOI: 10.1073/pnas.2120455119
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Male life expectancy (A and B) and lifespan inequality (C and D) conditional on survival to age 30 y in the German Empire (1500–1901) and in The Netherlands (1600–1900). The mean values and 95% CI, illustrated by ribbons, are derived from 1,000 Monte Carlo simulations. The familinx values come from ref. 2, and the scholars’ values come from ref. 9. For the scholars’ estimates, wLT is a more accurate measure, but woLT is more comparable to the familinx data. The national life tables come from the HMD and the Human Life Table Database.
Fig. 2.Coverage: Yearly number of living individuals reported in familinx as a share of the total population size (A). Higher values indicate that the coverage in familinx is better. Oversampling of scholars: ratio of the share of scholars identified in familinx to the share of scholars in the total population (B; log scale). Values above one indicate a higher proportion of scholars than would be expected by chance. Points represent historical population size estimates taken from refs. 10 and 11, the Statistical Yearbooks of the German Empire, and the HMD (data available at ref. 15).