| Literature DB >> 26452043 |
Davide Risso1, Luca Taglioli2, Sergio De Iasio3, Paola Gueresi4, Guido Alfani5, Sergio Nelli6, Paolo Rossi7, Giorgio Paoli2, Sergio Tofanelli2.
Abstract
This research is the first empirical attempt to calculate the various components of the hidden bias associated with the sampling strategies routinely-used in human genetics, with special reference to surname-based strategies. We reconstructed surname distributions of 26 Italian communities with different demographic features across the last six centuries (years 1447-2001). The degree of overlapping between "reference founding core" distributions and the distributions obtained from sampling the present day communities by probabilistic and selective methods was quantified under different conditions and models. When taking into account only one individual per surname (low kinship model), the average discrepancy was 59.5%, with a peak of 84% by random sampling. When multiple individuals per surname were considered (high kinship model), the discrepancy decreased by 8-30% at the cost of a larger variance. Criteria aimed at maximizing locally-spread patrilineages and long-term residency appeared to be affected by recent gene flows much more than expected. Selection of the more frequent family names following low kinship criteria proved to be a suitable approach only for historically stable communities. In any other case true random sampling, despite its high variance, did not return more biased estimates than other selective methods. Our results indicate that the sampling of individuals bearing historically documented surnames (founders' method) should be applied, especially when studying the male-specific genome, to prevent an over-stratification of ancient and recent genetic components that heavily biases inferences and statistics.Entities:
Mesh:
Year: 2015 PMID: 26452043 PMCID: PMC4599962 DOI: 10.1371/journal.pone.0140146
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Map of Italy showing the location of the 26 investigated communities.
This Fig is similar, although not identical, to the original image, and is therefore for illustrative purposes only.
Details of the studied communities/municipalities.
N0, number of individuals of the oldest list of surnames; Nt, number of individuals of the present-day list of surnames; altitude is indicated in meters above sea level; relative growth rate calculated as Nt-N0/Nt.
| Community | Region (Province) | Oldest Historical Source | Recent Source | No | Nt | So | St | Altitude | Growth rate |
|---|---|---|---|---|---|---|---|---|---|
| Commezzadura | Trentino-Alto Adige (TN) | Marriage acts 1700 | SEAT 1993 | 297 | 350 | 44 | 166 | 850 | 0.15 |
| Pellizzano | Trentino-Alto Adige (TN) | Marriage acts 1700 | SEAT 1993 | 548 | 324 | 85 | 134 | 925 | -0.69 |
| Rabbi | Trentino-Alto Adige (TN) | Marriage acts 1566 | SEAT 1993 | 292 | 482 | 88 | 103 | 1,095 | 0.39 |
| Vermiglio | Trentino-Alto Adige (TN) | Marriage acts 1714 | SEAT 1993 | 341 | 462 | 28 | 68 | 1,261 | 0.26 |
| Azeglio | Piedmont (TO) | Baptismal acts 1543 | SEAT 1993 | 895 | 423 | 100 | 217 | 260 | -1.12 |
| Ivrea | Piedmont (TO) | Census Paper 1613 | SEAT 1993 | 3,835 | 9,816 | 568 | 4,861 | 253 | 0.61 |
| Moncalieri | Piedmont (TO) | Census Paper 1613 | SEAT 1993 | 6,129 | 20,436 | 776 | 9,382 | 219 | 0.7 |
| Susa | Piedmont (TO) | Census Paper 1613 | SEAT 1993 | 4,447 | 2,283 | 341 | 1,276 | 503 | -0.95 |
| Levanto | Liguria (SP) | Census Paper 1662 | SEAT 1993 | 1,728 | 5,716 | 329 | 1,151 | 3 | 0.7 |
| Nonantola | Emilia-Romagna (MO) | Census Paper 1629 | SEAT 1993 | 3,451 | 3,407 | 181 | 1,092 | 24 | -0.01 |
| Careggine | Tuscany (LU) | Marriage acts 1566 | SEAT 1993 | 243 | 206 | 115 | 70 | 882 | -0.18 |
| Montecarlo | Tuscany (LU) | Baptismal acts 1527 | SEAT 1993 | 3,913 | 1,226 | 283 | 545 | 162 | -2.09 |
| Montefegatesi | Tuscany (LU) | Marriage acts 1600 | ISTAT 1991 | 398 | 270 | 108 | 55 | 842 | -0.47 |
| Pisa | Tuscany (LU) | Baptismal acts 1447 | SEAT 1993 | 17,504 | 35,921 | 1,830 | 10,913 | 4 | 0.51 |
| Roggio | Tuscany (LU) | Marriage acts 1775 | ISTAT 1991 | 115 | 175 | 30 | 41 | 858 | 0.34 |
| San Gimignano | Tuscany (LU) | Marriage acts 1700 | SEAT 1993 | 290 | 2,357 | 73 | 1,013 | 324 | 0.88 |
| Siena | Tuscany (LU) | Census Paper 1767 | SEAT 1993 | 2,941 | 56,956 | 1,373 | 5,626 | 322 | 0.95 |
| Vagli | Tuscany (LU) | Marriage acts 1700 | SEAT 1993 | 553 | 379 | 96 | 100 | 575 | -0.46 |
| Viareggio | Tuscany (LU) | Census Paper 1705 | SEAT 1993 | 290 | 57,514 | 86 | 7,263 | 2 | 0.99 |
| Pontremoli | Tuscany (MS) | Marriage acts 1559 | SEAT 1993 | 249 | 3,400 | 61 | 976 | 236 | 0.93 |
| Cerchio | Abruzzo (AQ) | Census Paper 1700 | SEAT 1993 | 932 | 1,735 | 144 | 146 | 834 | 0.46 |
| Bari | Puglia (BA) | Census Paper 1598 | SEAT 1993 | 8,872 | 111,221 | 1,065 | 12,993 | 5 | 0.92 |
| Bagaladi | Calabria (RC) | Baptismal acts 1657 | SEAT 1993 | 125 | 399 | 31 | 120 | 460 | 0.69 |
| Cannavò | Calabria (RC) | Baptismal acts 1601 | ISTAT 2001 | 994 | 3,935 | 274 | 577 | 147 | 0.75 |
| Cardeto | Calabria (RC) | Baptismal acts 1670 | SEAT 1993 | 924 | 695 | 129 | 129 | 700 | -0.33 |
| Trizzino | Calabria (RC) | Baptismal acts 1706 | ISTAT 2001 | 137 | 104 | 34 | 28 | 551 | -0.32 |
TN, Trento; TO, Turin; SP, La Spezia; MO, Modena; LU, Lucca; MS, Massa and Carrara; AQ, L’Aquila; BA, Bari; RC, Reggio Calabria.
Fig 2Minimum, maximum and mean values of the sampling-dependent bias (SDB) calculated after sampling by random (R), locally spread (LS), first quartile (FQ) and grandparents (GP) strategies, under the low-kinship (A) and the high-kinship (B) models in the 26 investigated communities.
Fig 3Average values and standard deviations of the S/N parameter (A) and isonymy (B) calculated with different sampling strategies in the present-day communities.
S/N, relative number of surnames; ISO, isonymy. Sampling strategies: R, random; FQ, first quartile; LS, locally spread; GP, grandparents; FS, founder surnames.
Detailed list of R2 values and correlation signs (S) with nominal and Bonferroni adjusted P-values for different sampling strategies and models.
Sampling strategies: R, random; FQ, first quartile; LS, locally spread; GP, grandparents; FS, founder surnames. Models: LK, low-kinship; HK, high-kinship.
| Sampling strategy | Model | Covariate | R2 | S | P-value | Adjusted P-value |
|---|---|---|---|---|---|---|
| R | LK | Altitude | 0.42 | - | 0.00 | 0.008 |
| HK | Altitude | 0.58 | - | 0.01 | 0.036 | |
| LS | LK | Altitude | 0.01 | - | 0.98 | 1.000 |
| HK | Altitude | 0.12 | - | 0.74 | 1.000 | |
| GP | LK | Altitude | 0.36 | - | 0.01 | 0.036 |
| HK | Altitude | 0.41 | - | 0.00 | 0.012 | |
| FQ | LK | Altitude | 0.42 | - | 0.00 | 0.012 |
| HK | Altitude | 0.43 | - | 0.00 | 0.008 | |
| R | LK | Present-day N | 0.13 | + | 0.41 | 1.000 |
| HK | Present-day N | 0.16 | + | 0.28 | 1.000 | |
| LS | LK | Present-day N | 0.01 | + | 0.96 | 1.000 |
| HK | Present-day N | 0.01 | + | 0.96 | 1.000 | |
| GP | LK | Present-day N | 0.21 | + | 0.16 | 0.640 |
| HK | Present-day N | 0.23 | + | 0.11 | 0.440 | |
| FQ | LK | Present-day N | 0.11 | + | 0.46 | 1.000 |
| HK | Present-day N | 0.12 | + | 0.42 | 1.000 | |
| R | LK | Foundation year | 0.11 | - | 0.46 | 1.000 |
| HK | Foundation year | 0.12 | - | 0.42 | 1.000 | |
| LS | LK | Foundation year | 0.01 | - | 0.99 | 1.000 |
| HK | Foundation year | 0.01 | - | 0.99 | 1.000 | |
| GP | LK | Foundation year | 0.11 | - | 0.46 | 1.000 |
| HK | Foundation year | 0.11 | - | 0.46 | 1.000 | |
| FQ | LK | Foundation year | 0.02 | - | 0.89 | 1.000 |
| HK | Foundation year | 0.02 | - | 0.89 | 1.000 | |
| R | LK | Growth rate | 0.03 | + | 0.84 | 1.000 |
| HK | Growth rate | 0.11 | + | 0.46 | 1.000 | |
| LS | LK | Growth rate | 0.01 | + | 0.99 | 1.000 |
| HK | Growth rate | 0.01 | + | 0.99 | 1.000 | |
| GP | LK | Growth rate | 0.01 | + | 0.99 | 1.000 |
| HK | Growth rate | 0.12 | + | 0.42 | 1.000 | |
| FQ | LK | Growth rate | 0.01 | + | 0.99 | 1.000 |
| HK | Growth rate | 0.09 | + | 0.54 | 1.000 |