| Literature DB >> 21829225 |
Bruce Winney1, Abdelhamid Boumertit, Tammy Day, Dan Davison, Chikodi Echeta, Irina Evseeva, Katarzyna Hutnik, Stephen Leslie, Kristin Nicodemus, Ellen C Royrvik, Susan Tonks, Xiaofeng Yang, James Cheshire, Paul Longley, Pablo Mateos, Alexandra Groom, Caroline Relton, D Tim Bishop, Kathryn Black, Emma Northwood, Louise Parkinson, Timothy M Frayling, Anna Steele, Julian R Sampson, Turi King, Ron Dixon, Derek Middleton, Barbara Jennings, Rory Bowden, Peter Donnelly, Walter Bodmer.
Abstract
There is a great deal of interest in a fine-scale population structure in the UK, both as a signature of historical immigration events and because of the effect population structure may have on disease association studies. Although population structure appears to have a minor impact on the current generation of genome-wide association studies, it is likely to have a significant part in the next generation of studies designed to search for rare variants. A powerful way of detecting such structure is to control and document carefully the provenance of the samples involved. In this study, we describe the collection of a cohort of rural UK samples (The People of the British Isles), aimed at providing a well-characterised UK-control population that can be used as a resource by the research community, as well as providing a fine-scale genetic information on the British population. So far, some 4000 samples have been collected, the majority of which fit the criteria of coming from a rural area and having all four grandparents from approximately the same area. Analysis of the first 3865 samples that have been geocoded indicates that 75% have a mean distance between grandparental places of birth of 37.3 km, and that about 70% of grandparental places of birth can be classed as rural. Preliminary genotyping of 1057 samples demonstrates the value of these samples for investigating a fine-scale population structure within the UK, and shows how this can be enhanced by the use of surnames.Entities:
Mesh:
Year: 2011 PMID: 21829225 PMCID: PMC3260910 DOI: 10.1038/ejhg.2011.127
Source DB: PubMed Journal: Eur J Hum Genet ISSN: 1018-4813 Impact factor: 4.246
Basic information on numbers, gender and the age distribution of the total sample and, separately, of the sample used for the pilot genotyping is given in the top part of the table
| M | 1824 | 0.472 | 506 | 0.479 |
| F | 1982 | 0.513 | 497 | 0.470 |
| Unknown | 59 | 0.015 | 54 | 0.051 |
| Total | 3865 | 1057 | ||
| <20 | 8 | 0.002 | 0 | 0.000 |
| 20–29 | 82 | 0.021 | 13 | 0.012 |
| 30–39 | 180 | 0.047 | 33 | 0.031 |
| 40–49 | 462 | 0.120 | 66 | 0.062 |
| 50–59 | 688 | 0.178 | 172 | 0.163 |
| 60–69 | 1161 | 0.300 | 295 | 0.279 |
| 70–79 | 915 | 0.237 | 246 | 0.233 |
| 80–89 | 291 | 0.075 | 96 | 0.091 |
| 90–99 | 21 | 0.005 | 12 | 0.011 |
| >100 | 10 | 0.003 | 2 | 0.002 |
| Unknown | 47 | 0.012 | 122 | 0.115 |
| Total | 3865 | 1057 | ||
| Median (km) | 16.05 | 16.31 | ||
| 25% quartile (km) | 2.96 | 3.72 | ||
| 75% quartile (km) | 44.85 | 48.92 | ||
| | 3646 | 893 | ||
| No. missing | 219 | 65 | ||
| Orkney | 0 | 99 | ||
The lower part of the table gives the median and 25 and 75% quartiles of the mean distance (MD) between grandparental birthplaces for volunteers who gave information for all four grandparents.
99 of the unknown age group in the pilot data are previously collected Orkney samples.[19] These are not included in the overall geocoded data set.
Figure 1Graph of the log (MLQ) of the RD with the highest LQ for each surname (y-axis) against log (surname population size) in the 1881 census (x-axis). There are a number of surnames (circled) with a higher MLQ than might be expected for the surname sample size (Jones, Davies, Evans, Thomas, Hughes, James and Phillips), which are established Welsh surnames. The surnames from Supplementary Figure 1 are also marked.
Allele and haplotype frequency data
| SW | 0.013 | 0.150 | 0.025 | 0.750 | 0.038 | 0.025 | 80 | |||||||||
| CN | 0.053 | 0.140 | 0.018 | 0.684 | 0.105 | 0.000 | 57 | |||||||||
| E | 0.035 | 0.161 | 0.023 | 0.598 | 0.138 | 0.046 | 87 | |||||||||
| N | 0.022 | 0.202 | 0.033 | 0.656 | 0.071 | 0.016 | 183 | |||||||||
| OR | 0.342 | 0.079 | 0.000 | 0.579 | 0.000 | 0.000 | 38 | |||||||||
| | ||||||||||||||||
| SW | 0.169 | 0.307 | 0.156 | 0.055 | 326 | 0.151 | 0.170 | 0.077 | 0.071 | 0.106 | 311 | 0.123 | 0.141 | 0.083 | 0.368 | 326 |
| CN | 0.191 | 0.270 | 0.157 | 0.071 | 267 | 0.139 | 0.147 | 0.053 | 0.090 | 0.109 | 266 | 0.167 | 0.104 | 0.089 | 0.333 | 270 |
| E | 0.172 | 0.304 | 0.175 | 0.052 | 326 | 0.134 | 0.130 | 0.103 | 0.103 | 0.090 | 322 | 0.175 | 0.089 | 0.092 | 0.316 | 326 |
| N | 0.177 | 0.271 | 0.151 | 0.076 | 661 | 0.186 | 0.105 | 0.060 | 0.081 | 0.124 | 651 | 0.131 | 0.122 | 0.087 | 0.366 | 666 |
| OR | 0.183 | 0.291 | 0.091 | 0.080 | 175 | 0.222 | 0.090 | 0.084 | 0.030 | 0.204 | 167 | 0.171 | 0.114 | 0.091 | 0.381 | 176 |
| | | | | |||||||||||||
| SW | 0.106 | 0.156 | 0.240 | 0.100 | 0.065 | 0.109 | 0.122 | 321 | 0.225 | 0.338 | 0.184 | 0.225 | 320 | |||
| CN | 0.105 | 0.165 | 0.173 | 0.102 | 0.094 | 0.109 | 0.132 | 266 | 0.229 | 0.342 | 0.150 | 0.259 | 266 | |||
| E | 0.079 | 0.142 | 0.195 | 0.145 | 0.085 | 0.101 | 0.123 | 318 | 0.280 | 0.341 | 0.137 | 0.213 | 314 | |||
| N | 0.116 | 0.144 | 0.177 | 0.147 | 0.055 | 0.090 | 0.164 | 654 | 0.262 | 0.308 | 0.150 | 0.253 | 652 | |||
| OR | 0.061 | 0.141 | 0.160 | 0.184 | 0.043 | 0.086 | 0.209 | 163 | 0.267 | 0.320 | 0.111 | 0.273 | 172 | |||
| | | | | | | | | | | |||||||
| SW | 0.076 | 0.023 | 0.030 | 0.020 | 0.030 | 0.003 | 304 | |||||||||
| CN | 0.088 | 0.023 | 0.012 | 0.012 | 0.008 | 0.008 | 260 | |||||||||
| E | 0.077 | 0.026 | 0.016 | 0.026 | 0.019 | 0.016 | 310 | |||||||||
| N | 0.062 | 0.019 | 0.021 | 0.015 | 0.011 | 0.011 | 623 | |||||||||
| OR | 0.051 | 0.013 | 0.044 | 0.051 | 0.013 | 0.044 | 158 | |||||||||
| | | | | | | | | | | | ||||||
| SW | 0.906 | 340 | 0.945 | 328 | 0.929 | 328 | ||||||||||
| CN | 0.924 | 264 | 0.909 | 264 | 0.893 | 270 | ||||||||||
| E | 0.908 | 326 | 0.932 | 310 | 0.920 | 326 | ||||||||||
| N | 0.910 | 652 | 0.934 | 664 | 0.914 | 660 | ||||||||||
| OR | 0.887 | 194 | 0.906 | 192 | 0.828 | 192 | ||||||||||
The NRY haplogroups are those that are the most common in Europe, while the HLA alleles (low, allele group, resolution) are those that have a frequency of >7.5% in at least one region. The estimated frequencies of the six most common HLA-A, -B -and DRB1 haplotypes are also shown. Only the major allele frequencies are presented for the MC1R and ABO SNPs. Populations are grouped into regions as defined in the main text. The regions are: SW (Cornwall, Devon and Pembrokeshire), CN (Oxfordshire and the Forest of Dean), E (Sussex, Kent, Norfolk and Lincolnshire), N (Cumbria, Yorkshire and the North East) and OR (Orkney).
Figure 2Distribution of MGP of grandparental birthplaces of the 3646 volunteers for whom there was information for all four grandparents. Dots mark the MPG for individual volunteers. The populations from which samples were taken for the genotyping are marked on the inset map.
Figure 3Percentage of volunteers with all four grandparents classed as rural according to their distance (2, 5 or 10 km) from an urban area (y axis) of a given population size (x-axis). Estimates are made for all the geocoded samples (all samples) and those genotyped (pilot samples).
Proportion of surnames classified as local depending on different exclusion criteria
| Cornwall | 0.550 | 0.583 | 0.767 | 0.467 | 0.483 | 0.533 | 0.417 | 0.433 | 0.467 | 0.267 | 0.217 |
| Cumbria | 0.345 | 0.397 | 0.552 | 0.293 | 0.293 | 0.328 | 0.190 | 0.190 | 0.190 | 0.086 | 0.034 |
| Devon | 0.316 | 0.354 | 0.684 | 0.316 | 0.342 | 0.456 | 0.253 | 0.266 | 0.316 | 0.152 | 0.076 |
| Forest of Dean | 0.164 | 0.299 | 0.478 | 0.149 | 0.209 | 0.239 | 0.090 | 0.134 | 0.149 | 0.119 | 0.045 |
| Kent/Sussex | 0.469 | 0.469 | 0.653 | 0.429 | 0.429 | 0.490 | 0.388 | 0.367 | 0.408 | 0.204 | 0.122 |
| Lincolnshire | 0.367 | 0.433 | 0.667 | 0.367 | 0.400 | 0.567 | 0.267 | 0.267 | 0.333 | 0.133 | 0.067 |
| North East | 0.324 | 0.382 | 0.588 | 0.309 | 0.338 | 0.485 | 0.096 | 0.103 | 0.154 | 0.088 | 0.044 |
| Norfolk | 0.430 | 0.440 | 0.700 | 0.400 | 0.410 | 0.520 | 0.230 | 0.240 | 0.270 | 0.150 | 0.120 |
| Pembrokeshire | 0.436 | 0.487 | 0.590 | 0.231 | 0.256 | 0.359 | 0.103 | 0.103 | 0.128 | 0.051 | 0.051 |
| Oxfordshire | 0.278 | 0.316 | 0.582 | 0.241 | 0.266 | 0.380 | 0.190 | 0.203 | 0.266 | 0.165 | 0.101 |
| Yorkshire | 0.372 | 0.414 | 0.621 | 0.248 | 0.269 | 0.379 | 0.090 | 0.103 | 0.138 | 0.083 | 0.034 |
| All populations | 0.363 | 0.411 | 0.625 | 0.309 | 0.333 | 0.431 | 0.186 | 0.200 | 0.236 | 0.131 | 0.077 |
The two main criteria were a minimum location quotient (LQ) of the district with the highest LQ (MLQ) and maximum distance of the mean grandparental place of birth (MGP) from that district for each sample. When no distance is given, the distance constraint was not used. A number of samples were further excluded because of observed multiple peaks or broad geographic surname distributions (see Supplementary Table 3). These exclusions are incorporated into the proportions here.
Maximum likelihood admixture estimates for the most stringent and the least stringent criteria used to define local and non-local surnames
| CN | West | L | 0.945 | 0.895 | 0.995 |
| CN | West | N | 0.630 | 0.591 | 0.669 |
| OR | West | L | 0.550 | 0.488 | 0.614 |
| OR | West | N | 0.695 | 0.630 | 0.760 |
| CN | West | L | 0.900 | 0.829 | 0.971 |
| CN | West | N | 0.525 | 0.482 | 0.568 |
| OR | West | L | 0.360 | 0.265 | 0.455 |
| OR | West | N | 0.815 | 0.761 | 0.869 |
| OR | Norse | L | 0.375 | 0.331 | 0.419 |
| OR | Norse | N | 0.405 | 0.357 | 0.453 |
| OR | Norse | L | 0.315 | 0.266 | 0.364 |
| OR | Norse | N | 0.420 | 0.375 | 0.465 |
The contributions of the putative ancestral populations (East, West and Norse) to the putative admixed population (Central (CN) or Orkney (OR)) were estimated for either the local surnames (L) alone or only the non-local (N) surnames. For the Orkney anaylsis, all Orcadian samples were compared with either local or non-local stratified PoBI samples.
Most stringent criteria
Least stringent criteria