| Literature DB >> 28643794 |
Yali Xue1, Massimo Mezzavilla1,2, Marc Haber1, Shane McCarthy1, Yuan Chen1, Vagheesh Narasimhan1, Arthur Gilly1, Qasim Ayub1, Vincenza Colonna1,3, Lorraine Southam1,4, Christopher Finan1, Andrea Massaia1,5, Himanshu Chheda6, Priit Palta6,7, Graham Ritchie1,8,9, Jennifer Asimit1, George Dedoussis10, Paolo Gasparini11, Aarno Palotie1,6,12,13,14,15,16, Samuli Ripatti1,6,17, Nicole Soranzo1,18, Daniela Toniolo19, James F Wilson9,20, Richard Durbin1, Chris Tyler-Smith1, Eleftheria Zeggini1.
Abstract
The genetic features of isolated populations can boost power in complex-trait association studies, and an in-depth understanding of how their genetic variation has been shaped by their demographic history can help leverage these advantageous characteristics. Here, we perform a comprehensive investigation using 3,059 newly generated low-depth whole-genome sequences from eight European isolates and two matched general populations, together with published data from the 1000 Genomes Project and UK10K. Sequencing data give deeper and richer insights into population demography and genetic characteristics than genotype-chip data, distinguishing related populations more effectively and allowing their functional variants to be studied more fully. We demonstrate relaxation of purifying selection in the isolates, leading to enrichment of rare and low-frequency functional variants, using novel statistics, DVxy and SVxy. We also develop an isolation-index (Isx) that predicts the overall level of such key genetic characteristics and can thus help guide population choice in future complex-trait association studies.Entities:
Mesh:
Year: 2017 PMID: 28643794 PMCID: PMC5490002 DOI: 10.1038/ncomms15927
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Figure 1General characteristics and demographic history of isolated and matched general populations.
(a) Geographical locations of samples. The base map was plotted in R using the mapdata package and circles were added using Photoshop. (b) PCA using common variants. (c) PCA using low-frequency variants. (d) Sharing of rare variants within and between populations. Upper left triangle: f2 variants; lower right triangle f3–f10 variants. (e) Effective population size (Ne) inferred from IBDNe for UKO and UKG during the past nine KY. (f) The lowest Ne inferred by IBDNe for all populations for the past three KY, plotted as a function of the time at which it occurred.
Summary of variants discovered in this study.
| FIK | 377 | 4x | 4,066,373 | 10.90 | 1,553,076 | 1.20 | 6,025,077 | 0.70 | 190,527 | 70,579 |
| FIG | 1,564 | 6x | 6,548,833 | 11.80 | 1,540,915 | 0.80 | 6,053,704 | 0.70 | na | na |
| GRM | 249 | 4x | 5,129,513 | 7.20 | 1,447,981 | 1.10 | 6,111,923 | 0.80 | 513,272 | 49,884 |
| GRG | 99 | 10–30x | 3,757,110 | na | 1,321,955 | na | 5,842,537 | na | na | na |
| IF1 | 60 | 4–10x | 1,456,881 | 1.30 | 1,420,929 | 1.30 | 5,890,714 | 0.80 | 320,191 | 119,157 |
| IF2 | 45 | 4–10x | 1,063,098 | 1.30 | 1,554,145 | 1.00 | 6,001,568 | 0.80 | 273,694 | 94,496 |
| IF3 | 47 | 4–10x | 961,059 | 1.30 | 1,455,284 | 1.10 | 6,068,304 | 0.80 | 299,603 | 107,281 |
| IF4 | 36 | 4–10x | 1,030,673 | 1.30 | 1,124,789 | 1.10 | 6,001,625 | 0.80 | 308,356 | 122,254 |
| IVB | 222 | 6x | 4,857,767 | 1.60 | 1,396,799 | 0.80 | 6,112,476 | 0.80 | 188,972 | 30,284 |
| UKO | 397 | 4x | 5,963,416 | 11.70 | 1,471,782 | 0.80 | 6,047,383 | 0.80 | 193,300 | 36,512 |
| Total | 3,096 | 12,218,797 | 10.50 | 5,503,179 | 0.70 | 8,301,524 | 0.30 | |||
‘Novel’ variants are those not found in 1000 Genomes Project Phase 3 or UK10K project.
*Variants that are common (minor allele frequency, MAF≥5.6%, alternative allele count ≥4) in an isolated population but not common (MAF<1.4%, alternative allele count ≤1) in its closest general population.
†Variants that are common (MAF≥5.6%, alternative allele count ≥4) in an isolated population but not (MAF<1.4%, alternative allele count ≤1) in any of the general populations.
‡Different variant calling procedure in this population.
Figure 2Isolation index (Isx) and its correlation with other genetic measures.
(a) Information summarized in Isx. (b) Example of the correlation between Isx and other statistics, here DVxy-coding. (c) Summary of the correlations between Isx and other population-genetic statistics. All the correlation coefficients are high and statistically significant.
Figure 3Purifying selection in the isolates and general populations.
(a) Rxy-missense statistic in each isolate, showing no evidence for increased genetic load in the isolates. The mean and s.d. for each Rxy value from 100 bootstraps are shown. (b) DVxy-wg (DVxy-whole genome) statistic in isolates and general populations, stratified by CADD score, showing enrichment of highly functional low-frequency variants. (c) DVxy-coding statistic in isolates and general populations, showing enrichment of low-frequency missense variants in isolates. (d) SVxy-missense statistic in each isolate, showing relaxation of purifying selection in isolates in singletons. The s.e.'s for both DVxy and SVxy were calculated by randomly sampling data from 20 chromosomes 100 times. All of these analyses are based on the minimum-sample-size data set (36 individuals from each population).