| Literature DB >> 26625947 |
Steven Gazal1,2, Mourad Sahbatou3, Marie-Claude Babron4,5, Emmanuelle Génin6,7, Anne-Louise Leutenegger4,5.
Abstract
The 1000 Genomes Project provides a unique source of whole genome sequencing data for studies of human population genetics and human diseases. The last release of this project includes more than 2,500 sequenced individuals from 26 populations. Although relationships among individuals have been investigated in some of the populations, inbreeding has never been studied. In this article, we estimated the genomic inbreeding coefficient of each individual and found an unexpected high level of inbreeding in 1000 Genomes data: nearly a quarter of the individuals were inbred and around 4% of them had inbreeding coefficients similar or greater than the ones expected for first-cousin offspring. Inbred individuals were found in each of the 26 populations, with some populations showing proportions of inbred individuals above 50%. We also detected 227 previously unreported pairs of close relatives (up to and including first-cousins). Thus, we propose subsets of unrelated and outbred individuals, for use by the scientific community. In addition, because admixed populations are present in the 1000 Genomes Project, we performed simulations to study the robustness of inbreeding coefficient estimates in the presence of admixture. We found that our multi-point approach (FSuite) was quite robust to admixture, unlike single-point methods (PLINK).Entities:
Mesh:
Year: 2015 PMID: 26625947 PMCID: PMC4667178 DOI: 10.1038/srep17453
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Inbreeding detection in TGP populations.
| African Caribbean in Barbados (ACB) | 96 | 4 | 4 (4%) | |||
| African Ancestry in Southwest United States (ASW) | 60 | 1 | 1 (2%) | |||
| Esan in Nigeria (ESN) | 99 | 27 | 27 (27%) | |||
| Gambian in Western Division, The Gambia (GWD) | 113 | 6 | 22 | 28 (25%) | ||
| Luhya in Webuye, Kenya (LWK) | 99 | 9 | 9 (9%) | |||
| Mende in Sierra Leone (MSL) | 85 | 10 | 10 (12%) | |||
| Yoruba in Ibadan, Nigeria (YRI) | 108 | 11 | 11 (10%) | |||
| Utah residents with European ancestry (CEU) | 99 | 1 | 1 (1%) | |||
| Finnish in Finland (FIN) | 99 | 34 | 34 (34%) | |||
| British in England and Scotland (GBR) | 91 | 1 | 15 | 16 (18%) | ||
| Iberian populations in Spain (IBS) | 107 | 27 | 27 (25%) | |||
| Toscani in Italy (TSI) | 107 | 10 | 10 (9%) | |||
| Chinese Dai in Xishuangbanna, China (CDX) | 93 | 1 | 1 | 34 | 36 (39%) | |
| Han Chinese in Bejing, China (CHB) | 103 | 1 | 3 | 4 (4%) | ||
| Southern Han Chinese, China (CHS) | 105 | 2 | 2 (2%) | |||
| Japanese in Tokyo, Japan (JPT) | 104 | 4 | 4 (4%) | |||
| Kinh in Ho Chi Minh City, Vietnam (KHV) | 99 | 8 | 8 (8%) | |||
| Bengali in Bangladesh (BEB) | 86 | 2 | 17 | 19 (22%) | ||
| Gujarati Indian in Houston, Texas (GIH) | 103 | 1 | 40 | 41 (40%) | ||
| Indian Telugu in the United Kingdom (ITU) | 100 | 4 | 6 | 34 | 44 (44%) | |
| Punjabi in Lahore, Pakistan (PJL) | 96 | 9 | 13 | 33 | 55 (57%) | |
| Sri Lankan Tamil in the United Kingdom (STU) | 102 | 10 | 22 | 30 | 62 (61%) | |
| Colombian in Medellin, Colombia (CLM) | 94 | 8 | 42 | 50 (53%) | ||
| Mexican Ancestry in Los Angeles, California (MXL) | 64 | 1 | 2 | 8 | 11 (17%) | |
| Peruvian in Lima, Peru (PEL) | 81 | 1 | 15 | 16 (20%) | ||
| Puerto Rican in Puerto Rico (PUR) | 104 | 1 | 64 | 65 (63%) | ||
1 ASW, 2 ITU and 4 PEL of the 2,504 initial individuals have been removed due to Q-score ≤ 50. AV = avuncular offspring; 2 × 1C = double first-cousin offspring; 1C = first-cousin offspring; 2C = second-cousin offspring.
*These populations should be considered as Admixed African.
Figure 1Accuracy of inbreeding estimators in simulated admixed samples.
The differences between estimated and true f values (Δf) and the genomic proportions of European ancestry (ADM) were calculated on one random individual (1C, 2C and OUT) from 100 sample replicates (total 100 per mating type). Only FSuite estimates with Q >50 were plotted and single-point negative estimates (PLINK and REAP) were set to 0. Four sets of allele frequencies were used for FSuite and PLINK: European (CEU), African (YRI) and Asian (JPT/CHB) reference frequencies, and frequencies estimated on each sample (SAMPLE). REAP used individual allele frequencies. 1C = first-cousin offspring; 2C = second-cousin offspring; OUT = outbred individual.
Figure 2Inbreeding estimation and detection in TGP populations.
Each point represents the f estimate for one individual. Large points represent the ones that are inferred as offspring of first-cousin (1C) or closest relationships, medium open points, the ones that are inferred as offspring of second-cousin offspring (2C), and small points, the ones that are inferred as outbred. Individuals were ordered in each population according to their f values. See Table 1 for the description of the different populations.