| Literature DB >> 24981136 |
Jacobo Pardo-Seco, Federico Martinón-Torres, Antonio Salas1.
Abstract
BACKGROUND: There is a growing interest among geneticists in developing panels of Ancestry Informative Markers (AIMs) aimed at measuring the biogeographical ancestry of individual genomes. The efficiency of these panels is commonly tested empirically by contrasting self-reported ancestry with the ancestry estimated from these panels.Entities:
Mesh:
Year: 2014 PMID: 24981136 PMCID: PMC4101176 DOI: 10.1186/1471-2164-15-543
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Corresponding ancestry estimates in three continental HapMap groups, CEU (Europe), CHB (East Asia), and YRI (Africa) using different SNP sets
| CEU | CHB | YRI | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SNPs | Training set populations | % | SD | 95% CI | Range | % | SD | 95% CI | Range | % | SD | 95% CI | Range | |
|
| 1,440,616 | AFR/ASI/EUR | 100 | 0.1 | 100-100 | 100-100 | 100 | 0 | 100-100 | 100-100 | 100 | 0 | 100-100 | 99.4-100 |
|
| 10,000 | AFR/ASI/EUR | 99.7 | 0.7 | 99.9-100 | 98.5-100 | 100 | 0.2 | 99.9-100 | 98.6-100 | 99.9 | 0.3 | 99.5-99.9 | 96.9-100 |
|
| 1,000 | AFR/ASI/EUR | 97.0 | 3.5 | 98.1-99.4 | 91.0-100 | 98.8 | 2.2 | 97.7-98.9 | 91.7-100 | 98.3 | 2.2 | 96.1-98.0 | 87.0-100 |
|
| 500 | AFR/ASI/EUR | 94.8 | 1.2 | 96.5-98.5 | 86.1-100 | 97.5 | 3.6 | 95.8-97.8 | 86.8-100 | 96.8 | 3.6 | 93.3-96.2 | 80.4-100 |
|
| 446 | AFR/AME/EUR | 92.7 | 3.1 | 91.9-93.6 | 86.6-100 | 96.0 | 3.0 | 95.1-96.8 | 89.5-100 | 99.1 | 1.2 | 98.8-99.4 | 95.8-100 |
|
| 360 (310)2 | CEU/CHB + JPT/YRI | 87.8 | 4.7 | 86.5-89.1 | 73.9-98.1 | 97.0 | 2.5 | 96.3-97.7 | 90.1-100 | 98.7 | 1.7 | 98.2-99.2 | 92.9-100 |
|
| 1763 (162)2 | AFR/AME/EUR/ASI | 87.0 | 7.2 | 85.0-89.0 | 70.2-100 | 93.9 | 5.1 | 92.4-95.3 | 77.6-100 | 96.2 | 3.7 | 95.2-97.2 | 86.6-100 |
|
| 128 | AFR/ASI/EUR/ASI/SAS/AME/MEX/PRI | 87.4 | 6.5 | 85.6-89.2 | 73.4-100 | 90.7 | 5.4 | 89.2-92.2 | 77.9-100 | 97.9 | 2.8 | 97.1-98.7 | 90.6-100 |
|
| 93 | OCE/ASI/AFR/SAM/EUR | 87.5 | 7.4 | 85.4-89.5 | 70.2-100 | 89.0 | 6.1 | 87.3-90.6 | 77.0-100 | 97.7 | 3.2 | 96.8-98.6 | 88.1-100 |
|
| 34 (27)2 | AFR/EUR/ASI | 93.8 | 8.2 | 91.5-96.1 | 67.5-100 | 92.7 | 8.3 | 90.4-95.0 | 73.6-100 | 90.1 | 9.0 | 87.6-92.5 | 65.0-100 |
|
| 24 (23)2 | SAM/EUR/AFR | 90.9 | 9.7 | 88.2-93.5 | 58.4-100 | 89.5 | 12.7 | 86.0-93.0 | 37.5-100 | 92.7 | 8.1 | 90.5-95.0 | 57.7-100 |
|
| 10 | AFR/EUR/ASI/AME | 83.6 | 14.5 | 79.6-87.7 | 48.5-100 | 85.9 | 19.0 | 80.7-91.2 | 30.4-100 | 81.0 | 16.5 | 76.4-85.5 | 33.9-100 |
CEU column shows the percentages (%) of European ancestry in CEU, CHB column shows the percentage of Asian ancestry in CHB, and YRI shows the percentage of African ancestry in YRI. For each population group the table shows also the standard deviations (SD), 95% confidence intervals (95%CI), and ranges (minimum-maximum values). Genome ancestry refers to the ancestry measured using the full set of SNPs in HapMap. Training set populations refer to the population groups used to design the AIM panels. AFR: Africa, ASI: Asia, SAS: South Asia, EUR: Europe, AME: America, MEX: Mexico, SAM: South America, OCE: Oceania, PRI: Puerto Rico.
1Averaged values over all the re-samples.
2Number of SNPs indicated in round brackets are those contained in the HapMap database.
3Halder et al. [37] refer to 170 AIMs; however, their supplementary data file refers to 176 AIMs.
Figure 1PCA plots of YRI, CHB and CEU carried out using 500 re-samples of rSNPs from HapMap taking at random 10,000 (1A), 1,000 (1B) and 500 (1C) SNPs. Only one of the re-samples is highlighted in color; the results for the remaining 499 re-samples are indicated in grey with the aim of illustrating the variability on ancestry estimates associated to the random sub-sets of SNPs. The box-plots (right panels) indicate the European, Asian and African ancestries in CEU, CHB, and YRI, respectively, as obtained in the different re-samples. A statistical summary of these box-plots is given in Table 1.
Figure 2Bar-plots of ancestry memberships inferred for YRI, CHB, and CEU, considering 1,000 and 500 rSNPs sets (each considering one sample taken at random from HapMap) and the different AIM panels.
Figure 3PCA plots obtained for YRI, CHB, and CEU considering 1,000 and 500 rSNPs (one sample each) taken at random from HapMap as well as different AIM panels. The inferences carried out on AA-genomes are shown in grey; note that the variation (size of the grey point cloud) increases as fewer rSNPs or AIMs are considered.
LSBL values for the AIMs considered in the different SNP panels and when considering HapMap populations
| Accumulated LSBL | Average LSBL | |||||
|---|---|---|---|---|---|---|
| Panel ID | AFR | ASI | EUR | AFR | ASI | EUR |
| HapMap | 143264.06 | 91184.56 | 50587.85 | 0.037 | 0.024 | 0.013 |
| GAL | 71.08 | 23.35 | 46.67 | 0.159 | 0.052 | 0.105 |
| ILU | 37.11 | 29.82 | 17.28 | 0.120 | 0.096 | 0.056 |
| HAL | 13.21 | 13.90 | 5.66 | 0.084 | 0.089 | 0.036 |
| KOS | 14.18 | 7.41 | 9.08 | 0.111 | 0.058 | 0.071 |
| NAS | 12.87 | 6.32 | 6.58 | 0.138 | 0.068 | 0.071 |
| PHI | 3.29 | 3.08 | 3.66 | 0.122 | 0.077 | 0.136 |
| COR | 4.96 | 2.41 | 1.74 | 0.216 | 0.105 | 0.076 |
| LAO | 1.31 | 1.30 | 0.56 | 0.131 | 0.130 | 0.056 |
| Present study (595 SNPs) | 98.38 | 98.40 | 98.35 | 0.165 | 0.165 | 0.165 |
The term “average LSBL” refers to the LSBL accumulated and standardized by the number of AIMs in each panel. LSBL in HapMap was calculated using the HapMap database that considers all African, European and Asian populations together.
Figure 4Effect of sample size on the inference of ancestry using different AIM panels. The horizontal bar indicates the genome ancestry as estimated using all the HapMap individuals for each (CEU, CHB, YRI) and it marks therefore the value to which all the ancestry estimates from AIM panels should converge. As the number of individuals increases, the estimates of ancestry using the different panels approach the genome ancestry. Color codes are as follows, red: African ancestry; green: Asian ancestry; and blue: European ancestry.
Figure 5Estimation of ancestry on AA-genomes using two panels of 1,000 and 500 rSNPs from HapMap and the AIM panels. The horizontal bar represents the genomic ancestry of AA-genomes that are assumed to have equal ancestry membership in Africa, East Asia, and Europe (~33% each). Color codes for ancestries are as indicated in legend of Figure 4.
Figure 6Error of the different panels in the estimation of genome ancestry for CEU, CHB, and YRI, measured as standard deviations regarding genome ancestry (inferred using the whole HapMap SNP database) the different AIM panels. Solid circles and lines indicate errors on non-admixed genomes, while triangles (and dashed lines) indicate errors on admixed genomes (AA-genomes).