| Literature DB >> 34440458 |
María de la Puente1, Jorge Ruiz-Ramírez1, Adrián Ambroa-Conde1, Catarina Xavier2, Jacobo Pardo-Seco3, Jose Álvarez-Dios4, Ana Freire-Aradas1, Ana Mosquera-Miguel1, Theresa E Gross5,6, Elaine Y Y Cheung5, Wojciech Branicki7, Michael Nothnagel8,9, Walther Parson2,10, Peter M Schneider5, Manfred Kayser11, Ángel Carracedo1,12, Maria Victoria Lareu1, Christopher Phillips1.
Abstract
We detail the development of the ancestry informative single nucleotide polymorphisms (SNPs) panel forming part of the VISAGE Basic Tool (BT), which combines 41 appearance predictive SNPs and 112 ancestry predictive SNPs (three SNPs shared between sets) in one massively parallel sequencing (MPS) multiplex, whereas blood-based age analysis using methylation markers is run in a parallel MPS analysis pipeline. The selection of SNPs for the BT ancestry panel focused on established forensic markers that already have a proven track record of good sequencing performance in MPS, and the overall SNP multiplex scale closely matched that of existing forensic MPS assays. SNPs were chosen to differentiate individuals from the five main continental population groups of Africa, Europe, East Asia, America, and Oceania, extended to include differentiation of individuals from South Asia. From analysis of 1000 Genomes and HGDP-CEPH samples from these six population groups, the BT ancestry panel was shown to have no classification error using the Bayes likelihood calculators of the Snipper online analysis portal. The differentiation power of the component ancestry SNPs of BT was balanced as far as possible to avoid bias in the estimation of co-ancestry proportions in individuals with admixed backgrounds. The balancing process led to very similar cumulative population-specific divergence values for Africa, Europe, America, and Oceania, with East Asia being slightly below average, and South Asia an outlier from the other groups. Comparisons were made of the African, European, and Native American estimated co-ancestry proportions in the six admixed 1000 Genomes populations, using the BT ancestry panel SNPs and 572,000 Affymetrix Human Origins array SNPs. Very similar co-ancestry proportions were observed down to a minimum value of 10%, below which, low-level co-ancestry was not always reliably detected by BT SNPs. The Snipper analysis portal provides a comprehensive population dataset for the BT ancestry panel SNPs, comprising a 520-sample standardised reference dataset; 3445 additional samples from 1000 Genomes, HGDP-CEPH, Simons Foundation and Estonian Biocentre genome diversity projects; and 167 samples of six populations from in-house genotyping of individuals from Middle East, North and East African regions complementing those of the sampling regimes of the other diversity projects.Entities:
Keywords: 1000 Genomes; Human Origins SNP array; SNPs; ancestry informative markers; bio-geographical ancestry; massively parallel sequencing
Mesh:
Year: 2021 PMID: 34440458 PMCID: PMC8391248 DOI: 10.3390/genes12081284
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Commonality between SNPs selected for the BT ancestry panel and established forensic ancestry panels: Kiddlab 56; Thermo Fisher Precision ID Ancestry Panel (PIAP); Euroforgen Global AIMs (gAIMs); the LACE panel; the NAME panel; Eurasiaplex; and Shriver et al. US admixture mapping panel [24]. The two markers lower right are BT appearance SNPs previously selected as AIMs.
| No. | Pop. | SNP | KK/PIAP | gAIMs | LACE | Other | No. | Pop. | SNP | KK/PIAP | gAIMs | LACE | Other | No. | Pop. | SNP | KK/PIAP | gAIMs | LACE | Other |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | AFR | rs10497191 | Kiddlab | - | LACE | 1 | AME | rs10012227 | - | gAIMs | - | 1 | Eurasia | rs1495085 | - | - | - | NAME | ||
| 2 | AFR | rs1197062 | - | gAIMs | LACE | 2 | AME | rs10483251 | - | gAIMs | LACE | 2 | Eurasia | rs1757928 | - | - | - | NAME | ||
| 3 | AFR | rs1369290 | - | gAIMs | - | 3 | AME | rs12130799 | PIAP | - | - | 3 | Eurasia | rs2337024 | - | - | - | NAME | ||
| 4 | AFR | rs2789823 | - | gAIMs | - | 4 | AME | rs12498138 | Kiddlab | gAIMs | - | 4 | Eurasia | rs6989963 | - | - | - | NAME | ||
| 5 | AFR | rs2814778 | Kiddlab | gAIMs | - | 5 | AME | rs12629908 | PIAP | - | - | 5 | Eurasia | rs6990312 | Kiddlab | - | - | |||
| 6 | AFR | rs310644 | Kiddlab | gAIMs | - | 6 | AME | rs1452501 | - | gAIMs | LACE | 6 | Eurasia | rs7148809 | - | - | - | NAME | ||
| 7 | AFR | rs4737753 | - | - | - | NAME | 7 | AME | rs1557553 | - | gAIMs | LACE | 7 | Eurasia | rs12203115 | - | - | - | NAME | |
| 1 | EUR | rs11778591 | - | gAIMs | LACE | 8 | AME | rs17130385 | - | gAIMs | LACE | 8 | Eurasia | rs2227203 | - | - | - | NAME | ||
| 2 | EUR | rs12142199 | - | gAIMs | - | 9 | AME | rs17359176 | - | gAIMs | LACE | 9 | Eurasia | rs39897 | - | - | - | Eurasiaplex | ||
| 3 | EUR | rs12913832 | Kiddlab | gAIMs | - | 10 | AME | rs174570 | Kiddlab | gAIMs | LACE | 10 | Eurasia | rs4308478 | - | - | - | NAME | ||
| 4 | EUR | rs1426654 | Kiddlab | gAIMs | - | 11 | AME | rs2302013 | - | gAIMs | - | 11 | Eurasia | rs7570971 | - | - | - | NAME | ||
| 5 | EUR | rs16891982 | Kiddlab | gAIMs | - | 12 | AME | rs2471552 | - | gAIMs | - | 12 | Eurasia | rs984038 | - | - | - | NAME | ||
| 6 | EUR | rs2715883 | - | gAIMs | - | 13 | AME | rs3737576 | Kiddlab | - | - | 1 | SAS | rs1040934 | - | - | - | Shriver | ||
| 7 | EUR | rs3759171 | - | gAIMs | LACE | 14 | AME | rs4792928 | - | gAIMs | - | 2 | SAS | rs1063677 | - | - | - | Shriver | ||
| 8 | EUR | rs705308 | PIAP | - | - | 15 | AME | rs5757362 | - | - | LACE | 3 | SAS | rs10764919 | - | - | LACE | |||
| 9 | EUR | rs7084970 | - | gAIMs | - | 16 | AME | rs8137373 | - | gAIMs | - | 4 | SAS | rs10962599 | - | - | - | Eurasiaplex | ||
| 10 | EUR | rs7531501 | - | gAIMs | - | 17 | AME | rs870347 | Kiddlab | - | - | 5 | SAS | rs13267318 | - | - | - | Shriver | ||
| 11 | EUR | rs8072587 | - | gAIMs | - | 1 | EAS | rs10079352 | - | gAIMs | LACE | 6 | SAS | rs13280988 | - | - | LACE | |||
| 12 | EUR | rs820371 | - | gAIMs | LACE | 2 | EAS | rs1229984 | Kiddlab | gAIMs | - | 7 | SAS | rs17625895 | - | - | - | Eurasiaplex | ||
| 13 | EUR | rs862500 | - | gAIMs | LACE | 3 | EAS | rs12594144 | - | gAIMs | - | 8 | SAS | rs1796048 | - | - | - | Shriver | ||
| 14 | EUR | rs917115 | Kiddlab | gAIMs | - | 4 | EAS | rs1371048 | - | gAIMs | - | 9 | SAS | rs1924381 | - | gAIMs | LACE | |||
| 15 | EUR | rs9522149 | Kiddlab | gAIMs | - | 5 | EAS | rs17822931 | - | gAIMs | - | 10 | SAS | rs2026999 | - | - | - | Shriver | ||
| 1 | OCE | rs10149275 | - | gAIMs | - | 6 | EAS | rs1834619 | Kiddlab | gAIMs | - | 11 | SAS | rs2196051 | Kiddlab | - | - | Eurasiaplex | ||
| 2 | OCE | rs16830500 | - | gAIMs | - | 7 | EAS | rs2180052 | - | gAIMs | - | 12 | SAS | rs2238151 | Kiddlab | - | - | |||
| 3 | OCE | rs2139931 | - | gAIMs | - | 8 | EAS | rs3827760 | Kiddlab | gAIMs | - | 13 | SAS | rs2269793 | PIAP | - | - | |||
| 4 | OCE | rs2274636 | - | gAIMs | - | 9 | EAS | rs434504 | - | gAIMs | - | 14 | SAS | rs2472304 | - | - | - | Eurasiaplex | ||
| 5 | OCE | rs26951 | - | gAIMs | - | 10 | EAS | rs459920 | Kiddlab | - | - | 15 | SAS | rs2503770 | - | gAIMs | LACE | |||
| 6 | OCE | rs3751050 | - | gAIMs | - | 11 | EAS | rs4657449 | - | gAIMs | - | 16 | SAS | rs26247 | - | - | - | Shriver | ||
| 7 | OCE | rs3804030 | - | gAIMs | - | 12 | EAS | rs4781011 | PIAP | - | - | 17 | SAS | rs3844336 | - | - | - | Shriver | ||
| 8 | OCE | rs4391951 | - | gAIMs | - | 13 | EAS | rs4918664 | Kiddlab | gAIMs | - | 18 | SAS | rs7080350 | - | - | LACE | |||
| 9 | OCE | rs4959270 | - | x | - | Pacifiplex * | 14 | EAS | rs4935501 | - | gAIMs | - | 19 | SAS | rs7568054 | - | - | LACE | ||
| 10 | OCE | rs6054465 | - | gAIMs | - | 15 | EAS | rs7226659 | Kiddlab | - | - | 20 | SAS | rs756913 | - | - | - | Eurasiaplex | ||
| 11 | OCE | rs715605 | - | gAIMs | - | 16 | EAS | rs8104441 | - | gAIMs | - | |||||||||
| 12 | OCE | rs9908046 | - | gAIMs | - | MYH15 º | EAS | rs6437783 | - | gAIMs | - | - | ||||||||
| 13 | OCE | rs9934011 | - | gAIMs | - | OCA2 º | EAS | rs1800414 | Kiddlab | - | - | - |
* A single SNP developed for the Pacifiplex panel was not incorporated into gAIMs. º Gene locations of the two BT appearance SNPs previously selected for other ancestry panels.
Figure 1Accumulating population specific divergence values for each of the main population groups and the six sets of SNPs informative for their differentiation. The cumulative values are shown obtained from 88 binary SNPs (i.e., excluding 12 Eurasian-informative SNPs) and from the addition of the 15 tri-allelic SNPs in BT (VISAGE Basic Tool).
Figure 2The 38 BT ancestry SNPs with the highest numbers of discordant genotypes between 1000 Genomes low coverage vs. high coverage sequence data. An additional 29 had only one discordant genotype. The full set of genotype comparisons for all BT SNPs and all 1000 Genomes samples are given in File S3B.
Figure 3Ancestry analysis with 115 BT ancestry SNPs of the six populations of the standardised reference dataset (AFR: brown; EUR: blue; SAS: yellow; EAS: pink; OCE: green; AMR: purple), with consistent colours across three statistical analyses of A, C and D. (A) STRUCTURE cluster membership proportions at K = 6, indicated to be the optimum K number of genetic clusters by: (B) the mean L(K) (log probability of data) and ΔK plots from STRUCTURE runs, following the analyses of Evanno et al. [31]. (C) Multi-dimensional scaling (MDS) analysis showing principal component (PC) 1 vs. PC2 coordinates, PC1 vs. PC3 and PC2 vs. PC3 two-dimensional plots. (D) Neighbour joining tree (NJT) analysis. (E) Summary classification success table of cross validation of the standardised reference set.
Figure 4Pairwise comparison of individual co-ancestry proportions (cluster membership proportions) in 504 admixed samples from 1000 Genomes, analysed using the standardised reference set with Human Origins array, comprising >572,000 SNPs (using ADMIXTURE) and 115 VISAGE BT ancestry SNPs (using STRUCTURE). Co-ancestry proportions in four genetic clusters representing, African, European, Native American and East Asian, ancestries are given in File S4. The r2 correlation analysis plots of each admixed population group are shown below the corresponding cluster patterns.