Literature DB >> 20886636

Evaluating self-declared ancestry of U.S. Americans with autosomal, Y-chromosomal and mitochondrial DNA.

Oscar Lao1, Peter M Vallone, Michael D Coble, Toni M Diegoli, Mannis van Oven, Kristiaan J van der Gaag, Jeroen Pijpe, Peter de Knijff, Manfred Kayser.   

Abstract

The current U.S. population represents an amalgam of individuals originating mainly from four continental regions (Africa, Europe, Asia and America). To study the genetic ancestry and compare with self-declared ancestry we have analyzed paternally, maternally and bi-parentally inherited DNA markers sensitive for indicating continental genetic ancestry in all four major U.S. American groups. We found that self-declared U.S. Hispanics and U.S. African Americans tend to show variable degrees of continental genetic admixture among the three genetic systems, with evidence for a marked sex-biased admixture history. Moreover, for these two groups we observed significant regional variation across the country in genetic admixture. In contrast, self-declared U.S. European and U.S. Asian Americans were genetically more homogeneous at the continental ancestry level. Two autosomal ancestry-sensitive markers located in skin pigmentation candidate genes showed significant differences in self-declared U.S. African Americans or U.S. European Americans, relative to their assumed parental populations from Africa or Europe. This provides genetic support for the importance of skin color in the complex process of ancestry identification.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20886636      PMCID: PMC3051415          DOI: 10.1002/humu.21366

Source DB:  PubMed          Journal:  Hum Mutat        ISSN: 1059-7794            Impact factor:   4.878


INTRODUCTION

The current U.S. American population is particularly interesting for studying bio-geographic ancestry, as it represents an amalgam of individuals who originate from at least four major continental regions that (at least potentially) started to admix at different time scales from the first European colonization of North America onwards. The four most frequently self-assigned clusters by U.S. Americans according to the U.S. Census Bureau (2008) are White (U.S. European), Black (U.S. African), Asian (U.S. Asian) and Hispanic / Latinos (U.S. Hispanic). It should be noticed, however, that such classification mixes bio-geographic ancestry with sociological and cultural, including linguistic variables. For example, individuals self-defined as U.S. Hispanics share cultural aspects, such as the Spanish mother tongue, but can be of different bio-geographic ancestry reflecting the more than 500 years of admixture history between Native Americans, Europeans and Africans in the Americas (Salazano and Bortolini, 2002). Similarly, self-declared U.S. Africans generally carry some degree of European genetic ancestry which in particular cases can reach more than 80% of the total ancestry (Sinha, et al., 2006). Finally, additional sub-continental population substructure can also be detected within self-identified groups, such as within U.S. Europeans (Campbell, et al., 2005), U.S. Africans (Tishkoff, et al., 2009; Zakharia, et al., 2009) and U.S. Hispanics (Wang, et al., 2008), as genetic heterogeneity within the respective parental populations has also been observed (Jakobsson, et al., 2008; Lao, et al., 2008; Li, et al., 2008; Novembre, et al., 2008). In the present study we have analyzed the bio-geographic ancestry of U.S. Americans with self-declared African, European, Asian and Hispanic ancestry, respectively, using single nucleotide polymorphisms (SNPs) from uniparental non-recombining part of the human Y-chromosome (NRY) and mitochondrial (mt) DNA, as well as from carefully ascertained biparental autosomal regions. All DNA markers used were ascertained to be sensitive for indicating bio-geographic ancestry on the level of the four continental regions (Africa, Europe, Asia, and America) expected to have contributed to the current U.S. population. Very few previous studies have analyzed all three genetic systems in at least one of these U.S. groups (Parra, et al., 1998; Lind, et al., 2007; Stefflova, et al., 2009). As far as we know, our study represents the first of its kind combining suitable ancestry-sensitive markers from all three genetic systems to detect separately patrilineal, matrilineal and biparental genetic ancestry in all four major U.S. American groups.

MATERIALS AND METHODS

Samples

Anonymous liquid blood or buccal swab samples from a total of 664 U.S. American individuals were obtained from Interstate Blood Bank, Inc. (Memphis, TN), Millennium Biotech, Inc. (Ft. Lauderdale, FL) and DNA Diagnostics Center (Fairfield, OH). Among them, 246 were self-declared U.S. African Americans, 127 were self-declared U.S. Hispanic Americans, and 245 were self-declared U.S. European Americans from Temple and Killeen, TX, Louisville, KY, Baltimore, MD, Philadelphia, PA, Memphis, TN and Miami, FL and 46 were self-declared U.S. Asian Americans from the Fairfield, OH source. Each sample was examined with 15 autosomal short tandem repeats and the amelogenin sex-typing marker using the AmpFlSTR Identifiler kit (Applied Biosystems, Foster City, CA) to verify that each sample was unique (Butler, et al., 2003; Decker, et al., 2008). In addition to the U.S. American samples, autosomal markers were also genotyped in the Human Genome Diversity Project- Centre d'Etude du Polymorphisme Humain (HGDP-CEPH) samples (Cann, et al., 2002). From those, four groups i.e. i) Sub-Saharan Africans (Bantu, Biaka Pygmies, Mandenka, Mbuti Pygmies, San, Yoruba); ii) East Asians (Cambodian, Dai, Daur, Han, Hazara, Hezhen, Japanese, Lahu, Miaozu, Mongola, Naxi, Oroqen, She, Tu, Tujia, Uygur, Xibo, Yakut, Yizu); iii) Eurasians (Adygei, Basque, Bergamo, French, Orcadian, Russian, Sardinian, Tuscan); and iv) Native Americans (Colombian, Karitiana, Maya, Pima, Surui) were used as parental groups in some of the statistical analyses.

Autosomal DNA analysis

Tweny four autosomal SNPs: rs1876482, rs2179967, rs1048610, rs1371048, rs1478785, rs1369290, rs952718, rs1405467, rs1344870, rs1391681, rs1461227, rs1907702, rs2052760, rs714857, rs721352, rs722869, rs926774, rs1448484, rs1667751, rs1858465, rs1465648, rs16891982, rs1808089, rs3843776 were genotyped via two SNaPshot multiplex reactions as described in detail in the Supp. Methods and Supp. Table S1. These SNPs were ascertained to be ancestry-sensitive on the continental level as described in detail elsewhere (Lao, et al., 2006; Lao, et al., 2007; Kersbergen, et al., 2009; Corach, et al., 2010). In brief, Affymetrix 10K SNP data in 76 human individuals from 21 worldwide sampling localities from the Y-Chromosome Consortium (YCC) panel were analyzed using the informativeness of ancestry statistic (I; (Rosenberg, et al., 2003)) and applying a genetic algorithm to select a minimal set of markers that maximized the amount of ancestry information for differentiating four continental populations (Sub-Saharan Africa, Eurasia, East Asia and America) (Lao, et al., 2006). In parallel, a single population FST (Weir and Cockerham, 1984) strategy was applied to ascertain markers that differentiate each population (Kersbergen, et al., 2009). In addition, SNPs were added from 3 genes associated with variation in skin pigmentation showing large frequency differences between Europeans, Africans and East Asian ancestry and for which evidence of positive selection was established (Lao, et al., 2007). The current set of 24 ancestry-sensitive markers (ASMs) was obtained by ascertaining from the pooled data the set of SNPs that maximizes the I statistic considering four continental groups.
Supp. Table S1

Genotyping information autosomal SNPs

Multiplex APCR PrimersSBE PrimersLengthμM
rs1048610FAGGCAGGTCTCAGAACAATCCGTGTGCTGCAGGGACCTTTCF205
RGTTCAGCATCGACATAGGGC
rs1876482FGAGCTGTTGATAGAGCTTTTGTGGttttttGGCTGTACCCTCACTATTGGTGR285
RACGTGACACATAAAGAAAATGCCAT
rs2179967FAAGAGTGTGTTGTATGCTTTGGAAAttttttCTTTGGAAATGGGTGTGCAACAF286
RTCCTTCCAGCCCGACTAGAAC
rs1858465FGATTTCAAAAAGTCTACAGATTTGGtttttACTTCCTCTTTAATACTTCAACTGAGTR327
RTGACTTTGTCAAACTTCCTCTTTAA
rs1371048FCTTAAATAGCCAAATAGCTCTAACTttttttttttATTTGAGTATGCTCTGTAGATGCTTCR365
RACAAACGAAATATTTGAGTATGCT
rs1369290FGAGGCCCTACATGACCTGTCtttttttttttttttACCACAGGCTCTTGATAAAGTGTCTF405
RGGGCTCCTCTTTCGCTCA
rs1465648FACCAGAAGGAAAGAGAAAAAGCACtttttttttttttttttGAAAAAGCACAGTATCAAGTTTGACTTF446
RAACAAACTACAGCAACAGAATCTTT
rs1391681FGAGTAGTTGCTCATGAAGCTGAAAAttttttttttttttttttttttTGTCACCCTTTACAAAACAGTTTGCAF485
RGGGCAGCCAAAAATAAAACAAAACA
rs1461227FACTGGGAAATTCTCACTGCAACTtttttttttttttttttttttttttAACTACAACTAGCCCTAGGCTAATCTAF525
RTTGACAGATGGAGACACTGAAGC
rs1907702FCCAACTCCTAATCAAGGCCTACttttttttttttttttttttttttttttttCCTAATCAAGGCCTACAGAGACCTTCF565
RAGGAACATAAAGGAGGCCAGT
rs2052760FATTCAGAAAAGTGCATGCAGAAATTttttttttttttttttttttttttttttttttttATTATCAATGGGTTATTTTTGCCTCAF605
RGAGAGAGAGGAGTGAGAAAGGC
rs1667751FCTGGTTCTTTTCCATCCAGCCTTTAttttttttttttttttttttttttttttttttttttttCTTTACAAGCTACAAGACTTACGCCTF645
RGAGATCACCAAGGGAGTAAGTACAG

Multiplex BPCR PrimersSBE PrimersLengthμM

rs1448484FTCTCCTTCCAAGCCTTCTGAAAAATtATGAGAGCTGGCAGCTTCCF206
RGCAACCACACAGAACACAGC
rs714857FGAAACTTCCCTAATGGGTCTTGTGAtttCTTGTGAACCTTGGCTCCCTGF246
RCCTCCCTCACACATAAAACTTCTCA
rs16891982FATCCAAGTTGTGCTAGACCAGAAttttttGAGGAAAACACGGAGTTGATGCAF295
RAGAGGAGTCGAGGTTGGATG
rs1808089FTGTCAGGCCTTACCACTGCATAAGAttttttttACAAATGAGTAATGCCGTGGTGGR315
RAAACAACTCAGCGGCACAAA
rs1478785FTCCTGGAGGCTTGAGGGCTAtttttttttAGGGATGTTCATTTAAAATAACATCGCF365
RGGCTTGCTGGCTTTTTCTAGAT
rs952718FGAGCCTAGATCCTGACTTCCTTGtttttttttttttAAAATGCAAATTTCACCTTCTTCAAATR405
RCTGTCACTGGAGATGTCATCTCAT
rs1405467FAATTTGCAACAAAGAGGAAGGGGAttttttttttttttttttAAGTAGTCAGCTGAACTCACCTGATF435
RGAGCAATAAGAGTGACTATGTCTGC
rs1344870FCAATCTCAGTTTTAATTGCCATGTttttttttttttttttttttttTCGCTCTTAAGTATGTTTTCTTGGTCF485
RAGGATGTATTGGGGCCTTTC
rs3843776FAGGCCACTGTTGTGGTTTATGtttttttttttttttttttttttttttTGTTGTGGTTTATGTTTCACTTCGACF536
RTGAGGGCTCTACAACACTGC
rs721352FTCTGTGCCCAGATGCAAATCCTTAtttttttttttttttttttttttttttttTGCTTGATGGCTCCACCTATCAR516
RGACCCAGAACTGTGCAGG
rs722869FCCTTCTGCACTTGGGCATATTtttttttttttttttttttttttttttttttttCAAATCCTTCATTTCACAAATGAAGCTR605
RAGGTAGAGATCTAACAAACCACAGT
rs926774FAATCAAGTTCAGACTTTTGCCTCATtttttttttttttttttttttttttttttttttttttAAGCTATTGTAGTGAGGAAGGCTAGAR637

Mitochondrial DNA analysis

The entire mtDNA control region [range 16024-576] was sequenced using an automated, high-throughput, redundant sequencing and review strategy as described elsewhere (Irwin, et al., 2007). Sequence assembly and confirmation was performed independently by two different analysts, and followed by electronic data transfer to a secured laboratory information management system (LIMS) for sequence verification. The raw data was then exported to a second laboratory (the European DNA Profiling Group (EDNAP) mtDNA Population Database (EMPOP); (Parson and Dur, 2007)) for additional review and quality control examination. Control region haplotypes for the self-declared African American (Diegoli, et al., 2009) and Hispanic (Saunier, et al., 2008) samples have been published previously, and the sequences, along with those generated here for European Americans and Asian Americans have all been deposited in GenBank under accession numbers: DQ906460-DQ906701 and DQ906703-DQ906708 (African Americans), DQ906175-DQ906459 (European Americans), EU014897-EU015024 (Hispanics), and HM214959-HM215005 (Asian Americans). MtDNA haplogroup assignment of the samples was conducted using a multitude of references found within the reference section of (Diegoli, et al., 2009) for the African American samples, (Saunier, et al., 2008) for the Hispanic samples, (Irwin, et al., 2008) for the European American samples, and (Irwin, et al., 2009) for the Asian American samples, and checked against the most recent human mtDNA tree at http://www.phylotree.org (van Oven and Kayser, 2009). In those cases where haplogroup assignment based upon sequence polymorphisms in the control region was ambiguous, additional sequencing of coding region SNPs was performed as described elsewhere (Just, et al., 2008). The continental region of geographic origin of the mtDNA haplogroups was assumed from published mtDNA data (Richards, et al., 1998; Macaulay, et al., 1999; Finnila, et al., 2001; Kivisild, et al., 2006; Kong, et al., 2006; Achilli, et al., 2008; Behar, et al., 2008), and is provided for all mtDNA haplogroups observed in this study in Supp. Table S2.
Supp. Table S2

MtDNA haplogroups observed among U.S. Americans and their assumed geographic region of origin

Assumed continental origin
mtDNA haplogroupAsianEurasianAfricanNative American
A1
A21
A51
B21
B4a1
B4b11
B4c1
B5b1
C11
D/E/G1
D/G1
D11
D4a1
D4e1
D4i1
D4k1
D5b1
E21
F1a1
F1b1
F2a1
F3b1
G1
H1
H111
H13a1
H1a1
H1b1
H1c1
H3a1
H51
H61
HV01
I1
J1b1
J1c1
J2a1
K1
L0a1
L0a1
L1b1
L1c1
L2a11
L2b1
L2c1
L2d1
L31
L3a1
L3b1
L3d1
L3e11
L3e21
L3e31
L3e41
L3f1
L3h1
M101
M351
M7a1
M7b1
M8a1
M9a1
N1a1
N1b1
N91
R*0.50.5
T11
T21
U21
U31
U41
U5a1
U5b1
U6a1
U8a1
W1
X21
X2a1

Y-chromosomal DNA analysis

Y-chromosome variation was identified by means of 42 NRY-SNPs in total. Twenty four NRY-SNPs were genotyped in all samples (including: SRY 1532, M91, M168, M145, M174, 12f2, M96, M213, M201, M69, M52, M170, M172, M9, M20, M106, M214, Tat, M175, M45, MEH2, M207, M269, and M124). Aiming to maximize continental differentiation of haplogroup origins we additionally genotyped 18 additional SNPs among samples identified as belonging to haplogroup E (M33, P2, M2, M154, M191, M215, M35, M78, V12, M224, V32, V13, V22, M81, M123, M281, V6, and M75). A single multiplex PCR and SNaPshot assay using the principle of primer extension was designed for the core set of 24 NRY-SNPs as described elsewhere (Corach, et al., 2010). Genotyping of the additional 18 NRY SNPs for subtyping of haplogroup E was performed in a multiplex, designed in a similar way as described for the core set of 24 NRY-SNPs, the only exception being a final MgCl2-concentration of 3mM in the multiplex PCR. PCR-product sizes ranged from 76-150 bp. Sequences and concentrations of the primers used in the monoplex and multiplex PCR and extension reactions are provided in Supp. Table S3 and a phylogenetic tree of the NRY-SNPs used is in the Supp. Figure S1. NRY haplogroups were derived from genotyping of NRY-SNPs using the marker phylogeny as described elsewhere (Karafet, et al., 2008). The continental region of geographic origin of the NRY haplogroups was assumed from published NRY data (Semino, et al., 2000; Bortolini, et al., 2003; Jobling and Tyler-Smith, 2003; Luis, et al., 2004; Cruciani, et al., 2007), and is provided for all NRY haplogroups observed in this study in Supp. Table S4.
Supp. Table S3

Genotyping information NRY SNPs

AdditionalHaplogroupSNPBibliogr aphical sourceGenBankdbSNPs accession (if known)Position Y-chromosomeForward Amplification primer (5′–> 3′)Reverse Amplification primer (5′–> 3′)concentration in PCR (μM)Amplic on size (bp)Minisequencing primers (target-specific sequence in capitals)Orientationconcentration in miniseqreaction (μM)Primer size (nt)Mutation: Wildtype/Mutant**
hgEEM96RefsAC010889rs930684120238386GCCAGCCAAGAATGAAGAGATGAGCTGTGATGTGTAACTTGG0.1143GGAAAACAGGTCTCTCATAATAR0.0422G/C
hgEE1aM332AC00997720199838CCGTCATAGGCTGAGACAAGACCCCAAGAGAGACAACTGAC0.15150ccacgtcgtgaaagtctgacaaCAGTTACAAAAGTATAATATGTCTGAGATR0.0651C/G
hgEE1b1P23AC01013720070219GAGAATCAGCTCCAGCCATCTTTTGGATCTTCATGCTGGTT0.03100gacaaAGGTGCCCCTAGGAGGAGAAF0.225T/C
hgEE1b1aM26AC011302rs389312606580ACGGAAGGAGTTCTAAAATTCAGGAAAATACAGCTCCCCCTTTATCCT0.1147cacgtcgtgaaagtctgacaaTTCATTGTTAACAAAAGTCCR0.0641G/A
hgEE1b1a4M1542AC01088920352065AGGCTACAAATTAGTGCGACAGAGGCACAGATACTTAAACCATTG0.0677acaaGTTACATGGCCTATAATATTCAGTACAR0.0331G/A
hgEE1b1a7M1912AC004474rs203259013529007AAAAATGGAGTGTTTATCAGAGCTTCCCAGACACACCAAAATATCTC0.3122gaaagtctgacaaAAAATATCTCATATTTTCATR0.2533A/G
hgEE1b1bM2152AC006376rs203265413977218TCAAACTGTTGGTAAATTTTAGAGAAACAGAAGCATCAGCTGGAACA0.2597gtcgtgaaagtctgacaaCAGCTGGAACAGTTAGAAAGR0.1538C/T
hgEE1b1b1M352AC009977rs117918820201091AGGGCATGGTCCCTTTCTATTCCATGCAGACTTTCGGAGT0.296actgactaaactaggtgccacgtcgtgaaagtctgacaaTCGGAGTCTCTGCCTGTGTCR0.0659G/A
hgEE1b1b1aM78*2AC01088920352691TGCATTACTCCGTATGTTCGACTGGAAGCTTACCATCTTTTTATGA0.05*132aagtctgacaaCTTATTTTGAAATATTTGGAAGGGCR0.0236A/C
hgEE1b1b1a1V127AC0120686883099CTGAGTTGGATTGTTTTAAGTTGATTGGTCTCTCTTCATGTGCTG0.15150acaaTTGTGTAGATAATTCAAAGTR0.2524C/T
hgEE1b1b1a1aM2242AC01088920352687TGCATTACTCCGTATGTTCGACTGGAAGCTTACCATCTTTTTATGA0.05*132cgtgaaagtctgacaaAATTGATACACTTAACAAAGATACTTCF0.1543A/G
hgEE1b1b1a1bV327AC0120686992821GCAAATGTTCCATGAATGGTGCCAGCCAGAGAGGCACTTTA0.4111CCCaactgactaaactaggtgccacgtcgtgaaagtctgacaaCACACATGTATATACACACCR0.2563C/G
hgEE1b1b1a2V137AC0120686902263CAACAGTGGAGGACAAAGCAAAGACCAGCCTGACCAACAT0.15106cgtcgtgaaagtctgacaaGCTCAAACTTCCCTTGR0.1535A/G
hgEE1b1b1a3V227AC0120686919957TGGCAATGCCTCAACTTACAATTCCCCAAGGTTTCAGAGG0.15110CaactgactaaactaggtgccacgtcgtgaaagtctgacaaCCAAGGTTTCAGAGGTCR0.1558C/G
hgEE1b1b1bM812AC010889rs203264020351960GCACTATCATACTCAGCTACACATCTCTTGTTTCTTCTTGGTTTGTGTGA0.0399acaaCTTGGTTTGTGTGAGTATACTCTATGACR0.0332G/A
hgEE1b1b1cM1232AC01088920223974GTTGCCCAGGAATTTGCATCACAGAGCAAGTGACTCTCAAAG0.1589taaactaggtgccacgtcgtgaaagtctgacaaCATTTCTAGGTATTCAGGCGATGF0.156T/G
hgEE1b1b1dM2814AC010889rs1344737020223888AGCAAAGTTGAGGTTGCACATGGGCAACACCAGAATCTAA0.1593gtgccacgtcgtgaaagtctgacaaGCACAAACTCAGTATTATTAAACF0.0648T/C
hgEE1b1b1eV63AC0120686992007GATGGCACAGTGTTCGACAGCTTCTCTCCAAATGCCTGCT0.4102taggtgccacgtcgtgaaagtctgacaaCCTGCTGCCGCATCTGCAR0.0246T/C
hgEE2M752AC010889rs203263920349565TGACTTGTCAAAAGCCAAAACATTGAACAGAGGCATTTGTGA0.1123taggtgccacgtcgtgaaagtctgacaaGAAAAGACAATTATCAAACCACATCCF0.154C/T
Supp. Figure S1

Phylogenetic tree of NRY SNPs.

Supp. Table S4

NRY DNA haplogroups observed among U.S. origin Americans and their assumed geographic region of origin

Assumed continental origin
NRY haplogroupAsianEurasianAfricanNative American
A1
B1
C1
D1
E1a1
E1b1a*(xE1b1a4,E1b1a7)1
E1b1a71
E1b1b1*(xE1b1b1a,E1b1b1b,E1b1b1c,E1b1b1d,E1b1b1e)0.50.5
E1b1b1a*(xE1b1b1a1,E1b1b1a2,E1b1b1a3)0.50.5
E1b1b1a1*(xE1b1b1a1a,E1b1b1a1b)0.50.5
E1b1b1a21
E1b1b1a30.50.5
E1b1b1b1
E1b1b1c0.80.2
E21
G1
I1
J*(xJ2)1
J21
K*(xL,M1,NO,P)0.3330.3330.333
N1c1
O1
Q1a1
R1a1
R1b1b21
R21

Statistical analyses

Suitability of the 24 ascertained SNPs to recover continental ancestry was checked by means of performing a STRUCTURE analysis (Pritchard, et al., 2000) in the HGDP-CEPH panel. We increased the number of groups from K=2 to K=6 under the Admixture model with a burn-in of 100,000 simulations and retaining the next 100,000. Five runs were performed for each K. For the estimation of the parental ancestry of the U.S. samples, a STRUCTURE analysis considering four parental populations (Native Americans, East Asians, Eurasians, and Sub-Saharan Africans from HGDP-CEPH) based on expected continental ancestry was used. Ten thousand simulations were used as burn-in and the next 10,000 simulations retained for admixture estimates. Reproducibility of results was checked by repeating 10 times the same analyses, obtaining in all cases similar values of admixture from the parental populations. Bar plot was performed from the STRUCTURE estimations with Distruct software 1.1 (Rosenberg, 2004). Differences in the amount of ancestry were tested in regions with more than 10 sampled individuals by means of a Kruskal-Wallis test. In particular, it was computed for the African component in U.S. Africans (regions = Baltimore (n = 34), Louisville (n = 21), Memphis (n = 41), Miami (n = 25), Philadelphia (n = 104), Temple (n = 17)) and for the Native American component in U.S. Hispanics (regions = Miami (n=61), Temple (n=29), Killeen (n=17), Philadelphia (n=13)). Additionally, we compared the genetic clustering of U.S. individuals with self-identified ethnicity by means of a STRUCTURE analysis assuming no admixture between the inferred clusters and 4 populations (Tang, et al., 2005). An identical by state distance matrix between all pairs of individuals including parental HGDP-CEPH populations was computed considering the 24 SNPs and was used to compute a non parametric multidimensional scaling (MDS) (Kruskal and Wish, 1990) with the package isoMDS of the R software (R Development Core Team, 2006) specifying 3 dimensions. When the distance between two individuals was 0, a small quantity of 0.001 was added. The I statistic was computed for each of the 24 ASMs using as populations: self-declared U.S. European and the Sub-Saharan African HGDP-CEPH population cluster (set A), self-declared U.S. African and HGDP-CEPH European group (set B), and Sub-Saharan African HGDP-CEPH population cluster and HGDP-CEPH European group (set C). A linear regression was performed with SPSS (SPSS, 2003) between set A and C, and between set B and C; the SNPs falling out of the prediction with a 99% confidence estimation in any of the two linear regressions were recovered. Analysis of Molecular Variance (AMOVA; (Excoffier, et al., 1992)) was conducted in Arlequin 3.0 software (Excoffier, et al., 2005) assuming self-identified ancestry.

RESULTS

Autosomal DNA

The ancestry information provided by the 24 autosomal ASMs was first tested by performing a STRUCTURE analysis with the HGDP-CEPH samples assuming no prior knowledge of the ancestral groups. After K=4 the estimated loglikelihood of the data given the model (-19135) did not substantially change anymore. The four clusters detected at K=4 broadly match the four geographic regions: America, Sub-Saharan Africa, East Asia, and Eurasia (including Europe / Middle East / South Asia / Central Asia) (Figure 1). Only a small percentage of misclassified individuals was observed i.e., 0.47% Sub-Saharan Africans, 4.2% of Eurasians, 4.6% of Native American individuals, and 6.2% of East Asians (the latter was mainly in the Eurasian cluster with 3.6%). We concluded that these 24 SNPs are suitable for inferring bio-geographic ancestry in U.S. Americans since the four geographic regions identified represent the putative parental populations of the four major groups of U.S. Americans.
Figure 1

Genetic ancestry per individual in the global HGDP-CEPH panel as estimated by STRUCTURE using 24 autosomal ASMs (K=4).

Genetic ancestry per individual in the global HGDP-CEPH panel as estimated by STRUCTURE using 24 autosomal ASMs (K=4). Next, we used the Native Americans, East Asians, Eurasians, and Sub-Sahara Africans from HGDP-CEPH as parental groups of the U.S. Americans (the genotype data of the 24 autosomal SNPs can be found in the Supp. Table S5) in a STRUCTURE analysis. Self-declared U.S. Europeans showed on average 93.2% of European ancestry (95% CI from 73.23% to 98.09%), self-declared U.S. Asians carried on average 89.5% of East Asian ancestry (95% CI from 37.43% to 97.46%), and self-declared U.S. Africans revealed on average 86.2 % Sub-Sahara African ancestry (95% CI from 47.82% to 98.5%) (Figure 2). For these three U.S. groups rather small (between 0.8 and 8.1% on average) components of continental ancestries other than the self-declared ones were detected (Figure 2). In contrast, self-declared U.S. Hispanics carried on average 61.2% European ancestry (95% CI from 8.33% to 95.75%), 14.9% Native American (95% CI from 1.21% to 55.54%), 10.8% East Asian (95% CI from 1.12% to 56.35%), and 11.6%, Sub-Saharan African ancestries (95% CI from 0.41% to 58.49%) (Figure 2). Furthermore, we observed for self-declared U.S. Africans statistically significant heterogeneity in the amount of African genetic ancestry depending on the geographic sampling region (Kruskal-Wallis test p-value=0.0042), as well as for self-declared U.S. Hispanics in the amount of Native American genetic ancestry (Kruskal-Wallis p-value = 1.48e-07). An AMOVA grouping individuals based on self-declared ancestry explained 34.2% (two tail p value <0.0005) of the total genetic variation suggesting strong genetic differentiation between self-declared ancestry groups of U.S. Americans.
Figure 2

Proportions of average continental genetic ancestry in four U.S. American groups of self-declared ancestry based on autosomal DNA, mtDNA and NRY DNA.

Proportions of average continental genetic ancestry in four U.S. American groups of self-declared ancestry based on autosomal DNA, mtDNA and NRY DNA. Furthermore, we performed an additional STRUCTURE analysis considering only U.S. samples with K=4 and assuming no admixture (loglikelihood of the data given the model = -16287.9) showing that the majority of U.S. Africans appeared in one of the four clusters (K4), and almost all U.S. Asians were in another cluster (K1) (see Table 1). In contrast, 15% of self-declared U.S. Hispanic samples were classified in the main cluster of U.S. Europeans (K3), and 19% of self-declared U.S. Europeans were clustered in the main cluster of self-declared U.S. Hispanics (K2).
Table 1

Correspondence between self-declared ancestry and STRUCTURE-based genetic ancestry inferred from 24 autosomal ASMs in four major U.S. American self-declared groups

Clusters from STRUCTURE
Self-declared ancestryK1K2K3K4
U.S. African0%2.2%1.0%96.8%
U.S. European0%19.0%80.6%0.4%
U.S. Hispanic2.4%77.8%15.7%4.0%
U.S. Asian99.9%0.1%0%0%
Correspondence between self-declared ancestry and STRUCTURE-based genetic ancestry inferred from 24 autosomal ASMs in four major U.S. American self-declared groups From the MDS plot (Figure 3) it is evident that self-declared U.S. Europeans, U.S. Africans and U.S. Asians form rather discrete data clouds without strong overlaps between these groups, and tend to cluster close to their respective continental parental populations (from HGDP-CEPH). Self-declared U.S. Hispanics, however, did not cluster separately but either overlapped with U.S. / continental Europeans or appear between the U.S. / continental European cluster and the U.S. / continental Asian cluster with some U.S. Hispanics overlapping with the U.S. / continental African cluster or appeared between the U.S. / continental African and the U.S. / continental European clusters.
Figure 3

Two-dimensional plots of the first dimension, second dimension and third dimension obtained from a MDS analysis (stress = 0.13) performed with an Identical By State (IBS) distance matrix computed between pairs of individuals. Centroids of the four continental parental populations from HGDP-CEPH are marked by crosses.

Two-dimensional plots of the first dimension, second dimension and third dimension obtained from a MDS analysis (stress = 0.13) performed with an Identical By State (IBS) distance matrix computed between pairs of individuals. Centroids of the four continental parental populations from HGDP-CEPH are marked by crosses. We also tested whether any of the 24 autosomal ASMs were more or less informative proportionally to the amount of information of the other markers for self-identification of U.S. Africans and U.S. Europeans. The lineal regression between the I values computed for each SNP using U.S. Europeans and continental Africans (from HGDP-CEPH) versus continental Africans and continental Europeans (from HGDP-CEPH) (see methods for definition of continental populations) was highly statistically significant (R-squared = 0.98, two tail p-value = 3.91e-020; slope = 1.07, p value different from one = 0.0375). The I value observed for rs16891982 when considering U.S. Europeans and continental Africans was significantly higher (falling out of the 99% predicted interval) than the one predicted by the linear regression using all 24 markers. In a similar way, comparison of the I values computed between U.S. Africans and continental Europeans versus these computed considering continental Africans and continental Europeans also was statistically significant (R-squared = 0.97, two tail p-value = 1.85e-018; slope = 0.67, p value that the slope is different from 1 = 3.04e-12). Rs1448484 showed a larger I value and rs16891982 smaller for the comparison between U.S. Africans and continental Europeans than predicted by the linear regression considering all 24 markers.

NRY-DNAandmtDNA

The values of genetic ancestry provided by uni-parentally inherited NRY and mtDNA markers (Figure 2) were similar to the autosomal ASMs in the case of self-declared U.S. Europeans (estimated European ancestry for NRY: 94.7% and mtDNA: 96.7%; Fisher exact test value of the hypothesis of equal proportion of ancestry components between NRY and mtDNA = 4.85, two tail p value = 0.19) and for U.S. Asians (estimated East Asian ancestry for NRY: 97.8% for NRY and mtDNA; Fisher exact test value = 1.40, two tail p value = 1). In contrast, self-declared U.S. Africans showed discrepancies between the three genetic systems: 69.5% of NRY-DNA but 92.7% of mtDNA were of African ancestry and the second largest NRY ancestry component was European with 29.7%. The differences in the ancestry proportions between the two types of uniparental markers in U.S. Africans were highly statistically significant (Fisher exact test value = 58.80, two tail p value = 6.00e-014). In contrast to autosomal ASMs, we did not detect any statistically significant geographic substructure in the NRY and mtDNA ancestry data within self-declared U.S. Africans (Fisher statistic for NRY = 22.82, two tail p-value = 0.45 and Fisher statistic for mtDNA = 19.56, two tail p-value = 0.39). Self-declared U.S. Hispanics, however, showed the most complex ancestry pattern of all the U.S. American groups studied also for uniparental markers. NRY ancestry was 69.3% European, 21.3% African and only 7.9% Native American, whereas the East Asian component was 1.6%. MtDNA ancestry was 48.8% Native American, 23.6% European and 11.8% East Asian. Differences on ancestry proportions in U.S. Hispanics between the two uni-parentally inherited marker systems were statistically significant (Fisher exact test value = 82.41, two tail p value = 3.11e-018). In contrast to autosomal ASMs, there was no significant NRY differentiation between self-declared U.S. Hispanics from the different sampling regions across the country (Fisher statistic for NRY = 11.69, two tail p-value = 0.14), whereas mtDNA data revealed statistically significant differences (Fisher statistic for mtDNA = 23.3, two tail p-value = 0.0024) as autosomal ASMs did. AMOVA analyses performed on the NRY and mtDNA data separately and considering self-declared ancestry grouping explained 27.65% (two tail p value < 0.000005) and 7.6% (two tail p value < 0.000005) of the total genetic diversity, respectively. AMOVA using the autosomal ASM data and considering groupings based on NRY ancestry and separately on mtDNA ancestry revealed 23.3% (two tail p-value <0.0005) and 30.2% (two tail p-value <0.0005) of the total genetic diversity, respectively. The NRY and mtDNA haplogroups for all individual samples included can be found in the Supp. Table S5.

DISCUSSION

The current U.S. population represents a mixture of groups with different bio-geographic ancestries, mainly from Europe, Sub-Saharan Africa, East Asia and the Americas. We have shown in the HGPD-CEPH samples that the ascertained autosomal ASMs are informative for detecting the ancestry of these four continental groups. Overall, STRUCTURE, MDS and AMOVA analyses indicate that in U.S. Americans self-declared ancestry serves on average as a good proxy of the underlying autosomal genetic diversity, especially of European, African and Asian Americans. Our STRUCTURE results are in line with an earlier study reporting that ancestry self-identification corresponded well with STRUCTURE-based predictions for U.S. Americans (Tang, et al., 2005). Our findings with autosomal ASMs tend to corroborate previous findings performed in self-identified U.S. Europeans (Halder, et al., 2008; Halder, et al., 2009; Kosoy, et al., 2009) and U.S. Asians (Kosoy, et al., 2009), although usually many more markers were applied before. However, we observed discrepancies between our data and previous studies for self-declared U.S. Africans and U.S. Hispanics. For U.S. Africans we found a slightly larger percentage of African ancestry and a slightly lower percentage of European ancestry relative to previous reports (Tian, et al., 2006; Halder, et al., 2008; Halder, et al., 2009; Kosoy, et al., 2009; Zakharia, et al., 2009). For U.S. Hispanics, the Native American component tends to be rather low compared to previous studies (Price, et al., 2007; Halder, et al., 2009; Kosoy, et al., 2009). Differences in the admixture histories in different regions of the U.S. as reported elsewhere (Salazano and Bortolini, 2002; Kittles and Weiss, 2003; Zakharia, et al., 2009) are likely to explain such discrepancies. This view also is supported by the considerable heterogeneity in continental genetic ancestry depending on the geographic origin of the sampling region within the U.S. we observed for these two U.S. American groups. An alternative explanation in the case of U.S. Hispanics could be a lack of power of the set of autosomal ASMs we applied to distinguish Native American from East Asian ancestry (also explaining the apparent small Native American ancestry component in U.S. Asians). Native Americans and East Asians show a general genetic proximity due to their shared population history (Jakobsson, et al., 2008; Li, et al., 2008). Repeating the STRUCTURE analysis for U.S. Hispanics without considering East Asians as parental population raised the Native American ancestry component up to 27.44%, which is more comparable to previous studies. However, the fact that some of the self-declared U.S. Hispanic individuals carried NRY haplogroups typical for East Asians, and because a previous study also detected Asian ancestry in U.S. Hispanics (Guthery, et al., 2007), indicate that excluding East Asian admixture a priory would be incorrect for estimating genetic ancestry in U.S. Hispanics. Ancestry estimations obtained here with uni-parentally inherited markers are in good agreement with previous studies for U.S. Europeans, U.S. Africans and U.S. Hispanics for NRY (Kayser, et al., 2003; Hammer, et al., 2006; Lind, et al., 2007) and mtDNA (Allard, et al., 2002; Allard, et al., 2004; Allard, et al., 2005). In contrast, the percentage of Native American mtDNA ancestry estimated in the U.S. Hispanics studied here appears smaller than that of other studies (ranging from ∼70% to ∼85.11%) (Merriwether, et al., 1997; Allard, et al., 2006), although differences between U.S. Hispanic groups from different U.S. regions were observed, which may explain the discrepancies Combining the ancestry information of patrilineal, matrilineal and biparental markers, a special quality of our study, offers the possibility to study the patterns of admixture at different levels of complexity. We observed the same degree of ancestry homogeneity in the three types of genetic markers for self-identified U.S. Europeans and U.S. Asians, which suggests relatively low genetic admixture with other ancestry groups than the one indicated by self-declaration. Noticeably, this finding for U.S. Europeans contrasts with common observation for self-declared European Americans from South America (Goncalves, et al., 2007; Corach, et al., 2010). In those South American groups European ancestry signals are usually high for NRY-DNA, intermediate for autosomal DNA, but low for mtDNA, whereas Native American genetic ancestry signals are reverse, indicating sex-bias admixture between mostly European men and mostly Native American women (Goncalves, et al., 2007; Corach, et al., 2010). This discrepancy between European Americans from North and South Americans has been explained in terms of local differences in social practices (Goncalves, et al., 2007). However, it could also be explained if the concept of ancestry self-identification had different meanings depending on the region of residence. This is supported by the fact that genetic admixture proportions of self-identified U.S. Hispanics from our study resemble those from self-declared European Americans in some South American countries with similar evidence for sex-biased admixture history. Our data also indicate sex-biased admixture for U.S. Africans with considerably more European NRY than mtDNA ancestry, and autosomal DNA estimates in-between. Previous studies analyzing NRY and mtDNA ancestry in U.S. Africans have reported similar results (Kayser, et al., 2003; Lind, et al., 2007), (see (Stefflova, et al., 2009) for a review), which we complement here with agreeing autosomal DNA evidence. Why did we (and others) not detect similarly strong signals of genetic admixture in U.S. Europeans, in contrast to U.S. Africans and U.S. Hispanics? One explanation may be that admixed individuals traditionally self-classify in a biased way and towards only one of the parental groups involved in the admixture process. Ancestry self-identification is the result of both visible traits (with a biological basis) such as skin color combined with cultural/sociological aspects (Bamshad and Guthery, 2007). In the present study rs1448484 appeared to be more informative and rs16891982 less informative for differentiating U.S. Africans from continental Europeans than continental Africans from continental Europeans. In contrast, rs16891982 was more informative for differentiating U.S. Europeans from continental Africans than continental Europeans from continental Africans. Rs1448484 is located within the OCA2 gene, which when mutated can lead to oculocutaneous albinism type II (MIM# 203200); in addition, it has been previously associated with differences in pigmentation using pooled U.S. African / African-Caribbean population and U.S. European individuals (Shriver, et al., 2003). However, there is no evidence thus far that rs1448484 is directly involved in pigmentation variation, although it could be in LD with a functional OCA2 variant. In contrast, rs16891982 represents a non-synonymous amino acid change (F374L) in SLC45A2, and this gene, if mutated, leads to oculocutaneous albinism type IV (MIM# 606574). Notably, the SLC45A2-374 F allele of rs16891982 is almost fixed in Europeans (Soejima and Koda, 2007), and affects the amount of pigmentation (Stokowski, et al., 2007). Individuals carrying the genotypes SLC45A2-374L/L or SLC45A2-374L/F tend to show a darker skin color than SLC45A2-374F/F individuals (Cook, et al., 2009). Here we hypothesize that within the self-identified U.S. Europeans or U.S. Africans, individuals with the L/L or F/L genotypes would tend to declare themselves as U.S. African whereas F/F individuals would as U.S. Europeans. In that case, the presence of heterozygotes in U.S. Africans would decrease the I statistic more than expected with continental Europeans and increase it between U.S. Europeans and continental Africans, as observed by our data. Although our data provide genetic evidence for the role of skin color in the complex process of ancestry self-identification, it would be extremely simplistic to reduce ancestry self- identification only to the type of analysis performed here.
  66 in total

1.  Inference of population structure using multilocus genotype data.

Authors:  J K Pritchard; M Stephens; P Donnelly
Journal:  Genetics       Date:  2000-06       Impact factor: 4.562

2.  A human genome diversity cell line panel.

Authors:  Howard M Cann; Claudia de Toma; Lucien Cazes; Marie-Fernande Legrand; Valerie Morel; Laurence Piouffre; Julia Bodmer; Walter F Bodmer; Batsheva Bonne-Tamir; Anne Cambon-Thomsen; Zhu Chen; J Chu; Carlo Carcassi; Licinio Contu; Ruofu Du; Laurent Excoffier; G B Ferrara; Jonathan S Friedlaender; Helena Groot; David Gurwitz; Trefor Jenkins; Rene J Herrera; Xiaoyi Huang; Judith Kidd; Kenneth K Kidd; Andre Langaney; Alice A Lin; S Qasim Mehdi; Peter Parham; Alberto Piazza; Maria Pia Pistillo; Yaping Qian; Qunfang Shu; Jiujin Xu; S Zhu; James L Weber; Henry T Greely; Marcus W Feldman; Gilles Thomas; Jean Dausset; L Luca Cavalli-Sforza
Journal:  Science       Date:  2002-04-12       Impact factor: 47.728

3.  Y chromosome STR haplotypes and the genetic structure of U.S. populations of African, European, and Hispanic ancestry.

Authors:  Manfred Kayser; Silke Brauer; Hiltrud Schädlich; Mechthild Prinz; Mark A Batzer; Peter A Zimmerman; B A Boatin; Mark Stoneking
Journal:  Genome Res       Date:  2003-04       Impact factor: 9.043

Review 4.  The human Y chromosome: an evolutionary marker comes of age.

Authors:  Mark A Jobling; Chris Tyler-Smith
Journal:  Nat Rev Genet       Date:  2003-08       Impact factor: 53.242

5.  Control region sequences for East Asian individuals in the Scientific Working Group on DNA Analysis Methods forensic mtDNA data set.

Authors:  Marc W Allard; Mark R Wilson; Keith L Monson; Bruce Budowle
Journal:  Leg Med (Tokyo)       Date:  2004-03       Impact factor: 1.376

6.  AutoDimer: a screening tool for primer-dimer and hairpin structures.

Authors:  Peter M Vallone; John M Butler
Journal:  Biotechniques       Date:  2004-08       Impact factor: 1.993

7.  Mitochondrial versus nuclear admixture estimates demonstrate a past history of directional mating.

Authors:  D A Merriwether; S Huston; S Iyengar; R Hamman; J M Norris; S M Shetterly; M I Kamboh; R E Ferrell
Journal:  Am J Phys Anthropol       Date:  1997-02       Impact factor: 2.868

8.  Characterizing the admixed African ancestry of African Americans.

Authors:  Fouad Zakharia; Analabha Basu; Devin Absher; Themistocles L Assimes; Alan S Go; Mark A Hlatky; Carlos Iribarren; Joshua W Knowles; Jun Li; Balasubramanian Narasimhan; Steven Sidney; Audrey Southwick; Richard M Myers; Thomas Quertermous; Neil Risch; Hua Tang
Journal:  Genome Biol       Date:  2009-12-22       Impact factor: 13.583

9.  Analysis of cultured human melanocytes based on polymorphisms within the SLC45A2/MATP, SLC24A5/NCKX5, and OCA2/P loci.

Authors:  Anthony L Cook; Wei Chen; Amy E Thurber; Darren J Smit; Aaron G Smith; Timothy G Bladen; Darren L Brown; David L Duffy; Lorenza Pastorino; Giovanna Bianchi-Scarra; J Helen Leonard; Jennifer L Stow; Richard A Sturm
Journal:  J Invest Dermatol       Date:  2008-07-24       Impact factor: 8.551

10.  Evaluation of group genetic ancestry of populations from Philadelphia and Dakar in the context of sex-biased admixture in the Americas.

Authors:  Klara Stefflova; Matthew C Dulik; Athma A Pai; Amy H Walker; Charnita M Zeigler-Johnson; Serigne M Gueye; Theodore G Schurr; Timothy R Rebbeck
Journal:  PLoS One       Date:  2009-11-25       Impact factor: 3.240

View more
  25 in total

1.  Forensic performance of two insertion-deletion marker assays.

Authors:  M Fondevila; C Phillips; C Santos; R Pereira; L Gusmão; A Carracedo; J M Butler; M V Lareu; P M Vallone
Journal:  Int J Legal Med       Date:  2012-06-20       Impact factor: 2.686

2.  A panel of 130 autosomal single-nucleotide polymorphisms for ancestry assignment in five Asian populations and in Caucasians.

Authors:  Hsiao-Lin Hwa; Chih-Peng Lin; Tsun-Ying Huang; Po-Hsiu Kuo; Wei-Hsin Hsieh; Chun-Yen Lin; Hsiang-I Yin; Li-Hui Tseng; James Chun-I Lee
Journal:  Forensic Sci Med Pathol       Date:  2017-04-24       Impact factor: 2.007

3.  Biological races in humans.

Authors:  Alan R Templeton
Journal:  Stud Hist Philos Biol Biomed Sci       Date:  2013-05-16

4.  Association between Y haplogroups and autosomal AIMs reveals intra-population substructure in Bolivian populations.

Authors:  Carlos Vullo; Verónica Gomes; Carola Romanini; Andréa M Oliveira; Omar Rocabado; Juliana Aquino; António Amorim; Leonor Gusmão
Journal:  Int J Legal Med       Date:  2014-05-31       Impact factor: 2.686

5.  Long-distance dispersal suppresses introgression of local alleles during range expansions.

Authors:  C E G Amorim; T Hofer; N Ray; M Foll; A Ruiz-Linares; L Excoffier
Journal:  Heredity (Edinb)       Date:  2016-08-31       Impact factor: 3.821

6.  Extensive genomic variability of knops blood group polymorphisms is associated with sickle cell disease in Africa.

Authors:  Kimberley C Duru; Jenelle A Noble; Aldiouma Guindo; Li Yi; Ikhide G Imumorin; Dapa A Diallo; Bolaji N Thomas
Journal:  Evol Bioinform Online       Date:  2015-03-09       Impact factor: 1.625

7.  HERC2 rs12913832 modulates human pigmentation by attenuating chromatin-loop formation between a long-range enhancer and the OCA2 promoter.

Authors:  Mijke Visser; Manfred Kayser; Robert-Jan Palstra
Journal:  Genome Res       Date:  2012-01-10       Impact factor: 9.043

8.  Prediction of biogeographical ancestry from genotype: a comparison of classifiers.

Authors:  Elaine Y Y Cheung; Michelle Elizabeth Gahan; Dennis McNevin
Journal:  Int J Legal Med       Date:  2016-12-20       Impact factor: 2.686

9.  A SNP panel for early detection of artificial chimerism in HSCT patients using TaqMan technology.

Authors:  Tikumphorn Sathirapatya; Wikanda Worrapitirungsi; Poonyapat Sukawutthiya; Kawin Rasmeepaisarn; Kornkiat Vongpaisarnsin
Journal:  Int J Legal Med       Date:  2020-04-05       Impact factor: 2.686

10.  Genetic risk variants in African Americans with multiple sclerosis.

Authors:  Noriko Isobe; Pierre-Antoine Gourraud; Hanne F Harbo; Stacy J Caillier; Adam Santaniello; Pouya Khankhanian; Martin Maiers; Stephen Spellman; Nezih Cereb; SooYoung Yang; Marcelo J Pando; Laura Piccio; Anne H Cross; Philip L De Jager; Bruce A C Cree; Stephen L Hauser; Jorge R Oksenberg
Journal:  Neurology       Date:  2013-06-14       Impact factor: 9.910

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.