| Literature DB >> 28100790 |
Josyf C Mychaleckyj1,2, Alexandre Havt3,4, Uma Nayak1, Relana Pinkerton5, Emily Farber1, Patrick Concannon6,7, Aldo A Lima3,4, Richard L Guerrant5.
Abstract
Despite its population, geographic size, and emerging economic importance, disproportionately little genome-scale research exists into genetic factors that predispose Brazilians to disease, or the population genetics of risk. After identification of suitable proxy populations and careful analysis of tri-continental admixture in 1,538 North-Eastern Brazilians to estimate individual ancestry and ancestral allele frequencies, we computed 400,000 genome-wide locus-specific branch length (LSBL) Fst statistics of Brazilian Amerindian ancestry compared to European and African; and a similar set of differentiation statistics for their Amerindian component compared with the closest Asian 1000 Genomes population (surprisingly, Bengalis in Bangladesh). After ranking SNPs by these statistics, we identified the top 10 highly differentiated SNPs in five genome regions in the LSBL tests of Brazilian Amerindian ancestry compared to European and African; and the top 10 SNPs in eight regions comparing their Amerindian component to the closest Asian 1000 Genomes population. We found SNPs within or proximal to the genes CIITA (rs6498115), SMC6 (rs1834619), and KLHL29 (rs2288697) were most differentiated in the Amerindian-specific branch, while SNPs in the genes ADAMTS9 (rs7631391), DOCK2 (rs77594147), SLC28A1 (rs28649017), ARHGAP5 (rs7151991), and CIITA (rs45601437) were most highly differentiated in the Asian comparison. These genes are known to influence immune function, metabolic and anthropometry traits, and embryonic development. These analyses have identified candidate genes for selection within Amerindian ancestry, and by comparison of the two analyses, those for which the differentiation may have arisen during the migration from Asia to the Americas.Entities:
Keywords: Brazil.; Native American; admixture; ancestry; genetic differentiation; selection
Mesh:
Year: 2017 PMID: 28100790 PMCID: PMC5430616 DOI: 10.1093/molbev/msw249
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Fig. 1.Geographical Map of Brazil showing the location of Fortaleza, the capital city of Ceará state, and other study centers. The inset shows the location of study recruitment centers in the North-Eastern region of Brazil. The location of Fortaleza is indicated by the yellow star icon, and other state color-coded locations are: Picos (Piauí—dark blue); Ouricuri (Pernambuco—light blue); Crato (Ceará—green); Cajazeiras, Sousa and Patos (Paraíba—red). For scale, the distance from Fortaleza to Picos (1.) or from Fortaleza to Ouricuri (2.) is approximately 300 miles.
The Six North-Eastern Brazil Studies and Results of the Genome-Wide Genotyping Quality Control (QC)
| Gonçalves Dias | Mal-ED Birth | Mal-ED Case Control | Recodisa Case Control | PU Zinc-Arginine Trial | PU Zinc Vitamin A Trial | Total | |
|---|---|---|---|---|---|---|---|
| Study Type | Birth Cohort | Birth Cohort | Prospective Case-Control | Prospective Case-Control | Randomized Clinical Trial | Randomized Clinical Trial | |
| Location | Fortaleza, Gonçalves Dias Favela | Fortaleza | Fortaleza, IPREDE | 6 Cities in 4 North-Eastern States | Fortaleza, Parque Universitário | Fortaleza, Parque Universitário | |
| Enrollment | 1989–1993 | 2010–2014 | 2010–2014 | 2010–2014 | 2006–2010 | 2000–2006 | |
| Samples Genotyped | 172 | 300 | 368 | 1044 | 126 | 109 | 2,119 |
| Samples Post-Genotyping QC | 110 | 276 | 336 | 658 | 95 | 63 | 1,538 |
| SNP QC | All Cohorts: SNPs | ||||||
| Total SNPs on Affymetrix Axiom LAT-1 Array 4 | 818,154 | ||||||
| Affymetrix SNP QC: SNPs dropped | 62,353 | ||||||
| SNPs Remaining | 755,801 | ||||||
| Call Rate < 99% + MAF | −345,629 | ||||||
| SNPS Remaining | 410,172 |
The top half of the table shows the number of DNA samples genotyped by study and the number remaining after genome-wide genotyping QC.
The bottom half of the table shows the initial total number of SNPs on the Affymetrix array used and the number remaining after QC. The same SNP results pertain to all studies and are only shown once for clarity.
MAF: Minor Allele Frequency.
Fig. 2.Genome-wide principal component analysis of the Brazil samples using all 1000 Genomes as race/ethnic reference samples in the component coordinates. The principal components were solely defined by variation in the Brazil samples (shown in green glyphs, BRN group) and the reference samples plotted into these coordinates. Principal Component 1 (PC1) is plotted against 2 (PC2). The 1000 Genomes population descriptions corresponding to the three letter codes are listed in supplementary table S2, Supplementary Material online. Higher principal components (PC3–15) did not define additional ancestral structure at the granularity of the 1000 Genomes populations. European populations cluster at approx. (PC1, PC2) (−0.08, 0.01), Asian populations at (0.02, −0.08) and Indian subcontinent populations at (−0.03, −0.02). African populations and Amerindian populations cluster along the two axes projecting from Europe at the top and bottom of the plot.
Genetic Differentiation (Fst) between the Ancestral Brazilian Admixture Components from an Unsupervised ADMIXTURE Analysis and Reference 1000 Genomes Populations Containing Major African, Latin American, and European Ancestry
| 1KG Code | Population | BRN1 (Afr) | BRN2 (Amr) | BRN3 (Eur) |
|---|---|---|---|---|
| BRN | Brazilians in North-East Brazil | 0.0256 (254, 258) | 0.0192 (191, 193) | 0.0108 (108, 109) |
| Africa | ||||
| ASW | African Ancestry in Southwest US | 0.0086 (086, 088) | 0.0705 | 0.0727 |
| ACB | African Caribbean in Barbados | 0.0157 (156, 158) | 0.0908 | 0.0954 |
| LWK | Luhya in Webuye, Kenya | 0.0240 (238, 242) | – | – |
| YRI | Yoruba in Ibadan | 0.0269 (267, 271) | – | – |
| ESN | Esan in Nigeria | 0.0278 (276, 281) | – | – |
| GWD | Gambian in Western Division | 0.0281 (279, 283) | – | – |
| MSL | Mende in Sierra Leone | 0.0286 (284, 288) | – | – |
| Latin America | ||||
| MXL | Mexican Ancestry in Los Angeles | 0.0632 | 0.0123 (122, 125) | 0.0297 |
| CLM | Colombian in Medellin | 0.0482 | 0.0201 (200, 203) | 0.0110 |
| PEL | Peruvian in Lima | 0.0983 | 0.0220 (218, 222) | 0.0754 |
| PUR | Puerto Rican in Puerto Rico | 0.0396 | 0.0286 (285, 289) | 0.0066 |
| Europe | ||||
| IBS | Iberian in Spain | 0.0631 | 0.0529 | 0.0032 (032, 033) |
| TSI | Toscani in Italy | 0.0643 | 0.0538 | 0.0041 (041, 043) |
| CEU | North/Western European ancestry in Utah | 0.0667 | 0.0527 | 0.0050 (050, 051) |
| GBR | British in England and Scotland | 0.0672 | 0.0532 | 0.0053 (052, 054) |
| FIN | Finnish in Finland | 0.0701 | 0.0518 | 0.0118 (118, 120) |
1KG populations BEB, CDX, CHB, CHS, GIH, ITU, PJL, JPT, KHV, and STU are not shown since they were not in the closest population groups by ranked Fst and were not predominantly of the putative ancestral group.
The 95% confidence interval (CI) from bootstrap percentile (B = 10,000 replicates) is shown for the closest 1KG populations for each inferred component ancestry as last 3 digits only. CIs are not shown for other Fst values. – (dash) indicates Fst >0.1 (not shown for clarity).
BRN row shows the Fst with the source Brazil samples without segregation of putative ancestral components for comparison.
Estimated Genetic Differentiation (Fst) between the Brazil Ancestral Admixture Components from a Supervised ADMIXTURE Analysis and Reference 1000 Genomes Populations Containing Major African, Latin American, and European Ancestry
| 1KG Code | Population | BRN1 (Afr) | BRN2 (Amr) | BRN3 (Eur) |
|---|---|---|---|---|
| BRN | Brazilians in North-East Brazil | 0.0837 (832, 843) | 0.0801 (796, 806) | 0.0154 (153, 155) |
| Africa | ||||
| ASW | African Ancestry in Southwest US | 0.0132 (131, 134) | – | 0.0821 |
| ACB | African Caribbean in Barbados | 0.0074 (073, 075) | – | – |
| LWK | Luhya in Webuye, Kenya | 0.0102 (101, 103) | – | – |
| YRI | Yoruba in Ibadan | 0.0040 (039, 040) | – | – |
| ESN | Esan in Nigeria | 0.0036 (035, 036) | – | – |
| GWD | Gambian in Western Division | 0.0107 (106, 108) | – | – |
| MSL | Mende in Sierra Leone | 0.0084 (084, 085) | – | – |
| Latin America | ||||
| MXL | Mexican Ancestry in Los Angeles | – | 0.0375 (372, 377) | 0.0362 |
| CLM | Colombian in Medellin | – | 0.0689 (684, 693) | 0.0148 |
| PEL | Peruvian in Lima | – | 0.0076 (075, 077) | 0.0867 |
| PUR | Puerto Rican in Puerto Rico | – | 0.0911 (905, 917) | 0.0094 |
| Europe | ||||
| IBS | Iberian in Spain | – | – | 0.0013 (012, 013) |
| TSI | Toscani in Italy | – | – | 0.0030 (029, 031) |
| CEU | North/Western European ancestry in Utah | – | – | 0.0039 (038, 040) |
| GBR | British in England and Scotland | – | – | 0.0041 (040, 042) |
| FIN | Finnish in Finland | – | – | 0.0117 (115, 118) |
1KG populations BEB, CDX, CHB, CHS, GIH, ITU, PJL, JPT, KHV, STU are not shown since they were not in the closest population groups by ranked Fst and were not predominantly of the putative ancestral group.
Analogous to table 2, the supervised ADMIXTURE analysis of the Brazil samples used N = 30 reference sample genome-wide profiles from each of the ancestral proxy groups. The 95% confidence interval from bootstrap percentile (10,000 replicates) is shown for the closest 1KG populations for each inferred component ancestry as last 3 digits only. – (dash) indicates Fst > 0.1 (not shown for clarity). CIs are not shown for other Fst values.
BRN row shows the Fst with the source Brazil samples without segregation of putative ancestral components.
Fig. 3.Genome-wide, Africa-centric, principal component analysis of the Brazil and 1000 Genomes samples. The sample three letter and color codes are identical to figure 1. Genetic variation within the African 1000 Genomes populations (ESN, GWD, LWK, MSL, and YRI), defined (supervised) the principal components with other samples plotted onto these coordinates. Panel a shows PC1 vs. PC2 and panel b shows PC2 vs. PC3. Principal components higher than three reflected recent kinship within a 1000 Genomes African population, rather than ancestral population structure. Approximate coordinates of the supervising 1000 Genomes African clusters are (PC1, PC2): YRI/ESN (0.0, −0.05); MSL (0.03, 0.001); GWD (0.05, 0.05); LWK −0.075, 0.04). Nonsupervising samples with majority African ancestry include the more “smeared” (admixed) and less homogeneous ACB (−0.02, 0.0); and ASW (−0.02, 0.02). PC3 segregates the MSL population (−ve PC3 coordinate).
Fig. 4.The proportion of continental ancestry within the Brazil samples, estimated using supervised ADMIXTURE analysis. The K = 3 ancestral components are labeled “Eur” predominantly European; “Amr” predominantly Amerindian; “Afr” predominantly African. In each panel, each individual sample along the x-axis is a narrow vertical bar with three color intervals along the y-axis that are proportional to the percentage of the three ancestries and sum to 100% (y = 1.0). In panel a, the Brazil samples are sorted along the x-axis from lowest to highest fraction of Eur ancestry (blue); in panel b, sorted by fraction of African ancestry (grey); in panel c, by fraction of Amerindian ancestry (yellow). The table shows the mean proportion of each ancestry with 95% confidence interval, and range.
The Top 10 Most Highly Differentiated Loci for the Amerindian Admixture Branch within the Brazil Samples Compared with the Putative Ancestral Population of the Three Admixture Components
| CHR | SNP | GENPOS | POS | A1 | A2 | f(Eur) | f(Amr) | f(Afr) | Fst | Nearest Gene |
|---|---|---|---|---|---|---|---|---|---|---|
| 16 | rs6498115 | 28.170 | 10965511 | T | C | 0.000 | 0.908 | 0.000 | 0.908 | |
| 2 | rs1834619 | 39.407 | 17901485 | A | G | 0.041 | 0.942 | 0.000 | 0.899 | |
| 16 | rs77979769 | 28.358 | 11343560 | A | G | 0.073 | 0.949 | 0.035 | 0.883 | |
| 2 | rs2288697 | 47.776 | 23860168 | A | G | 0.029 | 0.906 | 0.018 | 0.877 | |
| 16 | rs35346036 | 28.164 | 10951098 | G | A | 0.065 | 0.957 | 0.088 | 0.872 | CIITA |
| 16 | rs2021760 | 28.358 | 11343992 | G | A | 0.076 | 0.947 | 0.065 | 0.869 | |
| 16 | rs45601437 | 28.180 | 10989754 | A | G | 0.006 | 0.912 | 0.050 | 0.862 | |
| 16 | rs2866065 | 91.637 | 75822042 | A | G | 0.075 | 0.929 | 0.000 | 0.849 | – |
| 16 | rs8054781 | 28.400 | 11384776 | C | T | 0.026 | 0.932 | 0.092 | 0.846 | |
| 15 | rs16964480 | 35.881 | 37284909 | G | T | 0.000 | 0.837 | 0.000 | 0.837 |
GENPOS is the genetic map position of the marker on a chromosome (CHR) in centiMorgans, POS is the hg19 physical map position, A1 is the reference allele, A2 the alternative.
f(Afr), f(Amr), f(Eur) are the estimated reference allele frequencies for SNP A1 allele in the 3 Brazil ancestral admixture components.
This table shows the top ten ranked loci by Hudson Fst value, where Fst measures the genetic differentiation between the inferred second Amerindian admixture component (Amr) and a single ancestral population of all components. Fst is the Amerindian component branch-specific estimate of genetic differentiation.
Nearest Gene is taken from the RefSeq track in the UCSC genome browser database (http://genome.ucsc.edu; last accessed October 16, 2016). Annotated SNPs are within 100 kb of the nearest gene. A promoter SNP is within 10 kb 5′ to the transcription start site; 3′ downstream SNP is within 10 kb 3′ of the nearest gene; exon and intron are within an exon or intron of the nearest gene.
Fig. 5.Genome regional plots of the most highly differentiated region along the Amerindian branch, centered on SNP rs6498115, chromosome 16. The plots were generated using LocusZoom and show the physical region of chromosome 16, 0.9–20.9 Mb and at higher resolution, 9.9–11.9 Mb. The linkage disequilibrium between SNPs was estimated in LocusZoom using the 1000 genomes admixed American samples (AMR).
The Top 10 Most Highly Differentiated Loci for the Amerindian Admixture Component in the Brazil Samples vs. the Closest Asian 1KG Population (BEB, Bengalis in Bangladesh)
| CHR | SNP | GENPOS | POS | A1 | A2 | f(Eur) | f(Amr) | f(Afr) | f(BEB) | Fst | Nearest Gene |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | rs7631391 | 88.546 | 64514393 | G | A | 0.002 | 0.950 | 0.113 | 0.058 | 0.885 | |
| 5 | rs77594147 | 178.800 | 169155975 | G | A | 0.079 | 0.878 | 0.328 | 0.017 | 0.857 | |
| 5 | rs73318286 | 178.809 | 169162708 | G | A | 0.042 | 0.879 | 0.297 | 0.029 | 0.843 | |
| 15 | rs28649017 | 89.197 | 85438991 | A | G | 0.353 | 0.150 | 0.515 | 0.983 | 0.827 | |
| 14 | rs7151991 | 30.033 | 32635572 | A | G | 0.148 | 0.950 | 0.158 | 0.116 | 0.821 | |
| 16 | rs45601437 | 28.180 | 10989754 | A | G | 0.006 | 0.912 | 0.050 | 0.081 | 0.816 | |
| 16 | rs6498115 | 28.170 | 10965511 | T | C | 0.000 | 0.908 | 0.000 | 0.081 | 0.811 | |
| 20 | rs6088519 | 57.089 | 33132191 | T | C | 0.298 | 0.966 | 0.249 | 0.163 | 0.791 | |
| 2 | rs4666032 | 50.639 | 28254769 | C | T | 0.000 | 0.827 | 0.000 | 0.029 | 0.788 | |
| 22 | rs117487309 | 49.210 | 41195082 | A | G | 0.039 | 0.786 | 0.000 | 0.000 | 0.786 | SLC25A17, exon |
All other columns are as in table 4, except f(BEB) contains the allele frequency of the A1 reference allele estimated in N = 86 Bengalis in Bangladesh 1KG samples.
This table shows the top ten ranked loci by Hudson Fst value, where Fst measures the genetic differentiation between the Amr Amerindian admixture component and the BEB Bangladesh 1KG population.
Nearest Gene is taken from the RefSeq track in the UCSC genome browser database (http://genome.ucsc.edu; last accessed October 16, 2016). Annotated SNPs are within 100 kb of the nearest gene. A promoter SNP is within 10 kb 5′ to the transcription start site; 3′ downstream SNP is within 10 kb 3′ of the nearest gene; exon and intron are within an exon or intron of the nearest gene. SNP rs28649017 in SLC28A1 is in an exon and intron of different splice forms of the gene transcript.
Locus-Specific Fst and Allele Frequencies within the 1000 Genomes Populations for the Five Most Differentiated SNPs in Distinct Genome Regions in the Brazil Amerindian Component vs. Closest Asian Population (BEB)
| Population | LWK | TSI | BEB | PJL | CHB | JPT | CHS | GIH | KHV | CDX | BRN | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| N | 97 | 107 | 86 | 96 | 103 | 104 | 105 | 101 | 99 | 93 | 1538 | ||
| SNP CHR:POS | A1 | A2 | Fst LWK | Fst TSI | Fst BEB | Fst PJL | Fst CHB | Fst JPT | Fst CHS | Fst GIH | Fst KHV | Fst CDX | |
| rs7631391 3:64514393 | G | A | 0.858 | 0.950 | 0.885 | 0.909 | 0.634 | 0.685 | 0.682 | 0.889 | 0.671 | 0.694 | |
| rs77594147 5:169155975 | G | A | 0.240 | 0.826 | 0.857 | 0.801 | 0.782 | 0.777 | 0.795 | 0.817 | 0.822 | 0.785 | |
| rs28649017 15:85438991 | A | G | 0.102 | 0.242 | 0.827 | 0.843 | −0.003 | 0.024 | 0.052 | −0.001 | 0.008 | −0.003 | |
| rs7151991 14:32635572 | A | G | 0.813 | 0.754 | 0.821 | 0.749 | 0.794 | 0.791 | 0.766 | 0.677 | 0.749 | 0.707 | |
| rs45601437 16:10989754 | A | G | 0.827 | 0.901 | 0.816 | 0.820 | 0.511 | 0.586 | 0.567 | 0.865 | 0.628 | 0.629 | |
| SNP CHR:POS | A1 | A2 | f(A1) LWK | f(A1) TSI | f(A1) BEB | f(A1) PJL | f(A1) CHB | f(A1) JPT | f(A1) CHS | f(A1) GIH | f(A1) KHV | f(A1) CDX | f(A1) BRN2Amr |
| rs7631391 3:64514393 | G | A | 0.082 | 0.000 | 0.058 | 0.036 | 0.286 | 0.240 | 0.243 | 0.054 | 0.253 | 0.231 | 0.950 |
| rs77594147 5:169155975 | G | A | 0.541 | 0.042 | 0.017 | 0.063 | 0.078 | 0.082 | 0.067 | 0.050 | 0.045 | 0.075 | 0.878 |
| rs28649017 15:85438991 | A | G | 0.356 | 0.500 | 0.983 | 0.995 | 0.141 | 0.245 | 0.291 | 0.173 | 0.101 | 0.145 | 0.150 |
| rs7151991 14:32635572 | A | G | 0.124 | 0.178 | 0.116 | 0.182 | 0.141 | 0.144 | 0.167 | 0.248 | 0.182 | 0.220 | 0.950 |
| rs45601437 16:10989754 | A | G | 0.072 | 0.009 | 0.081 | 0.078 | 0.345 | 0.279 | 0.295 | 0.040 | 0.242 | 0.242 | 0.912 |
The populations are ordered by Africa(LWK), Europe(TSI), Asia(BEB, PJL, CHB, JPT, CHS, GIH, KHV, and CDX) where Asian populations are in decreasing order of similarity to BRN2(Amr) (supplementary table S5, Supplementary Material online). Population codes are LWK (Luhya in Webuye, Kenya); TSI (Toscani in Italy); BEB (Bengalis in Bangladesh); PJL (Punjabis in Lahore, Pakistan); CHB (Han Chinese in Beijing); JPT (Japanese in Tokyo, Japan); CHS (Southern Han Chinese); GIH (Gujarati in Houston, USA); KHV (Kinh in Ho Chi Minh City, Vietnam); CDX (Chinese Dai in Xishuangbanna); BRN2(Amr) (North-Eastern Brazilians, Amerindian admixture component 2).
Number of DNA samples within each population.
SNP-specific Fst value for each population compared with the Brazil Amerindian admixture component, BRN2 (Amr). In the top half of the table, the BEB Fst values are shown in bold since this is the closest Asian population and are the values in table 5. In the lower half of the table, the BEB and BRN2(Amr) columns are in bold since the difference in these frequencies is used in the FstBEB calculation. Other Fst 1KG values in the top half of the table are calculated from the 1KG population and BRN2(Amr) frequencies in the lower half.
Frequency of the A1 allele in each population.