| Literature DB >> 22412386 |
Joshua Mark Galanter1, Juan Carlos Fernandez-Lopez, Christopher R Gignoux, Jill Barnholtz-Sloan, Ceres Fernandez-Rozadilla, Marc Via, Alfredo Hidalgo-Miranda, Alejandra V Contreras, Laura Uribe Figueroa, Paola Raska, Gerardo Jimenez-Sanchez, Irma Silva Zolezzi, Maria Torres, Clara Ruiz Ponte, Yarimar Ruiz, Antonio Salas, Elizabeth Nguyen, Celeste Eng, Lisbeth Borjas, William Zabala, Guillermo Barreto, Fernando Rondón González, Adriana Ibarra, Patricia Taboada, Liliana Porras, Fabián Moreno, Abigail Bigham, Gerardo Gutierrez, Tom Brutsaert, Fabiola León-Velarde, Lorna G Moore, Enrique Vargas, Miguel Cruz, Jorge Escobedo, José Rodriguez-Santana, William Rodriguez-Cintrón, Rocio Chapela, Jean G Ford, Carlos Bustamante, Daniela Seminara, Mark Shriver, Elad Ziv, Esteban Gonzalez Burchard, Robert Haile, Esteban Parra, Angel Carracedo.
Abstract
Most individuals throughout the Americas are admixed descendants of Native American, European, and African ancestors. Complex historical factors have resulted in varying proportions of ancestral contributions between individuals within and among ethnic groups. We developed a panel of 446 ancestry informative markers (AIMs) optimized to estimate ancestral proportions in individuals and populations throughout Latin America. We used genome-wide data from 953 individuals from diverse African, European, and Native American populations to select AIMs optimized for each of the three main continental populations that form the basis of modern Latin American populations. We selected markers on the basis of locus-specific branch length to be informative, well distributed throughout the genome, capable of being genotyped on widely available commercial platforms, and applicable throughout the Americas by minimizing within-continent heterogeneity. We then validated the panel in samples from four admixed populations by comparing ancestry estimates based on the AIMs panel to estimates based on genome-wide association study (GWAS) data. The panel provided balanced discriminatory power among the three ancestral populations and accurate estimates of individual ancestry proportions (R² > 0.9 for ancestral components with significant between-subject variance). Finally, we genotyped samples from 18 populations from Latin America using the AIMs panel and estimated variability in ancestry within and between these populations. This panel and its reference genotype information will be useful resources to explore population history of admixture in Latin America and to correct for the potential effects of population stratification in admixed samples in the region.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22412386 PMCID: PMC3297575 DOI: 10.1371/journal.pgen.1002554
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Characteristics of the AIMs panel.
| Population | Number of AIMs | Cumulative LSBL Fst | Cumulative LSBL In | LSBL Fst | LSBL IN |
| (mean ± sd; median, 25:75) | (mean ± sd; median, 25:75) | ||||
| African | 115 | 73.0 | 43.8 | 0.64±0.05;0.63, 0.61: 0.66 | 0.38±0.03;0.37, 0.36: 0.40 |
| European | 202 | 77.9 | 44.0 | 0.39±0.05;0.37, 0.35: 0.41 | 0.22±0.03;0.21, 0.20: 0.23 |
| Native American | 129 | 74.5 | 44.0 | 0.58±0.05;0.56, 0.54: 0.61 | 0.34±0.03;0.33, 0.32: 0.36 |
Figure 1Bland-Altman plots showing error in individual ancestral estimates using AIMs to ancestral estimates using GWAS data.
The x-axis shows the ancestry estimate using GWAS data; the y-axis shows the difference in estimates between GWAS and AIMs data using the 425 AIMs genotyped in the GALA Mexicans and Puerto Rican samples, 314 AIMs for the Mexico City sample, and 398 AIMs for the MGDP-INMEGEN sample.
Validation of the AIMs panel compared to ancestry estimates using GWAS data.
| Sample Ancestry | Mean ancestry estimate (with GWAS) | Correlation R2 | Mean error ± sd | Mean discordance | Root mean square error |
|
| |||||
| Native American | 0.642 | 0.968 | −0.005 (±0.032) | 0.025 | 0.032 |
| European | 0.324 | 0.956 | −0.010 (±0.034) | 0.028 | 0.036 |
| African | 0.035 | 0.555 | 0.015 (±0.025) | 0.023 | 0.029 |
|
| |||||
| Native American | 0.544 | 0.966 | 0.009 (±0.031) | 0.025 | 0.032 |
| European | 0.402 | 0.964 | −0.022 (±0.031) | 0.031 | 0.038 |
| African | 0.054 | 0.722 | 0.012 (±0.023) | 0.020 | 0.026 |
|
| |||||
| Native American | 0.496 | 0.972 | 0.002 (±0.029) | 0.023 | 0.029 |
| European | 0.458 | 0.967 | −0.027 (±0.031) | 0.033 | 0.041 |
| African | 0.046 | 0.558 | 0.026 (±0.025) | 0.029 | 0.035 |
|
| |||||
| Native American | 0.124 | 0.603 | 0.027 (±0.029) | 0.033 | 0.040 |
| European | 0.670 | 0.914 | −0.059 (±0.034) | 0.060 | 0.068 |
| African | 0.206 | 0.942 | 0.032 (±0.030) | 0.036 | 0.044 |
Performance of nested subsets of AIMs.
| Sample | Correlation R2 | Mean error | Mean discordance | RMSE |
|
| ||||
| Native American | 0.97 | −0.005 | 0.025 | 0.032 |
| European | 0.96 | −0.010 | 0.028 | 0.036 |
| African | 0.56 | 0.015 | 0.023 | 0.029 |
|
| ||||
| Native American | 0.54 | 0.025 | 0.034 | 0.042 |
| European | 0.89 | −0.061 | 0.063 | 0.072 |
| African | 0.92 | 0.035 | 0.041 | 0.049 |
|
| ||||
| Native American | 0.95 | −0.005 | 0.031 | 0.039 |
| European | 0.94 | −0.011 | 0.033 | 0.042 |
| African | 0.48 | 0.016 | 0.026 | 0.034 |
|
| ||||
| Native American | 0.43 | 0.025 | 0.034 | 0.042 |
| European | 0.85 | −0.060 | 0.063 | 0.072 |
| African | 0.89 | 0.035 | 0.044 | 0.053 |
|
| ||||
| Native American | 0.92 | −0.006 | 0.040 | 0.051 |
| European | 0.89 | −0.014 | 0.044 | 0.056 |
| African | 0.35 | 0.020 | 0.034 | 0.044 |
|
| ||||
| Native American | 0.27 | 0.035 | 0.052 | 0.064 |
| European | 0.72 | −0.067 | 0.077 | 0.093 |
| African | 0.77 | 0.032 | 0.055 | 0.069 |
|
| ||||
| Native American | 0.85 | −0.011 | 0.056 | 0.070 |
| European | 0.80 | −0.016 | 0.061 | 0.076 |
| African | 0.21 | 0.027 | 0.044 | 0.059 |
|
| ||||
| Native American | 0.14 | 0.038 | 0.069 | 0.086 |
| European | 0.56 | −0.086 | 0.101 | 0.123 |
| African | 0.64 | 0.049 | 0.076 | 0.096 |
|
| ||||
| Native American | 0.76 | −0.011 | 0.075 | 0.094 |
| European | 0.69 | −0.027 | 0.081 | 0.103 |
| African | 0.14 | 0.038 | 0.059 | 0.081 |
|
| ||||
| Native American | 0.10 | 0.041 | 0.086 | 0.108 |
| European | 0.39 | −0.099 | 0.125 | 0.156 |
| African | 0.48 | 0.057 | 0.101 | 0.127 |
Figure 2Performance of nested subsets of AIMs.
Ancestry of Latin American populations.
| Population | Country | Sample size | Native American Ancestry | European Ancestry | African Ancestry |
|
| Colombia (Southern) | 22 | 0.80, 0.57: 0.87 | 0.17, 0.12: 0.37 | 0.02, 0.0: 0.05 |
|
| Colombia (Central) | 19 | 0.86, 0.83: 0.89 | 0.09, 0.07: 0.13 | 0.02, 0.01: 0.05 |
|
| Colombia (Southern) | 36 | 0.83, 0.64: 0.87 | 0.16, 0.12: 0.31 | 0.02, 0.0: 0.04 |
|
| Venezuela (Amazon) | 20 | 1.00, 1.00: 1.00 | 0.0, 0.0: 0.0 | 0.0, 0.0: 0.0 |
|
| Venezuela (Amazon) | 20 | 1.00, 1.00: 1.00 | 0.0, 0.0: 0.0 | 0.0, 0.0: 0.0 |
|
| Venezuela (Amazon) | 20 | 0.99, 0.97: 1.00 | 0.01, 0.0: 0.02 | 0.0, 0.0: 0.01 |
|
| Venezuela (North) | 20 | 0.97, 0.84: 0.99 | 0.02, 0.0: 0.08 | 0.02, 0.0: 0.03 |
|
| Argentina | 14 | 0.41, 0.12: 0.84 | 0.54, 0.13: 0.81 | 0.05, 0.01: 0.08 |
|
| Venezuela | 20 | 0.28, 0.25: 0.36 | 0.60, 0.44: 0.62 | 0.12, 0.11: 0.15 |
|
| Chile | 20 | 0.46, 0.37: 0.50 | 0.51, 0.43: 0.55 | 0.05, 0.03: 0.07 |
|
| Chile | 20 | 0.51, 0.43: 0.55 | 0.45, 0.38: 0.53 | 0.06, 0.03: 0.08 |
|
| Colombia | 19 | 0.39, 0.35: 0.46 | 0.52, 0.48: 0.56 | 0.06, 0.04: 0.08 |
|
| Bolivia | 11 | 0.99, 0.98: 1.00 | 0.0, 0.0: 0.02 | 0.0, 0.0: 0.01 |
|
| Colombia | 35 | 0.13, 0.10: 0.18 | 0.10, 0.07: 0.16 | 0.76, 0.64: 0.83 |
|
| Colombia | 28 | 0.18, 0.12: 0.26 | 0.25, 0.19: 0.20 | 0.54, 0.46: 0.69 |
|
| Bolivia | 10 | 0.94, 0.78: 0.96 | 0.04, 0.03: 0.22 | 0.01, 0.0: 0.03 |
|
| Bolivia | 12 | 090, 0.86: 0.95 | 0.09, 0.05: 0.13 | 0.0, 0.0: 0.01 |
|
| Bolivia | 27 | 0.25, 0.13: 0.97 | 0.03, 0.0: 0.05 | 0.70, 0.01: 0.82 |
Ancestries are given in median and 25th:75th percentiles.
Figure 3Ancestry estimates of Latin American populations.
Figure 4Time since admixture for Mestizo and African descendent populations.
Ancestral populations used for this study.
| Population | Designation | Sample size | Platform(s) |
|
| CEU | 56 | Affymetrix 6.0/Illumina 1M |
|
| TSI | 44 | Affymetrix 6.0/Illumina 1M |
|
| SPAIN | 619 | Affymetrix 6.0 |
|
| YRI | 53 | Affymetrix 6.0/Illumina 1M |
|
| LWK | 50 | Affymetrix 6.0/Illumina 1M |
|
| AYMARA | 25 | Affymetrix 6.0 |
|
| QUECHUA | 24 | Affymetrix 6.0 |
|
| NAHUA | 14 | Affymetrix 6.0 |
|
| MAYAS | 25 | Affymetrix 500K/Illumina 550K |
|
| TEPHUANOS | 22 | Affymetrix 500K/Illumina 550K |
|
| ZAPOTECAS | 21 | Affymetrix 500K/Illumina 550K |
Figure 5Algorithm for selecting AIMs.
Samples used for validation.
| Population | Ethnicity | Sample size | Platform(s) |
|
| Mexican | 668 | Affymetrix 6.0 |
|
| Puerto Rican | 803 | Affymetrix 6.0 |
|
| Mexican | 312 | Affymetrix 500K+Illumina 550 |
|
| Mexican | 1310 | Affymetrix 5.0 |
Latin American populations genotyped in stage III of this study.
| Population | Country | Ethnicity | Sample size |
|
| Colombia (Southern) | Indigenous | 22 |
|
| Colombia (Central) | Indigenous | 19 |
|
| Colombia (Southern) | Indigenous | 36 |
|
| Venezuela (Amazon) | Indigenous | 20 |
|
| Venezuela (Amazon) | Indigenous | 20 |
|
| Venezuela (Amazon) | Indigenous | 20 |
|
| Venezuela (North) | Indigenous | 20 |
|
| Argentina | Indigenous | 14 |
|
| Venezuela | Mestizo (admixed) | 20 |
|
| Chile | Mestizo (admixed) | 20 |
|
| Chile | Mestizo (admixed) | 20 |
|
| Colombia | Mestizo (admixed) | 19 |
|
| Bolivia | Mestizo (admixed) | 11 |
|
| Colombia | Afro-Colombian | 35 |
|
| Colombia | Afro-Colombian | 28 |
|
| Bolivia | Multi-ethnic (Mestizo and Indigenous) | 10 |
|
| Bolivia | Multi-ethnic (Mestizo and Indigenous) | 12 |
|
| Bolivia | Multi-ethnic (Indigenous, Afro-Bolivian) | 27 |
Figure 6Origin of samples used in this study.
Labels in purple correspond to the Native American ancestral populations, labels in red to the validation samples, and labels in black to the 18 populations from throughout the Americas. MGDP-INMEGEN samples were collected throughout Mexico (see Figure S1). GALA Mexico samples were also collected in the San Francisco Bay Area, CA. GALA Puerto Rico samples were also collected in New York, NY.