| Literature DB >> 33831079 |
Israel Aguilar-Ordoñez1,2, Fernando Pérez-Villatoro1,2,3, Humberto García-Ortiz2, Francisco Barajas-Olmos2, Judith Ballesteros-Villascán4, Ram González-Buenfil4, Cristobal Fresno2, Alejandro Garcíarrubio1, Juan Carlos Fernández-López2, Hugo Tovar2, Enrique Hernández-Lemus2, Lorena Orozco2, Xavier Soberón1,2, Enrique Morett1.
Abstract
There has been limited study of Native American whole genome diversity to date, which impairs effective implementation of personalized medicine and a detailed description of its demographic history. Here we report high coverage whole genome sequencing of 76 unrelated individuals, from 27 indigenous groups across Mexico, with more than 97% average Native American ancestry. On average, each individual has 3.26 million Single Nucleotide Variants and short indels, that together comprise a catalog of 9,737,152 variants, 44,118 of which are novel. We report 497 common Single Nucleotide Variants (with allele frequency > 5%) mapped to drug responses and 316,577 in enhancer or promoter elements; interestingly we found some of these enhancer variants in PPARG, a nuclear receptor involved in highly prevalent health problems in Mexican population, such as obesity, diabetes, and insulin resistance. By detecting signals of positive selection we report 24 enriched key pathways under selection, most of them related to immune mechanisms. No missense variants in ACE2, the receptor responsible for the entry of the SARS CoV-2 virus, were found in any individual. Population genomics and phylogenetic analyses demonstrated stratification in a Northern-Central-Southern axis, with major substructure in the Central region. The Seri, a northern group with the most genetic divergence in our study, showed a distinctive genomic context with the most novel variants, and the most population specific genotypes. Genome-wide analysis showed that the average haplotype blocks are longer in Native Mexicans than in other world populations. With this dataset we describe previously undetected population level variation in Native Mexicans, helping to reduce the gap in genomic data representation of such groups.Entities:
Mesh:
Substances:
Year: 2021 PMID: 33831079 PMCID: PMC8031408 DOI: 10.1371/journal.pone.0249773
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1NM sampling.
(a) Each point represents the approximate geographic origin of the 76 individuals, and the number in the legend indicates the number of samples per indigenous group. Legend separates individuals from Northern (N), Central (C) and Southern (S) regions. For the exact GPS coordinates see S1 Table in S2 Material. (b) Total variants (M = millions) for each of the 76 individuals; the y axis enumerates the sampled individuals and is shared with panels c, d, and e; shape and color of the points correspond to the indigenous groups in the map. (c) Number of singletons (K = thousands) for each sample inferred from worldwide comparison with gnomAD and the 1000 Genomes Project. (d) Number of novel variants (K = thousands) not registered in dbSNP b152. (e) Percentage of Native American ancestry.
Fig 2Summary of variant effect annotations in the NM catalog.
All plots depict log10 number of variants. The color legend is shared between panels. (a) Consequences from the full set of SNVs. (b) Consequences from the full set of indels. (c) Consequences in natural selection signals. (d) Consequences in novel SNVs found at an allele frequency > 5%. nc-transcript = noncoding transcript.
NM whole genome variation summary.
| SNV | INDEL | |||||
|---|---|---|---|---|---|---|
| Rare (AF < = 1%) | Low (AF 1–5%) | Common (AF > 5%) | Rare (AF < = 1%) | Low (AF 1–5%) | Common (AF > 5%) | |
| 3.566 | 2.812 | 12.936 | 262 | 269 | 962 | |
| 25.784 | 9.118 | 28.945 | 843 | 342 | 592 | |
| 1,918,765 | 1,128,329 | 4,930,622 | 225.923 | 211.317 | 592.412 | |
| 0 | 34.791 | 314 | 0 | 7.148 | 1.865 | |
| 155.175 | 69.571 | 279.838 | 18.254 | 13.915 | 36.739 | |
| 1.623 | 2.789 | 20.037 | 28 | 58 | 371 | |
| 28 | 66 | 497 | 0 | 0 | 0 | |
AF = allele frequency.
Fig 3GRCh38 variation overview in NM.
(a) SNVs under selection, health related selection signals (matching a GWAS catalog or ClinVar registry) are highlighted in orange. (b) Novel SNVs with allele frequency higher than 5%. (c) SNVs altering enhancer or promoter elements. Height of the dots in a, b and c depicts the allele frequency of the variants. (d) Population-wide variant density. (e) Average NGS genome coverage.
Heterozygosity ratio and haplotype block length per population.
| Population | HET(avr) | HET(med) | Average Haplotype Length (kb) | Median Haplotype Length (kb) |
|---|---|---|---|---|
| 0.3025 | 0.3175 | 6.2 | 2.108 | |
| 0.2939 | 0.3125 | 6.13 | 2.079 | |
| 0.2935 | 0.3165 | 1.703 | 669 | |
| 0.2834 | 0.301 | 7.788 | 2.484 | |
| 0.2735 | 0.2727 | 10.01 | 3.153 | |
| 0.2596 | 0.2763 | 11.89 | 3.488 |
Fig 4NM demography.
(a) PCA of NM including 4 Native Peruvians (NP). (b) Summarized Parallel Coordinate Plot, showing only statistically significant PCs; (b) top panel, PC values per region, solid lines depict mean values, and dashed lines depict standard deviation; (b) bottom panel, dotted parallel coordinate plot, each dot depicts an individual. (c) ADMIXTURE analysis for different k, samples are ordered by geographic latitude and ethnic group. (d) Neighbor-joining tree based on FST between the 27 NM groups and NP in our study; colors indicate Region from Fig 4B.