| Literature DB >> 23028602 |
Kshitij Wagh1, Aatish Bhatia, Gabriela Alexe, Anupama Reddy, Vijay Ravikumar, Michael Seiler, Michael Boemo, Ming Yao, Lee Cronk, Asad Naqvi, Shridar Ganesan, Arnold J Levine, Gyan Bhanot.
Abstract
The Maasai are a pastoral people in Kenya and Tanzania, whose traditional diet of milk, blood and meat is rich in lactose, fat and cholesterol. In spite of this, they have low levels of blood cholesterol, and seldom suffer from gallstones or cardiac diseases. Field studies in the 1970s suggested that the Maasai have a genetic adaptation for cholesterol homeostasis. Analysis of HapMap 3 data using Fixation Index (Fst) and two metrics of haplotype diversity: the integrated Haplotype Score (iHS) and the Cross Population Extended Haplotype Homozygosity (XP-EHH), identified genomic regions and single nucleotide polymorphisms (SNPs) as strong candidates for recent selection for lactase persistence and cholesterol regulation in 143-156 founder individuals from the Maasai population in Kinyawa, Kenya (MKK). The non-synonmous SNP with the highest genome-wide Fst was the TC polymorphism at rs2241883 in Fatty Acid Binding Protein 1(FABP1), known to reduce low density lipoprotein and tri-glyceride levels in Europeans. The strongest signal identified by all three metrics was a 1.7 Mb region on Chr2q21. This region contains the genes LCT (Lactase) and MCM6 (Minichromosome Maintenance Complex Component) involved in lactase persistence, and the gene Rab3GAP1 (Rab3 GTPase-activating Protein Catalytic Subunit), which contains polymorphisms associated with total cholesterol levels in a genome-wide association study of >100,000 individuals of European ancestry. Sanger sequencing of DNA from six MKK samples showed that the GC-14010 polymorphism in the MCM6 gene, known to be associated with lactase persistence in Africans, is segregating in MKK at high frequency (∼58%). The Cytochrome P450 Family 3 Subfamily A (CYP3A) cluster of genes, involved in cholesterol metabolism, was identified by Fst and iHS as candidate loci under selection. Overall, our study identified several specific genomic regions under selection in the Maasai which contain polymorphisms in genes associated with lactase persistence and cholesterol regulation.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23028602 PMCID: PMC3461017 DOI: 10.1371/journal.pone.0044751
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Population structure components for individuals from CEU, ASW, LWK, MKK and YRI.
Results from STRUCTURE version 2.3 on genotype data for 12,999 randomly selected SNPs in 578 founder (unrelated) individuals from the CEU, ASW, LWK, MKK and YRI HapMap populations. The no-admixture model showed that the data was best fit by 6 inferred ancestral populations. Each column represents an individual, and the colors indicate the fractions of their genotype attributable to ancestry from each of the 6 ancestral populations.
Top 20 genomic regions identified as selection candidates in MKK using the Fst statistic and clustering.
| Chr | Start location | Stop location | Genes in region | Number of HighFst SNPs (empiricalp-value <0.001) | Max Fst within cluster | Max XP-EHH score within cluster |
| 2 | 135036696 | 136726567 | RAB3GAP1, ZRANB3, DARS, R3HDM1, TMEM163,YSK4, LCT, UBXN4, MCM6, MGAT5, CCNT2 | 123 | 0.382 | 12.202 |
| 2 | 78305622 | 78500655 | - | 33 | 0.311 | 3.805 |
| 12 | 56402204 | 56754137 | PAN2, OBFC2B, SLC39A5, APOF, STAT2, CS,RNF41, IKZF4, SMARCC2 | 28 | 0.283 | 3.024 |
| 3 | 191929784 | 191990575 | FGF12 | 13 | 0.272 | 5.222 |
| 5 | 115126388 | 115223035 | ATG12, AP3S1 | 7 | 0.266 | 3.870 |
| 2 | 163048404 | 163152351 | IFIH1, FAP | 19 | 0.261 | 3.108 |
| 7 | 99053816 | 99436198 | ZNF498, CYP3A4, CPSF4, CYP3A7, CYP3A43 | 17 | 0.260 | 3.290 |
| 1 | 12296232 | 12319994 | VPS13D | 4 | 0.253 | 3.060 |
| 22 | 49978502 | 50077531 | - | 4 | 0.244 | 3.732 |
| 5 | 32128179 | 32159329 | GOLPH3 | 5 | 0.242 | 3.062 |
| 5 | 14747247 | 14750823 | ANKH | 4 | 0.237 | 6.800 |
| 14 | 36033703 | 36201722 | RALGAPA1 | 4 | 0.221 | 3.517 |
| 2 | 136917330 | 136921703 | - | 2 | 0.218 | 8.549 |
| 1 | 198692364 | 198745866 | PTPRC | 2 | 0.212 | 3.138 |
| 2 | 137580234 | 137595545 | - | 4 | 0.209 | 4.871 |
| 12 | 111414527 | 111502280 | CUX2 | 5 | 0.209 | 3.393 |
| 17 | 75423198 | 75431978 | SEPT9 | 3 | 0.200 | 5.024 |
| 18 | 66714832 | 66724690 | CCDC102B | 4 | 0.200 | 5.704 |
| 1 | 74807337 | 74842787 | TNNI3K | 3 | 0.193 | 3.993 |
| 3 | 185752767 | 185805993 | ETV5 | 3 | 0.192 | 4.569 |
1,232 SNPs with significant Fst scores (pB<8.6E−6, pE<0.001) were clustered into contiguous genomic regions of linkage disequilibrium. A cluster was defined as a collection of SNPs in a genomic region where each SNP had genotype R2≥0.25 with at least one other SNP in the cluster. Clusters containing a SNP with maximum XP-EHH score >3 were identified as being MKK associated. The 22 top clusters are ranked by the highest Fst value for a SNP pair in a cluster. The complete set of clusters identified by Fst is in Table S2.
The most significant non-synonymous SNPs under selection in MKK using Fst, with LWK as the reference population.
| Rsid of SNP | Chr | Position | Gene | Bonferroni correctedPermutation p-value (pB) | Empirical p-value (pE)using distribution ofnon-coding SNPs | Fst MKK vs LWK | Fst MKK vs YRI | Fst MKK vs ASW |
| rs2241883 | 2 | 88424066 | FABP1 | 1.72E−12 | 3.13E−05 | 0.250 | 0.172 | 0.152 |
| rs961360 | 2 | 136393658 | R3HDM1 | 3.13E−08 | 3.13E−04 | 0.199 | 0.288 | 0.447 |
| rs6997753 | 8 | 142487937 | FLJ43860 | 4.87E−08 | 3.59E−04 | 0.194 | 0.138 | 0.006 |
| rs531503 | 7 | 100377082 | ZAN | 3.83E−07 | 5.47E−04 | 0.182 | 0.014 | 0.073 |
| rs17014118 | 4 | 89319296 | HERC6 | 4.42E−07 | 6.06E−04 | 0.180 | 0.178 | 0.045 |
| rs2271586 | 11 | 3659993 | ART5 | 4.76E−07 | 6.06E−04 | 0.180 | 0.034 | 0.004 |
| rs10930046 | 2 | 163137983 | IFIH1 | 1.24E−06 | 6.86E−04 | 0.176 | 0.279 | 0.128 |
| rs1051334 | 12 | 71523134 | TSPAN8 | 1.36E−06 | 6.86E−04 | 0.176 | 0.173 | 0.104 |
| rs10475299 | 5 | 5461233 | KIAA0947 | 1.46E−06 | 6.86E−04 | 0.175 | 0.160 | 0.198 |
| rs1918496 | 12 | 56722060 | PAN2 | 3.06E−06 | 8.17E−04 | 0.171 | 0.296 | 0.074 |
| rs13389745 | 2 | 65298657 | CEP68 | 3.84E−06 | 8.17E−04 | 0.172 | 0.115 | 0.052 |
| rs846266 | 7 | 42088222 | GLI3 | 2.54E−06 | 9.42E−04 | 0.169 | 0.150 | 0.059 |
| rs3813227 | 2 | 73651967 | ALMS1 | 6.02E−06 | 9.82E−04 | 0.167 | 0.173 | 0.034 |
The most significant non-synonymous SNPs identified as candidates for selection by Fst. The complete list of 1,232 SNPs identified as selection candidates by Fst (pB <8.6E−6 and pE <0.001) is in Table S1.
The most significant genomic regions under selection in MKK using iHS.
| Chr | Cluster start position (GRCh37) | Cluster end position (GRCh37) | Genes | Max |iHS| in cluster | # of SNPs in cluster with |iHS| >2 |
| 2 | 134221398 | 137892309 | LCT, MGAT5, NCKAP5, DARS, ZRANB3, R3HDM1, TMEM163, RAB3GAP1, THSD7B, CCNT2, YSK4, UBXN4, MCM6 | 6.339 | 545 |
| 13 | 30496779 | 30565298 | – | 5.234 | 26 |
| 7 | 20373632 | 20468718 | ITGB8 | 5.012 | 45 |
| 2 | 176089888 | 176422005 | – | 4.626 | 69 |
| 11 | 110532348 | 110663647 | ARHGAP20 | 4.480 | 36 |
| 9 | 83127968 | 83382243 | – | 4.471 | 59 |
| 5 | 14657062 | 14753764 | FAM105B, ANKH | 4.429 | 23 |
| 18 | 66652846 | 66765215 | CCDC102B | 4.402 | 33 |
| 11 | 34025053 | 34189564 | CAPRIN1, NAT10, ABTB2 | 4.375 | 22 |
| 2 | 179421694 | 179606538 | TTN | 4.289 | 28 |
| 14 | 105792959 | 105907642 | PACS2, MTA1 | 4.228 | 20 |
| 5 | 108990708 | 109217428 | MAN2A1 | 4.219 | 50 |
| 9 | 107973277 | 108067684 | SLC44A1 | 4.192 | 34 |
| 9 | 3869844 | 3919130 | GLIS3 | 4.185 | 23 |
| 7 | 99053816 | 99314986 | ZNF789, CPSF4, ATP5J2, FAM200A, ZNF655, ZNF498,CYP3A7, ZKSCAN5, CYP3A5 | 4.120 | 24 |
| 9 | 13812037 | 13867306 | – | 4.066 | 23 |
| 11 | 75470813 | 75678647 | UVRAG, DGAT2 | 4.059 | 48 |
| 2 | 12294875 | 12366781 | – | 4.041 | 24 |
| 14 | 97426813 | 97505011 | – | 4.025 | 24 |
| 8 | 145839058 | 146082167 | COMMD5, LOC100287170, LOC100129596, ARHGAP39,RPL8, ZNF7, ZNF251, ZNF34, LOC100287297, ZNF517 | 3.955 | 22 |
Using a sliding window of 50 SNPs wide, genomic regions were scored for the fraction of SNPs with |iHS|>2. The top 0.02% of non-overlapping windows were identified and merged into genomic clusters based on genotype R2 using the same criterion as in Table 1. Clusters are ranked by the maximum |iHS| value in the cluster. Complete lists of genome-wide significant SNPs and regions identified by iHS are in Tables S2a and S2b respectively.
The most significant genomic regions under selection in MKK using XP-EHH, with LWK as the reference population.
| Chr | Start Position | End Position | Genes | Number of SNPs | Max XP-EHH |
| 2 | 135058615 | 137017060 | R3HDM1, MGAT5, RAB3GAP1, LCT, DARS, ZRANB3,MCM6, TMEM163, ACMSD, CCNT2, YSK4, UBXN4, CXCR4 | 572 | 12.182 |
| 5 | 14681797 | 14751400 | FAM105B, ANKH | 25 | 6.800 |
| 18 | 66712510 | 66731187 | CCDC102B | 12 | 5.587 |
| 5 | 115885282 | 115922669 | SEMA6A | 21 | 5.482 |
| 18 | 66768031 | 66777543 | – | 5 | 5.324 |
| 20 | 4513311 | 4522535 | – | 10 | 5.313 |
| 13 | 104870241 | 104880533 | – | 7 | 5.183 |
| 4 | 64594290 | 64639661 | – | 16 | 5.149 |
| 2 | 134507165 | 134561145 | – | 12 | 5.062 |
| 16 | 75360734 | 75364940 | CFDP1 | 2 | 5.040 |
| 17 | 75427551 | 75428021 | SEPT9 | 2 | 5.024 |
| 3 | 191943578 | 191989642 | FGF12 | 10 | 5.019 |
| 11 | 117610387 | 117620420 | DSCAML1 | 8 | 4.989 |
SNPs with positive genome-wide significant XP-EHH scores (XP-EHH ≥4.796, two-tailed Bonferroni corrected p≤0.05) were grouped into contiguous genomic clusters using genotype R2 using the same criterion as in Table 1. Overlapping clusters were merged. Column E lists the number of significant SNPs in each each cluster. Complete lists of genome-wide significant SNPs and clusters identified by XP-EHH are in Tables S3a and S3b.
Concordant genomic regions identified by at least two of three metrics as candidates for selection in MKK.
| Chr | Genomic Extent | Significant by (Method) | Genes in Region | Number of SNPs identified by each Method |
| 2 | 135058615–136726567 | Fst, iHS, XP-EHH | MGAT5, TMEM163, ACMSD, CCNT2, YSK4, RAB3GAP1,ZRANB3, R3HDM1, UBXN4, LCT, MCM6, DARS | Fst: 123, iHS: 545, XP-EHH: 572 |
| 3 | 191943578–191989642 | Fst, XP-EHH | FGF12 | Fst:13, XP-EHH: 10 |
| 5 | 14747247–14750823 | Fst, iHS, XP-EHH | ANKH | Fst: 4, iHS: 23, XP-EHH: 25 |
| 5 | 115885574–115885672 | Fst,XP-EHH | SEMA6A | Fst: 2, XP-EHH: 21 |
| 7 | 99053816–99314986 | Fst, iHS | ZNF789, CPSF4, ATP5J2, FAM200A, ZNF655,ZNF498, CYP3A7, ZKSCAN5, CYP3A5 | Fst: 17, iHS: 24 |
| 17 | 75427551–75428021 | Fst, XP-EHH | SEPT9 | Fst: 3, XP-EHH: 2 |
| 18 | 66714832–66724690 | Fst, iHS, XP-EHH | CCDC102B | Fst: 4, iHS: 33, XP-EHH: 12 |
Genomic regions identified as genome-wide significant by at least two of the three methods - Fst, iHS and XP-EHH.
Figure 2(a) Genome-wide significant scores identifying candidate regions under selection on Chromosome 2.
Chromosome wide plot of SNPs with significant scores using Fst (empirical p-value <0.001 and Bonferroni corrected permutation test pB <8.6E−6), iHS (normalized |iHS|>2), and XP-EHH (XP-EHH ≥4.796, two-tailed Bonferroni corrected p≤0.05). The SNPs thus identified were clustered on the basis of linkage disequilibrium to identify contiguous genomic regions that are candidates for selections (Table 1,2,3,4). The locus containing the genes LCT and MCM6 (135–137 Mb) was identified by all three metrics as the top candidate for selection. The non-synonymous TC polymorphism at rs2241883 in the FABP1 gene had most significant genome-wide Fst (Fst = 0.25, pE = 3.13E−5). The MKK samples have a high frequency (∼0.45) of the protective C allele, known to be associated with low cholesterol levels in Europeans (plots for other chromosomes in Appendix S6). (b) Inset of the LCT locus on Chromosome 2.An inset of the Fst, iHS and XP-EHH scores for SNPs in the ∼ 1 Mb locus (from 135.8–136.8 Mb) on Chr 2 containing the genes LCT and MCM6. The uniformly high values for all three metrics in this region suggest that this locus has undergone strong selection pressure. The blue marker indicates the position of the lactase associated SNP in MCM6 that we sequenced, which was polymorphic in MKK with frequency pC = 0.58+/−0.14 (68% CI) for the protective C allele.