Literature DB >> 23028602

Lactase persistence and lipid pathway selection in the Maasai.

Kshitij Wagh1, Aatish Bhatia, Gabriela Alexe, Anupama Reddy, Vijay Ravikumar, Michael Seiler, Michael Boemo, Ming Yao, Lee Cronk, Asad Naqvi, Shridar Ganesan, Arnold J Levine, Gyan Bhanot.   

Abstract

The Maasai are a pastoral people in Kenya and Tanzania, whose traditional diet of milk, blood and meat is rich in lactose, fat and cholesterol. In spite of this, they have low levels of blood cholesterol, and seldom suffer from gallstones or cardiac diseases. Field studies in the 1970s suggested that the Maasai have a genetic adaptation for cholesterol homeostasis. Analysis of HapMap 3 data using Fixation Index (Fst) and two metrics of haplotype diversity: the integrated Haplotype Score (iHS) and the Cross Population Extended Haplotype Homozygosity (XP-EHH), identified genomic regions and single nucleotide polymorphisms (SNPs) as strong candidates for recent selection for lactase persistence and cholesterol regulation in 143-156 founder individuals from the Maasai population in Kinyawa, Kenya (MKK). The non-synonmous SNP with the highest genome-wide Fst was the TC polymorphism at rs2241883 in Fatty Acid Binding Protein 1(FABP1), known to reduce low density lipoprotein and tri-glyceride levels in Europeans. The strongest signal identified by all three metrics was a 1.7 Mb region on Chr2q21. This region contains the genes LCT (Lactase) and MCM6 (Minichromosome Maintenance Complex Component) involved in lactase persistence, and the gene Rab3GAP1 (Rab3 GTPase-activating Protein Catalytic Subunit), which contains polymorphisms associated with total cholesterol levels in a genome-wide association study of >100,000 individuals of European ancestry. Sanger sequencing of DNA from six MKK samples showed that the GC-14010 polymorphism in the MCM6 gene, known to be associated with lactase persistence in Africans, is segregating in MKK at high frequency (∼58%). The Cytochrome P450 Family 3 Subfamily A (CYP3A) cluster of genes, involved in cholesterol metabolism, was identified by Fst and iHS as candidate loci under selection. Overall, our study identified several specific genomic regions under selection in the Maasai which contain polymorphisms in genes associated with lactase persistence and cholesterol regulation.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 23028602      PMCID: PMC3461017          DOI: 10.1371/journal.pone.0044751

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The Maasai are a pastoralist, Nilotic people living primarily in southern Kenya and northern Tanzania. An economy traditionally based on herds of cattle, sheep, and goats led to a diet rich in lactose, fat, and cholesterol consisting largely of milk, meat, and blood. Although their cholesterol intake is high (600–2000 mg/day), and 66% of their calories come from fat, their total serum cholesterol levels average 135 mg/100 ml [1]–[4]. In comparison, a study consisting of cohorts from seven countries (Croatia, Finland, Greece, Italy, Japan, Netherlands, USA) found that the average dietary cholesterol intakes are 141–612 mg/day and serum cholesterol levels range from 160–266 mg/100 ml [5]. Greenland Eskimos were found to have a high cholesterol consumption of 420–1650 mg/day [6] with average consumption of ∼700 mg/day [7], and were found to have plasma cholesterol levels of 233 mg/100 ml [8]. Although African children generally have lower cholesterol levels (115–137 mg/100 ml for 7–8 year olds) than other populations [9], the fact that adult Maasai have very low cholesterol levels, inspite of a high cholesterol diet, is quite remarkable. The Maasai also have low rates of cholelithiasis (especially cholesterol gallstones), low blood pressure, and low incidence of atherosclerotic coronary artery disease [1]–[3], [10]. Various hypotheses to understand this puzzle have been proposed, such as: “physical fitness and freedom from emotional stress” [10], [11], a “hypo-cholesterolaemic factor” in milk [12] and saponins derived from herbs [13]. However, the hypo-cholesterolaemic factor was never found, and the model of [10], [11] could not explain the low frequencies of heart disease in older Maasai men who lead sedentary lives after age ∼ 24, when their warrior (Murran/Moran) period ends [14], [15]. Additional clues emerged from a controlled experiment [2] on 23 healthy Maasai adults (11 experimental, 12 control) between the ages of 20 and 24 years. All study subjects were fed a basic high calorie, cholesterol-free diet for 8 weeks, including trace amounts (1 micro-curie) of radioactively labeled Cholesterol-4-14C. The eleven subjects in the treatment group were fed 2 gm of crystalline cholesterol per day in addition to the basic diet. Blood and fecal samples were collected at the start of the study, weekly for 8 weeks and at the end of 9, 16 and 24 weeks. Using the radioactive tracer to quantitate/normalize the measurements, the data were analyzed to characterize metabolic patterns, namely, the amounts of dietary cholesterol absorbed, synthesized and excreted. The study found that, in spite of the additional 2 gm/day ingestion of cholesterol in the experimental group, there were no significant differences in serum cholesterol, phospholipids, triglyceride levels and lipoprotein patterns between the experimental and control groups. Both groups had identical turnover rates for cholesterol, with no evidence for cholesterol storage in the experimental group. In a similar study in American subjects, Mattson et al [16] found that total serum cholesterol increased linearly with dietary cholesterol with 11.8 mg/100 ml increase for every 100 mg/1000 kcal increase in dietary cholesterol over the range 100–317 mg/1000 kcal. Were this relation to hold in the Maasai, an increase of 66 mg/100 ml total cholesterol levels would be expected in the above experiment, contrary to the observed cholesterol homeostasis. The observed cholesterol homeostasis could not be attributed to a “hypo-cholesterolaemic factor”, or to saponins, which were absent from the Maasai study diet. The authors concluded that “the Maasai have some basically different genetic traits that result in their having superior biologic mechanisms for protection from hypercholesterolemia” [3]. It is widely accepted that there is a strong genetic component in the risk of hypercholesterolemia, atherosclerosis and heart disease [17]–[20]. Typically, genome-wide association studies (GWAS) focus on markers for increased risk of disease [21]–[25] and to a lesser extent on protective polymorphisms. Such protective polymorphisms are known to arise as adaptations and can be identified in selection studies. For example, many studies have identified polymorphisms conferring lactase persistence in Northern Europeans, which arose with the advent of cattle breeding [26]. Just as in Europe, pastoralism arose in East Africa around 4,000–10,000 years ago [27] leading to selection for lactase persistence [28]. In the Maasai, pastoralism led to a lactose rich, high fat, high cholesterol diet of milk, meat and blood [4]. It is quite reasonable that, in a time span similar to that which conferred lactase persistence in Europeans, selection pressure in the Maasai from such a diet might result in genetic adaptations against diseases such as hypercholesterolemia and atherosclerosis. Motivated by this possibility, we performed a genome wide scan for selection in 143–156 founder individuals from the Maasai of Kinyawa, Kenya (MKK) using the HapMap 3 [29] SNP (single nucleotide polymorphism) data to identify genomic regions under recent selection. We also used 90–110 HapMap 3 founder individuals from the Luhya population from Webuye, Kenya (LWK) as a reference group. Three complementary metrics to detect selection were applied: the Fixation Index (Fst) [30], the Cross Population Extended Haplotype Homozygosity (XP-EHH) [31], and the Integrated Haplotype Score (iHS) [32], [33]. Note that the phased data used for iHS and XP-EHH was from HapMap3 Release 2, which has fewer individuals (143 and 90 for MKK and LWK respectively) whereas the data for Fst was from HapMap Release 3, which had more individuals (156 and 110 respectively). Our analysis consistently identified strong, recent selection in genes involved in lipid metabolism and lactase persistence in the Maasai (MKK) samples. Several of the regions under selection in MKK contained specific polymorphisms known to protect against hyperlipidemia in other populations. Sanger sequencing of DNA from six MKK samples showed that the GC-14010 polymorphism in the Minichromosome Maintenance Complex Component (MCM6) gene, known to confer adult lactase persistence in East Africans [28], is segregating in the Maasai at a frequency of ∼58%. These results suggest that the regions identified contain polymorphisms that confer lactase persistence and protection from hypercholesterolemia in the Maasai. The wider consequence of our study is that consistent dietary pressure can induce strong selection in complex pathways in a short time (∼150–400 generations).

Results

Population Structure

Two of the methods used to detect selection (Fst and XP-EHH) require a genetically similar reference population. A comparison of Fst among HapMap populations shows that the MKK and African-Americans from South-west USA (ASW) have the lowest average Fst (0.0145), followed by MKK and the Luhya in Webuye, Kenya (LWK) (0.017), while Fst between MKK and Yoruba from Nigeria (YRI) is significantly higher (0.027) (Table S6 in [29]). However, a plot of the first two principal components from a PCA analysis of the African populations and Utah residents with Northern and Western European ancestry from the CEPH collection (CEU) (Figure S2, (c) in [29]) shows that the MKK are genetically closer to LWK. To understand the degree of admixture in the populations ASW, CEU, LWK, MKK and YRI, we used STRUCTURE [34] on a randomly sampled subset of 12,999 SNPs from the HapMap 3 dataset. Without using any population identification information, STRUCTURE found that the data fits best to 6 ancestral populations (Figure 1, details in Appendix S1). In agreement with [29], [35], the STRUCTURE results show that whereas the CEU and YRI are genetically homogenous, the LWK, ASW and MKK are admixed, with a ∼20% CEU admixture in ASW. The LWK and ASW also have a large admixture with YRI (66% and 76% respectively), while MKK have a smaller admixture with YRI (10%). In addition, the STRUCTURE results indicate that MKK have a 15% admixture with two populations that are not sampled in the HapMap study. We also see a small admixture between MKK and LWK, which is expected, given their geographical proximity. These results are largely consistent with linguistic phylogeny; whereas the Maasai speak a Nilo-Saharan language, the Luhya and the Yoruba speak Niger-Congo languages, also spoken by African ancestors of African Americans [35].
Figure 1

Population structure components for individuals from CEU, ASW, LWK, MKK and YRI.

Results from STRUCTURE version 2.3 on genotype data for 12,999 randomly selected SNPs in 578 founder (unrelated) individuals from the CEU, ASW, LWK, MKK and YRI HapMap populations. The no-admixture model showed that the data was best fit by 6 inferred ancestral populations. Each column represents an individual, and the colors indicate the fractions of their genotype attributable to ancestry from each of the 6 ancestral populations.

Population structure components for individuals from CEU, ASW, LWK, MKK and YRI.

Results from STRUCTURE version 2.3 on genotype data for 12,999 randomly selected SNPs in 578 founder (unrelated) individuals from the CEU, ASW, LWK, MKK and YRI HapMap populations. The no-admixture model showed that the data was best fit by 6 inferred ancestral populations. Each column represents an individual, and the colors indicate the fractions of their genotype attributable to ancestry from each of the 6 ancestral populations. To further quantify the genetic similarity of MKK, LWK, ASW and YRI to the six ancestral populations, we assigned a six component vector to each of these populations, whose coordinates were the fraction of the ancestral components represented in them. A comparison of the cosine similarity of these vectors showed that the largest overlap was between MKK and LWK (0.18), followed by MKK and ASW (0.16). Based on their closer proximity to MKK in the PCA plot, as well as closer cosine similarity, we chose the LWK as the appropriate reference population for the Fst and XP-EHH analysis. 1,232 SNPs with significant Fst scores (pB<8.6E−6, pE<0.001) were clustered into contiguous genomic regions of linkage disequilibrium. A cluster was defined as a collection of SNPs in a genomic region where each SNP had genotype R2≥0.25 with at least one other SNP in the cluster. Clusters containing a SNP with maximum XP-EHH score >3 were identified as being MKK associated. The 22 top clusters are ranked by the highest Fst value for a SNP pair in a cluster. The complete set of clusters identified by Fst is in Table S2.

Identifying selection in the Maasai

Selection based on Fst

We calculated Fst between MKK (n = 156) and LWK (n = 110) as in [30] for 1,175,055 SNPs common to both populations that passed filters for minor allele frequency, genotyping rate, and consistency with Hardy-Weinberg equilibrium. Statistical significance was assessed using a Bonferroni corrected permutation test p-value pB (Methods, Appendix S2). Within the SNPs that passed this filter, we identified those deviating significantly from neutral evolution using an empirical p-value (pE) based on the Fst distribution of inter-genic SNPs. This identified 1,232 SNPs with pB<8.6E−6 and pE <0.001 (Table S1) which were either genic or within 50 kb of genes. The most significant non-synonymous SNPs identified as candidates for selection by Fst. The complete list of 1,232 SNPs identified as selection candidates by Fst (pB <8.6E−6 and pE <0.001) is in Table S1. In a recent selective sweep, many neighboring SNPs may remain linked due to genetic hitchhiking. To identify such regions, we grouped the genome-wide significant SNPs identified by Fst into clusters based on linkage disequilibrium using the criterion that each SNP has genotype R2≥0.25 with at least one other SNP in the cluster (Methods, Appendix S2). Each cluster so identified is a candidate for a selective sweep in one of the two populations. To identify the population in which the sweep is most likely to have occurred, we compared the local haplotype diversity in each population using the XP-EHH score [31]. For each cluster identified by Fst, we label it as a selection candidate in MKK if the maximum XP-EHH score of a SNP in the cluster is >3. A positive value for XP-EHH indicates that the MKK carry the longer-range haplotypes. This procedure identified 26 clusters (containing 318 SNPs) as candidate regions for selective sweeps in MKK (Table S2). Nine of these clusters include SNPs that exceed the genome-wide significance threshold for XP-EHH (XP-EHH >4.79580, Bonferroni corrected p<0.05, two-tailed). The most significant genomic regions and non-synonymous SNP candidates under selection in MKK by Fst are listed in Table 1 and Table 2 respectively. Note that the isolated SNPs identified in Table 2 have high Fst with respect to at least two of the three possible reference African populations (ASW, LWK and YRI). This suggests that the results shown there are relatively independent of the reference population.
Table 1

Top 20 genomic regions identified as selection candidates in MKK using the Fst statistic and clustering.

ChrStart locationStop locationGenes in regionNumber of HighFst SNPs (empiricalp-value <0.001)Max Fst within clusterMax XP-EHH score within cluster
2135036696136726567RAB3GAP1, ZRANB3, DARS, R3HDM1, TMEM163,YSK4, LCT, UBXN4, MCM6, MGAT5, CCNT21230.38212.202
27830562278500655-330.3113.805
125640220456754137PAN2, OBFC2B, SLC39A5, APOF, STAT2, CS,RNF41, IKZF4, SMARCC2280.2833.024
3191929784191990575FGF12130.2725.222
5115126388115223035ATG12, AP3S170.2663.870
2163048404163152351IFIH1, FAP190.2613.108
79905381699436198ZNF498, CYP3A4, CPSF4, CYP3A7, CYP3A43170.2603.290
11229623212319994VPS13D40.2533.060
224997850250077531-40.2443.732
53212817932159329GOLPH350.2423.062
51474724714750823ANKH40.2376.800
143603370336201722RALGAPA140.2213.517
2136917330136921703-20.2188.549
1198692364198745866PTPRC20.2123.138
2137580234137595545-40.2094.871
12111414527111502280CUX250.2093.393
177542319875431978SEPT930.2005.024
186671483266724690CCDC102B40.2005.704
17480733774842787TNNI3K30.1933.993
3185752767185805993ETV530.1924.569

1,232 SNPs with significant Fst scores (pB<8.6E−6, pE<0.001) were clustered into contiguous genomic regions of linkage disequilibrium. A cluster was defined as a collection of SNPs in a genomic region where each SNP had genotype R2≥0.25 with at least one other SNP in the cluster. Clusters containing a SNP with maximum XP-EHH score >3 were identified as being MKK associated. The 22 top clusters are ranked by the highest Fst value for a SNP pair in a cluster. The complete set of clusters identified by Fst is in Table S2.

Table 2

The most significant non-synonymous SNPs under selection in MKK using Fst, with LWK as the reference population.

Rsid of SNPChrPositionGeneBonferroni correctedPermutation p-value (pB)Empirical p-value (pE)using distribution ofnon-coding SNPsFst MKK vs LWKFst MKK vs YRIFst MKK vs ASW
rs2241883288424066FABP11.72E−123.13E−050.2500.1720.152
rs9613602136393658R3HDM13.13E−083.13E−040.1990.2880.447
rs69977538142487937FLJ438604.87E−083.59E−040.1940.1380.006
rs5315037100377082ZAN3.83E−075.47E−040.1820.0140.073
rs17014118489319296HERC64.42E−076.06E−040.1800.1780.045
rs2271586113659993ART54.76E−076.06E−040.1800.0340.004
rs109300462163137983IFIH11.24E−066.86E−040.1760.2790.128
rs10513341271523134TSPAN81.36E−066.86E−040.1760.1730.104
rs1047529955461233KIAA09471.46E−066.86E−040.1750.1600.198
rs19184961256722060PAN23.06E−068.17E−040.1710.2960.074
rs13389745265298657CEP683.84E−068.17E−040.1720.1150.052
rs846266742088222GLI32.54E−069.42E−040.1690.1500.059
rs3813227273651967ALMS16.02E−069.82E−040.1670.1730.034

The most significant non-synonymous SNPs identified as candidates for selection by Fst. The complete list of 1,232 SNPs identified as selection candidates by Fst (pB <8.6E−6 and pE <0.001) is in Table S1.

Selection based on his

Recent selective sweeps amplify beneficial mutations and reduce haplotype diversity due to the hitchhiking effect. The Extended Haplotype Homozygosity [32] (EHH) statistic identifies such events without using a reference population. EHH(x) measures the probability that two randomly selected haplotypes sharing the same allele at a SNP are identical up to genomic distance x. At each SNP, we computed the unstandardized Integrated Haplotype Score [33] (iHS), defined as the logarithm of the ratio of the integrated EHH scores for the ancestral allele and the derived allele. Stratifying the data into bins by the derived allele frequency of the SNPs, the scores within each bin were then normalized to have zero mean and unit standard deviation. The iHS statistic is less sensitive to demographic history (e.g. population bottlenecks) and to local differences in recombination rates, because such factors have similar effects on ancestral and derived alleles, and tend to cancel in the ratio [33]. If either allele is under selection, the reduced haplotype diversity around it will tend to increase the absolute value of iHS. Following the protocols in [33], raw iHS scores for 991,737 SNPs in MKK (n = 143 individuals) that passed filters (minor allele frequency cutoff, consistency with Hardy-Weinberg equilibrium) were binned by derived allele frequency and standard normalized within each bin (details in Methods and Appendix S3). Genomic regions were scored by the fraction of high scoring iHS SNPs (|iHS| >2) using a sliding window of 50 SNPs. The top 0.02% of non-overlapping SNP windows identified 196 regions likely to be under selection (Table S3). These were further grouped on the basis of linkage disequilibrium using the same criterion as for Fst (genotype R2≥0.25). The most significant regions identified as candidates for selection in MKK are in Table 3 (the complete list is in Table S4).
Table 3

The most significant genomic regions under selection in MKK using iHS.

ChrCluster start position (GRCh37)Cluster end position (GRCh37)GenesMax |iHS| in cluster# of SNPs in cluster with |iHS| >2
2134221398137892309LCT, MGAT5, NCKAP5, DARS, ZRANB3, R3HDM1, TMEM163, RAB3GAP1, THSD7B, CCNT2, YSK4, UBXN4, MCM66.339545
1330496779305652985.23426
72037363220468718ITGB85.01245
21760898881764220054.62669
11110532348110663647ARHGAP204.48036
983127968833822434.47159
51465706214753764FAM105B, ANKH4.42923
186665284666765215CCDC102B4.40233
113402505334189564CAPRIN1, NAT10, ABTB24.37522
2179421694179606538TTN4.28928
14105792959105907642PACS2, MTA14.22820
5108990708109217428MAN2A14.21950
9107973277108067684SLC44A14.19234
938698443919130GLIS34.18523
79905381699314986ZNF789, CPSF4, ATP5J2, FAM200A, ZNF655, ZNF498,CYP3A7, ZKSCAN5, CYP3A54.12024
913812037138673064.06623
117547081375678647UVRAG, DGAT24.05948
212294875123667814.04124
1497426813975050114.02524
8145839058146082167COMMD5, LOC100287170, LOC100129596, ARHGAP39,RPL8, ZNF7, ZNF251, ZNF34, LOC100287297, ZNF5173.95522

Using a sliding window of 50 SNPs wide, genomic regions were scored for the fraction of SNPs with |iHS|>2. The top 0.02% of non-overlapping windows were identified and merged into genomic clusters based on genotype R2 using the same criterion as in Table 1. Clusters are ranked by the maximum |iHS| value in the cluster. Complete lists of genome-wide significant SNPs and regions identified by iHS are in Tables S2a and S2b respectively.

Using a sliding window of 50 SNPs wide, genomic regions were scored for the fraction of SNPs with |iHS|>2. The top 0.02% of non-overlapping windows were identified and merged into genomic clusters based on genotype R2 using the same criterion as in Table 1. Clusters are ranked by the maximum |iHS| value in the cluster. Complete lists of genome-wide significant SNPs and regions identified by iHS are in Tables S2a and S2b respectively.

Selection based on XP-EHH

The third method used to identify selective sweeps in MKK was the Cross Population Extended Haplotype Homozygosity statistic (XP-EHH) [31]. This statistic compares the EHH profiles for bi-allelic SNPs between two populations. It is defined as the log of the ratio of the integrals of the EHH profiles for a given allele between the two populations (Appendix S4). The comparison between populations normalizes the effects of large-scale variations in recombination rates on haplotype diversity, and has a higher statistical power to detect sweeps that are close to fixation [31]. Using the LWK cohort (n = 90) as the reference population for MKK (n = 143), XP-EHH was calculated for 1,373,755 SNPs that passed various filters (Methods, Appendix S4). Following [31], we assigned p-values using a Gaussian fit after standard normalizing the XP-EHH distribution. SNPs with Bonferroni corrected p-value <0.05 (two-tailed) were chosen as potentially significant candidates for selection. These are listed in Table S5. We also clustered these candidate SNPs (using the genotype R2≥0.25 criterion as before) to identify putative regions under selection in MKK (Table S6). The most significant regions thus identified are listed in Table 4.
Table 4

The most significant genomic regions under selection in MKK using XP-EHH, with LWK as the reference population.

ChrStart PositionEnd PositionGenesNumber of SNPsMax XP-EHH
2135058615137017060R3HDM1, MGAT5, RAB3GAP1, LCT, DARS, ZRANB3,MCM6, TMEM163, ACMSD, CCNT2, YSK4, UBXN4, CXCR457212.182
51468179714751400FAM105B, ANKH256.800
186671251066731187CCDC102B125.587
5115885282115922669SEMA6A215.482
18667680316677754355.324
2045133114522535105.313
1310487024110488053375.183
46459429064639661165.149
2134507165134561145125.062
167536073475364940CFDP125.040
177542755175428021SEPT925.024
3191943578191989642FGF12105.019
11117610387117620420DSCAML184.989

SNPs with positive genome-wide significant XP-EHH scores (XP-EHH ≥4.796, two-tailed Bonferroni corrected p≤0.05) were grouped into contiguous genomic clusters using genotype R2 using the same criterion as in Table 1. Overlapping clusters were merged. Column E lists the number of significant SNPs in each each cluster. Complete lists of genome-wide significant SNPs and clusters identified by XP-EHH are in Tables S3a and S3b.

SNPs with positive genome-wide significant XP-EHH scores (XP-EHH ≥4.796, two-tailed Bonferroni corrected p≤0.05) were grouped into contiguous genomic clusters using genotype R2 using the same criterion as in Table 1. Overlapping clusters were merged. Column E lists the number of significant SNPs in each each cluster. Complete lists of genome-wide significant SNPs and clusters identified by XP-EHH are in Tables S3a and S3b.

Overlap of high scoring regions

The metrics we use probe for different signatures of selection, and hence, genomic regions which are identified by more than one metric are more likely to be true positives. Using a concordance between at least two of the metrics, we identified seven genomic regions as strong candidates for selection (Table 5). There was also overlap between the regions identified by our methods and those identified by the International HapMap Consortium for MKK (they used a statistic they call CMS or “Composite of Multiple Signals”) [29]. These regions of concordance are listed in Table S7. Figure 2 shows the results for all three metrics for chromosome 2. The significant selection in a region in Chr2q21 of size ∼ 1.0–1.7 Mb is clearly visible in Figure 2a. Figure 2b shows details of this region which contains a large number of polymorphisms with significant high scores by all three metrics (discussed further below). Similar figures for all chromosomes are shown in Appendix S6.
Table 5

Concordant genomic regions identified by at least two of three metrics as candidates for selection in MKK.

ChrGenomic ExtentSignificant by (Method)Genes in RegionNumber of SNPs identified by each Method
2135058615–136726567Fst, iHS, XP-EHHMGAT5, TMEM163, ACMSD, CCNT2, YSK4, RAB3GAP1,ZRANB3, R3HDM1, UBXN4, LCT, MCM6, DARSFst: 123, iHS: 545, XP-EHH: 572
3191943578–191989642Fst, XP-EHHFGF12Fst:13, XP-EHH: 10
514747247–14750823Fst, iHS, XP-EHHANKHFst: 4, iHS: 23, XP-EHH: 25
5115885574–115885672Fst,XP-EHHSEMA6AFst: 2, XP-EHH: 21
799053816–99314986Fst, iHSZNF789, CPSF4, ATP5J2, FAM200A, ZNF655,ZNF498, CYP3A7, ZKSCAN5, CYP3A5Fst: 17, iHS: 24
1775427551–75428021Fst, XP-EHHSEPT9Fst: 3, XP-EHH: 2
1866714832–66724690Fst, iHS, XP-EHHCCDC102BFst: 4, iHS: 33, XP-EHH: 12

Genomic regions identified as genome-wide significant by at least two of the three methods - Fst, iHS and XP-EHH.

Figure 2

(a) Genome-wide significant scores identifying candidate regions under selection on Chromosome 2.

Chromosome wide plot of SNPs with significant scores using Fst (empirical p-value <0.001 and Bonferroni corrected permutation test pB <8.6E−6), iHS (normalized |iHS|>2), and XP-EHH (XP-EHH ≥4.796, two-tailed Bonferroni corrected p≤0.05). The SNPs thus identified were clustered on the basis of linkage disequilibrium to identify contiguous genomic regions that are candidates for selections (Table 1,2,3,4). The locus containing the genes LCT and MCM6 (135–137 Mb) was identified by all three metrics as the top candidate for selection. The non-synonymous TC polymorphism at rs2241883 in the FABP1 gene had most significant genome-wide Fst (Fst = 0.25, pE = 3.13E−5). The MKK samples have a high frequency (∼0.45) of the protective C allele, known to be associated with low cholesterol levels in Europeans (plots for other chromosomes in Appendix S6). (b) Inset of the LCT locus on Chromosome 2.An inset of the Fst, iHS and XP-EHH scores for SNPs in the ∼ 1 Mb locus (from 135.8–136.8 Mb) on Chr 2 containing the genes LCT and MCM6. The uniformly high values for all three metrics in this region suggest that this locus has undergone strong selection pressure. The blue marker indicates the position of the lactase associated SNP in MCM6 that we sequenced, which was polymorphic in MKK with frequency pC = 0.58+/−0.14 (68% CI) for the protective C allele.

Genomic regions identified as genome-wide significant by at least two of the three methods - Fst, iHS and XP-EHH.

The non-synonymous SNP at rs2241883 in FABP1 is a Candidate for Selection in Maasai

We found that the non-synonymous SNP with the highest genome-wide significant Fst was rs2241883 in the gene Fatty Acid binding Protein 1, Liver (FABP1, alternative name LFABP) (Table 2 and Figure 2a). The SNP rs2241883 is a TC non-synonymous transition which encodes a Threonine to Alanine (T94A) change in the protein LFABP, which is expressed in liver. The C allele was associated with total tri-glyceride and low density lipoprotein (LDL) cholesterol levels in Germans [36], and with Apolipoprotein B (ApoB) levels induced by a high fat diet in French-Canadians [37]. The MKK have high Fst at this SNP, relative to all the other three African populations in Hapmap (Table 2). The allele frequency of the C allele is also highest (0.44) in MKK compared to all other HapMap3 populations (in which the frequency ranges from 0.09–0.32). These results suggest that the rs2241883 polymorphism is under selection in the Maasai.

Maasai are under Selection in a 1.7 Mb Region on Chr2q21 for Lactase Persistence

The largest cluster under selection in Maasai, identified by all the metrics, was a 1.7 Mb region on Chr2q21 (Figures 2a, 2b, Tables 1,2,3,4). The region includes the Lactase (LCT) gene, which encodes the Lactase protein, as well as the gene MCM6, which contains intronic regulatory regions for LCT [28], [38]–[40]. Specific polymorphisms in these regions are known to confer lactase persistence in Europeans and Africans [28], [38]. Our results are in agreement with other studies that have also shown that this region is under recent, positive selection in the Maasai [28], [31]–[33], [41], [42]. To identify specific polymorphisms for adult lactase persistence in the Maasai, we sequenced DNA from six founder MKK samples (HapMap IDs: NA21367, NA21379, NA21454, NA21519, NA21522, NA21650) at five loci in MCM6 (G/C-14010, rs41525747, rs4988235, rs41380347 and rs182549), which are known to be associated with lactase persistence in Africans and Europeans [28]. We found that the GC-14010 polymorphism in the MCM6 gene is segregating in these samples (nGG = 1, nGC = 3, nCC = 2). We estimated the frequency of the beneficial (C) allele in the MKK samples to be pC = 0.58+/−0.14 (68% CI from finite size sampling - details in Appendix S5). This is in agreement with Tishkoff et al [28], who showed that this allele is significantly associated with lactase persistence, has significantly reduced haplotype diversity indicative of a selective sweep, and is segregating at high frequency in the Maasai samples from Kenya.

(a) Genome-wide significant scores identifying candidate regions under selection on Chromosome 2.

Chromosome wide plot of SNPs with significant scores using Fst (empirical p-value <0.001 and Bonferroni corrected permutation test pB <8.6E−6), iHS (normalized |iHS|>2), and XP-EHH (XP-EHH ≥4.796, two-tailed Bonferroni corrected p≤0.05). The SNPs thus identified were clustered on the basis of linkage disequilibrium to identify contiguous genomic regions that are candidates for selections (Table 1,2,3,4). The locus containing the genes LCT and MCM6 (135–137 Mb) was identified by all three metrics as the top candidate for selection. The non-synonymous TC polymorphism at rs2241883 in the FABP1 gene had most significant genome-wide Fst (Fst = 0.25, pE = 3.13E−5). The MKK samples have a high frequency (∼0.45) of the protective C allele, known to be associated with low cholesterol levels in Europeans (plots for other chromosomes in Appendix S6). (b) Inset of the LCT locus on Chromosome 2.An inset of the Fst, iHS and XP-EHH scores for SNPs in the ∼ 1 Mb locus (from 135.8–136.8 Mb) on Chr 2 containing the genes LCT and MCM6. The uniformly high values for all three metrics in this region suggest that this locus has undergone strong selection pressure. The blue marker indicates the position of the lactase associated SNP in MCM6 that we sequenced, which was polymorphic in MKK with frequency pC = 0.58+/−0.14 (68% CI) for the protective C allele.

The Selected Locus on Chr2q21 Contains Polymorphisms Associated with Cholesterol Levels

The selected locus on Chr2q21 contains polymorphisms that have been associated with cholesterol levels in various GWAS studies [43]–[45]. The SNP rs7570971 in RAB3GAP1, not found in the HapMap data for the MKK, is associated with total cholesterol levels in a GWAS of >100,000 individuals of European descent [43]. However, the six MKK samples we sequenced were homozygous at this locus in the Maasai for the allele associated with an increase in total cholesterol levels in the samples with European descent. A study in a Finnish cohort identified polymorphisms in LCT associated with total cholesterol and Low Density Lipoprotein C (LDL-C) levels [44]. The authors found that the lactase persistence genotype in Finns, as defined by the genotype for SNP rs4988235, was associated with lower cholesterol values. Several SNPs in and around the gene LCT were associated with total cholesterol and LDL-C levels, with stronger associations in males than females. This study also found that the G allele at the synonymous SNP rs2304371 in the LCT gene was associated with highest LDL-C levels in males. The same SNP was identified by our methods as a selection candidate (Tables S1, S2, S3). However, once again, the major allele in the MKK (frequency 87%) was the one associated with higher LDL-C levels.

The CYP3A Locus is a Candidate for Selection in Maasai

On Chromosome 7, a 261 kb wide region spanning the entire Cytochrome P450 Subfamily 3A (CYP3A) locus was identified as a candidate for selection by Fst and iHS (Tables 1, 2). All CYP genes in this locus contain SNPs with genome-wide significant Fst or iHS scores, including: CYP3A4 (a potent oxidizer of steroids and drugs), CYP3A5 (involved in oxidation of fatty acids and steroids in the liver), CYP3A7 (the main CYP enzyme expressed in fetal livers) and CYP3A43 (involved in testosterone metabolism). The CYP proteins play an important role in drug metabolism and in the synthesis of steroids from cholesterol [46].

Discussion

In spite of a fat and cholesterol rich diet, the Maasai have low blood cholesterol levels and low incidence of heart disease and atherosclerosis. Cholesterol challenge studies in the 1970s [2] demonstrated that the Maasai are able to maintain cholesterol homeostasis in response to elevated levels of dietary cholesterol, and suggested that the mechanism of cholesterol homeostasis may have a genetic basis. In the present study, we used HapMap 3 data to investigate this possibility. Using 90–110 unrelated LWK individuals as a reference population, three complementary metrics (Fst, iHS and XP-EHH) were used to identify SNPs and chromosomal regions under selection in 143–156 unrelated MKK (Maasai) individuals in HapMap 3. The genomic regions and genes identified as selection candidates in MKK are shown in Tables 1,2,3 and Tables S1,S2,S3 for the Fst, iHS and XP-EHH metrics respectively. We identified seven genomic regions as strong candidates for selection using concordance between at least two of the metrics (Table 5). We now discuss some of the most interesting SNPs and regions identified for the role they may play in lactase persistence and lipid pathway selection in the Maasai. Using Fst, the most significant non-synonymous SNP was the polymorphism rs2241883 located at 88.42 Mb on Chromosome 2 (Figure 2a, Table 1). This is a Threonine to Alanine substitution (T94A) in exon 3 of the FABP1 (or LFABP) gene, a fatty acid binding protein expressed in liver. This locus was not detected by iHS or XP-EHH, suggesting either an increased local recombination rate or a more ancient selective sweep. The T94A polymorphism was strongly associated with lower levels of plasma triglycerides and LDL-cholesterol levels in a study of 826 individuals from Northern Germany [36]. A study of plasma concentrations of ApoB in 623 French Canadian men found that carriers of the A94 allele were protected against high ApoB levels when consuming a high fat and saturated fat diet, possibly because of diminished function of the protein LFABP due to a disruption in ligand binding [37]. LFABP knockout mice fed a high cholesterol, high saturated fat diet were protected against diet-induced obesity and lower levels of hepatic triglycerides compared to control mice, despite the absence of discernible differences in energy levels, food intake, or mal-absorption of fat induced obesity [47], [48]. The study concluded that “LFABP may function as a metabolic sensor in regulating lipid homeostasis” [47]. The protective C allele of this SNP is segregating in the Maasai at allele-frequency 0.44, suggesting that the effect of the T94A mutation on the LFABP pathway may be partly responsible for the homeostatic regulation of blood cholesterol in Maasai [1]–[3]. We found evidence for a strong recent selective sweep in a ∼1.7 Mb region on Chr2q21 (Fig 2, Table 1,2,3,4). This region is known to harbor polymorphisms conferring lactase persistence in Kenyans, and has been shown to be under strong recent selection. Tishkoff et al [28] performed a phenotype-genotype association study for lactase persistence on 470 Tanzanians, Kenyans and Sudanese who were genotyped at 123 SNPs, in a 3 Mb region surrounding the LCT and MCM6 genes. The SNP known as G/C-14010 was found to have the most significant association with the lactase persistence phenotype in Kenyan Nilo-Saharan and Tanzanian Afro-Asiatic populations, as well as in a meta-analysis of all the populations combined. Tishkoff et al observed the C-14010 allele to occur at 32% frequency in Kenyan populations. As this SNP is in the upstream regulatory region of the gene LCT, the authors also studied the effect of this polymorphism on expression using luciferase assays in intestinal cells. They found that the C-14010 allele leads to a significantly higher expression. Furthermore, an iHS analysis of the haplotype background on which the SNP occurs indicated that the SNP is under selection in Kenyans and Tanzanians. We found that in the MKK samples from HapMap the C-14010 allele is segregating at high frequency (0.58). Thus, our results confirm the findings of Tishkoff et al, that C-14010 contributes towards selection for lactase persistence in the MKK samples from HapMap. In addition to lactase persistence, the GWAS studies of [43] and [44] indicate that, in Europeans, the locus on Chr2q21 is associated with cholesterol levels. As this locus is also identified by our analysis, it may be associated with cholesterol levels in the Maasai. However, the allelic variants of the GWAS SNPs of [43], [44] that have high frequency in MKK are associated with an increase in cholesterol levels in Europeans. This might reflect the possibility that Europeans and Maasai have different sets of functional polymorphisms at this locus responsible for lower cholesterol levels: indeed it is known that the Maasai have an African polymorphism associated with lactase persistence, different from the one found in Europeans. It could also be that in the Maasai, the SNPs identified in our study are not themselves functional, but linked to functional variants that are not genotyped. Given the extended linkage disequilibrium (LD) in this region due to a selective sweep in both Europeans and the Maasai, this last possibility is especially important. The differing effects of the SNPs identified in the Maasai, as compared with the Europeans, could arise from the effects of differing modifier alleles at different loci in this region. These possibilities emphasize the difficulties associated with identifying true functional polymorphisms because of potential population specificity of SNP based studies. However, given the GWAS findings, and the strong signal of selection in MKK seen in our analysis, the LCT locus is a candidate region for identifying genotypic variants associated with cholesterol regulation in the Maasai. We also identified a 261 kb locus on Chr 7 (the CYP3A locus) to be under selection using Fst and iHS (Tables 1, 2, 4). This locus has been identified in re-sequencing studies and genome-wide scans to be under positive selection in Africans and non-Africans [33], [49], [50] and is also under positive selection for salt sensitivity in equatorial populations [49], [51]. This locus contains the CYP3A (cytochrome P450, subfamily 3A) family of genes which are involved in cholesterol metabolism and steroid biosynthesis [46]. This family contains CYP3A5, a gene involved in fatty acid oxidation in liver, as well as CYP3A7, a gene encoding a CYP enzyme expressed in fetal livers. Variants in CYP3A5 have been shown to reduce the efficacy of certain statins, drugs used to lower cholesterol biosynthesis [52]. Thus, the selection pressure at this locus, as identified by our analysis, coupled with its role in cholesterol metabolism, suggests that the CYP3A locus is an important candidate for cholesterol homeostasis in the Maasai. Several other clusters identified to be under selection in MKK contain genes related to cholesterol metabolism, cholesterol biosynthesis and atherosclerosis. On Chr12q13, we identified a region spanning many genes with one of the highest Fst signals (Table 1). This locus contains the Apolipoprotein F (APOF) gene, involved in cholesterol transport and esterification [46], whose over-expression in mice reduces high density lipoprotein (HDL) cholesterol levels [53]. A cluster identified by iHS on chromosome 11q13.5 contains the gene Diacylglycerol O-acyltransferase 2 (DGAT2) (Table 3). This gene is involved in biosynthesis of triglycerols [54], [55] and has been implicated in hyperlipidemia [56] and fatty liver disease [57]. Another cluster on Chr7p21.1 identified by iHS, contains the Integrin Beta 8 (ITGB8) gene (Table 3) implicated as a quantitative trait locus (QTL) for fibrinogen plasma levels in a study involving 3600 Native Americans [58]. Fibrinogen levels are associated with risks for several cardiovascular diseases [59], and play a role in the pathogenesis of atherosclerosis [58]. XP-EHH identified a genome-wide significant region on chromosome 16q22.2–22.3, containing the gene Craniofacial Development Protein 1 (CFDP1) (Table 4). A GWAS showed that this region is associated with low levels of HDL cholesterol in ∼400 French-Canadians [60]. Our results identified several genes and loci involved in cholesterol metabolism as selection candidates in the Maasai. Thus, our findings suggest that the Maasai are adapted for a high-cholesterol and high-fat diet. The traditional diet of the Maasai is rich in saturated fats and cholesterol, and low in carbohydrates. Similar ketogenic diets are often used to treat epileptic seizures in children [61], [62]. Early complications of these diets include hypertriglyceridemia, hypercholesterolemia, and low levels of HDL, and late complications include osteopenia, renal stones, and cardiomyopathy [62], [63]. This suggests that a diet rich in fat and cholesterol from childhood can exert a strong diet-induced selection pressure on survival and reproductive success. Maasai social customs may also favor genetic selection against diseases of the elderly. Maasai are both polygynous and gerontocratic, and older men routinely marry nulliparous young women [64]–[70]. Maasai women are also permitted, at their discretion, to have sex with members of their husbands’ age set, a form of open marriage that provides older men with opportunities to reproduce [15], [64]–[67], [69]. Finally, extramarital sex between older men and the wives of younger men sometimes occurs [64], [68]. Such mating practices may facilitate the spread of protective adaptations for old-age diseases.

Summary

Field studies showed that, in spite of a high fat and high cholesterol diet, the Maasai have low levels of cardiac disease and atherosclerosis. In this paper, we present results from a genome wide scan of the HapMap 3 SNP data using the Fst, iHS and XP-EHH statistics to identify genomic regions under selection in the Maasai. We identify regions containing genes involved in lactose and lipid metabolism which are under selection in the Maasai. Our analysis suggest that the identified regions harbor known and novel genetic polymorphisms responsible for the unusual lipid metabolism, cholesterol homeostasis, protection against cardiac diseases and adult lactase persistence in the Maasai.

Methods

Ethics statement

The data analyzed was public SNP data from the HapMap Consortium http://hapmap.ncbi.nlm.nih.gov/. No consent was required.

Data used

HapMap 3 release 3 SNP genotype data for founders from the Maasai in Kinyawa, Kenya (MKK) (n = 156), the Luhya in Webuye, Kenya (LWK) (n = 110), African-Americans in Southwest USA (ASW) (n = 53), the Yoruba in Ibadan, Nigeria (YRI) (n = 147), and Utah residents of Northern and Western European ancestry (CEU) (n = 112) was downloaded from http://snp.cshl.org/. Using PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/), we filtered to retain only SNPs common to all populations. Hapmap 3 release 2 autosomal haplotype data for the MKK (n = 143) and LWK (n = 90) was downloaded from http://hapmap.ncbi.nlm.nih.gov/downloads/phasing/2009-02_phaseIII/HapMap3_r2/. The data was phased using IMPUTE++ [71]. SNPs were pre-filtered for Hardy Weinberg equilibrium and for low frequency of Mendel errors (http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/2010-05_phaseIII/00README.txt). Genetic maps were downloaded from http://hapmap.ncbi.nlm.nih.gov/downloads/recombination/2008-03_rel22_B36/rates/to obtain the genetic map position of the SNPs in cM.

STRUCTURE computation

Using PLINK, genotype data for MKK, LWK, YRI, ASW and CEU was filtered to exclude SNPs with minor allele frequency <1% or SNPs where more than 1% of the genotype data was missing. Restricting the samples to founders resulted in 1,325,342 common SNPs for 578 individuals. We further restricted the genotype data to a random subset of ∼1% of these SNPs (12,999 SNPs) and ran the “no admixture” model in STRUCTURE [34] version 2.3. We found that k = 6 ancestral populations fit the data best. Further details are in Appendix S1.

Fst computation

Using PLINK, we retained 1,175,055 autosomal SNPs in Hardy Weinberg equilibrium (p>0.05) and with minor allele frequency >5% in either population (LWK and MKK). We then computed Fst using the method of [30]. Two tests were used to assess statistical significance, a Bonferroni corrected permutation test (p-value pB), and an empirical p-value that compared the Fst of a SNP to the Fst distribution of intergenic SNPs. Gene positions were from the human genome build 37 (GRCh37/hg19) available at http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz. To avoid linkage with genes and promoter regions, we define intergenic regions as those that are at least 50 kb away from the start or stop site of a gene. For the remaining genic or near-gene SNPs, we calculated an empirical p-value (pE) given by the fraction of intergenic SNPs with greater Fst. This procedure identified 1,232 SNPs with pB <8.6E−6 and pE <0.001 that are the top candidates for selection using Fst (Table S1). These SNPs were then clustered into regions of high linkage (Table 1, Table S2) using the method described below (details of the Fst calculation are in Appendix S2).

iHS computation

Autosomal haplotype data for 991,737 SNPs in MKK with minor allele frequency >10% were used to calculate raw iHS scores as in Voight et al [33]. These raw iHS scores were binned on the basis of derived allele-frequency, and the scores in each bin were standard normalized to zero mean and unit variance. Genomic sliding windows of 50 SNPs were ranked by the percentage of SNPs with |iHS|>2. The SNPs with |iHS|>2 that occured in the top 0.02% of non-overlapping windows were selected as top candidates for selection by iHS (Table S3). These were then clustered into regions of high linkage (Table 3, Table S4) using the method described below (details of the iHS calculation are in Appendix S3).

XP-EHH computation

Autosomal haplotype data for 1,373,755 SNPs in MKK and LWK was mapped to genomic locations in the human genome, build 37 (GRCh37). XP-EHH scores were calculated using the code at http://hgdp.uchicago.edu/Software/xpehh.tar. The XP-EHH scores were fit to a normal distribution, which identified the threshold for genome-wide significance to be XP-EHH ≥4.796 (Bonferroni corrected p<0.05, two-tailed test). The SNP that exceeded this threshold were chosen as top candidates for selection by XP-EHH (Table S5). These SNPs were clustered into regions of high linkage (Table 4, Table S6) using the method described below (details of the XP-EHH calculation are in Appendix S4).

LD clustering of SNPs

The SNPs identified as candidates for selection by each of the above methods were clustered using genotype R2 as an estimator of linkage disequilibrium. We used the criteria that for a SNP to be included in a cluster, it must have genotype R2≥0.25 with at least one other SNP in the cluster (the justification for this choice of cutoff is given in Appendix S2). More concretely, for the SNPs identified by the methods above, we used PLINK to extract a file of raw genotype data from the HapMap genotype data file for MKK. These files contained a matrix of genotype values, whose columns were labeled by SNPs and rows labeled by individuals. We imported this genotype matrix into the statistical package R, to calculate a SNP x SNP Pearson correlation matrix. This correlation matrix was then used to construct a SNP x SNP adjacency matrix whose entries are 1 if R2≥0.25 and 0 if R2<0.25. The problem of finding linked clusters of SNPs then translates to identifying the connected components of the graph described by this adjacency matrix. This computation was performed in Python using the NetworkX package (http://networkx.lanl.gov/).

Sequencing loci in LCT/MCM6 and RAB3GAP1

Forward and reverse primers for Sanger sequencing were chosen using Primer3 (http://frodo.wi.mit.edu/primer3/), and checked for absence of homologies to other parts of the human genome using BLAT. The details of the primers, the loci sequenced and the samples used are in Appendix S5. 1,232 genic or near-gene SNPs identified by Fst as top candidates for selection (pB<8.6E−6 and pE <0.001). Significance was assessed using an exact permutation test (Bonferroni corrected p-value pB shown in column Q) and an empirical test based on the Fst distribution of intergenic SNPs (pE : column R). Columns H-M list the number of individuals with each genotype (A1 homozygous, heterozygous, A2 homozygous) in MKK and LWK. (XLS) Click here for additional data file. Genomic regions identified as selection candidates in MKK using Fst and clustering. SNPs having empirical p-value <0.001 with respect to the distribution of intergenic Fst scores were clustered into regions of high linkage disequilibrium using genotype R2 between SNPs. Clusters with maximum XP-EHH score >3 were identified as being MKK associated. Also listed are the maximum Fst score and the maximum XP-EHH score of any SNP in the genomic extent of the cluster. (XLSX) Click here for additional data file. SNPs identified as selection candidates using the iHS metric. Sliding windows of 50 SNPs each were scored for fraction of SNPs with |iHS| >2. SNPs with |iHS| >2 that occur in the top 0.02% of non-overlapping genomic windows are listed. (XLS) Click here for additional data file. Genomic regions identified as selection candidates in MKK using the iHS statistic. Sliding windows of 50 SNPs each were scored for the fraction of SNPs with |iHS| >2. The top 0.02% of non-overlapping windows were identified as candidates for selection. These windows were then merged on the basis of linkage disequilibrium (estimated using genotype R2 between SNPs with |iHS| >2). (XLS) Click here for additional data file. SNPs identified as candidates for selection in MKK using the XP-EHH statistic, with LWK as the reference population. All SNPs listed have scores exceeding the threshold for genome-wide significance (XP-EHH > = 4.796, two-tailed bonferroni corrected p< = 0.05). (XLSX) Click here for additional data file. Genomic regions identified as selection candidates in MKK using the XP-EHH statistic, with LWK as the reference population. SNPs with genome-wide significant scores (XP-EHH > = 4.796, two-tailed Bonferroni corrected p< = 0.05) were assigned to a cluster if they had genotype R2≥0.25 with another SNP in the cluster. This identified contiguous genomic regions as candidates for selective sweeps. Clusters that overlapped in genomic extent were merged. Column F and G list the number of significant SNPs occurring in the genomic extent of each cluster, and their rsids. (XLSX) Click here for additional data file. Common regions and SNPs identified to be under selection in MKK by our analysis (using Fst, iHS and XP-EHH) and by the International HapMap Consortium (using the CMS test). Only those SNPs identified by the HapMap Consortium which were also identified by our analysis (i.e. passed genome-wide significance thresholds for the Fst, iHS, and XP-EHH statistic respectively) are listed. (XLSX) Click here for additional data file. Details of STRUCTURE calculation. (DOC) Click here for additional data file. Details of Fst calculation, p-values and SNP clustering for Fst and XP-EHH. (DOC) Click here for additional data file. Details of iHS calculation. (DOC) Click here for additional data file. Details of XP-EHH calculation. (PDF) Click here for additional data file. Details of Sequencing in LCT/MCM6 locus (DOC) Click here for additional data file. Plots of Fst, XP-EHH and iHS for all chromosomes. (PDF) Click here for additional data file.
  57 in total

1.  Genetic signatures of strong recent positive selection at the lactase gene.

Authors:  Todd Bersaglieri; Pardis C Sabeti; Nick Patterson; Trisha Vanderploeg; Steve F Schaffner; Jared A Drake; Matthew Rhodes; David E Reich; Joel N Hirschhorn
Journal:  Am J Hum Genet       Date:  2004-04-26       Impact factor: 11.025

Review 2.  Genetic basis of atherosclerosis: part I: new genes and pathways.

Authors:  Aldons J Lusis; Alan M Fogelman; Gregg C Fonarow
Journal:  Circulation       Date:  2004-09-28       Impact factor: 29.690

Review 3.  Genomewide association studies and assessment of the risk of disease.

Authors:  Teri A Manolio
Journal:  N Engl J Med       Date:  2010-07-08       Impact factor: 91.245

4.  Some unique biologic characteristics of the Masai of East Africa.

Authors:  K Biss; K J Ho; B Mikkelson; L Lewis; C B Taylor
Journal:  N Engl J Med       Date:  1971-04-01       Impact factor: 91.245

5.  Integrating common and rare genetic variation in diverse human populations.

Authors:  David M Altshuler; Richard A Gibbs; Leena Peltonen; David M Altshuler; Richard A Gibbs; Leena Peltonen; Emmanouil Dermitzakis; Stephen F Schaffner; Fuli Yu; Leena Peltonen; Emmanouil Dermitzakis; Penelope E Bonnen; David M Altshuler; Richard A Gibbs; Paul I W de Bakker; Panos Deloukas; Stacey B Gabriel; Rhian Gwilliam; Sarah Hunt; Michael Inouye; Xiaoming Jia; Aarno Palotie; Melissa Parkin; Pamela Whittaker; Fuli Yu; Kyle Chang; Alicia Hawes; Lora R Lewis; Yanru Ren; David Wheeler; Richard A Gibbs; Donna Marie Muzny; Chris Barnes; Katayoon Darvishi; Matthew Hurles; Joshua M Korn; Kati Kristiansson; Charles Lee; Steven A McCarrol; James Nemesh; Emmanouil Dermitzakis; Alon Keinan; Stephen B Montgomery; Samuela Pollack; Alkes L Price; Nicole Soranzo; Penelope E Bonnen; Richard A Gibbs; Claudia Gonzaga-Jauregui; Alon Keinan; Alkes L Price; Fuli Yu; Verneri Anttila; Wendy Brodeur; Mark J Daly; Stephen Leslie; Gil McVean; Loukas Moutsianas; Huy Nguyen; Stephen F Schaffner; Qingrun Zhang; Mohammed J R Ghori; Ralph McGinnis; William McLaren; Samuela Pollack; Alkes L Price; Stephen F Schaffner; Fumihiko Takeuchi; Sharon R Grossman; Ilya Shlyakhter; Elizabeth B Hostetter; Pardis C Sabeti; Clement A Adebamowo; Morris W Foster; Deborah R Gordon; Julio Licinio; Maria Cristina Manca; Patricia A Marshall; Ichiro Matsuda; Duncan Ngare; Vivian Ota Wang; Deepa Reddy; Charles N Rotimi; Charmaine D Royal; Richard R Sharp; Changqing Zeng; Lisa D Brooks; Jean E McEwen
Journal:  Nature       Date:  2010-09-02       Impact factor: 49.962

6.  Identification of a variant associated with adult-type hypolactasia.

Authors:  Nabil Sabri Enattah; Timo Sahi; Erkki Savilahti; Joseph D Terwilliger; Leena Peltonen; Irma Järvelä
Journal:  Nat Genet       Date:  2002-01-14       Impact factor: 38.330

7.  Dietary saturated and trans fatty acids and cholesterol and 25-year mortality from coronary heart disease: the Seven Countries Study.

Authors:  D Kromhout; A Menotti; B Bloemberg; C Aravanis; H Blackburn; R Buzina; A S Dontas; F Fidanza; S Giampaoli; A Jansen
Journal:  Prev Med       Date:  1995-05       Impact factor: 4.018

8.  The genetic structure and history of Africans and African Americans.

Authors:  Sarah A Tishkoff; Floyd A Reed; Françoise R Friedlaender; Christopher Ehret; Alessia Ranciaro; Alain Froment; Jibril B Hirbo; Agnes A Awomoyi; Jean-Marie Bodo; Ogobara Doumbo; Muntaser Ibrahim; Abdalla T Juma; Maritha J Kotze; Godfrey Lema; Jason H Moore; Holly Mortensen; Thomas B Nyambo; Sabah A Omar; Kweli Powell; Gideon S Pretorius; Michael W Smith; Mahamadou A Thera; Charles Wambebe; James L Weber; Scott M Williams
Journal:  Science       Date:  2009-04-30       Impact factor: 47.728

9.  Diet-induced alterations in intestinal and extrahepatic lipid metabolism in liver fatty acid binding protein knockout mice.

Authors:  Elizabeth P Newberry; Susan M Kennedy; Yan Xie; Jianyang Luo; Nicholas O Davidson
Journal:  Mol Cell Biochem       Date:  2008-12-31       Impact factor: 3.396

10.  The origins of lactase persistence in Europe.

Authors:  Yuval Itan; Adam Powell; Mark A Beaumont; Joachim Burger; Mark G Thomas
Journal:  PLoS Comput Biol       Date:  2009-08-28       Impact factor: 4.475

View more
  18 in total

1.  Signatures of Relaxed Selection in the CYP8B1 Gene of Birds and Mammals.

Authors:  Sagar Sharad Shinde; Lokdeep Teekas; Sandhya Sharma; Nagarjun Vijay
Journal:  J Mol Evol       Date:  2019-08-01       Impact factor: 2.395

2.  Runs of homozygosity in sub-Saharan African populations provide insights into complex demographic histories.

Authors:  Francisco C Ceballos; Scott Hazelhurst; Michèle Ramsay
Journal:  Hum Genet       Date:  2019-07-16       Impact factor: 4.132

3.  Genetic origins of lactase persistence and the spread of pastoralism in Africa.

Authors:  Alessia Ranciaro; Michael C Campbell; Jibril B Hirbo; Wen-Ya Ko; Alain Froment; Paolo Anagnostou; Maritha J Kotze; Muntaser Ibrahim; Thomas Nyambo; Sabah A Omar; Sarah A Tishkoff
Journal:  Am J Hum Genet       Date:  2014-03-13       Impact factor: 11.025

4.  Comparative transcriptome analyses reveal conserved and distinct mechanisms in ovine and bovine lactation.

Authors:  Mini Singh; Peter C Thomson; Paul A Sheehy; Herman W Raadsma
Journal:  Funct Integr Genomics       Date:  2013-01-17       Impact factor: 3.410

5.  Novel genomic signals of recent selection in an Ethiopian population.

Authors:  Fasil Tekola-Ayele; Adebowale Adeyemo; Guanjie Chen; Elena Hailu; Abraham Aseffa; Gail Davey; Melanie J Newport; Charles N Rotimi
Journal:  Eur J Hum Genet       Date:  2014-11-05       Impact factor: 4.246

6.  Evidence for negative selection of gene variants that increase dependence on dietary choline in a Gambian cohort.

Authors:  Matt J Silver; Karen D Corbin; Garrett Hellenthal; Kerry-Ann da Costa; Paula Dominguez-Salas; Sophie E Moore; Jennifer Owen; Andrew M Prentice; Branwen J Hennig; Steven H Zeisel
Journal:  FASEB J       Date:  2015-04-28       Impact factor: 5.191

7.  Information compression exploits patterns of genome composition to discriminate populations and highlight regions of evolutionary interest.

Authors:  Nicholas J Hudson; Laercio R Porto-Neto; James Kijas; Sean McWilliam; Ryan J Taft; Antonio Reverter
Journal:  BMC Bioinformatics       Date:  2014-03-07       Impact factor: 3.169

8.  Genomic scan reveals loci under altitude adaptation in Tibetan and Dahe pigs.

Authors:  Kunzhe Dong; Na Yao; Yabin Pu; Xiaohong He; Qianjun Zhao; Yizhao Luan; Weijun Guan; Shaoqi Rao; Yuehui Ma
Journal:  PLoS One       Date:  2014-10-17       Impact factor: 3.240

9.  Milk intake and risk of mortality and fractures in women and men: cohort studies.

Authors:  Karl Michaëlsson; Alicja Wolk; Sophie Langenskiöld; Samar Basu; Eva Warensjö Lemming; Håkan Melhus; Liisa Byberg
Journal:  BMJ       Date:  2014-10-28

10.  Extensive Admixture and Selective Pressure Across the Sahel Belt.

Authors:  Petr Triska; Pedro Soares; Etienne Patin; Veronica Fernandes; Viktor Cerny; Luisa Pereira
Journal:  Genome Biol Evol       Date:  2015-11-26       Impact factor: 3.416

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.