| Literature DB >> 31068377 |
Christelle Vangenot1, Pascal Gagneux2, Natasja G de Groot3, Adrian Baumeyer4, Médéric Mouterde1, Brigitte Crouau-Roy5, Pierre Darlu6, Alicia Sanchez-Mazas1,7, Audrey Sabbagh8, Estella S Poloni9,7.
Abstract
Among the many genes involved in the metabolism of therapeutic drugs, human arylamine N-acetyltransferases (NATs) genes have been extensively studied, due to their medical importance both in pharmacogenetics and disease epidemiology. One member of this small gene family, NAT2, is established as the locus of the classic human acetylation polymorphism in drug metabolism. Current hypotheses hold that selective processes favoring haplotypes conferring lower NAT2 activity have been operating in modern humans' recent history as an adaptation to local chemical and dietary environments. To shed new light on such hypotheses, we investigated the genetic diversity of the three members of the NAT gene family in seven hominid species, including modern humans, Neanderthals and Denisovans. Little polymorphism sharing was found among hominids, yet all species displayed high NAT diversity, but distributed in an opposite fashion in chimpanzees and bonobos (Pan genus) compared to modern humans, with higher diversity in Pan species at NAT1 and lower at NAT2, while the reverse is observed in humans. This pattern was also reflected in the results returned by selective neutrality tests, which suggest, in agreement with the predicted functional impact of mutations detected in non-human primates, stronger directional selection, presumably purifying selection, at NAT1 in modern humans, and at NAT2 in chimpanzees. Overall, the results point to the evolution of divergent functions of these highly homologous genes in the different primate species, possibly related to their specific chemical/dietary environment (exposome) and we hypothesize that this is likely linked to the emergence of controlled fire use in the human lineage.Entities:
Keywords: Arylamine N-acetyltransferases; drug metabolism; great apes; multigenic family; natural selection
Mesh:
Substances:
Year: 2019 PMID: 31068377 PMCID: PMC6643899 DOI: 10.1534/g3.119.400223
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Segregating sites identified in the three .
| Gene | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ancient genomes | |||||||||||||||||
| Position in human reference sequence | Human cds | SNP rs identifier | Alleles (amino acid change if non-synonymous) | ||||||||||||||
| Total sample size | 33 | 3 | 12 | 5 | 72 | 10 | 5 | 6 | 14 | ||||||||
| rs4987076 | |||||||||||||||||
| 18080015 | 459 | rs4986990 | G/A | ||||||||||||||
| rs4986783 | |||||||||||||||||
| Total sample size | 32 | 3 | 12 | 5 | 72 | 10 | 5 | 6 | 14 | ||||||||
| rs1801279 | T | T | |||||||||||||||
| 18257795 | 282 | rs1041983 | C/T | ||||||||||||||
| 18257858 | 345 | rs45532639 | C/T | A | A | A | A | A | |||||||||
| rs200585149 | |||||||||||||||||
| rs79050330 | |||||||||||||||||
| Total sample size | 32 | 3 | 13 | 5 | 75 | 10 | 5 | 6 | 14 | ||||||||
| 18228246 | rs73590295 | T/C | |||||||||||||||
| 18228285 | T/A | ||||||||||||||||
| 18228458 | rs372738250 | G/A | |||||||||||||||
| 18228616 | rs35548819 | T/C | |||||||||||||||
| 18228661 | rs546009408 | G/A | |||||||||||||||
| 18228673 | rs115350875 | T/C | |||||||||||||||
| 18228727 | rs530022558 | G/A | |||||||||||||||
| 18228959 | T/C | ||||||||||||||||
| 18229104 | rs74444655 | T/C | |||||||||||||||
Boxes shaded in light gray indicate the presence of the polymorphism in the relevant species/sub-species and those shaded in dark gray indicate fixation (detection for ancient genomes) of the derived allele in the species (if different from the human derived allele, the allele is indicated in the box).
The screened segments for the NAT1, NAT2 and NATP homologous sequences span from 18’079’545 to 18’080’447 (903 bp including the NAT1 coding exon), 18’257’489 to 18’258’603 (1,115 bp including the NAT2 coding exon) and 18’228’116 to 18’229’117 (1,002 bp including the NATP pseudogene) respectively, on chromosome 8 in the human reference sequence GRCh37/hg19. Non-synonymous mutations are shown in bold type.
Based on the individuals of this study, the gorillas of Prado-Martinez and the Gorilla gorilla gorilla draft assembly reference sequence (gorGor4, December 2014).
Based on the individuals of this study, the orangutans of Prado-Martinez and the Pongo pygmaeus abelii draft assembly reference sequence (ponAbe2, July 2007).
Polymorphism recording is based on the chimpanzee and bonobo individuals of the present study and those of Prado-Martinez , cross-checked with the Pan troglodytes verus assembly reference sequence (panTro4, February 2011) and the Pan paniscus draft assembly reference sequence (panPan1, May 2012).
Total number of genotypes, including genotypes of individuals deduced from their descendants (see Supplementary Figure S1A and Supplementary File S1).
Segregating sites identified in the three .
| Gene | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Position in human reference sequence | Alleles (amino acid change if non-synonymous) | |||||||||
| Hybrid | Human cds | Fixed position | Variable position (SNP rs identifier in Ensembl) | |||||||
| | 72 | 10 | 5 | 6 | 1 | 14 | ||||
| 18079703 | C/T | 147 | C | |||||||
| 18079859 | C/T | 303 | C | |||||||
| 18079925 | T/C | 369 | T | |||||||
| 18080014 | C/T | 458 | C/T (rs374226986) | |||||||
| | 72 | 10 | 5 | 6 | 1 | 14 | ||||
| 18257549 | T/C | 36 | T | |||||||
| 18258302 | G/T | 789 | T | |||||||
| 18258447 | G/A | 934 | G | |||||||
| 18258462 | C/G | 949 | C | |||||||
| | 75 | 10 | 5 | 6 | 1 | 14 | ||||
| 18228146 | G/T | — | G | |||||||
| 18228189 | C/T | — | C | |||||||
| 18228238 | A/G | — | A | |||||||
| 18228242 | A/G | — | A | |||||||
| 18228285 | T/A | N | N | — | T/A | |||||
| 18228304 | C/T | — | C | |||||||
| 18228368 | T/C | — | T | |||||||
| 18228404 | C/T | — | C | |||||||
| 18228501 | C/T | — | C | |||||||
| 18228543 | A/G | — | A | |||||||
| 18228560 | C/A | — | C | |||||||
| 18228582 | G/T | — | G | |||||||
| 18228614 | G/T | — | G | |||||||
| 18228659 | G/A | — | G | |||||||
| 18228660 | C/T | — | G | |||||||
| 18228661 | G/A | — | G/A (rs546009408) | |||||||
| 18228748 | C/T | — | T | |||||||
| 18228771 | C/T | — | T | |||||||
| 18228959 | T/C | — | T | |||||||
| 18229057 | G/A | — | C | |||||||
| 18229103 | A/T | — | A | |||||||
Boxes shaded in light gray indicate the presence of the polymorphism in the relevant Pan species/sub-species and those shaded in dark gray indicate a fixation of the derived allele in the Bonobo species.
The screened segments for the NAT1, NAT2 and NATP homologous sequences span from 18’079’545 to 18’080’447 (903 bp including the NAT1 coding exon), 18’257’489 to 18’258’603 (1,115 bp including the NAT2 coding exon) and 18’228’116 to 18’229’117 (1,002 bp including the NATP pseudogene) respectively, on chromosome 8 in the human reference sequence GRCh37/hg19. Non-synonymous mutations are shown in bold type.
Polymorphism recording is based on the individuals of the present study and the chimpanzees of Prado-Martinez cross-checked with the Pan troglodytes verus assembly reference sequence (panTro4, February 2011).
Hybrid Western (P. t. verus) / Central (P. t. troglodytes) individual.
Based on the individual of this study (Bonobo), the bonobos of Prado-Martinez and the Pan paniscus draft assembly reference sequence (panPan1, May 2012).
With the exception of human NAT2 polymorphisms rs1801279 (G/A at cds position 191) and rs79050330 (C/T at cds position 578), which are common variants in humans, all other human polymorphisms, including those with a SNP rs identifier, are rare variants, detected with a highest population MAF < 0.01 in Ensembl (http://www.ensembl.org/Homo_sapiens/Info/Index).
Total number of genotypes, including genotypes of individuals deduced from their descendants (see Supplementary Figure S1A and Supplementary File S1).
Non-coding positions downstream of NAT2 coding exon (3′UTR region).
Undefined position in some Nigeria-Cameroun and Eastern chimpanzee samples (indicated by N, see Supplementary Table S5). The human T/A SNP at position 18’228’285, at present only described in Mortensen at very low frequency, is not reported in Ensembl; we note that Ensembl reports another rare T/A SNP, with associated SNP identifier rs546046491, in the next contiguous position (18’228’286), and both variants are embedded in a stretch of T nucleotides on the human reference sequence (from 18’228’278 to 18’228’286).
Figure 2Expected heterozygosity (h) and nucleotide diversity (π x 10−3) at the three NAT genes in Pan species and sub-species (left panes) and in human populations (right panes). The variation of values among the 122 San Diego P. t. verus sub-samples (left panes) and among human populations samples (right panes) are shown by boxplots. The dotted lines were added to the graphs to highlight inter-locus variation. For Pan, P-values of Wilcoxon rank-sum tests after adjustment for multiple testing (and using only the average value for the San Diego sample) were of 0.039 for NAT1 vs. NAT2, 0.065 for NAT1 vs. NATP, and 0.065 for NAT2 vs. NATP, respectively, for differences in expected heterozygosity (h), and of 0.0065 for NAT1 vs. NAT2, 0.5887 for NAT1 vs. NATP, and 0.0974 for NAT2 vs. NATP, respectively, for differences in nucleotide diversity (π). When restricting the tests to the chimpanzee (P. troglodytes) data only, P-values were of 0.012 for NAT1 vs. NAT2, 0.222 for NAT1 vs. NATP, and 0.012 for NAT2 vs. NATP, respectively, for expected heterozygosity, and of 0.036 for NAT1 vs. NAT2, 0.018 for NAT1 vs. NATP, and 0.018 for NAT2 vs. NATP, respectively for nucleotide diversity. For human populations, adjusted Wilcoxon rank-sum tests P-values for differences in both expected heterozygosity and nucleotide diversity were all < 0.0001 in the comparisons of NAT1 vs. NAT2, and NAT1 vs. NATP, and of 0.58 and 0.048 for expected heterozygosity and nucleotide diversity, respectively, in the NAT2 vs. NATP comparison.
Haplotypes of the three NAT gene paralogs in the genus Pan
| Position | 79’632 | 79’703 | 79’859 | 79’897 | 79’925 | 80’014 | 80’074 | 80’153 | 80’316 | 80’345 |
|---|---|---|---|---|---|---|---|---|---|---|
| SNP | G76A | C147T | C303T | T341C | T369C | C458T | A518C | T597G | G760C | A789G |
| Amino acid change | D26N | I114T | E173A | I199M | E254Q | I263M | ||||
| Haplotypes | ||||||||||
| G | C | C | T | T | C | A | T | G | A | |
| . | T | . | . | . | . | . | . | . | . | |
| . | . | . | . | . | . | . | . | . | G | |
| . | T | . | . | . | . | . | G | . | . | |
| A | . | . | . | . | . | . | . | . | . | |
| . | . | . | . | C | . | . | . | . | . | |
| . | . | . | . | . | . | . | . | C | . | |
| . | . | . | C | . | . | . | . | . | . | |
| . | . | . | . | . | T | C | . | . | . | |
| A | T | . | . | . | . | . | . | . | . | |
| . | . | . | . | . | . | C | . | . | . | |
| . | . | T | . | . | . | . | . | . | . |
Position (+18’000’000) on GRCh37/hg19.
SNP position relative to the coding exon of NAT1, NAT2, or its paralog sequence on NATP (starts at position 1).
Position (+18’200’000) on GRCh37/hg19.
NAT haplotype frequencies (%) estimated in the different species and sub-species of the genus Pan and results of Hardy-Weinberg equilibrium tests.
| San Diego sample | BPRC sample | |||||
|---|---|---|---|---|---|---|
| 79.33 (1.71) | 80.43 | 65.00 | 70.00 | 66.70 | 25.00 | |
| 7.97 (0.94) | 4.35 | 5.00 | 0 | 0 | 0 | |
| 1.59 (1.38) | 0 | 0 | 10.00 | 8.33 | 0 | |
| 8.33 (0) | 13.04 | 0 | 0 | 0 | 0 | |
| 2.78 (0) | 0 | 0 | 0 | 0 | 0 | |
| 0 | 2.17 | 0 | 0 | 0 | 53.60 | |
| 0 | 0 | 10.00 | 0 | 0 | 0 | |
| 0 | 0 | 0 | 10.00 | 0 | 0 | |
| 0 | 0 | 0 | 0 | 8.33 | 0 | |
| 0 | 0 | 20.00 | 10.00 | 8.33 | 0 | |
| 0 | 0 | 0 | 0 | 8.33 | 0 | |
| 0 | 0 | 0 | 0 | 0 | 21.40 | |
| Total (2n chromosomes) | 36 | 46 | 20 | 10 | 12 | 28 |
| Hardy-Weinberg test | ||||||
| 0.36 (0.03) | 0.26 | 0.50 | 0.60 | 0.50 | 0.71 | |
| 0.37 (0.03) | 0.34 | 0.55 | 0.53 | 0.58 | 0.63 | |
| ∈ [0.39 ; 0.64] | 0.20 | 0.35 | > 0.99 | 0.51 | 0.45 | |
| 92.49 (1.27) | 91.30 | 5.00 | 10.00 | 0 | 0 | |
| 2.41 (0.94) | 4.35 | 0 | 0 | 0 | 0 | |
| 0 | 2.17 | 0 | 0 | 0 | 0 | |
| 5.10 (1.03) | 0 | 10.00 | 80.00 | 91.70 | 0 | |
| 0 | 2.17 | 0 | 0 | 0 | 0 | |
| 0 | 0 | 85.00 | 10.00 | 0 | 0 | |
| 0 | 0 | 0 | 0 | 0 | 89.3 | |
| 0 | 0 | 0 | 0 | 0 | 3.57 | |
| 0 | 0 | 0 | 0 | 0 | 7.14 | |
| 0 | 0 | 0 | 0 | 8.33 | 0 | |
| Total (2n chromosomes) | 36 | 46 | 20 | 10 | 12 | 28 |
| Hardy-Weinberg test | ||||||
| 0.15 (0.03) | 0.17 | 0.30 | 0.20 | 0.17 | 0.21 | |
| 0.15 (0.02) | 0.17 | 0.28 | 0.38 | 0.17 | 0.20 | |
| ∈ [0.08 ; > 0.99 ] | > 0.99 | > 0.99 | 0.11 | > 0.99 | > 0.99 | |
| 24.61 (2.46) | 44.00 | 0 | 10.00 | 0 | 0 | |
| 52.14 (1.7) | 42.00 | 40.00 | 10.00 | 58.30 | 0 | |
| 0 | 0 | 0 | 20.00 | 0 | 0 | |
| 0 | 6.00 | 0 | 0 | 0 | 0 | |
| 0 | 2.00 | 0 | 0 | 0 | 0 | |
| 2.32 (1.03) | 0 | 0 | 0 | 0 | 0 | |
| 19.33 (2.21) | 6.00 | 0 | 0 | 0 | 0 | |
| 1.59 (1.38) | 0 | 25.00 | 20.00 | 8.33 | 0 | |
| 0 | 0 | 0 | 30.00 | 0 | 0 | |
| 0 | 0 | 0 | 0 | 8.33 | 0 | |
| 0 | 0 | 10.00 | 0 | 16.67 | 0 | |
| 0 | 0 | 5.00 | 0 | 0 | 0 | |
| 0 | 0 | 0 | 0 | 0 | 0 | |
| 0 | 0 | 5.00 | 0 | 0 | 0 | |
| 0 | 0 | 15.00 | 0 | 0 | 0 | |
| 0 | 0 | 0 | 10.00 | 0 | 0 | |
| 0 | 0 | 0 | 0 | 8.33 | 0 | |
| 0 | 0 | 0 | 0 | 0 | 96.43 | |
| 0 | 0 | 0 | 0 | 0 | 3.57 | |
| Total (2n chromosomes) | 36 | 50 | 20 | 10 | 12 | 28 |
| Hardy-Weinberg test | ||||||
| 0.64 (0.04) | 0.60 | 0.90 | 0.60 | 0.67 | 0.07 | |
| 0.65 (0.01) | 0.64 | 0.78 | 0.89 | 0.67 | 0.07 | |
| ∈ [0.09 ; > 0.99 ] | 0.06 | 0.15 | 0.76 | > 0.99 | ||
Average over the 122 sub-samples (see text), standard deviation in brackets.
Test for departure from Hardy-Weinberg equilibrium; Ho: observed heterozygosity, He: expected heterozygosity (equivalent to gene diversity). The only significant deviation from equilibrium (heterozygote excess at NATP in P. t. ellioti) is shown in bold.
Haplotype NATP*13, which combines SNPs at 170, 253, 289, 386, 499, 633, and 656 (Table 3), was inferred only for the genotype of the hybrid P. t. verus/troglodytes individual.
P-value > 0.05 after correction for multiple testing.
Figure 1Haplotype frequency distributions at the three NAT genes in Pan species and sub-species.
Predictions of the effect of mutations between Pan NAT1 and NAT2 coding sequences according to PolyPhen, SIFT and PANTHER cSNP Scoring.
| PolyPhen | SIFT | PANTHER cSNP Scoring | ||||||
|---|---|---|---|---|---|---|---|---|
| Haplotypes | cDNA | protein | Score | Prediction | Score | Prediction | PSEP | Prediction |
| A789G | I263M | 0.279 (0.91-0.88) | B | 0.08 (3.08, 80) | T | 220 | POD | |
| T597G | I199M | 0.369 (0.9-0.89) | B | 0.01 (3.07, 81) | A | 220 | POD | |
| G76A | D26N | 0.377 (0.9-0.89) | B | 0.1 (3.08, 76) | T | 91 | B | |
| G760C | E254Q | 0.892 (0.82-0.94) | POD | 0.07 (3.08, 80) | T | 455 | PRD | |
| T341C | I114T | 0.099 (0.93-0.85) | B | 0.06 (3.07, 81) | T | 220 | POD | |
| A518C | E173A | 0.013 (0.96-0.78) | B | 0.17 (3.07, 81) | T | 30 | B | |
| C578T | T193M | 1 (0.00-1.00) | PRD | 0 (3.07, 51) | A | 456 | PRD | |
| A514G | N172D | 0.001 (0.99-0.15) | B | 0.26 (3.07, 51) | T | 220 | POD | |
| G145A | E49K | 0.002 (0.99-0.3) | B | 0.5 (3.07, 50) | T | 324 | POD | |
| NAT2*8 | G191A | R64Q | 1.00 (0.00-1) | PRD | 0 (3.07, 50) | A | 4200 | PRD |
| NAT2*9 | A72C | L24F | 1 (0.00-1) | PRD | 0 (3.07, 50) | A | 4200 | PRD |
PolyPhen score: probability that a substitution is damaging; sensibility and specificity in brackets.
PolyPhen prediction: “benign” (B), “possibly damaging” (POD), “probably damaging” (PRD).
SIFT score: probability that a substitution is tolerated; median sequence information and number of sequences used for the prediction in brackets.
SIFT prediction: T: “tolerated” (T), A: “affect protein function” (A).
PANTHER cSNP Scoring PSEP (position-specific evolutionary preservation): length of time (in millions of years) of preservation of a position.
PANTHER cSNP Scoring prediction: “probably damaging” (PRD), “possibly damaging” (POD), “probably benign” (B).
The reference Pan NAT1 haplotype used is the basal haplotype in the network of NAT1 sequences (Supplementary Figure S2).
The reference Pan NAT2 haplotype used is the basal haplotype in the network of NAT2 sequences (Supplementary Figure S3). Since NAT2*1 differs from NAT2*4 at a single position located 61 bp downstream the coding exon relative to the stop codon (A934G, Table 3), the two haplotypes likely translate into a similar gene product, so that haplotypes deriving from NAT2*1 could be predicted using NAT2*4 as a reference. Instead, both haplotypes NAT2*8 and NAT2*9 derive from NAT2*7, which differs from the basal haplotype at SNP G145A (E49K, Table 3). Thus, for the non-synonymous mutations defining NAT2*8 and NAT2*9, predictions were performed using NAT2*7 as a reference.
Haplotypes NAT2*8 and NAT2*9 both bear the G145A mutation defining haplotype NAT2*7. Since the prediction tools do not allow the simultaneous specification of two substitutions, we ran the prediction tools for G191A and A72C against NAT2*7 as a reference, instead of NAT2*4.
Genetic diversity and results of selective neutrality tests for the three NAT gene paralogs in the different species and sub-species of the genus Pan, with equivalent estimates in human populations shown in the last column.
| San Diego sample | BPRC sample | (human populations average) | |||||
|---|---|---|---|---|---|---|---|
| Total (2N chromosomes) | 36 | 46 | 20 | 10 | 12 | 28 | 119.7 |
| Number of usable positions | 903 | 903 | 898 | 898 | 898 | 898 | 903 |
| Number of segregating sites ( | 3.57 (0.5) | 3 | 3 | 4 | 5 | 2 | 3.75 |
| Number of haplotypes ( | 4.57 (0.5) | 4 | 4 | 4 | 5 | 3 | 3.5 |
| Expected heterozygosity ( | 0.37 (0.03) | 0.34 | 0.55 | 0.53 | 0.58 | 0.63 | 0.095 |
| Nucleotide diversity ( | 0.58 (0.03) | 0.63 | 1.02 | 0.89 | 1.08 | 0.96 | 0.187 |
| Ewens-Watterson test | |||||||
| | 0.64 (0.026) | 0.67 | 0.48 | 0.52 | 0.47 | 0.40 | 0.902 |
| | 0.45 (0.041) | 0.52 | 0.44 | 0.37 | 0.30 | 0.59 | 0.618 |
| | ∈ [0.84 ; 0.95] | 0.82 (0.18) | 0.70 (0.10) | 0.09 (0.25) | 9 ( | ||
| Tajima’s | |||||||
| | −0.91 (0.17) | −0.35 | 0.24 | −1.67 | −1.53 | 1.43 | −1.475 |
| | ∈ [0.112 ; 0.282] | 0.398 (0.642) | 0.642 (0.099) | 0.031 (0.061) | 0.056 (0.056) | 0.917 (0.752) | 11 ( |
| Fu’s | |||||||
| | −1.64 (0.47) | −0.59 | −0.20 | −1.35 | −1.98 | 1.19 | −2.849 |
| | ∈ [0.038 ; 0.197] | 0.321 (0.642) | 0.399 (0.901) | 0.043 (0.061) | 0.024 (0.048) | 0.742 (0.742) | 8 ( |
| Total (2N chromosomes) | 36 | 46 | 20 | 10 | 12 | 28 | 137.8 |
| Number of usable positions | 1’115 | 1’115 | 1’091 | 1’091 | 1’091 | 1’112 | 1'115 |
| Number of segregating sites ( | 1.87 (0.34) | 3 | 2 | 2 | 3 | 2 | 9.78 |
| Number of haplotypes ( | 2.87 (0.34) | 4 | 3 | 3 | 2 | 3 | 10.7 |
| Expected heterozygosity ( | 0.15 (0.02) | 0.17 | 0.28 | 0.38 | 0.17 | 0.20 | 0.761 |
| Nucleotide diversity ( | 0.13 (0.02) | 0.15 | 0.41 | 0.50 | 0.45 | 0.19 | 2.041 |
| Ewens-Watterson test | |||||||
| | 0.86 (0.02) | 0.84 | 0.74 | 0.66 | 0.85 | 0.80 | 0.249 |
| | 0.63 (0.05) | 0.52 | 0.56 | 0.49 | 0.70 | 0.59 | 0.291 |
| | ∈ [0.75 ; | 0.90 (0.69) | 0.92 (0.92) | 0 / 18 | |||
| Tajima’s | |||||||
| | −1.26 (0.19) | −1.58 | −0.44 | −0.69 | −1.63 | −1.24 | 0.639 |
| | ∈ [0.034 ; 0.158] | 0.337 (0.521) | 0.237 (0.246) | 0.076 (0.076) | 2 (0) / 18 | ||
| Fu’s | |||||||
| | −1.84 (0.55) | −3.43 | −0.377 | −0.59 | 1.054 | −1.59 | −0.765 |
| | ∈ [ | 0.260 (0.521) | 0.123 (0.246) | 0.595 (0.595) | 2 (0) / 18 | ||
| Total (2N chromosomes) | 36 | 50 | 20 | 10 | 12 | 28 | 128.8 |
| Number of usable positions | 1’000 | 1’000 | 936 | 937 | 936 | 975 | 1'002.61 |
| Number of segregating sites ( | 3.51 (0.62) | 4 | 7 | 8 | 6 | 1 | 9.22 |
| Number of haplotypes ( | 4.41 (0.61) | 5 | 6 | 6 | 5 | 2 | 10.6 |
| Expected heterozygosity ( | 0.65 (0.01) | 0.64 | 0.78 | 0.89 | 0.67 | 0.07 | 0.755 |
| Nucleotide diversity ( | 0.81 (0.05) | 0.77 | 2.60 | 2.75 | 1.74 | 0.07 | 1.804 |
| Ewens-Watterson test | |||||||
| | 0.37 (0.01) | 0.38 | 0.26 | 0.20 | 0.39 | 0.93 | 0.255 |
| | 0.46 (0.06) | 0.44 | 0.29 | 0.22 | 0.30 | 0.75 | 0.269 |
| | ∈ [0.07 ; 0.51] | 0.40 (0.93) | 0.43 (0.86) | 0.48 (0.95) | 0.95 (0.83) | 1 (0) / 18 | |
| Tajima’s | |||||||
| | −0.06 (0.36) | −0.30 | 1.04 | −0.11 | −0.47 | −1.15 | 0.018 |
| | ∈ [0.383 ; 0.849] | 0.424 (0.935) | 0.865 (0.594) | 0.483 (0.947) | 0.338 (0.637) | 0.138 (0.138) | 0 / 18 |
| Fu’s | |||||||
| | −0.37 (0.55) | −0.81 | 0.48 | −1.02 | −0.55 | −1.15 | −1.771 |
| | ∈ [0.245 ; 0.706] | 0.312 (0.935) | 0.623 (0.863) | 0.222 (0.667) | 0.318 (0.637) | 3 ( | |
Average over the 122 samples, standard deviation in brackets.
Average values for 18 to 20 human populations from four continents; single population values are reported in Supplementary Tables S7 and S12.
Ewens-Watterson test for departure from selective neutrality and demographic equilibrium; Fo: observed homozygosity, Fe: expected homozygosity; the P-value is given as the proportion of random Fe values generated under the neutral equilibrium model that are smaller than, or equal to the observed Fo value. Significant deviations (P-value < 0.025 or > 0.975) are shown in bold, and after correction for multiple testing in brackets; for humans, we report the number of population samples associated with a significant deviation (before slash, and significant after correction for multiple testing in bold and brackets) on the total number of population samples tested (after slash, see Supplementary Table S12).
Tajima’s D test for departure from selective neutrality and demographic equilibrium; the P-value is given as the proportion of random D values generated under the neutral equilibrium model that are smaller than, or equal to the observed D value. Significant deviations (P-value < 0.025 or > 0.975) are shown in bold, and after correction for multiple testing in brackets; for humans, we report the number of population samples associated with a significant deviation (before slash, and significant after correction for multiple testing in bold and brackets) on the total number of population samples tested (after slash, see Supplementary Table S12).
Fu’s FS test for departure from selective neutrality and demographic equilibrium; the P-value is given as the proportion of random FS values generated under the neutral equilibrium model that are smaller than, or equal to the observed FS value. Significant deviations (P-value < 0.02) are shown in bold, and after correction for multiple testing in brackets; for humans, we report the number of population samples associated with a significant deviation (before slash, and significant after correction for multiple testing in bold and brackets) on the total number of population samples tested (after slash, see Supplementary Table S12).
Twenty tests out of 122 (16%) indicated significant homozygosity excess (P-value > 0.975), thus exceeding by 13 tests the expected proportion of 5% (6.1 out of 122) false positives.
One hundred and six tests out of 122 (87%) indicated significant deviation from neutral expectation (P-value < 0.02), thus exceeding by 103 tests the expected proportion of 2% (2.44 out of 122) false positives.