| Literature DB >> 29160618 |
T Goeury1,2, L E Creary3, L Brunet1,4, M Galan5, M Pasquier1,2, B Kervaire1,4, A Langaney1, J-M Tiercy2,4, M A Fernández-Viña3, J M Nunes1,2, A Sanchez-Mazas1,2.
Abstract
With the aim to understand how next-generation sequencing (NGS) improves both our assessment of genetic variation within populations and our knowledge on HLA molecular evolution, we sequenced and analysed 8 HLA loci in a well-documented population from sub-Saharan Africa (Mandenka). The results of full-gene NGS-MiSeq sequencing compared with those obtained by traditional typing techniques or limited sequencing strategies showed that segregating sites located outside exon 2 are crucial to describe not only class I but also class II population diversity. A comprehensive analysis of exons 2, 3, 4 and 5 nucleotide diversity at the 8 HLA loci revealed remarkable differences among these gene regions, notably a greater variation concentrated in the antigen recognition sites of class I exons 3 and some class II exons 2, likely associated with their peptide-presentation function, a lower diversity of HLA-C exon 3, possibly related to its role as a KIR ligand, and a peculiar molecular diversity of HLA-A exon 2, revealing demographic signals. Based on full-length HLA sequences, we also propose that the most frequent DRB1 allele in the studied population, DRB1*13:04, emerged from an allelic conversion involving 3 potential alleles as donors and DRB1*11:02:01 as recipient. Finally, our analysis revealed a high occurrence of the DRB1*13:04-DQA1*05:05:01-DQB1*03:19 haplotype, possibly resulting from a selective sweep due to protection to Onchorcerca volvulus, a prevalent pathogen in West Africa. This study unveils highly relevant information on the molecular evolution of HLA genes in relation to their immune function, calling for similar analyses in other populations living in contrasting environments.Entities:
Keywords: zzm321990Onchocerciasis; HLA nucleotide diversity; West Africa; allelic conversion; balancing selection; full-length HLA genes; population genetics; selective sweep
Mesh:
Substances:
Year: 2018 PMID: 29160618 PMCID: PMC5767763 DOI: 10.1111/tan.13180
Source DB: PubMed Journal: HLA ISSN: 2059-2302 Impact factor: 4.513
Number of genotype matches between PCR‐SSO, NGS‐454 and NGS‐MiSeq
| Locus | Pairwise comparisons between different typing techniques | ||||||
|---|---|---|---|---|---|---|---|
| PCR‐SSO vs NGS‐454 | NGS‐454 vs NGS‐MiSeq | PCR‐SSO vs NGS‐MiSeq | |||||
| HLA‐A | No. of genotyped individuals |
|
| ||||
| No. of compared individuals |
| ||||||
| No. of genotype matches (%) |
| ||||||
| CI95 | 97.0–100.0% | ||||||
| HLA‐B | No. of genotyped individuals |
|
| ||||
| No. of compared individuals |
| ||||||
| No. of genotype matches (%) |
| ||||||
| CI95 | 89.4–95.5% | ||||||
| HLA‐C | No. of genotyped individuals |
|
| ||||
| No. of compared individuals |
| ||||||
| No. of genotype matches (%) |
| ||||||
| CI95 | 72.7–72.7% | ||||||
| HLA‐DRB1 | No. of genotyped individuals |
|
|
|
|
|
|
| No. of compared individuals |
|
|
| ||||
| No. of genotype matches (%) |
|
|
| ||||
| CI95 | 63.6–80.3% | 87.9–92.4% | 57.5–65.2% | ||||
| HLA‐DQA1 | No. of genotyped individuals |
|
| ||||
| No. of compared individuals |
| ||||||
| No. of genotype matches (%) |
| ||||||
| CI95 | 97.0% | ||||||
| HLA‐DQB1 | No. of genotyped individuals |
|
|
|
|
|
|
| No. of compared individuals |
|
|
| ||||
| No. of genotype matches (%) |
|
|
| ||||
| CI95 | 72.7–87.9% | 90.9–93.9% | 15.2–19.7% | ||||
| HLA‐DPB1 | No. of genotyped individuals |
|
|
|
|
|
|
| No. of compared individuals |
|
|
| ||||
| No. of genotype matches (%) |
|
|
| ||||
| CI95 | 83.3–94.0% | 84.9–90.9% | 25.8–36.4% | ||||
No., Number; CI95, 95% confidence interval of the percentage of matches estimated by considering that the number of individuals compared between 2 different techniques is as small as the smallest number of individuals actually compared (ie, 66 individuals which is the number compared both for HLA‐C between PCR‐SSO and NGS‐MiSeq and for HLA‐DQA1 between NGS‐454 and NGS‐MiSeq); these intervals were obtained by drawing 1000 random samples of 66 individuals without replacement, see section 2.
Statistics describing the genetic diversity of the Mandenka population based on HLA molecular typings obtained by 3 different techniques, PCR‐SSO, NGS‐454 and NGS‐MiSeq
| Locus |
|
|
|
|
| ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PCR‐SSO (second‐field) | NGS‐454 | NGS‐MiSeq | PCR‐SSO (second‐field) | NGS‐454 | NGS‐MiSeq | PCR‐SSO (second‐field) | NGS‐454 | NGS‐MiSeq | PCR‐SSO (second‐field) | NGS‐454 | NGS‐MiSeq | PCR‐SSO (seond‐field) | 454 | NGS‐MiSeq | |
| A | 196 | – | 87 | 72 | – | 72 | 23/21.6 | – | 22/20.7 | 92.2 | – | 92.0 | 5 | – | 5 |
| B | 198 | – | 83 | 67 | – | 67 | 30/27.4 | – | 30/27.8 | 93.6 | – | 93.6 | 6 | – | 7 |
| C | 165 | – | 83 | 54 | – | 54 | 15/15 | – | 18/17.9 | 89.4 | – | 91.0 | 4 | – | 5 |
| DRB1 | 198 | 194 | 81 | 96 | 96 | 65 | 22/19.0 | 23/20.3 | 20/16.9 | 87.7 | 87.6 | 87.6 | 3 | 4 | 4 |
| DQA1 | – | 194 | 82 | – | 66 | 66 | – | 9/9 | 14/13.0 | – | 60.7 | 71.5 | – | 1 | 1 |
| DQB1 | 195 | 196 | 76 | 94 | 96 | 60 | 12/10.9 | 11/10.5 | 13/12.8 | 66.2 | 68.2 | 76.7 | 1 | 1 | 2 |
| DPA1 | – | – | 51 | – | – | 51 | – | – | 10/10 | – | – | 71.9 | – | – | 2 |
| DPB1 | 193 | 199 | 82 | 99 | 101 | 70 | 18/14.4 | 14/12.7 | 19/16.6 | 80.0 | 80.7 | 86.3 | 2 | 2 | 3 |
N (total), total number of individuals analysed for which the typing yielded usable results; N (unrelated), number of unrelated individuals analysed for which the typing yielded usable results; k, number of alleles detected; ar, allelic richness or number of alleles expected in a population whose size is equal to the smallest N used with a given technique (ie, 54 for PCR‐SSO, 66 for NGS‐454 and 54 for NGS‐MiSeq); H, heterozygosity; F50, number of most frequent alleles whose cumulated frequency reaches at least 50%.
Class I loci were not typed with NGS‐454, DQA1 was not typed with PCR‐SSO, and DPA1 was only typed with NGS‐MiSeq.
Reference sequences used for the alignments of the 8 HLA loci and their gene regions
| Locus | Reference gene | Reference exon | Number of exons | N Seqs |
|---|---|---|---|---|
| A |
|
| 8 | 174 |
| B |
|
| 7 | 166 |
| C |
|
| 8 | 166 |
| DRB1 |
|
| 6 | 160 |
| DQA1 |
|
| 4 | 158 |
| DQB1 |
|
| 6 | 154 |
| DPA1 |
|
| 4 | 102 |
| DPB1 |
|
| 5 | 166 |
N Seqs, number of consensus sequences retrieved after the filtering steps.
Figure 1Proportions of genotype matches (or « matching scores ») obtained between PCR‐SSO (“SSO” in the Figure), NGS‐454 (“454”) and NGS‐MiSeq (“MiSeq”) at 3 to 6 HLA loci (DQA1 was not typed with PCR‐SSO and class I genes were not typed with NGS‐454). For the comparisons involving NGS‐454, as the sequences obtained with this technique were limited to exon 2 and may thus correspond to different alleles, we reported a match when the allele found with the other technique was compatible with the NGS‐454 sequence. Only perfect matches (i.e. for any compared genotype, when both alleles found with the 2 techniques were either compatible or identical) were counted. The comparisons were replicated 1′000 times each on random samples of 66 individuals (corresponding to the lowest sample size of the observed data) drawn without replacement to assess the variability of genotype matches due to sampling size
Figure 2Nucleotide (top) and inferred amino acid (bottom) diversity per site (±σ) at exons encoding the peptide‐binding region (left, with a distinction between antigen‐recognition sites (ARS) and non‐antigen‐recognition sites (non‐ARS) sites); the domains interacting with CD4+ and CD8+ T‐cell receptors (middle); and the trans‐membrane region (right) of the HLA‐A, ‐B, ‐C, ‐DRB1, DQA1, DQB1 and DPB1 molecules in the Mandenka population. Brackets remind the chains forming the HLA‐DQ (DQA1 and DQB1) and HLA‐DP (DPA1 and DPB1) dimers
Results of Tajima's D and dN/dS selective neutrality tests for the 7 (among 86) regions rejecting significantly the null hypothesis of selective neutrality according to one or both tests (see Supplementary Information S05 for the results relative to the other regions)
| Gene region | B exon 2 (ARS) | A exon3 (ARS) | B exon 3 (ARS) | DPA1 intron 1 | DPB1 exon 2 (ARS) | DPB1 intron 2 | DPB1 exon 3 |
|---|---|---|---|---|---|---|---|
|
| 66 | 54 | 54 | 3584 | 75 | 4014 | 282 |
|
| 18 | 17 | 12 | 106 | 8 | 133 | 7 |
|
| 2.4 | 2.9 | 2.3 |
|
|
|
|
|
| .07 | .06 | .08 |
|
|
|
|
|
|
|
|
| ‐ |
| ‐ | .2 |
|
|
|
|
| ‐ |
| ‐ | ‐1.17 |
Size (bp), length of the region in base pairs; S, number of segregating sites; Adj. p‐value, p value corrected for multiple testing according to Benjamini Hochberg (fdr), α = .05. Z value is significant when outside the [‐1.96:1.96] interval. Values in bold are significant.
Figure 3Principal component analysis (axes 1 and 2, explaining respectively 60% and 20% of the total variance) based on Tajima's D, nucleotide diversity π, frequency of segregating sites S.freq, number of non‐synonymous dN and synonymous dS nucleotides at exons 2, 3 and 4 of loci HLA‐A, ‐B, ‐C, ‐DRB1, ‐DQA1, ‐DQB1, ‐DPA1, ‐DPB1 and exons 5 of loci HLA‐A, ‐B, ‐C. Symbols D, D_1, D_2, D_3 represent Tajima's D estimated on the whole gene region and at the first, second and third nucleotide of each codon, respectively. Similarly, π, π_1, π_2 and π_3 as well as S.freq, S.freq_1, S.freq_2 and S.freq_3 represent the nucleotide diversity π and the frequency of segregating sites S estimated on the whole gene region and at nucleotide positions 1, 2 and 3 of each codon, respectively. Grey boxes correspond to ARS codons, white boxes to non‐ARS codons. The inset graph at the bottom left represents the correlations between the projections of the variables (for each pair of variables, the correlation is measured by the cosine of the angle of the 2 variable vectors) on the plan of the PCA
Figure 4Putative mechanism of the allelic conversion mentioned in this study, which suggests an unidirectional transfer of genetic material including the « AGCGCC » pattern from a donor allele (potentially DRB1*04:05:01, DRB1*08:06 or DRB1*13:03:01 in the Mandenka) to the recipient allele DRB1*11:02:01, leading to the creation of the DRB1*13:04 allele