| Literature DB >> 30365549 |
Laurent Abi-Rached1,2, Philippe Gouret3, Jung-Hua Yeh1,2, Julie Di Cristofaro4,5, Pierre Pontarotti1,2,3, Christophe Picard4,5, Julien Paganini3.
Abstract
Defining worldwide human genetic variation is a critical step to reveal how genome plasticity contributes to disease. Yet, there is currently no metric to assess the representativeness and completeness of current and widely used data on genetic variation. We show here that Human Leukocyte Antigen (HLA) genes can serve as such metric as they are both the most polymorphic and the most studied genetic system. As a test case, we investigated the 1,000 Genomes Project panel. Using high-accuracy in silico HLA typing, we find that over 20% of the common HLA variants and over 70% of the rare HLA variants are missing in this reference panel for worldwide genetic variation, due to undersampling and incomplete geographical coverage, in particular in Oceania and West Asia. Because common and rare variants both contribute to disease, this study thus illustrates how HLA diversity can detect and help fix incomplete sampling and hence accelerate efforts to draw a comprehensive overview of the genetic variation that is relevant to health and disease.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30365549 PMCID: PMC6203392 DOI: 10.1371/journal.pone.0206512
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1HLA sampling (A) is much more widespread than that of the 1,000 Genomes Project (B), which represents the current reference panel of human genomewide diversity.
A. Circles represent the populations that were HLA typed. Colors and size of the circles correspond to the number of common alleles observed in each population (scales in the bottom left corner). B. Circles represent the region of origin for each of the 26 populations of the 1,000 Genomes Project. Each population has a color that corresponds to the associated geographical region: Europe (blue), Africa (yellow), Americas (red), South Asia (purple), and East Asia (green). A-B. Maps were generated using the ggplot2 package in R [16] and the world database.
Fig 2The HLA diversity in the 1000 Genomes Project panel only represents 78% of the expected diversity for the alleles with a frequency >1%.
The top part of the figure shows the % match (Y axis) between the expected HLA diversity at different frequency cutoffs (X axis) and the HLA diversity observed in the 1,000 Genomes Project panel. For each frequency cutoff, the number of expected alleles is displayed at the top of each histogram. The bottom part of the figure displays the same information on a locus by locus basis for the 1% cutoff.
Fig 3The common HLA alleles that are missing in the 1,000 Genome Project panel define a worldwide distribution with variable frequencies that range from low (1–2.5%) to high (>10%).
A. Worldwide distribution of the populations harboring alleles that are missing in the 1,000 Genomes Project. Colors and size of the circles correspond to the number of common alleles observed in each population (scale on the bottom left corner). The map was generated using the ggplot2 package in R [16] and the world database. B. Geographical distribution of the alleles that are missing in the 1,000 Genomes Project. For each region, the number of distinct alleles is given together with their maximum frequencies.