| Literature DB >> 35694745 |
Elena Campoy1, Marta Puig1,2, Illya Yakymenko1, Jon Lerga-Jaso1, Mario Cáceres1,3.
Abstract
Supergenes are involved in adaptation in multiple organisms, but they are little known in humans. Genomic inversions are the most common mechanism of supergene generation and maintenance. Here, we review the information about two large inversions that are the best examples of potential human supergenes. In addition, we do an integrative analysis of the newest data to understand better their functional effects and underlying genetic changes. We have found that the highly divergent haplotypes of the 17q21.31 inversion of approximately 1.5 Mb have multiple phenotypic associations, with consistent effects in brain-related traits, red and white blood cells, lung function, male and female characteristics and disease risk. By combining gene expression and nucleotide variation data, we also analysed the molecular differences between haplotypes, including gene duplications, amino acid substitutions and regulatory changes, and identify CRHR1, KANLS1 and MAPT as good candidates to be responsible for these phenotypes. The situation is more complex for the 8p23.1 inversion, where there is no clear genetic differentiation. However, the inversion is associated with several related phenotypes and gene expression differences that could be linked to haplotypes specific of one orientation. Our work, therefore, contributes to the characterization of both exceptional variants and illustrates the important role of inversions. This article is part of the theme issue 'Genomic architecture of supergenes: causes and evolutionary consequences'.Entities:
Keywords: gene expression; humans; inversions; phenotypic traits; supergenes
Mesh:
Year: 2022 PMID: 35694745 PMCID: PMC9189494 DOI: 10.1098/rstb.2021.0209
Source DB: PubMed Journal: Philos Trans R Soc Lond B Biol Sci ISSN: 0962-8436 Impact factor: 6.671
Figure 1Summary of the genomic structure of two potential human inversion supergenes. (a,b) Linkage disequilibrium (LD) distribution of the 17q21.31 (hg38, chr17: 45 495 836–46 707 123) (a) and 8p23.1 (hg38, chr8:7 064 966–12 716 088) (b) inversions in European (EUR) and global (GLB) populations. Each dot represents a polymorphic variant within the inversion ± 500 kb region. LD was calculated from 1000 Genome Project (1000GP) individuals in which the inversion has been genotyped experimentally (N) and is shown in different colours: r2 < 0.8 (blue), r2 = 0.8–0.95 (yellow) and r2 > 0.95 (red). The rectangles above the graphs summarize the inversion region (light grey) and the variable segmental duplication (SD) blocks at its breakpoints (dark grey). (c) Structure of the inversion 17q21.31 H1 and H2 haplotypes updated from Boettger et al. and Steinberg et al. [21,22]. Coloured blocks indicate repeated segments at the breakpoints, with partial copies from the same repeat in lighter colours. The copy number variable fragments corresponding to α, β and γ duplications are indicated above [21]. Arrows below each orientation represent protein-coding genes (grey), non-coding genes (black) and pseudogenes (white). Small orange boxes correspond to the duplication of the first KANSL1 exons and two pseudogenes. (d) Structure of the 8p23.1 inversion region according to Mohajeri et al. [24]. SD organization within the REPD and REPP repeat blocks in O1 (light green rectangles) is not indicated because it has not been completely resolved and it includes gaps (indicated as black bars) and known assembly errors. Owing to the size of the region, all protein-coding genes (grey arrows) and only two non-coding genes mentioned in the main text (black arrows) are shown, with β-defensin clusters pictured as light blue boxes.
Figure 2Summary of phenotypic associations for 17q21.31 (a) and 8p23.1 (b) inversions. GWAS Catalog signals (p < 5 × 10–8, dotted line) in high LD (r2 ≥ 0.8) with the inversions in the closest studied population are shown grouped by categories of related traits with different colours. Upwards and downwards triangles illustrate direction of effect of the inverted allele (H2 or O2) and dots unknown direction. Complete data are described in the electronic supplementary material, tables S2 and S3. (Online version in colour.)
Potential functional effects of SNP differences in high LD (r2 > 0.95) with 17q21.31 H1 and H2 haplotypes according to VEP analysis. (The combined annotation dependent depletion (CADD) value scores the predicted deleteriousness of variants by integrating multiple annotations, including conservation and functional information. Variants with scores greater than 20 are predicted to be the 1% most deleterious substitutions in the human genome. n.a., not applicable.)
| SNP effect | no. SNPs | affected elements | no. genes | genes |
|---|---|---|---|---|
| amino acid change (CADD > 20) | 5 | n.a. | 3 | |
| amino acid change (CADD < 20) | 12 | n.a. | 5 | |
| synonymous change | 16 | n.a. | 4 | |
| UTRs | 58 | n.a. | 3 | |
| splicing signals | 2 | n.a. | 1 | |
| promoters | 60 | 12 | 9 | |
| transcription factor binding sites | 25 | 43 | n.a. | n.a. |
| enhancers | 76 | 24 | n.a. | n.a. |
| CTCF binding sites | 126 | 49 | n.a. | n.a. |
Figure 3Tissue-specific association of gene expression with the inversion and β and γ duplications in the 17q21.31 region. Variant cis-eQTL effects were estimated by testing associations between genotypes and gene expression measures from multiple GTEx tissues and Geuvadis lymphoblastoid cell lines (LCLs) [38,39] (see the electronic supplementary material, methods). eQTL effect size and direction associated with the presence of the inversion or duplication is illustrated by the colour gradient and the p-value by the dot size, with black squares indicating when the variants are lead eQTLs or in high LD (r² ≥ 0.95) with top variants in the corresponding dataset (electronic supplementary material, table S6). Genes are ordered according to their genomic position. Figure is an updated version of that in Degenhardt et al. [29].