| Literature DB >> 32998693 |
Marie Klumplerova1,2, Petra Splichalova1,2, Jan Oppelt2,3,4, Jan Futas1,2, Aneta Kohutova1,5, Petra Musilova6,7, Svatava Kubickova6,7, Roman Vodicka8, Ludovic Orlando9,10, Petr Horin11,12.
Abstract
BACKGROUND: The mammalian Major Histocompatibility Complex (MHC) is a genetic region containing highly polymorphic genes with immunological functions. MHC class I and class II genes encode antigen-presenting molecules expressed on the cell surface. The MHC class II sub-region contains genes expressed in antigen presenting cells. The antigen binding site is encoded by the second exon of genes encoding antigen presenting molecules. The exon 2 sequences of these MHC genes have evolved under the selective pressure of pathogens. Interspecific differences can be observed in the class II sub-region. The family Equidae includes a variety of domesticated, and free-ranging species inhabiting a range of habitats exposed to different pathogens and represents a model for studying this important part of the immunogenome. While equine MHC class II DRA and DQA loci have received attention, the genetic diversity and effects of selection on DRB and DQB loci have been largely overlooked. This study aimed to provide the first in-depth analysis of the MHC class II DRB and DQB loci in the Equidae family.Entities:
Keywords: Family Equidae; MHC class II loci; MHC exon 2; Major histocompatibility complex; Positive selection; Selected amino acid sites; Trans-species polymorphism
Mesh:
Year: 2020 PMID: 32998693 PMCID: PMC7525986 DOI: 10.1186/s12864-020-07089-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1The horse MHC DR and DQ loci studied. Expressed DR and DQ genes located on ECA20 of EquCab3.0 are represented as solid-line arrows, pseudogenes as dashed-line arrows. Transcriptional directionality is indicated by arrows. Genes analyzed in this study are in bold. Based on Viluma et al. [24]
Percentage of non-zero-coverage bases of DRB and DQB loci among equid genomes. Values close to zero indicate absence of the sequence. Non-zero-coverage lower than 10% is shown in bold
| 67 | 98 | 96 | 63 | 98 | ||
| 30 | 81 | 80 | 39 | 92 | ||
| 35 | 96 | 90 | 52 | 57 | 98 | |
| 35 | 95 | 95 | 59 | 74 | 97 | |
| 55 | 94 | 94 | 43 | 83 | 94 | |
| 60 | 91 | 94 | 43 | 95 | ||
| 36 | 95 | 94 | 49 | 93 | ||
| 54 | 97 | 98 | 72 | 99 | ||
| 49 | 99 | 93 | 51 | 93 | 97 | |
| 41 | 97 | 98 | 56 | 93 | 100 | |
| 48 | 92 | 98 | 55 | 94 | 88 | |
| 39 | 94 | 99 | 55 | 90 | 99 | |
| 46 | 95 | 99 | 61 | 100 | ||
| 47 | 98 | 98 | 65 | 79 | 96 |
Numbers of exon 2 nucleotide sequences identified in individual equid species and sub-species
| Species/Sub-species | No. of individuals | No. of exon 2 sequences (No. of sequences shared by at least two species/sub-species) | ||||||
|---|---|---|---|---|---|---|---|---|
| 11 | 2 | 6 (1) | 3 | 5 | 5 (1) | 1 | 4 (2) | |
| 3 | 0 | 2 (1) | 0 | 1 (1) | 2 (1) | 0 | 4 (3) | |
| 2 | 1 | 1 | 1 | 1 (1) | 1 | 0 | 2 (1) | |
| 2 | 0 | 1 (1) | 1 (1) | 1 (1) | 1 | 1 (1) | 1 (1) | |
| 2 | 1 | 1 (1) | 2 | 2 | 1 | 1 (1) | 3 (2) | |
| 2 | 1 | 1 | 1 | 1 | 1 | 0 | 1 (1) | |
| 2 | 1 | 1 | 2 (1) | 2 (1) | 1 | 2 | 2 (1) | |
| 2 | 1 | 2 | 1 (1) | 1 (1) | 1 | 1 | 2 (1) | |
| 2 | 1 | 1 | 2 | 0 | 1 | 0 | 1 (1) | |
| 2 | 1 | 2 | 2 | 1 | 3 | 2 | 3 (2) | |
| 2 | 2 | 1 | 1 | 2 | 1 | 1 (1) | 3 (2) | |
| 2 | 1 | 2 | 3 | 2 (1) | 0 | 2 | 2 | |
| No. of sequences (No. of sequences shared by at least two species/sub-species) | 12 (0) | 19 (2) | 15 (1) | 14 (1) | 16 (1) | 9 (1) | 13 (4) | |
| No. of amino acid sequences (No. of sequences with stop codon / frame-shift deletion) | 12 (0/0) | 19 (2/0) | 14 (0/0) | 14 (2/1) | 16 (0/0) | 9 (0/0) | 10 (0/0) | |
Standard diversity indices and global-selection at individual genes and at sub-region-level
| Locus/sub-region | Length (bp) | N | VNP | PIP | Z-test | |
|---|---|---|---|---|---|---|
| p-value | dN-dS | |||||
| 269 | 17 | 99 | 68 | 1.856 | ||
| 269 | 19 | 92 | 46 | 0.062 | 1.550 | |
| 269 | 14 | 13 | 3 | 1.000 | −0.610 | |
| 269 | 9 | 32 | 24 | 1.000 | −0.156 | |
| 269 | 59 | 141 | 104 | 1.747 | ||
| 260 | 16 | 93 | 67 | 1.000 | −0.303 | |
| 260 | 20 | 63 | 37 | 0.157 | 1.012 | |
| 260 | 15 | 40 | 11 | 1.000 | −0.204 | |
| 260 | 52 | 114 | 84 | 0.375 | 0.318 | |
N numbers of sequences, VNP variable nucleotide positions, PIP parsimony informative positions. Z-test p-value: probability of rejecting the null hypothesis of strict-neutrality (dN = dS) in favor of dN > dS. Significant p-values are in bold
Numbers of amino acid sites under positive and negative selection identified in MHC DRB and DQB exon 2 sequences in equids
| Selection | Positive | Negative | ||
|---|---|---|---|---|
| Locus/sub-region | ||||
| 11 | 9 | 7 | 5 | |
| 6 | 7 | 3 | 7 | |
| 1 | 0 | 4 | ||
| 11 | 15 | 15 | 13 | |
Substitutions identified as “Deleterious” and “Possibly damaging/Probably damaging” both by Provean and PolyPhen2, respectively
| Gene | AA position | Substitution | ABS | SAAS |
|---|---|---|---|---|
| 15 | G - > V | NO | NO | |
| 42 | Y - > V | NO | NO | |
| 53 | A - > P | NO | NO | |
| 55 | Y - > K | YES | NO | |
| 72 | T - > K | NO | NO | |
| 23 | D - > F | NO | NO | |
| 25 | Y - > D | YES | NO | |
| 25 | Y - > L | YES | NO | |
| 51 | P - > T | YES | YES | |
| 15 | G - > V | NO | NO | |
| 56 | W - > L | NO | NO | |
| 73 | Y - > V | YES | NO | |
| 24 | R - > T | NO | NO | |
| 43 | R - > L | NO | NO | |
| 60 | K - > D | NO | NO | |
| 24 | R - > S | NO | NO | |
| 25 | Y - > I | YES | YES | |
| 67 | R - > W | YES | NO | |
| 49 | G - > A | YES | NO | |
| 51 | P - > W | YES | YES | |
| 51 | P - > S | YES | YES | |
| 51 | P - > L | YES | YES | |
| 10 | C - > R | NO | NO | |
| 53 | A - > P | NO | NO |
Non-synonymous amino acid substitutions, antigen binding sites and selected amino acid sites within the MHC class II loci analyzed
| Gene | Lenght (AA) | No of variants | No of ABS | No of SAAS |
|---|---|---|---|---|
| 86 | 68 | 36 | 28 | |
| 86 | 42 | 21 | 9 | |
| 86 | 25 | 11 | 1 | |
| 89 | 67 | 27 | 24 | |
| 89 | 43 | 15 | 13 | |
| 89 | 22 | 16 | 4 | |
| 89 | 7 | 1 | 0 |
Position of genes analyzed in the reference genome EquCab3.0, primer sequences and PCR annealing temperatures
| Gene | Position of the gene analyzed (EquCab3.0) | Strand | Amplification details | Forward primer | Reverse primer | Annealing T (°C) | PCR product lenght (bp) |
|---|---|---|---|---|---|---|---|
| TNFA | 32,223,398.. 32,226,182 | + | TNFA_5UTR | CCTTTCAGAAGACCCATCCA | CATCTCGGATCATGCTTTCA | 59.9 | 777 |
| TNFA_1CR | TAAACAGCCAGGCGATTTTCTCCCT | CCTACAACATGGGCTACAGGCTTG | 57.5 | 1144 | |||
| TNFA_2CR | TGCCTTCCAGTCAATCAACCCTCT | GGTCACACATCCCTGCATTCTAGGTT | 61.5 | 1192 | |||
| TNFA_3UTR | TGAGCCCATCTACCTGGGAGGAGT | GCAGAGGTTCAGCGATGTAGCGA | 59 | 868 | |||
| 33,625,487.. 33,631,729 | – | 1st round | GGGACGTGTTTAAGATGGGT | AACCACACACCCTCTCCACTG | 7x(62–0,3/cycle) followed by 60 | 812 | |
| 2nd round | TGACCGGATCCTTCCTGTAC | GCGCTCACCTCGCCGAC | 60 | 303 | |||
| 34,096,675.. 34,108,525 | + | 1st round | TGTCCTTCAGGTGGAGGCAA | TCACACACTGACAACCACACATT | 65 | 793 | |
| 2nd round | TGACCCGATCSTTCCTGTAT | RCGCTCACCTCGCCGAG | 13x(65–0,3/cycle) followed by 61 | 303 | |||
| 34,266,651.. 34,285,281 | + | 1st round | ACTCGCTCACAGTCCTACACAC | GTGCTGGTAGTTCGTGCGTGG | 65 | 532 | |
| 2nd round | TGACCGGATCCTTCCTGTAC | GCGCTCACCTCGCCGAT | 13x(65–0,3/cycle) followed by 61 | 303 | |||
| 33,812,679.. 33,820,407 | – | CCTCTGGGGTAACGTTCCAG | CGGCCTTGCTTTAGGTTTATC | 4x(63–0,5/cycle) followed by 61 | 590 | ||
| 33,941,932.. 33,956,144 | AGGTTTCTCCCACTCAACTGCCTGA | GGACGCGCCCACCTCCCTGTCC | 66 | 522 | |||
| 34,031,398.. 34,037,071 | AGGTTTATCCGATCCAACCGGCTGC | GCCCTCCCAGCTCCGAGACT | 4X(68–0,5/cycle) followed by 66 | 451 | |||
| unknown | unknown | GCTCTCCTGGCGCAGAGACT | ACAGGGCTCTCATTTCCTTGTA | 65.5 | 603 | ||
| GGTCAGAGCGGGAGGCGAGT | GCCCCATAAGCTTCGCAGCA | 64 | 902 |