| Literature DB >> 22587557 |
Irem Sepil1, Hooman K Moghadam, Elise Huchard, Ben C Sheldon.
Abstract
BACKGROUND: The critical role of Major Histocompatibility Complex (Mhc) genes in disease resistance and their highly polymorphic nature make them exceptional candidates for studies investigating genetic effects on survival, mate choice and conservation. Species that harbor many Mhc loci and high allelic diversity are particularly intriguing as they are potentially under strong selection and studies of such species provide valuable information as to the mechanisms maintaining Mhc diversity. However comprehensive genotyping of complex multilocus systems has been a major challenge to date with the result that little is known about the consequences of this complexity in terms of fitness effects and disease resistance.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22587557 PMCID: PMC3483247 DOI: 10.1186/1471-2148-12-68
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1Schematic overview of the location of primers used in the study.
Primers used for amplifying class I exon 3 in great tits
| HN34 | CCATGGGTCTCTGTGGGTA | gDNA | [ |
| HN45 | CCATGGAATTCCCACAGGAA | gDNA | [ |
| T3F | TCCACACCATACAGCGAGTT | gDNA, cDNA | present study |
| GBT3R | TTTACGCTCCAGCTCTTTCC | gDNA, cDNA | present study |
| GTJSF | CGGCTGTGACCTCCTGTCC | gDNA | present study |
| GTJSR | ATTCYGGGCAGATGTGCTT | gDNA | present study |
| S2F | CACACCTCACAGTGGCTTTA | gDNA, cDNA | present study |
| S2R | CAGCTCTTTCTGCCCATATC | gDNA, cDNA | present study |
| L2F | CAATGGCTTTATGGCTGTGAC | gDNA, cDNA | present study |
| L2R | CCAGCTCTTTCCTCCCGTAT | gDNA, cDNA | present study |
| JSG2F | AGTTTCCGGCTGTGACCTC | gDNA | present study |
| JSG2R | GCCCGTATTCGACGTATTTC | gDNA | present study |
| JSG3F | CATACAGTGGCTTTATGGCTGT | gDNA | present study |
| JSG3R | CCCGTATCCGGTGTATTTCC | gDNA | present study |
| MHCV-F | CCCAGGTCTCCACACCATAC | gDNA | present study |
| MHCV-R | AGCTCTTTCCGCCCGTATT | gDNA | present study |
| MHCD-F | TTMYGGCTGTGACCTCCTG | gDNA, cDNA | present study |
| MHCD-R | TTGCGCTYCAGCTCTTTC | gDNA, cDNA | present study |
The primers were developed in the order from top to bottom (with the exception of HN34-45). MHCD-F and MHCD-R are the final primers designed for 454 pyrosequencing. F indicates forward and R indicates reverse primers. gDNA indicates amplification of genomic DNA and cDNA indicates amplification of cDNA libraries with the specified primer.
Rationale for each step of the variant validation procedure
| 1 Remove variants that don’t match the expected allele size (212, 215, 221 bp) | Variants that have deletions/substitutions shifting the reading frame probably result from sequencing errors (Assumption 1) |
| 2 Remove variants that have less than four copies in the whole dataset | Variants represented once in an individual probably result from sequencing errors (Assumption 4) and variants represented only in one individual probably result from PCR errors (Assumption 5) |
| 3 Remove individuals with less than 200 reads | A low number of reads per individual might lead to incomplete genotyping, thus the results would be unreliable (Assumption 6). The minimum number of reads required per individual is estimated using the probability distribution plotted by Galan et al. [ |
| 4 Remove variants that have MPAF lower than 0.01 | Variants represented rarely in the whole dataset probably result from sequencing errors (Assumption 2) |
| Remove variants that have MPAF between 0.01 - 0.025 if they can be explained as a chimera or a single basepair mutation | Variants represented rarely in the whole dataset but more frequently in per individual bases probably result from PCR errors if the parental sequences are also present (Assumption 3) |
| 5 Remove variants that have a single copy per individual | Variants represented once in an individual probably result from sequencing errors (Assumption 4) |
| Remove variants that have less than five copies per individual if they can be explained as a chimera or a single basepair mutation | Variants represented two, three or four times within an individual probably result from PCR errors if the parental sequences are present (Assumption 3). The threshold for PCR errors is estimated from the distribution of artefacts in the previous step |
Figure 2Flow chart of the stepwise variant validation procedure.
Repeatability measures for each duplicate pair after 3rd, 4th and 5th step of the variant validation procedure and following supertype classification
| | | | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 528 - 668 | 31 | 28 | 28 | 9 | 120 | 10 | 0 | 0 | 0.21 | 0.74 | 1 | 1 | |
| 387 - 418 | 29 | 29 | 28 | 11 | 71 | 14 | 0 | 0 | 0.29 | 0.67 | 1 | 1 | |
| 322 - 501 | 22 | 22 | 22 | 13 | 54 | 6 | 0 | 0 | 0.29 | 0.79 | 1 | 1 | |
| 471 - 507 | 26 | 25 | 22 | 9 | 71 | 7 | 0 | 0 | 0.27 | 0.78 | 1 | 1 | |
| 267 - 484 | 21 | 21 | 21 | 8 | 40 | 6 | 0 | 0 | 0.34 | 0.78 | 1 | 1 | |
| 276 - 456 | 21 | 20 | 20 | 10 | 52 | 4 | 0 | 0 | 0.29 | 0.83 | 1 | 1 | |
| 288 - 319 | 23 | 23 | 22 | 10 | 32 | 4 | 1 | 0 | 0.42 | 0.85 | 0.96 | 1 | |
| 208 - 455 | 21 | 21 | 20 | 10 | 50 | 13 | 2 | 0 | 0.3 | 0.62 | 0.91 | 1 | |
| 272 - 289 | 27 | 26 | 23 | 10 | 34 | 3 | 3 | 1 | 0.44 | 0.89 | 0.88 | 0.91 | |
| 240 - 471 | 21 | 21 | 18 | 10 | 66 | 20 | 3 | 1 | 0.24 | 0.51 | 0.86 | 0.91 | |
| 230 - 236 | 24 | 24 | 21 | 11 | 26 | 6 | 4 | 1 | 0.48 | 0.8 | 0.84 | 0.92 | |
| 240 - 570 | 20 | 20 | 19 | 10 | 50 | 22 | 4 | 2 | 0.29 | 0.48 | 0.83 | 0.83 | |
ST – supertypes; read no - the number of reads per sample. Verified alleles/supertypes stand for the number of identical alleles/supertypes between the duplicate samples. Unverified alleles/supertypes represent the number of alleles/supertypes that were found in one of the samples and indicate non-identical genotypes. Repeatability was calculated as the ratio of verified alleles/supertypes to the total number of alleles/supertypes for each duplicate. A repeatability value of 1 signifies identical genotypes. For a duplicate each measurement was calculated four times - after the 3rd, 4th and 5th steps of the variant validation procedure, and following supertype classification. The average values for each measure are provided in the bottom row.
Figure 3Variation in allele number per individual with increasing read number.
Figure 4Phylogenetic tree of great tit class I exon 3 sequences. The tree was constructed using Neighbour-Joining method and Tamura-Nei model. The sequences [GenBank: JQ034624 - JQ035485] differed at the length of exon 3 being either 221 bp, 215 bp or 212 bp. The tree was rooted with a chicken (Gallus gallus) Mhc class I sequence [Genbank: AY234770] and the reliability of the branches was tested with 1000 bootstrap replicates. Bootstrap supports for the major clades are indicated with numbers. Pseudogene alleles bearing stop codons are marked in red and putatively non-functional alleles (Group 1) are marked in blue. The rest of the alleles represent Group 2 (except the chicken Mhc class I sequence). The chicken Mhc allele is marked in grey. The sequences that are 221-basepair long are indicated by purple squares and the sequences that are 215-basepair long are indicated by green squares. The 212-basepair sequences are not marked.
Results of the likelihood ratio tests for different models of codon evolution and estimated parameter values
| | | ||
| M0 – one ω | −2975.2 | 454.8 | ω = 0.53 |
| M7 – nearly neutral with β | −2788.9 | 100.2 | |
| M8 – positive selection with β (ω0 ≤ 1, ω1 >1) | −2737.8 | Best | |
| | | ||
| M0 – one ω | −1154.5 | 68.5 | ω = 0.74 |
| M7 – nearly neutral with β | −1124 | 25.5 | |
| M8 – positive selection with β (ω0 ≤ 1, ω1 >1) | −1110.2 | Best | |
ω - dN/dS; nearly neutral with β - for all sites dN/dS ≤ 1 and the beta distribution approximates ω variation; positive selection - a proportion of sites evolves with dN/dS > 1; lnL - Log-likelihood value; ΔAIC - the difference between the value of the AIC of a given model and the best model; p0 - proportion of sites with dN/dS ≤ 1, p1 - proportion of positively selected sites (dN/dS > 1), ω1 - estimated value of ω for sites under positive selection.
Figure 5Aminoacid variation plot for (a) Group 1 and (b) Group 2 alleles. Chicken antigen binding sites (ABS) are indicated with the letter ‘g’, whereas positively selected sites (PSS) are indicated with black triangles. In the Group 1 plot there are no aminoacids between the positions 24–26, because these alleles were 212-basepair long; hence had a nine-basepair deletion at this location.
Results of codon based Z- test of selection for group 1 and group 2
| | | | | ||
| 18 | 0.171 ± 0.135 | 0.016 ± 0.03 | 1.244 | 0.108 | |
| 194 | 0.067 ± 0.014 | 0.112 ± 0.034 | −1.218 | 1 | |
| 9 | 0.152 ± 0.101 | 0.038 ± 0.039 | 1.493 | 0.069 | |
| 203 | 0.065 ± 0.014 | 0.115 ± 0.036 | −1.254 | 1 | |
| 212 | 0.075 ± 0.015 | 0.103 ± 0.032 | −0.788 | 1 | |
| | | | | ||
| | | | | ||
| 18 | 0.578 ± 0.132 | 0.134 ± 0.068 | 3.694 | ||
| 203 | 0.054 ± 0.011 | 0.128 ± 0.032 | −2.106 | 1 | |
| 27 | 0.492 ± 0.092 | 0.069 ± 0.051 | 5.801 | ||
| 194 | 0.043 ± 0.009 | 0.136 ± 0.034 | −2.686 | 1 | |
| 221 | 0.085 ± 0.017 | 0.126 ± 0.029 | −1.158 | 1 | |
The rates of non-synonymous mutations (dN) and synonymous mutations (dS) were computed using Nei-Gojobari method and Jukes-Cantor correction. The standard errors were obtained through 1000 bootstraps replicates. Z test of selection estimated dN-dS (indicated as Z) and computed a1-tailed test to determine whether dN > dS (indicated as P). n is the number of nucleotides representing antigen binding sites (ABS), non antigen binding sites (Non-ABS), positively selected sites (PSS), non positively selected sites (Non-PSS) and all sites in Group 1 and Group 2. ABS were detected by superimposing the chicken Mhc class I sequences and assuming concordance. Significant p-values are bold.
Figure 6Plot of (a) allele number per supertype and (b) frequency distribution of supertypes.
Figure 7DAPC scatterplot of the 17 supertypes. 12 PCs and three discriminant functions (dimensions) were retained during analyses, to describe the relationship between the clusters. The scatterplot show only the first two PCs (d = 2) of the DAPC of Mhc supertypes. The bottom right graph illustrates the variation explained by the 12 PCs. Each allele is represented as a dot and the supertypes as ellipses.
Summary of genotyping results and estimation of minimum loci number
| Number of alleles per individual | 12 - 37 | 23.76 | 3.98 | 19 |
| Number of functional alleles per individual | 9 - 32 | 19.48 | 3.74 | 16 |
| Number of supertypes per individual | 6 - 16 | 10.36 | 1.62 | - |
s.d.- standard deviation. Minimum loci number was estimated using the maximum allele number for each class.