| Literature DB >> 18840262 |
Darren J Obbard1, Deborah M Callister, Francis M Jiggins, Dinesh C Soares, Guiyun Yan, Tom J Little.
Abstract
BACKGROUND: Host-parasite coevolution can result in balancing selection, which maintains genetic variation in the susceptibility of hosts to parasites. It has been suggested that variation in a thioester-containing protein called TEP1 (AGAP010815) may alter the ability of Anopheles mosquitoes to transmit Plasmodium parasites, and high divergence between alleles of this gene suggests the possible action of long-term balancing selection. We studied whether TEP1 is a case of an ancient balanced polymorphism in an animal immune system.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18840262 PMCID: PMC2576239 DOI: 10.1186/1471-2148-8-274
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1Evidence for a chimeric origin of . Synonymous site divergence (K) between TEP5, TEP6, and TEP1 s and r, calculated for 30 consecutive windows of coding DNA sequence. (A) Divergence between TEP5 and TEP6. (B) Divergence between TEP1 (s in blue, r in red) and TEP6. (C) Divergence between TEP1 and TEP5. (D) Divergence between TEP1s and TEP1r. In (D), note that the region of high divergence between TEP1s and TEP1r covers sites 2100–3700 and the TED domain is shown as a black bar, and that in (B and C) TEP1 is most similar to TEP6 at sites 100–1500, but is most similar to TEP5 at sites 2000–3200 bp; specifically in the region 100–1500 the divergence between TEP6 and TEP1 is K= 0.87 (95% bounds 0.70–1.10), but in the region 2000–3200 this divergence drops to K= 0.37 (0.30–0.46) which differ significantly (p < 0.001 [37]). Note also that within the divergent region, TEP1r is consistently more similar to TEP6 than is TEP1s (red line vs. blue line); for example the divergence between TEP1r and TEP6 between sites 2250 and 3250 is K= 0.15 (0.10–0.20) but between TEP1s and TEP6 is K= 0.27 (0.20–0.35). Regions in which the MaxChi test suggests there is significant evidence (p < 0.05) for recombination breakpoints between TEP1 and TEP5 or TEP6 are shown as grey bars. The graph has been truncated when Ks>2
Figure 2Divergence between . The proportion of sites that differ between the TEP1s and TEP1r alleles are plotted against position for all silent sites, synonymous sites, and non-synonymous sites). Genomic sequence spanned by the macroglobulin domains 1–8 (MG), a linker (LNK), a β-sheet and the thioester-containing domain (TED) are marked below the x-axis. Also marked are exons (solid black bars) and a region of putative gene conversion from TEP6 (grey bar, see main text for details). For zero to 5 kbp, 14 TEP1s and 15 TEP1r haplotypes were used; for 5 kbp to end, 6 TEP1s and 5 TEP1r haplotypes were used. Moving windows were 100-site for silent and synonymous sites, 300-site for non-synonymous sites and gaps in Kcorrespond to regions with large indels and/or no discernable alignment between TEP1s and TEP1r haplotypes.
DNA sequence polymorphism summary statistics for TEP1
| nb | lengthc/bp |
|
| πAf | πSg | θAh | θsi | Taj | Fu&Li | Walls Q | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 35 | 661 | 47 | 32 | 3.87 | 8.92 | 2.07* | 0.89 | 0.68*** | |||
| | 11 | 828 | 5 | 7 | 0.31 | 0.91 | 2.67 | 1.27 | -1.16 | -0.71 | 0.00 |
| | 24 | 839 | 6 | 6 | 0.20 | 0.44 | 0.25 | 0.85 | -1.45 | -1.71 | 0.42 |
| Mbita (alla) | 46 | 736 | 44 | 38 | 3.81 | 10.58 | 2.93** | 1.90** | 0.76*** | ||
| | 34 | 736 | 2 | 8 | 0.15 | 1.70 | 0.09 | 1.19 | 1.5 | 1.40** | 0.2 |
| | 12 | 840 | 2 | 1 | 0.15 | 0.09 | 0.10 | 0.17 | -1.14 | -1.33 | 0.00 |
| Cameroon (all) | 24 | 840 | 11 | 9 | 0.78 | 1.85 | 0.45 | 1.26 | 1.47 | 1.37 | 0.90*** |
| | 9 | 840 | 2 | 6 | 0.17 | 1.58 | 0.11 | 1.15 | 1.57 | 1.37 | 1.00*** |
| | 15 | 840 | 0 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | .. | .. | .. |
| Burkina Faso (all r) | 12 | 840 | 0 | 2 | 0.00 | 0.34 | 0.00 | 0.35 | -0.05 | -0.37 | 0.00 |
| | 7 | 2292 | 8 | 14 | 0.22 | 1.15 | 0.21 | 1.19 | 0.18 | 0.05 | 0.16 |
| | 20 | 2316 | 8 | 15 | 0.06 | 0.48 | -1.61 | -2.23 | 0.25 | ||
| Mbita | |||||||||||
| | 19 | 2299 | 10 | 25 | 0.17 | 1.81 | 0.18 | 1.48 | 0.80 | 1.19 | 0.14 |
| | 10 | 2320 | 4 | 7 | 0.07 | 0.32 | 0.09 | 0.51 | -1.57 | -1.63 | 0.18 |
a – samples were not at random with respect to allelic class, and thus do not exactly reflect natural allele frequencies
b – number of haplotypes sequenced
c – analyzed length of sequence
d – non-synonymous segregating sites
e – synonymous segregating sites
f – average pairwise genetic diversity at non-synonymous sites
g – average pairwise genetic diversity at synonymous sites
h – Watterson's estimate of 4 Nμ per site for non-synonymous sites
i – Watterson's estimate of 4 Nμ per site for synonymous sites
j – Tajimas D statistic and Fu & Li's D statistic calculated for synonymous sites only
k – Walls Q statistic
Haplotype frequency tests
| θ | Haplotype configuration | |||||
|---|---|---|---|---|---|---|
| Arabiensisa | 19 | 79 | 22.6 | 7* | 7* | (4,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,..)*** |
| | 9 | 10 | 3.7 | 5 | 3 | (3,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,..) |
| | 12 | 8 | 2.6 | 4 | 6 | (2,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,..) |
| Mbita | 17 | 85 | 25.1 | 3*** | 9*** | (0,0,0,2,0,0,0,0,1,0,0,0,0,0,0,0,0,..)*** |
| | 24 | 10 | 2.7 | 4 | 16 | (0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,..) |
| | 10 | 2 | 0.7 | 2 | 6 | (0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,..) |
| Cameroon | 24 | 20 | 5.4 | 4* | 15* | (0,1,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,..)*** |
| | 9 | 8 | 2.9 | 3 | 4 | (0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,..) |
| | 15 | 0 | 0.0 | 1b | 15b | (0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,..)b |
n is the number of haplotypes sequenced, S the number of segregating sites, θ Watterson's estimate of 4 Nμ per gene, K the number of haplotypes, M the frequency of the commonest haplotype. The 'Haplotype Configuration' is a vector (n1, n2, n3, ..., ni) giving the number of distinct haplotypes that appeared i times in the sample (see Innan et al 2005). For example, the first haplotype configuration (4,1,0,0,0,1,1,...) has four haplotypes that appeared once each, one that appear twice, none that appeared three, four or five times, one that appeared six times and one that appeared seven times. aSampling was not at random with respect to s/r state, bsignificance not tested. Significance assessed through coalescent simulation is indicated as *p < 0.05, ***p < 0.001.
McDonald and Kreitman tests of whether natural selection has driven the divergence of the TEP1s and TEP1r protein sequences.
| Region analysed | Codonsb | NIc | ||||||
|---|---|---|---|---|---|---|---|---|
| Whole CDS | 29 | 1287 | 98 | 87 | 76 | 36 | 0.53 | 0.015 |
| 5' end (low divergence) | 29 | 417 | 1 | 29 | 2 | 20 | 0.345 | >0.5 |
| 3' end (high Divergence) | 29 | 870 | 97 | 58 | 74 | 16 | 0.362 | 0.001 |
| TED | 85 | 272 | 30 | 23 | 42 | 12 | 0.373 | 0.024 |
| remainder | 29 | 598 | 65 | 43 | 30 | 10 | 0.500 | 0.123 |
Sequences of TEP1s and TEP1r were pooled across all populations and both species (An. gambiae and An. Arabiensis). Tests were conducted separately for regions that show high or low divergence between alleles, as well as separately for the TED region. The test is accomplished by comparing divergence and polymorphism at synonymous sites (Ps and Ds, respectively) to divergence and polymorphism at nonsynonymous sites (Dn and Pn, respectively)
a – Total number of haplotypes
b – Analysed codons
c – Neutrality Index
d – p-value 2-tailed fishers exact test
Test for a recent change in allelic frequency based on the distribution of segregating sites between allelic classes.
| Sample | Allelic class | Class Counta | Sequenced Haplotypesb | Observed Segregating sitesc |
|---|---|---|---|---|
| S & R | 11 s | 33 r | 11 s | 24 r | 12 | 12# | |
| S & R | 32 s | 13 r | 27 s | 12 r | 10 | 3 | |
| S & R conversion S & R | 9 s | 15 sr | 9 s | 15 sr | 8 | 0 * | |
| S & R | 11 s | 33 r | 7 s | 20 r | 22 | 23* | |
| S & R | 32 s | 13 r | 19 s | 10 r | 35 | 11ns |
a The observed frequency of the TEP1s and TEP1r classes; b the number of haplotypes sequenced from each allelic class, c the total number of synonymous and non-synonymous segregating sites observed in that allelic class. Significance was assessed using simulations under a neutral model in which the frequency of each allelic class is constant, and proportional to the observed frequency (See main text, and Stahl et al 1999)
p > 0.05; # p < 0.1; * p < 0.05,