| Literature DB >> 19234606 |
Tovi Lehmann1, Jen C C Hume, Monica Licht, Christopher S Burns, Kurt Wollenberg, Fred Simard, Jose' M C Ribeiro.
Abstract
BACKGROUND: As pathogens that circumvent the host immune response are favoured by selection, so are host alleles that reduce parasite load. Such evolutionary processes leave their signature on the genes involved. Deciphering modes of selection operating on immune genes might reveal the nature of host-pathogen interactions and factors that govern susceptibility in host populations. Such understanding would have important public health implications. METHODOLOGY/Entities:
Mesh:
Year: 2009 PMID: 19234606 PMCID: PMC2642720 DOI: 10.1371/journal.pone.0004549
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Location, basic structure, and function of selected genes.
| Gene/Cytol | Length/protein | Immune Role (Pathogens) | Malaria response relevance |
|
| 1,723 bp | Regulatory: signal transduction (Gram +ve, −ve bacteria, | Distinguishes |
| 2R∶14D1 | 360 aa (S18/P91/M251) | ||
|
| 2,208 bp | Recognition (Gram −ve bacteria, | Upregulated after malaria infection |
| 2R∶17C | 396 aa (S24/M372) | ||
|
| 712 bp | Effector: antimicrobial protein (Gram +ve, −ve bacteria, Fungi, | Upregulated after malaria infection; unique to culicidae; marginaly lethal to |
| 3R∶30E | 81 aa (S18/P2/M61) | ||
|
| 1,410 bp | Effector: antimicrobial protein (Gram +ve, −ve bacteria, Fungi, | Upregulated after malaria infection; |
| 3L∶41 | 102 aa (S25/P37/M40) |
Cytological location of the gene. AgSP14D1 is mapped in inversion 2Rd. The other three genes are outside polymorphic inversions.
Total sequence length (bp) without deletions; total protein length (aa); length of signal peptide (S), cleaved propetide segment (P) and mature protein (M) in aa.
Population characteristics in relation to exposure to human pathogens.
| Species and Population |
|
|
|
|
|
|
| Date Collected | Jun. 1986 | Jul. 1994 | Jul. 1994 | Aug. 1996 | Jul. 1999 | Aug. 1995 |
| Method | IR | IR-bednet | IR-bednet | IR | IR | HL |
| Sample size | 14 | 13 | 12 | 11 | 14 | 10 |
| Anthropophily | Very low | Moderate | High | High | High | High |
| Local malaria transmission | None | Moderate 400 | High 400 | Low 10 | Moderate 120 | Moderate 260 |
| Local filaria Transmission | None | None | None | Moderate | High | None |
Collection method included IR: Indoor-resting adult mosquitoes collected by pyrethrum-spray or aspiration; IR-bednet: blood fed and blood-seeking females collected by aspiration from net traps hung over the beds of sleeping volunteers; and HL: blood-seeking mosquitoes were collected by human landing catches.
Refers to the mosquito preference to feed exclusively on human blood.
Overall index of the intensity of malaria transmission measured as annual infective bites per person. Estimates reflect total transmission by all vector species because most studies identify An. arabiensis and An. gambiae as An. gambiae sensu lato.
Overall index of the intensity of lymphatic filariasis transmission based on the prevalence of mosquito infected with larvae of Wucheraria bancrofti. None refers to locals where no clinical manifestations in people are known and no infected mosquitoes were found based on personal communication Frederic Simard (Senegal) and William Hawley (W. Kenya).
Figure 1Mature protein (excluding signal peptide and the cleaved propetide segment) distribution within and between species.
Figure 2Polymorphism along the gene using sliding window (window length = 50 bp; sliding interval = 10 bp).
Exons and flanking regions are denoted by broad and narrow hatched rectangles, respectively; introns are denoted by lines. A, Q, and Ga denote An. arabiensis, An. quadriannulatus, and An. gambiae from western Kenya, respectively.
Figure 3Diversity (π) and 95% CI in coding and NC regions in each population.
Diagonal lines mark equal diversity of coding and NC regions. GA, GJ, GN, and GS denote An. gambiae populations from western and eastern Kenya, Nigeria and Senegal, respectively. AA and and QM denote An. arabiensis and An. quadriannulatus respectively.
Nucleotide diversity (π×10−3), number of polymorphic sites (S), recombination parameter between adjacent position (R = 4Nr)×10−3, and ratio of nucleotide diversity in nonsynonymous/synonymous sites (ω = Ka/Ks) in coding regions in each population.
| Pop |
|
|
|
|
| ||||||||||||
| N | π/S | R | ω | N | π/S | R | ω | n | π/S | R | ω | N | π/S | R | ω | ω | |
|
| 12 | 20, 38 | 30 | 0.097 | 11 | 15, 46 | 79 | 0.22 | 10 | 12, 5 | 67 | 0.4 | 10 | 28, 10 | 21 | 0.25 | 0.24 |
|
| 11 | 7, 18 | 42 | 0.16 | 11 | 11, 31 | 20 | 0.16 | 9 | 10, 2 | 81 | u | 11 | 27, 11 | 17 | 0.14 | 0.15 |
|
| 13 | 19, 40 | 8 | 0.18 | 10 | 14, 43 | 346 | 0.17 | 14 | 12, 6 | 994 | 0.31 | 12 | 26, 18 | 292 | 0.22 | 0.22 |
|
| 10 | 15, 32 | 4 | 0.091 | 10 | 12, 40 | 84 | 0.17 | 10 | 15, 6 | 32 | 0.19 | 9 | 20, 13 | 37 | 0.12 | 0.14 |
|
| 43 | 20, 91 | 10 | 0.12*** | 42 | 14, 119 | 148 | 0.17*** | 43 | 12, 13 | 87 | 0.40 | 42 | 27, 27 | 104 | 0.19** | 0.22 |
|
| 11 | 16, 29 | 6 | 0.2** | 13 | 7, 33 | 5 | 0.40 | 11 | 17, 7 | 62 | 0.40 | 13 | 15, 13 | 264 | 0.19* | 0.30 |
|
| 14 | 9, 31 | 645 | 0.046*** | 11 | 9, 35 | u | 0.21** | 10 | 15, 4 | 192 | 0.30 | 14 | 18, 6 | 1019 | 0.00* | 0.14 |
|
| 69 | 24, 135 | 20 | 0.12 | 66 | 20, 168 | 131 | 0.26 | 64 | 15, 18 | 131 | 0.37 | 69 | 31, 39 | 82 | 0.13 | 0.22 |
Populations of An. gambiae are referred by location and whereas, gambiae, arabiensi, and quadrian, represent An. gambiae (pooled), An. arabiensis and An. quadriannulatus, respectively.
Testing equality of nucleotide diversity of synonymous and nonsynonymous sites (ω = 1) in coding regions was performed by using bootstrapping (see Materials & Methods) only at the species level. *, **, ***represent P<0.05, P<0.01, and P<0.001 significance levels and u denotes undefined value.
Average across genes for each population. Species values with different letter are statistically different from each other (P<0.05) as determined by Ryan-Einot-Gabriel-Welsch multiple range test following two way ANOVA of Nonsynonymous/synonymous diversity ratio over gene and species (separate An. gambiae populations were excluded).
Pooled across populations (and sepecies) for each gene. Values with different letter are statistically different from each other as described above (c).
Frequency spectra in coding (C) and non-coding (NC) regions across species at each gene.
| Population | Region |
|
|
|
| ||||||||||||
| f = 0 | f = 1 | f = 2–3 | f = 4–7 | f = 0 | f = 1 | f = 2–3 | f = 4–7 | f = 0 | f = 1 | f = 2–3 | f = 4–7 | f = 0 | f = 1 | f = 2–3 | f = 4–7 | ||
| A. gambiae | Coding | 96.7 | 0** | 1.3* | 2 | 97 | 2 | 0.7 | 0.7*** | 97.9 | 1.7* | 0.4 | na | 96.1 | 3 | 0.6* | 0.3 |
| West Kenya | NoCod | 91.4 | 3.1 | 4.2 | 1.3 | 90 | 3.4 | 1.7 | 5.2*** | 91.9 | 6.7 | 1.4 | na | 92.2 | 4.7 | 2.5** | 0.6 |
|
| Coding | 95.8 | 2 | 1.3 | 1 | 97 | 1.7* | 0.7** | 0.3 | 97.1 | 1.7 | 0.8 | 0.4 | 97.2 | 2.1 | 0.5 | 0.2 |
| NoCod | 94.3 | 3.1 | 1.3 | 1.3 | 90 | 4.8** | 3.8*** | 1.2 | 94 | 0.7 | 3.4 | 1.9 | 95.9 | 2.4 | 1.7* | 0 | |
|
| Coding | 98 | 1.3* | 0.3 | 0.3 | 97 | 1.9 | 0.4 | 0.6 | 98.4 | 1.2 | 0* | 0.4 | 97.1 | 2.1 | 0.7 | 0.2 |
| NoCod | 91.6 | 4.6 | 1.9 | 1.9 | 94 | 4.1* | 1.3 | 1.1 | 93.4 | 2.3 | 3.5 | 0.5 | 95.4 | 3.4 | 0.7 | 0.4 | |
|
| Coding | 96.1 | 1.4*** | 1.3** | 1.3 | 97 | 2*** | 0.6*** | 0.4*** | 97.9 | 1.3** | 0.5*** | 0.3 | 96.8 | 2.5 | 0.5*** | 0.2* |
| NoCod | 92 | 3.9* | 2.6 | 1.5 | 91.4* | 4.1*** | 2.4*** | 2.1*** | 93.5 | 3.6* | 2.3* | 0.7 | 94.5 | 3.1 | 1.8*** | 0.6* | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Frequency spectra classes including invariant positions (f = 0), low polymorphism represented by singletons (f = 1), moderately polymorphic positions with the rare nucleotide observed twice or three times (f = 2–3), and highly polymorphic positions with the rare nucleotide observed four or more times (f = 4–7). The relative distribution of each class is expressed as percentages. Excess and deficit of observed vs. expected frequency is marked by red and blue respectively in cells with significant deviations based on 1 df χ2 test (*, **, ***, represent P<0.05, 0.01, and 0.001, respectively). The western Kenya population of A. gambiae represents this species (heterogeneity χ2 test showed no evidence for heterogeneity among the four populations). All contingency tables for each gene and species were significant (P<0.01).
McDonald Kreitman test (see text for details).
| Gene G+C | Pop | Silent: (Fixed/Polymophic) | Nonsynonymous (Fixed/Polymorphic) | P |
|
| A-Q | 0.075 (10/133) | 0.095 (2/21) | Ns |
|
| Ga-Q | 0.068 (10/146) | 0.100 (2/20) | Ns |
|
| A-Q | 0.351 (39/111) | 0.028 (1/36) |
|
|
| Ga-Q | 0.191 (31/162) | 0.00 (0/36) |
|
|
| A-Q | 0.056 (3/54) | 0.00 (0/3) | Ns |
|
| Ga-Q | 0.016 (1/61) | 0.00 (0/2) | Ns |
|
| A-Q | 0.082 (12/147) | 0.250 (1/3) | Ns |
|
| Ga-Q | 0.018 (3/168) | 0.00 (0/4) | Ns |
G+C content (over species) in the coding region/whole gene.
The test could not be performed between An. gambiae and An. arabiensis because there were no fixed differences between them across all genes (see text for details).
Not significant (P>0.05).
Figure 4Divergence between species measured by FST in functional regions of each gene.
The 95% CI were estimated by bootstrapping over positions (1000 bootstrap replications) provided that there were ten or more variable positions in that region across the pair of populations compared. An. gambiae is represented by its western Kenya population (GA). Defensin, gambicin, GNBP, and SP14D1 are denoted by Df, Gm BP, and SP, respectively. NC denotes noncoding regions, C denotes coding regions, F denotes flanking regions, I denotes intronic region, M denotes mature protein, and SC, denotes signal and cleaved propetdide segment.
Figure 5Differentiation between An. gambiae populations measured by FST in different functional regions of each gene.
The 95% CI of each value were estimated by bootstrapping over positions (1000 bootstrap replications) provided that there were five or more variable positions in that gene segment across the pair of populations compared. The number of variable positions is shown if it is below 10. Horizontal axis legend is the same as in Figure 4.
Positive selection on single codon level based on PAML (see text for details).
| Gene | Models | ωS | p(ωS) | −2ΔLL | P | aa |
|
| M1 vs. M2 | 1 | 9.1 | 0 | Ns | None |
|
| M7 vs. M8 | 1 | 3.8 | 5.9 | Ns | None |
|
| M1 vs. M2 | 1 | 1.1 | 0 | Ns | None |
|
| M7 vs. M8 | 1.97 | 2.4 | 6.6 | 0.037 | 206ns; 169ns |
|
| M1 vs. M2 | 12.1 | 1.3 | 6.4 | 0.041 | 72** |
|
| M7 vs. M8 | 11.1 | 1.3 | 11.8 | 0.001 | 72** |
|
| M1 vs. M2 | 1.4 | 0 | 1.1 | Ns | None |
|
| M7 vs. M8 | 2.2 | 2.4 | 0.9 | Ns | None |
GNBP alignment was 171 aa long and included eight species; SP14D1alignment was 246 aa long and included six species; Gambicin alignment was 81 aa long and included nine species; Defensin alignment was 101 aa long and included seven species (see Materials and Methods for the species listing for each gene).
Likelihood ratio tests (with 2 df) were used to determine the significance of finding ω>1 over all codons by comparing selection models (M2 and M8) that allowed for ω>1 with neutral (M1 and M7) models that allowed only ω≤1.
Estimate of the highest ω value for any codon.
The proportion of codons with the highest w estimate.
Positions of the amino acids with ω>1 and their significant value estimated by BEB test in PAML.