| Literature DB >> 35754720 |
Luke Ambrose1, Iva Popovic1, James Hereward1, Daniel Ortiz-Barrientos1, Nigel W Beebe1,2.
Abstract
We investigate the genetic basis of anthropophily (human host use) in a non-model mosquito species group, the Anopheles farauti complex from the southwest Pacific. This complex has experienced multiple transitions from anthropophily to zoophily, contrasting with well-studied systems (the global species Aedes aegypti and the African Anopheles gambiae complex) that have evolved to be specialist anthropophiles. By performing tests of selection and assessing evolutionary patterns for >200 olfactory genes from nine genomes, we identify several candidate genes associated with differences in anthropophily in this complex. Based on evolutionary patterns (phylogenetic relationships, fixed amino acid differences, and structural differences) as well as results from selection analyses, we identify numerous genes that are likely to play an important role in mosquitoes' ability to detect humans as hosts. Our findings contribute to the understanding of the evolution of insect olfactory gene families and mosquito host preference as well as having potential applied outcomes.Entities:
Keywords: Entomology; Evolutionary biology; Phylogenetics; Zoology
Year: 2022 PMID: 35754720 PMCID: PMC9213756 DOI: 10.1016/j.isci.2022.104521
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Figure 1Mitogenome and consensus nuclear genome phylogenies
(A) Neighbor-joining phylogeny for samples used in this study based on whole mitogenome data. Support values are based on 1000 bootstrap replicates.
(B) Consensus neighbor-joining phylogeny for samples used in this study based on 164 041 SNPs from whole-genome sequence data. Support values are based on 1000 bootstrap replicates. Host preference is indicated by branch color as indicated in the key and the top-right panel is a map of the region, providing geographical context.
Sample information on the individuals sequenced in this study
| Sample ID | Species | Location | Host preference | Collection | Coverage | Lib type |
|---|---|---|---|---|---|---|
| Queensland (Aus) | Opportunist (A) | Adult (HLC) | 60 | 250bp PE | ||
| Guadalcanal (SI) | Animal (Z) | larval | 64 | 250bp PE | ||
| Guadalcanal (SI) | Animal (Z) | larval | 62 | 250bp PE | ||
| Bougainville (SI) | Animal (Z) | larval | 49 | 250bp PE | ||
| Western Province (SI) | Opportunist (A) | Adult (HLC) | 30 | 100bp PE | ||
| Queensland (Aus) | Opportunist (A) | Adult (CDC) | 60 | 250bp PE | ||
| Northern New Guinea | Opportunist (A) | Adult (HLC) | 39 | 100bp PE | ||
| Eastern New Guinea | Opportunist (A) | Adult (HLC) | 41 | 100bp PE | ||
| New Guinea | Opportunist (A) | Adult (HLC) | 87 | 250bp PE |
Sample ID = name given to the sample sequenced; Species = species that the sample belongs to; Location = geographic location that the sample was collected; Host preference = host preference of the population from which the sample was taken: Opportunist (A) = will readily feed on humans and other mammals), Animal (Z) = only feeds on animals other than humans (unknown hosts); Collection = indicates whether the sample was collected as an adult or larva and whether adults were collected in human landing catches (HLCs) or with CDC traps; Coverage = estimated average genome coverage of mapped reads; Lib type = Type of paired-end library used to generate sequence data (either 250bp paired-end reads or 100bp paired-end reads).
Figure 2kA/kS by gene family, comparison class, and zoophilic lineage
Boxplots are shown for each gene family by comparison class and zoophilic lineage, with standard errors around means presented; Z = zoophilic, A = anthropophilic. Individual comparisons with are shown on plots. Summary statistics for each gene family/comparison class are as follows: A vs A (0.259, 0.205), A vs NSI hinesorum (0.229, 0.179), A vs SSI hinesorum (0.239, 0.180), A vs irenicus (0.261, 0.213); A vs A (0.233, 0.211), A vs NSI hinesorum (0.217, 0.199), A vs SSI hinesorum (0.210, 0.185), A vs irenicus (0.235, 0.198); (mean, median) A vs A (0.260, 0.180), A vs NSI hinesorum (0.199, 0.124), A vs SSI hinesorum (0.232, 0.126), A vs irenicus (0.299, 0.206); A vs A (0.214, 0.186), A vs NSI hinesorum (0.193, 0.166), A vs SSI hinesorum (0.193, 0.150), A vs irenicus (0.223, 0.196).
Figure 3Expected phylogenetic relationship in candidate genes
Blue branches represent zoophilic lineages while red branches represent anthropophilic lineages. hin = A. hinesorum; hin SI = A. hinesorum from the Solomon Archipelago; far = A. farauti; iren = A. irenicus. Grs = Gustatory receptors; Irs = Ionotropic receptors; Ors = Olfactory receptors; Obps = Olfactory binding proteins. Numbers to the right of the figure show the proportion of genes in which each relationship was observed.
(A) The strongest hypothetical phylogenetic signal of a gene being involved in differences in host preference. Observing this relationship would suggest that a gene has introgressed from one zoophilic lineage to the others.
(B) The strongest phylogenetic relationship observed in this study suggestive of a gene being a potential candidate. For genes showing this relationship, zoophilic (Z)A. hinesorum from the Solomon Archipelago form an exclusive clade and anthropophilic (A)A. hinesorum fall within a clade containing other anthropophilic A. hinesorum (from Queensland and/or New Guinea).
(C) A weaker phylogenetic relationship potentially suggestive of a gene being a candidate. For genes showing this relationship, all A. hinesorum samples from the Solomon Islands form a monophyletic clade with the two zoophilic individuals being most closely related.
Evolutionary patterns for candidate genes
| GeneID | Fixed aa | Sub/site | Phyl | > | < | Exons |
|---|---|---|---|---|---|---|
| – | – | – | – | |||
| – | – | |||||
| – | – | – | ||||
| – | – | – | – | |||
| – | – | – | ||||
| – | – | – | ||||
| – | – | – | – | – | ||
| – | – | – | ||||
| – | – | – | – | |||
| – | – | – | ||||
| – | – | – | – | – | ||
| – | – | – | – | – | ||
| – | – | – | – | – | ||
| – | – | – | – | |||
| – | – | – | – | – | ||
| – | – | – | ||||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | |||||
| – | – | |||||
| – | – | |||||
| – | – | – | ||||
| – | – | – | – | – | ||
| – | – | |||||
| – | – | – | ||||
| – | – | – | ||||
| – | – | – | – | |||
| – | – | – | ||||
| – | – | – | ||||
| – | – | – | ||||
| – | – | – | – | – | ||
| – | – | |||||
| – | – | – | ||||
| – | – | – | – | |||
| – | – | – | ||||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | – | ||||
| – | – | – | – | |||
| – | – | – | ||||
| – | – | – | – | – | ||
| – | – | – | – | |||
| – | – | |||||
| – | – | |||||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | – | – | – | ||
| – | ||||||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | |||||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | – | – | – | ||
| – | – | |||||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | – | ||||
| – | – | – | ||||
| – | – | – | – | – | ||
| – | – | – | ||||
| – | – | – | ||||
| – | – | – | ||||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | |||||
| – | – | |||||
| – | – | – | – | – | ||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | – | – | – | ||
| – | – | – | ||||
| – | – | – | – | |||
| – | – | – | – | – | ||
| – | – | – | – | |||
| – | – | – | ||||
| – | – | – | ||||
| – | – | – | ||||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | – | – | – | ||
| – | – | – | ||||
| – | – | – | ||||
| – | – | |||||
| – | – | – | ||||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | – | ||||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | – | – | – | ||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – | |||||
| – | – | |||||
| – | – | |||||
| – | – | – | – | – | ||
| – | – | – | – | – | ||
| – | – | |||||
| – | – | |||||
| – | – | |||||
| – | – | |||||
| – | – | – | ||||
| – | – | – | – | – | ||
| – | – | – | – | |||
| – | – | – | ||||
| – | – | – | – | |||
| – | – | – | – | |||
| – | – |
iren = A. irenicus; Z_hin = zoophilic A. hinesorum; nSI_hin = A. hinesorum from northern Solomon Archipelago; sSI_hin = A. hinesorum from southern Solomon Archipelago. GeneID = Gene orthologue from Anopheles gambiae;Fixed aa = genes showing evidence of fixed amino acid differences between anthropophilic and zoophilic lineages; Sub/site = amino acid substitution (anthropophilic to zoophilic) and position in amino acid alignment when substitution has occurred; Phyl = candidates showing phylogenetic patterns B or C, as shown in Figure 3; > = genes showing higher kA/kS ratios in zoophilic/anthropophilic (Z/A) comparisons relative to anthropophilic/anthropophilic (A/A), based on differences in SE or IQR (>) or SE and IQR (>>); < = genes showing lower kA/kS ratios in zoophilic/anthropophilic (Z/A) comparisons relative to anthropophilic/anthropophilic (A/A), based on differences in SE or IQR (>) or SE and IQR (>>); Exons = genes with evidence of insertions or deletions in coding regions. See also Figures S4–S7.
Figure 4Ir8a – patterns of molecular evolution
(A) Codons in Ir8a in with fixed amino acid differences between anthropophilic and zoophilic populations/species. Numbers above the alignment represent the amino acid position in the protein alignment.
(B) Neighbor-joining phylogeny (Jukes Cantor) for Ir8a. Support values are based on 1000 bootstrap replicates.
(C) Box-pots comparing pairwise kA/kS ratios between same (A vs A) and different (A vs Z) phenotype comparisons for Ir8a, including A vs Z comparisons for each separate zoophilic individual sequenced. A = anthropophilic; Z = zoophilic. Median for each group is shown by black line, standard errors are shown by colored lines and interquartile ranges are shown by shaded areas.
Results from tests of selection performed using HyPhy (Kosakovsky Pond et al., 2020)
| GeneID | aBSREL, p | ω, %sites | BUSTED, p | RELAX, p, LR, K |
|---|---|---|---|---|
| – | Int, 0.022, 5.21, | |||
| 6840, 0.83 | – | |||
| 228, 0.71 | – | |||
| 554, 3.7 | Int, 0.044, 4.07 | |||
| – | – | – | ||
| – | – | – | Int, 0.001, 10.93, | |
| 6350, 0.86 | – | |||
| 82.5, 2.7 | – | – | ||
| – | – | – | Int, 0.034, 4.52 | |
| – | – | Int | ||
| 120, 0.59 | – | |||
| – | – | – | ||
| – | – | – | ||
| – | – | – | Int | |
| – | – | – | ||
| 132, 0.76 | – | |||
| – | – | – | ||
| – | – | – | Int | |
| 5010, 0.26 | Int, 0.018 | |||
| – | – | – | Int | |
| – | – | – | Int | |
| – | – | – | Int | |
| – | – | – | Int, 0.022, 5.28, | |
| – | ||||
| 54.9, 1.1 | – | Int | ||
| – | – | – | ||
| – | – | – | ||
| – | – | – | Int, 0.013, 6.14 | |
| 32, 0.8 | Int sSI_iren |
GeneID = Gene orthologue from Anopheles gambiae;ω, %sites = omega values on branches under selection for aBSREL (Smith et al., 2015) analyses and percentage of sites under selection on those branches; BUSTED, p = evidence of selection in BUSTED (Murrell et al., 2015) analysis (Y/-) and associated p value; RELAX, p, LR, K = genes under selection based on RELAX (Wertheim et al., 2015) analyses: Int = evidence of intensifying selection, Rel = evidence of relaxed selection, p = p value associated with test; LR = likelihood ratio; K = K value associated with test;
Indicates tests with p values >0.05 but <0.1.
Candidate genes based on combined evidence
| Gene | Evidence | Function | Expression | Study system | Previous ID |
|---|---|---|---|---|---|
| int sel (RELAX); | Detecting terpenes | No Rinker; Y Athrey & Pitts | Expressed antennae downregulated after bloodmeal ( | ||
| + sel (aBSREL hin Z); | Detecting lactic acid | Y Rinker, Pitts, Athrey | Combined – see discussion ( | ||
| aa (hin Z), phyl, indels (hin Z) | Diel cycle | Y Rinker, Pitts, Athrey | Upregulated in dark ( | ||
| – | Y Rinker, Pitts, Athrey | ||||
| + sel (aBSREL hin sSI, BUSTED); | – | Y Rinker, Athrey, Pitts | Expression ( | ||
| int sel (RELAX); | Detecting sulcatone( | Palps, f & m | – | ||
| + sel (BUSTED); indels (all Z) | – | – | – | ||
| + sel (aBSREL hin sSI), BUSTED; indels (hin Z) | – | Y Rinker, Pitts, Athrey | Expression ( | ||
| + sel (aBSREL iren), BUSTED, int (RELAX); phyl | – | Y Rinker, Pitts, Athrey | Male biased expression ( | ||
| + sel (aBSREL iren); | – | Y Rinker, Pitts, Athrey | – | ||
| int sel (RELAX); indels (hin nSI) | – | Y Rinker, Pitts, Athrey | – | ||
| + sel (aBSREL hin sSI); | – | N Rinker, Y Pitts & Athrey | – | ||
| + sel (abSREL hin nSI, BUSTED); phyl | – | Y Rinker, Pitts, Athrey | – | ||
| int sel (RELAX); phyl | Detecting CO2 | Y Rinker, Pitts, Athrey | Expression ( | ||
| int sel (RELAX); aa (hin Z) | – | N Rinker, Y Athrey | – | ||
| int sel (RELAX); phyl | – | N Rinker, Y Athrey | – | ||
| – | – | ||||
| – | – | Y Rinker, Pitts, Athrey | Expression ( | ||
| – | Detecting ketones and alcohol (larvae) ( | Y Rinker, Pitts, Athrey | – | ||
| Phyl | – | Y Rinker, Pitts, Athrey | Expression ( | ||
| – | – | – | – | ||
| – | – | Antennae | Expression ( | ||
GeneID and evidence = Gene ortholog from Anopheles gambiae;Function/Expression = information on gene function and/or expression (if known); Previously identified = has the gene been previously identified as a potential candidate in human blood-feeding behavior? (Yes/No); Study system = the study system in which the gene was identified as a candidate; Evidence = evidence for the involvement of gene in anthropophily.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Illumina TruSeq DNA Nano kit | Illumina | 20015964 |
| AfarF2.7 | ||
| Data and code used in this paper | ( | n/a |
| FastQC | Babraham Bioinformatics | |
| Geneious | ||
| BWA mem v0.7 | ||
| CLUSTALW | ||
| FreeBayes | ||
| VCFtools | ||
| tBLASTn | ||
| GeneWise | ||
| R | ||
| Seqinr (R package) | ||
| HyPhy package | ||
| DataMonkey Webserver | ||
| aBSREL | ||
| BUSTED | ||
| RELAX | ||