Literature DB >> 35120121

Population genomics and geographic dispersal in Chagas disease vectors: Landscape drivers and evidence of possible adaptation to the domestic setting.

Luis E Hernandez-Castro1,2, Anita G Villacís3, Arne Jacobs1,4, Bachar Cheaib1, Casey C Day5, Sofía Ocaña-Mayorga3, Cesar A Yumiseva3, Antonella Bacigalupo1, Björn Andersson6, Louise Matthews1, Erin L Landguth5,7, Jaime A Costales3, Martin S Llewellyn1, Mario J Grijalva3,8.   

Abstract

Accurate prediction of vectors dispersal, as well as identification of adaptations that allow blood-feeding vectors to thrive in built environments, are a basis for effective disease control. Here we adopted a landscape genomics approach to assay gene flow, possible local adaptation, and drivers of population structure in Rhodnius ecuadoriensis, an important vector of Chagas disease. We used a reduced-representation sequencing technique (2b-RADseq) to obtain 2,552 SNP markers across 272 R. ecuadoriensis samples from 25 collection sites in southern Ecuador. Evidence of high and directional gene flow between seven wild and domestic population pairs across our study site indicates insecticide-based control will be hindered by repeated re-infestation of houses from the forest. Preliminary genome scans across multiple population pairs revealed shared outlier loci potentially consistent with local adaptation to the domestic setting, which we mapped to genes involved with embryogenesis and saliva production. Landscape genomic models showed elevation is a key barrier to R. ecuadoriensis dispersal. Together our results shed early light on the genomic adaptation in triatomine vectors and facilitate vector control by predicting that spatially-targeted, proactive interventions would be more efficacious than current, reactive approaches.

Entities:  

Mesh:

Year:  2022        PMID: 35120121      PMCID: PMC8849464          DOI: 10.1371/journal.pgen.1010019

Source DB:  PubMed          Journal:  PLoS Genet        ISSN: 1553-7390            Impact factor:   5.917


Introduction

The process by which insect vectors of human diseases adapt to survive and breed in human habitats is fundamental to the emergence and spread of vector-borne diseases (e.g., Aedes aegypti [1]). Relatively modest changes in vector host preference between ancestral (wild) and derived (domesticated) forms can drive devastating epidemics that result in millions of deaths [2]. Host preference variability in Culex pipiens of hybrid ancestry is thought to be genetically based and has contributed to local West Nile virus outbreaks in North America [3,4]. Similarly, host choice behaviour in Malaria mosquito Anopheles arabiensis has been linked to the allelic variation of a 3Ra chromosomal inversion [5]. Understanding the evolution and genetic bases of traits associated to the domestic habitat in disease vectors is, therefore, paramount and could inform control efforts and reveal the epidemic potential for new vector species [6,7]. Triatominae (Hemiptera: Reduviidae) are a group of hematophagous arthropods that transmit Trypanosoma cruzi, the parasite that causes Chagas disease, a fatal parasitic infection afflicting more than seven million people in Latin America [8]. Approximately 20 species are of public health concern due to their involvement in T. cruzi domestic transmission [9]. Eradication of ‘domesticated’ triatomines through insecticide spraying has been the mainstay of disease control in the past (e.g., Triatoma infestans [10], Rhodnius prolixus and Triatoma dimidiata [11]). However, wild (e.g., T. infestans [12] and R. prolixus [13]) and/or secondary competent species of triatomines (e.g., Triatoma sordida [14], Triatoma maculata and Rhodnius pallescens [15], Panstrongylus howardi [16] and P. chinai [17]) can continuously occupy empty domestic niches. Except from a few species that intrude houses seasonally (e.g., Triatoma dimidiata and other species in the Amazon basin [18,19]), constant triatomine house colonisation has historically jeopardised Chagas disease control strategies. Colonisation of the domestic niche may involve multiple, independent evolutionary processes across the geographic distribution of a given vector species [20,21], analogous to parallel trophic speciation observed in other arthropods [22]. Alternatively, domestication (hereafter, refers to the long-term evolutionary sense) of vectors with their associated zoonotic parasites may result from a single or limited number of independent colonisation events, followed by rapid and widespread dispersal within the domestic setting [23,24]. Domestication, and selection for domestic traits (e.g., pathogen resistance or efficient pollitators), in a given species may also represent a combination of these two scenarios, where multiple domesticated lineages serially introgress with wild lineages over evolutionary time, as has been elegantly demonstrated through analysis of the genomes of the Scutellata-European hybrid honey bees in America [25,26]. Disentangling these different scenarios in triatomine species, and their important implications for disease control, has been challenging due to a lack of genomic resources for these organisms which are only recently becoming available [27-29]. With adequate genomic tools; however, patterns of colonisation of the domestic niche can be established, and their underlying mechanisms unveiled. Models of ‘adaptation with gene flow’ (e.g., [30]) exploit standard population genetic metrics and theory to make generalisations about the genomic basis of adaptations (e.g., [22]). Such models can be deployed to study disease vector colonisation and reveal fundamental traits associated with the domestic niche. The genetic changes that allowed triatomines to thrive in the domestic niche may be related to feeding, reproduction and developmental performance. For instance, the development of potent saliva compounds that alter vertebrate host homeostatic, anti-inflamatory and immune responses was a crucial adaptation in triatomines for successful blood intake, and therefore, survival [31,32]. Saliva composition variation between domestic and wild populations has not been shown, yet saliva composition does play a role in highly ‘domesticated’ triatomines (e.g., R. prolixus and T. infestans) with exceptional feeding performace in humans [33,34]. Morphological changes such as reduced sexual dimorphism and body size have also been associated with the domestic habitat [35]. Egg development and viability are driven by neurohormonal signaling pathways starting soon after a female feeds on blood which results in yolk formation and supports embryonic development [36]. Under laboratory conditions, embryonic development of eggs collected inside houses was faster than those from the peridomicile [37]. Morphometric studies have attempted to develop phenotypic markers in triatomines associated with domestic or wild ecotopes with little (e.g., [38]) to moderate (e.g., [39]) success. Therefore, association of triatomine with the domestic niche is currently a qualitative concept with urgent need for quantitative foundations [40]. Identification of the ecological factors driving triatomine dispersal, with subsequent colonisation of a given niche, is necessary to predict complex triatomine population dynamics. High localised genetic structuring is expected in triatomine populations given their poor flying capabilities (< 2 Km), nymphs can only crawl short distances, and long-distance dispersal may sporadically occur via attachment to human cloths/bird feathers [41-43]. Models based on presence-only data have shown altitude, temperature, humidity, precipitation and vegetation as importat variables for triatomine distribution [44-46]. These models, however, represent broad spatial distribution rather than detailed local vector population dynamics and their accuracy requires extensive entomological records [47-49]. Instead, a landscape genomics framework (Fig 1) can accurately define landscape functional connectivity (the level at which the landscape heterogeneity facilitates or impedes a given organism’s movement from, and to, different habitat patches [50]) and shed light on the drivers of dispersal in a given vector species, and even assist in identifying poorly connected or isolated areas that can be easily targeted by eradication interventions [51-53]. Elevation may be a factor limiting R. ecuadoriensis dispersal given it limits the presence of other triatomine species [46]. Habitat fragmentation and human agricultural activities have shown to have an effect on triatomine population dynamics [54]. Human-mediated passive triatomine dispersal has been suggested elsewhere [11,41-43], and therefore, we assume roads might connect triatomine popuations (Fig 1D).
Fig 1

Step-by-step walk-through of the landscape genomics mixed modelling framework used to study the Chagas disease arthropod vector, Rhodnius ecuadoriensis.

A, First, a research question is defined based on whether gene flow or adaptation processes are to be investigated and sampling design is established. B, In the field, triatomines are collected in different ecotopes in the spatial and temporal gradients defined in A. Different variables are recorded at this stage such as altitude and geographic coordinates. C, In the laboratory, triatomine next generation sequencing (NGS) libraries are prepared and sequenced in high-throughput platforms. NGS data is processed with bioinformatic tools, and each sample genotype information is used to obtain a matrix of pairwise populations (Pop) genetic distances. D, A hypothetical landscape model (1) is parametrised into a resistance surface (2) which is a spatial representation of a given species movement constraints at each grid cell on a digital layer. From this resistance surface, a matrix of pairwise population (Pop) effective distances is calculated (3). E, Finally, statistical methods are used to correlate pairwise population genetic and effective distance matrices to investigate whether isolation-by-resistance (landscape functional connectivity) is a fitted model of the genetic differentiation of triatomine populations. Source maps: www.usgs.gov/centers/eros/science/usgs-eros-archive-digital-elevation-global-multi-resolution-terrain-elevation, www.usgs.gov/media/images/south-america-land-cover-characteristics-data-base-version-20 and dataportaal.pbl.nl/downloads/GRIP4/GRIP4_Region2_vector_shp.zip.

Step-by-step walk-through of the landscape genomics mixed modelling framework used to study the Chagas disease arthropod vector, Rhodnius ecuadoriensis.

A, First, a research question is defined based on whether gene flow or adaptation processes are to be investigated and sampling design is established. B, In the field, triatomines are collected in different ecotopes in the spatial and temporal gradients defined in A. Different variables are recorded at this stage such as altitude and geographic coordinates. C, In the laboratory, triatomine next generation sequencing (NGS) libraries are prepared and sequenced in high-throughput platforms. NGS data is processed with bioinformatic tools, and each sample genotype information is used to obtain a matrix of pairwise populations (Pop) genetic distances. D, A hypothetical landscape model (1) is parametrised into a resistance surface (2) which is a spatial representation of a given species movement constraints at each grid cell on a digital layer. From this resistance surface, a matrix of pairwise population (Pop) effective distances is calculated (3). E, Finally, statistical methods are used to correlate pairwise population genetic and effective distance matrices to investigate whether isolation-by-resistance (landscape functional connectivity) is a fitted model of the genetic differentiation of triatomine populations. Source maps: www.usgs.gov/centers/eros/science/usgs-eros-archive-digital-elevation-global-multi-resolution-terrain-elevation, www.usgs.gov/media/images/south-america-land-cover-characteristics-data-base-version-20 and dataportaal.pbl.nl/downloads/GRIP4/GRIP4_Region2_vector_shp.zip. Rhodnius ecuadoriensis is the major vector for Chagas disease in Ecuador and Northern Peru [55]. Both domestic and wild populations of this species exist throughout its range [56]. Preliminary morphological and genetic evidence suggests some gene flow of R. ecuadoriensis between domestic and wild ecotopes [57,58]. By comparison, genetic studies of T. cruzi infecting the same vectors in Ecuador have shown strong to moderate differentiation between wild and domestic isolates [59,60]. As such there is a lack of a clear understanding of the micro and macro-evolutionary and ecological forces shaping R. ecuadoriensis domestic adaptation and dispersal capabilities, and those of the parasites they transmit. Our study represents an attempt to evidence gene flow from wild to domestic ecotopes in R. ecuadoriensis in Ecuador, a preliminary survey of any potential genomic signatures of adapation to the domestic niche in triatomines, as well as an assessment of the landscape drivers of vector dispersal. We used a reduced-representation sequencing approach (2b- RADseq) to recover genome-wide SNP variation in 272 Rhodnius ecuadoriensis individuals collected across ecological gradients in Loja, Ecuador. We confirmed R. ecuadoriensis do frequently invade houses from the forest in southern Ecuador. Significantly elevated allelic richness in wild sites by comparison to nearby domestic foci clearly confirmed that dispersal occurred most frequently from wild ecotopes into domestic structures. Genome scans across multiple parallel colonisation events revealed possible evidence of ‘adaptation with geneflow’, with key outlier loci associated with colonisation of built domestic structures and, presumably, human blood feeding. Several outlier loci were mapped to the annotated regions of the R. prolixus genome. A strong signature of isolation-by-distance (IBD) was observable throughout the dataset, an effect less pronounced between domestic sites than between wild foci. Formal landscape genomic analyses revealed elevation surface as the major barrier to genetic connectivity between populations. Landscape genomic analysis enabled a spatial model of vector connectivity to be elaborated, informing ongoing control efforts in the region and providing a model for mapping the dispersal potential of triatomines and other disease vectors. Our findings suggest frequent and spatially targeted interventions, to cope with high gene flow and fragmented populations, are necessary to suppress Chagas disease transmission in Loja. Moreover, the discovery of signatures of possible local adaptation shed the first light on the genomic basis of domestication in triatomines.

Results

Recovery of SNP markers from 272 Rhodnius ecuadoriensis SNP specimens

Our CspCI-based 2b-RAD protocol was successful in obtaining genome-wide SNP information for R. ecuadoriensis. Sequencing of non-target species was minimal (0.2%). We genotyped six Rhodnius prolixus as controls and 80% of reads mapped to the R. prolixus reference genome. Only 9.5% of R. ecuadoriensis reads mapped to the same reference, a consequence of genomic sequence divergence between R. ecuadoriensis and R. prolixus [61] (S1 Methods). A stringent genotyping approach confidently identified 2,552 SNP markers across 272 R. ecuadoriensis samples from 25 collection sites, which represented closely administrative boundaries of human communities. In seven collection sites (Fig 2A; CG, BR, CE, CQ, HY, SJ and GL- seven pairs) triatomines from both domestic and wild ecotopes were collected. Remaining sites only had individuals of one ecotope (domestic or wild; S1 Table).
Fig 2

Genomic differentiation of domestic and wild R. ecuadoriensis.

A, geographic distribution of the seven collection sites with both ecotopes over an elevation surface map of Loja. B, Neighbor-Joining midpoint phylogenetic tree with phylogenies indicating the Euclidean distance between triatomine samples built from allele counts. Tree branches clades are colour-coded to differentiate geographic collection sites (or clusters of collection sites) including some apparent migrants (black asterisks). Branch tip labels are coloured to indicate ecotype (domestic—blue / wild—green). C, the scatter plot shows five clusters are built with the first and third principal components of the discriminant analysis eigenvalues. D, pairwise FST comparisons between domestic (blue box) and wild (green box) R. ecuadoriensis in multiple sites across Loja (A). Significant FST values (arrows) after FDR correction are highlighted in bold and an asterisk. In all panels, samples location (dots) and labels are colour-coded to indicate their domestic (blue) or wild (green) collection ecotope. Collection sites abbrevations: SJ, San Jacinto; HY, EL Huayco; GL, Galapagos; CQ, Chaquizhca; CE, Coamine; BR, Bramaderos; CG, La Cienega (see S1 Table for full collection sites list). Source map: www.usgs.gov/centers/eros/science/usgs-eros-archive-digital-elevation-global-multi-resolution-terrain-elevation.

Genomic differentiation of domestic and wild R. ecuadoriensis.

A, geographic distribution of the seven collection sites with both ecotopes over an elevation surface map of Loja. B, Neighbor-Joining midpoint phylogenetic tree with phylogenies indicating the Euclidean distance between triatomine samples built from allele counts. Tree branches clades are colour-coded to differentiate geographic collection sites (or clusters of collection sites) including some apparent migrants (black asterisks). Branch tip labels are coloured to indicate ecotype (domestic—blue / wild—green). C, the scatter plot shows five clusters are built with the first and third principal components of the discriminant analysis eigenvalues. D, pairwise FST comparisons between domestic (blue box) and wild (green box) R. ecuadoriensis in multiple sites across Loja (A). Significant FST values (arrows) after FDR correction are highlighted in bold and an asterisk. In all panels, samples location (dots) and labels are colour-coded to indicate their domestic (blue) or wild (green) collection ecotope. Collection sites abbrevations: SJ, San Jacinto; HY, EL Huayco; GL, Galapagos; CQ, Chaquizhca; CE, Coamine; BR, Bramaderos; CG, La Cienega (see S1 Table for full collection sites list). Source map: www.usgs.gov/centers/eros/science/usgs-eros-archive-digital-elevation-global-multi-resolution-terrain-elevation.

Reduced R. ecuadoriensis population genetic diversity in domestic ecotopes

Multiple genetic diversity estimates among populations from the 25 collection sites in Loja province were calculated (obsvered heterozygosity (HO), gene diversity (HE), inbreeding coefficient (FIS) and allelic richness (Ar)). Diversity estimates ranged from 0.11 to 0.23, from 0.09 to 0.22, and from -0.24 to 0.11 for HO, HE and FIS, respectively. Sample-size corrected Ar values ranged from 1.19 to 1.44 with the lowest values in La Extensa (EX), San Jacinto (SJ), El Huayco (HY) and Santa Rita (RT). In the paired ecotopes within the seven collection sites, Ar values were significantly higher for wild than domestic triatomine populations and in five out of seven instances (p<0.05, rarefaction method [62]; S2 Table).

Genomic differentiation between domestic and wild ecotopes

To assay population dynamics between sympatric domestic and wild foci, we focused our individual-based genomic differentiation and pairwise FST comparisons analyses on the seven collection sites for which samples from both ecotopes were available (Fig 2A). Supporting frequent migration between domestic and wild ecotopes, samples from each ecotope were interleaved at most collection sites in the phylogenetic tree, with collection site geography, not ecotope, impacting the tree topology (Fig 2B). As such, samples collected in Galapagos (GL), Coamine (CE) and Chaquizhca (CQ) formed distinct clusters, and El Huayco (HY)—San Jacinto (SJ) and Bramaderos (BR)—La Cienega (CG) also grouped discretely. Five broadly congruent clusters were defined in a discriminant analysis of principal components (DAPC) (Fig 2C), with geographic collection site rather than ecotope (silvatic vs domestic) again structuring observed diversity. FST indices between paired domestic and wild triatomine samples within each of the seven compared collection sites indicate little differentiation (e.g., FST ≤ 0.10). Permutation tests indicated that FST was significant (p < 0.05) at only two sites—Bramaderos and El Huayco (Fig 2D). As expected, hierarchical analysis of molecular variance revealed genetic subdivision was significantly stronger (Fcollection sites/total = 0.26, p-value < 0.001) among collection sites than among ecotopes within collection sites (Fecotope/collection site = - 0.004, p-value < 0.001) or among collection year within communities (Fcollection year/collection site = 0.06, p-value < 0.001) (S3 Table).

Genetic loci correlated with domestic colonisation

To identify loci among our markers associated with domestic colonisation, we combined a Random Forest (RF) classification approach and redundancy analyses (RDA) with outlier scans (see Methods). We included the seven collection sites with frequent domestic-wild migration and three additional wild-only sites to roughly conform similar number of domestic (n = 56) and wild (n = 52) samples. A total of 347 SNPs provided high ranked classification accuracy (mean > 3) across the three RF iterations (inset in Fig 3A). Backwards purging on this highly discriminatory subset of SNPs detected a set of 43 SNPs that minimised the ‘Out-of-bag’ error rate (OOB-ER) to 0.09 and maximised the discriminatory power among domestic and wild samples (Fig 3A). In a parallel RDA model, ecotope (domestic / wild) was a predictor explaining approximately 0.4% of the total variation and the constrained axis built from that variation was significant (p-value < 0.001), and so was the full model as indicated by the Monte Carlo permutation test. The distribution of each SNP loading/contribution to the RDA significant axis showed 109 candidate outlier loci as SNPs loadings at ±2 SD from the mean of this distribution (permissive threshold; Fig 3B). In a more conservative approach, we also identified seven loci from those 109 under very strong selection as represented by those SNPs loading at the extreme ±3 SD (conservative threshold) away from the mean distribution of the constrained axis (Fig 3B). The arrangement of the individual samples in the ordination space with relation to the RDA axis showed a clear pattern of subdivision comparable to the ecotope in which samples were collected (Fig 3C). The 21 loci/SNPs identified as outlier loci (dark dots in Fig 3B) by RDA were also detected as highly discriminatory SNPs for domestic and wild ecotopes in the RF analysis. Assuming ‘adaptation with geneflow’ we assessed locus-specific estimates of FST (Fig 3D), among the 2552 SNPs between domestic and wild ecotopes and identified one SNP (Locus ID 15732 –purple diamond in Fig 3B and 3D) possibly under local adaptation and/or spatial heterogeneous selection as suggested by OutFlank analysis (Fig 3D left). Moreover, outlier scan with fsthet (Fig 3D right) in the same subset flagged this OutFlank SNP and 73 additional SNPs showing FST higher that the average neutral loci distribution at a 5% threshold. In summary, 43 SNPs were identified with the highest classification accuracy in RF analysis. 21 of those SNPs showed some signal of selection (that is, loaded ± 2 SD away from mean distribution of the constrained axis) and four were identified showing strong signal of selection (that is, loaded ± 3 SD away from the mean distribution of the constrained axis) in RDA analysis. Three of the SNPs flagged as outliers in fsthet analysis were found also being at high classification accuracy in RF analysis. The SNP (Locus ID 15732) possibly under strong selection as identified by OutFlank analysis, also had a high classification accuracy in RF and, interestingly, it was also identified within the RDA and fsthet SNPs sets as under a strong signal of selection.
Fig 3

Scanning outlier SNP markers for signatures of local adaptation in Rhodnius ecuadoriensis.

A, Random Forest backwards purging shows subsets with decreasing number of highly discriminatory SNPs and their resulting OOB-ER. The two vertical red lines indicated the 43 SNPs subset with the lowest OOB-ER and maximum discriminatory power between domestic and wild ecotopes. The inset shows SNPs ranked based on their classification accuracy averaged after 3-independent RF runs. SNPs with classification accuracy above three (red horizontal line) were used for the backwards purging. B, In our RDA model, SNPs (dots and diamonds) are arranged as a function of their relationship with the constrained predictor (RDA 1), ecotope (arrow outlines towards a wild ecotope relationship). SNPs closer to the centre (small grey dots) are not showing relation with the predictor. Outlier loci/SNPs are represented by those large dots/diamond loading at ± 2 SD and ± 3 SD separated from the mean SNPs loading distribution and showing a strong relationship with ecotype. Black large dots (and purple diamond) represent loci/SNP identified with high classification power in RF analysis. C, a biplot of R. ecuadoriensis triatomine samples and SNPs (small black dots in the centre) are arranged in relation to the constrained RDA axis with an arrow indicating those related to the wild ecotope. Dots are colour-coded to show sample ecotope of collection, domestic (blue) or wild (green). Biplot scaling is symmetrical with inset showing the density function for the RDA axis. D, Scatter plots show OutFlank (left) and fsthet (right) SNPs FST-heterozygosity relationship. 43 SNPs (large dots) had higher than average FST distribution of neutral loci in fsthet, whereas only one in OutFlank. Purple diamond indicated the SNP (ID 15732) flagged in all four analyses.

Scanning outlier SNP markers for signatures of local adaptation in Rhodnius ecuadoriensis.

A, Random Forest backwards purging shows subsets with decreasing number of highly discriminatory SNPs and their resulting OOB-ER. The two vertical red lines indicated the 43 SNPs subset with the lowest OOB-ER and maximum discriminatory power between domestic and wild ecotopes. The inset shows SNPs ranked based on their classification accuracy averaged after 3-independent RF runs. SNPs with classification accuracy above three (red horizontal line) were used for the backwards purging. B, In our RDA model, SNPs (dots and diamonds) are arranged as a function of their relationship with the constrained predictor (RDA 1), ecotope (arrow outlines towards a wild ecotope relationship). SNPs closer to the centre (small grey dots) are not showing relation with the predictor. Outlier loci/SNPs are represented by those large dots/diamond loading at ± 2 SD and ± 3 SD separated from the mean SNPs loading distribution and showing a strong relationship with ecotype. Black large dots (and purple diamond) represent loci/SNP identified with high classification power in RF analysis. C, a biplot of R. ecuadoriensis triatomine samples and SNPs (small black dots in the centre) are arranged in relation to the constrained RDA axis with an arrow indicating those related to the wild ecotope. Dots are colour-coded to show sample ecotope of collection, domestic (blue) or wild (green). Biplot scaling is symmetrical with inset showing the density function for the RDA axis. D, Scatter plots show OutFlank (left) and fsthet (right) SNPs FST-heterozygosity relationship. 43 SNPs (large dots) had higher than average FST distribution of neutral loci in fsthet, whereas only one in OutFlank. Purple diamond indicated the SNP (ID 15732) flagged in all four analyses.

Mapping outlier loci to the Rhodnius prolixus genome

Several outlier SNPs from the different analyses mapped to annotated regions of the R. prolixus genome. One SNP identified in the RDA analysis mapped (97.1% identity) in a R. prolixus genome region containing the characterised Krüppel gap gene (Accession No JN092576.1) involved in arthropod embryonic development [63]. Three outlier SNPs identified in fsthet analysis mapped (100% identity) to regions in the R. prolixus genome containing characterised GE-rich and polylysine protein precursors (mRNA—Accession AY340265.1), and the Krüppel and giant gap genes [63,64] (Accession No HQ853222.1). The former are important proteins within the sialome of blood-sucking bugs [65] and the latter involved in arthropod embryonic development [64]. Mapping of the majority of putatively outlier SNPs, including Locus ID 15732, was not possible in the absence of an available R. ecuadoriensis genome.

Comparison of dispersal rates of R. ecuadoriensis between domestic sites with dispersal rates between wild sites

Including all samples (n = 272) and collection sites (n = 25), we tested the strength of genetic isolation-by-distance (IBD) initially among domestic sample collection sites and latterly among wild collection sites (Fig 4). Mantel tests in both domestic (rm = 0.46, p-value < 0.001) and wild (rm = 0.31, p-value = 0.043) ecotopes strongly supported an effect of geographic distance on genetic distance (Fig 4A). Based on a generalised least square model with maximum likelihood population effects parametrisation (GLS-MLPE), the effect of geographic distance was significantly stronger (0.0018, p-value < 0.001) in wild compared to domestic foci (Fig 4A), suggesting that the rate of vector dispersal occurred at a higher rate between domestic populations than between wild ones (S4 Table).
Fig 4

Dispersal rate in R. ecuadoriensis.

A, correlation between pairwise genetic (FST) and geographic distances (data points) with fitted regression lines (95% CI) for domestic (blue dots) and wild (green diamonds) ecotopes. Fitted GLS-MLPE model in Eq 1. B, geographic distribution of the 25 collection sites across Loja province (elevation map) used for estimating R. ecuadoriensis gene flow with geographic distance. Collection sites ID labels: EX, La Extensa; SJ, San Jacinto; HY, EL Huayco; RT, Santa Rita; NJ, Naranjillo; GL, Galapagos; SS, Santa Rosa; TR, Tuburo; YS, Camayos; NT, San Antonio de Taparuca; AZ, Ardanza; GA, Guara; CQ, Chaquizhca; BM, Bella Maria; CE, Coamine; VC, Vega del Carmen; TM, Tamarindo; HG, Higida; ND, Naranjo Dulce; TC, Tacoranga; AH, Ashimingo; LM, Limones; BR, Bramaderos; CG, La Cienega; SF, San Francisco (SF). Source map: www.usgs.gov/centers/eros/science/usgs-eros-archive-digital-elevation-global-multi-resolution-terrain-elevation.

Dispersal rate in R. ecuadoriensis.

A, correlation between pairwise genetic (FST) and geographic distances (data points) with fitted regression lines (95% CI) for domestic (blue dots) and wild (green diamonds) ecotopes. Fitted GLS-MLPE model in Eq 1. B, geographic distribution of the 25 collection sites across Loja province (elevation map) used for estimating R. ecuadoriensis gene flow with geographic distance. Collection sites ID labels: EX, La Extensa; SJ, San Jacinto; HY, EL Huayco; RT, Santa Rita; NJ, Naranjillo; GL, Galapagos; SS, Santa Rosa; TR, Tuburo; YS, Camayos; NT, San Antonio de Taparuca; AZ, Ardanza; GA, Guara; CQ, Chaquizhca; BM, Bella Maria; CE, Coamine; VC, Vega del Carmen; TM, Tamarindo; HG, Higida; ND, Naranjo Dulce; TC, Tacoranga; AH, Ashimingo; LM, Limones; BR, Bramaderos; CG, La Cienega; SF, San Francisco (SF). Source map: www.usgs.gov/centers/eros/science/usgs-eros-archive-digital-elevation-global-multi-resolution-terrain-elevation.

Landscape functional connectivity in R. ecuadoriensis

Landscape genomic mixed modellling aims to identify the effect of different combinations of landscape surfaces and their parameters on a given genomic differentiation pattern (see Methods). R. ecuadoriensis genomic differentiation was closely partinioned by collection sites (Fig 5A) which was evidenced through hierarchical (S3 Table), phylogenetic, DAPC (Fig 2) and Admixture analyses (S1 and S2 Figs). To obtain an accurate representation of the genomic differentiation pattern among R. ecuadoriensis populations, we chose Hedrick’s GST pairwise comparisons (Fig 5B) which corrects for sampling limited number of populations [66]. The genomic pattern was consistent regardless of metric used (e.g., Pairwise FST [67] and Meirman’s standardised FST [68]) as revealed by strong and significant (r2 = 0.99 & 0.92, respectively; p < 0.001) Pearson’s correlations between them. Pairwise Hedrick’s GST comparisons (Fig 5B) showed a strong pattern of population structure across Loja province with presence of both high and low genetic differentiation among collection sites (Fig 5A and 5B). San Francisco (SF) and San Antonio (NT) were two examples of clear, and mutually distinct, outliers in genetic terms. Santa Rita (RT), El Huayco (HY), San Jacinto (SJ) and La Extensa (EX) were genetically and geographically close but highly differentiated form the rest. Overall, some clusters of collection sites were evident as well as instances differentiation within and among clusters (S5 Table).
Fig 5

Landscape connectivity of Rhodnius ecuadoriensis in Loja province, Ecuador.

A, Elevation map of the geographic location of collection sites across Loja. B, Heatmap shows pairwise genetic distances (GST) with collection sites ID labels on the right. Clusters and highly differentiated collection sites are circled in A. Grey scale indicate genetic distance with lighter colours showing higher differentiation. C, Electrical current map of Loja built from the optimised elevation surface model showing a gradient of high (yellow/light shade), medium (light greens) and low (blue/dark shade) functional connectivity across Loja. Clusters of highly connected sites are evident but isolated sites are also present across regions in Loja. Connectivity within and among clusters and collection sites is highly influenced by the landscape, specifically elevation surface. Source map: www.usgs.gov/centers/eros/science/usgs-eros-archive-digital-elevation-global-multi-resolution-terrain-elevation.

Landscape connectivity of Rhodnius ecuadoriensis in Loja province, Ecuador.

A, Elevation map of the geographic location of collection sites across Loja. B, Heatmap shows pairwise genetic distances (GST) with collection sites ID labels on the right. Clusters and highly differentiated collection sites are circled in A. Grey scale indicate genetic distance with lighter colours showing higher differentiation. C, Electrical current map of Loja built from the optimised elevation surface model showing a gradient of high (yellow/light shade), medium (light greens) and low (blue/dark shade) functional connectivity across Loja. Clusters of highly connected sites are evident but isolated sites are also present across regions in Loja. Connectivity within and among clusters and collection sites is highly influenced by the landscape, specifically elevation surface. Source map: www.usgs.gov/centers/eros/science/usgs-eros-archive-digital-elevation-global-multi-resolution-terrain-elevation. The pattern of population genomic differentiation was iteratively regressed with different combinations of landscape variables and parameters using the ResistanceGA [69] optimisation framework (see Methods). The optimisation process involves estimating unbiased resistance values for a given combination of surfaces and selecting the best (true) model representing the genomic pattern. To rule out collinearity between landscape variables, we calculated Spearman’s correlation coefficient, rho, between all pairs of surfaces which resulted in small and/or negative (rho < 0.29) correlations (S6 Table). Similarly, a scatterplot matrix did not show highly correlated surfaces (S3 Fig). Our three ResistanceGA optimisation replicates (see Methods) showed comparable results. In all replicates, the single elevation surface showed the lowest AICc values and the highest AICc weight compared to the other single and composite optimised surfaces (Table 1 is a replicate example). Delta AICc shows the AICc difference between the elevation surface (best model) and the rest of the (combination of) surfaces. A difference of ~2.26 units between elevation surface and a distance-only model was evident which suggests elevation surface is a better predictor than geographic distance, although geographic distance remains a strong predictor. Optimisation of the elevation surface parameters confirmed that gene flow resistance increases with altitude up to the highest resistance at approximately 2,400 m.a.s.l. (S4 Fig).
Table 1

Model selection results for the generalised mixed-effects models optimised on genetic distance (Hedrick’s GST) for R. ecuadoriensis.

The most strongly supported resistance surface model is presented first. For each resistance surface model, number of parameters plus the intercept (k), additional parameters corrected Akaike information criterion (AICc), delta AICc and AICc weight (ω) are provided.

Resistance surface modelType k AICcDelta AICcω
Elevation single4-749.5100.76
Distance single2-747.252.260.24
Roads single6-736.2513.260.0010
Elevation + Roads composite9-729.5519.963.49e-05
Land Single12-720.2629.253.35e-07
Elevation + Land cover composite15-687.8261.703.03e-14
Land cover + Roads composite17-648.63100.889.37e-23
Null model single1-565.08184.436.75e-41
Elevation + Land cover + Roads composite20-520.44229.081.36e-50

Model selection results for the generalised mixed-effects models optimised on genetic distance (Hedrick’s GST) for R. ecuadoriensis.

The most strongly supported resistance surface model is presented first. For each resistance surface model, number of parameters plus the intercept (k), additional parameters corrected Akaike information criterion (AICc), delta AICc and AICc weight (ω) are provided. To evaluate the roboustness of our optimisation procedure and test the effect of uneven distribution of sample sites, we ran a bootstrap analysis with resampling of the sites at each iteration. Interestingly, the bootstrap analysis revealed that, when resampling 85% of the collection sites, the optimised elevation surface model was ranked the top model in only 43.2% of the bootstrap iterations compared to 46% of the times in which a distance-only model was better (Table 2). The fact that elevation surface was slightly less supported in the bootstrap analysis is likely due to the irregular distribution of sites across the study area and altitudes [70].
Table 2

Summary of bootstrap analysis.

For each resistance surface model, number of parameters plus the intercept (k), and average (Avg) additional parameters corrected Akaike information criterion (AICc), AICc weight (ω), rank, and frequency the model was top ranked are provided.

Resistance surface model k Avg AICAvg AICCωAvg rankTop model (%)
Elevation 4-535.59-533.090.401.6243.2
Distance 2-531.44-530.780.602.3346
Land cover 12-526.59-487.594.17e-053.9110.8
Roads 6-523.81-517.810.00084.180
Elevation + Roads 9-525.76-509.402.80e-064.400
Elevation + Land cover 15-521.94-425.941.45e-215.290
Land cover + Roads 17-516.51-312.513.97e-466.590
Elevation + Land cover + Roads 20-511.14328.861.11e-1857.680

Summary of bootstrap analysis.

For each resistance surface model, number of parameters plus the intercept (k), and average (Avg) additional parameters corrected Akaike information criterion (AICc), AICc weight (ω), rank, and frequency the model was top ranked are provided. To assist with the identification of vector management zones for regional health authorities, an electrical current map was built by applying a circuit theory algorithm [71,72] on the optimised elevation surface model (Fig 5C). Specifically, the algorithm simulates the passing of an electric current across grids (zones) with low/high optimised resistance values. Low resistance grids are highlighted as high current intensity zones (yellow/light zones in Fig 5C) in which high population connectivity, and therefore high degree of gene flow, is predicted. The map showed different gradients of connectivity within and among western, central, eastern and southern Loja province. These included individually isolated populations (e.g. SF & CG), isolated clusters (e.g EX; SJ; HY; RT; NJ); as well as well-connected hubs (e.g., BR-LM, AH-TM-ND, HG-TC and CE-VC).

Discussion

In this study we make several core observations: R. ecuadoriensis do invade houses from wild populations, R. ecuadoriensis loci associated with the domestic niche can be identified within our limited marker set and mapped to annotated triatomine genomic regions, and the landscape drivers of vector dispersal can be identified. Consistent with frequent house invasion, high levels of gene flow between multiple domestic and wild R. ecuadoriensis populations were detected by hierarchical analysis. Low and largely non-significant pairwise FST values, as well as interleaved sample clustering based on phylogenetic and discriminant analyses were also consistent with house invasion. Significantly elevated allelic richness in wild sites by comparison to nearby domestic foci clearly confirmed that dispersal occurred most frequently from wild ecotopes into domestic structures. Genome scans across these parallel events of colonisation to the domestic niche revealed possible evidence of ‘adaptation with geneflow’, with key outlier loci associated with colonisation of built domestic structures and, presumably, human blood feeding—several of which mapped to the R. prolixus genome. A strong signature of isolation-by-distance (IBD) was observable throughout the dataset, an effect less pronounced between domestic sites than between wild foci. Formal landscape genomic analyses revealed elevation surface as the major barrier to genetic connectivity between populations. Landscape genomic analysis enabled a spatial model of vector connectivity to be elaborated, informing ongoing control efforts in the region and providing a model for mapping the dispersal potential of triatomines and other disease vectors. Vector control is the mainstay of Chagas Disease control [11]. Widespread wild reservoir hosts, as well as a lack of safe treatment options [73,74] and associated healthcare infrastructure, mean that transmission cannot be blocked by reducing parasite prevalence in human and animal hosts [75]. Our data indicate that elimination of domesticated R. ecuadoriensis in Ecuador will be frustrated by repeated re-invasion from the wild environment. Similar risks to effective control are posed by wild T. infestans in the southern cone region [12], R. prolixus in Los Llanos of Colombia and Venezuela [13] and potentially elsewhere in Latin America where competent vectors are present in the wild environment and nearby domestic locales (e.g., T. sordida, T. maculata, R. pallescens and others [14,15]). Understanding evolutionary processes that underpin the colonisation of the domestic environment by arthropod vectors, and their specialisation to feeding on humans, is required to characterize their vectorial capacity. Hybrid ancestry in Culex pipiens, for example, is thought to contribute to the biting preference for humans [3]. Human feeding preference can be rapidly genetically selected for in Anopheles gambiae [76]. Specialisation of Aedes aegypti on humans, and resultant global outbreaks of dengue, yellow fever, and Chikungunya viruses, may be traceable to SNPs associated with the emergence of differential ligand-sensitivity of the odorant receptor AaegOr4 in East Africa [2]. In triatomines, the nature of genetic adaptations that have enabled the widespread dispersal of successful lineages are far from clear. T. infestans, thought to have originated in the Western Andean region of Bolivia, spread rapidly among human dwellings in the Southern Cone region of South America before its near eradication in the 1990s [10]. Cytogenetic analyses suggest this early expansion was accompanied by a substantial reduction in genome size [77], but the significance of such a change is not clear. The advantage of the R. ecuadoriensis system we describe is that it may be able to capture multiple parallel adaptive processes and; therefore, can assist in the identification of common evolutionary features associated with colonisation of the domestic environment. Despite limited genomic coverage, and with no R. ecuadoriensis reference genome available, we mapped outlier loci to genes in the R. prolixus draft genome, and found hits related to salivary enzyme production [65], as well as embryonic development [63]. However, these findings represent only a small first step towards undertstaning domestic adaptation in triatomines. Our methodological pathway was limited to comparing allele frequencies at a relatively small fraction of genomic loci between triatomine natural populations in order to identify oulier loci associated with a given niche, and map them to genomic regions in R. prolixus ([30,78]). Although, these genes may have a role in domestic adaptation in triatomines, genome-wide association studies, quantitative trait locus mapping or CRISPR/Cas9 gene knockout approaches are necessary to fully reveal the genomic architecture of adaptation to the domestic setting. Nevertheless, these findings motivate us to investigate further putative genes involved in local adaptation to the domestic environment such as blood-feeding [79], sensory cues and host-seeking behaviour [28,80], as well as human blood detoxification [79,81]. Recent data from our group in Loja province shows that, without doubt, domestic R. ecuadoriensis feed extensively on human blood [82]. To adequately explore the genomic bases of adaptive traits in triatomines, future work should focus not only on improving functional annotation of triatomine genomes, but also robust experimental designs (e.g., common-garden or recriprocal transplant experiments [83,84]), to enable genotype and phenotype to be linked. Our analyses identified a strong signal of genetic IBD among R. ecuadoriensis populations across our study area. Geographic partitioning at this scale is consistent with limited autonomous dispersal capabilities of triatomines which are, in the main, poor fliers [41]. Wind-blown dispersal observed in smaller vector species is unlikely in triatomines [85]. Passive dispersal of triatomine vectors alongside the movements of their human hosts, which certainly underpins the successful dispersal of other domesticated vector species, is more likely (e.g., Aedes spp. [86,87]). Lower IBD observed among domestic sites than wild sites may be consistent with passive dispersal alongside humans in the former. We observed a similar phenomenon among parasite isolates from the same region in a previous study [59] in which T. cruzi domestic/peridomicile isolates showed no spatial structure in comparison with wild isolates. Nonetheless, our formal exploration of the landscape drivers of vector dispersal did not reveal an important effect of roads, and it is not clear to what extent human dispersal of vectors takes place based on our data alone. According to our landscape genomic analysis, elevation surface is a key predictor of connectivity/discontinuity among R. ecuadoriensis populations. Our machine learning (ML) optimisation procedure provides objective parameterisation of altitude resistance values to R. ecuadoriensis gene flow [88]. Based on our landscape model predictions we were able to construct an electric current map (Fig 5C) to assist medical entomologists and policy makers in understanding vector dispersal routes. Current vector control strategies in Loja target a single civic administrative unit (neighbourhood or town) for any given insecticidal intervention [55]. Historical vector control in Loja has been sporadic and limited insecticide spraying that varied yearly (from 2004 to 2014) to only a small number of parishes due to budgetary constraints [89]. Our data and model suggest this approach may be effective for certain communities (e.g., SF, CG, NT and YS, Fig 5). However, for highly connected hubs (e.g. BM, GA, CQ, AZ), successful longer term triatomine control (e.g., insecticide spraying, house improvement, window nets, etc.) will depend on simultaneous intervention in multiple connected communities. In Ecuador, as with many other endemic regions in Latin America, efforts to control Chagas disease may be complicated in the long term by substantial wild populations of secondary triatomine vectors [16]. As with many other vector borne diseases, there is also a strong case for the use of integrated vector management (IVM) for Chagas disease, where improvements to housing, education, community engagement, in addition to bed net use and insecticide spraying are all likely to be necessary to achieve sustained control [55,90]. Our data clearly indicate that triatomines do invade houses in Loja and low-lying valleys provide routes for vector dispersal between communities and cost-effective IVM must be underpinned by this understanding of vector population structure. Fortunately, genomic and analytical tools can now furnish much of the detail, although better genomic resources for secondary triatomine vector species are required to reveal the process of vector adaptation to the human host. Targeting secondary vector species like R. ecuadoriensis must now be a priority for health authorities, as these now represent the most pernicious and persistent barrier to controlling residual Chagas disease transmission.

Methods

Sample collection and study area

Rhodnius ecuadoriensis triatomine bugs (n = 272; S1 Table) were derived from a larger collection in the Center for Research on Health in Latin America (CISeAL) of Pontificia Universidad Católica del Ecuador (PUCE). Rhodnius prolixus samples (n = 6) were provided by the London School of Hygiene and Tropical Medicine and sequenced as an outgroup, as well as to assist with the decontamination of the of 2b-RAD reads and their mapping to functional regions in the draft R. prolixus genome [27]. The CISeAL triatomine collection has been gathered since 2004 from domestic (human built environment) and wild (animals nests, burrows, etc) ecotopes during field surveys across Loja, Ecuador. Surveys have occurred throughout the year but 80% of the time during the summer, from June to August [58]. Houses and wild locations have been selected over the years by random sampling, but depending on rough terrain accessibility, resource availability and community participation [55,91,92]. Domestic R. ecuadoriensis were collected inside houses (e.g., rooms, underneath beds and clothing, walls, etc) using the one-hour-man method described in previous studies [16,55,91]. Wild R. ecuadoriensis were collected in animals (bird/squirrel/mouse) nests attached to trees and bushes surrounding domestic collection sites [54,56,92]. This study triatomine subset (n = 272) was composed of spatially widespread collection sites (n = 25) across Loja separated by 0.4 to 100 Km (Fig 4). Sites were located at different ecotopes (e.g., domestic and wild ecotopes separated by 0.1 to 3 Km–Fig 2), altitudes (up to 1542.9 m.a.s.l.–S5 Fig), vegetation types (e.g., tree/bush forest, cropland, etc.–S6 Fig) and adjacent to different road infrastructure (e.g., highways, tertiary roads, etc.–S7 Fig). As mentioned above, we defined R. ecuadoriensis as domestic/wild based on whether they were collected in the built environment or wild animals nests, respectively. When speciemens were available we selected triatomines from different houses/wild sites within a locale to have a good representation of the site. The triatomines were collected under Ecuadorian collection permits: N° 002–07 IC-FAU-DNBAPVS/MA; N° 003–2011-IC-FAU-DLP-MA; N° 006-IC-FAU-DLP-MA-2010; N° 010-IC-FAN-DPEO-MAE; N° 011–2015- IC-INF-VS-DPL-MA; MAE-DNB-CM-2015-0030 and internal mobilization guide N° 001-2018-UPN-VS-DPAL-MAE and N° 017-2018-UPN-VS-DPAL-MAE. All these samples were exported to the University of Glasgow by the scientific export authorization N°70-2018-EXP-CM-FAU-DNB/MA.

Genomic DNA extraction and sequencing

Genomic DNA (gDNA) was extracted in 88.2% (443/502) of the samples using a SSNT/Salt precipitation method [93] previously applied in triatomine bugs [94]. For each sample, gDNA concentration was > 25 ng/uL and 288.4 ng/UL (sd. ± 241.8) on average with purity ratios (260/280 and 260/230) of 1.87 (sd. ± 0.10) and 2.30 (sd. ± 0.97), respectively. gDNA was digested with the CspCI Type IIB restriction enzyme (IIB-REase—New England BioLabs, Inc.) which has shown to yield a high marker density in triatomine [94]. DNA fragments (36bp) were ligated to Illumina single-end adaptors and a specific barcode added during PCR amplification to construct 382 150bp 2bRAD libraries [95]. Libraries were homogenised to an approximate similar concentration, purified with magnetic beads [96] and pooled in two separate batches (n = 191). Each batch was sequenced separately on 1-flowcell (2 lanes) HiSeq 2500 (Illumina) Rapid Mode platform with a single-end (1x50 bp) setup using v2 SBS chemistry at the Science for Life Laboratory (SciLifeLab, Stockholm, Sweden), which also implemented the reads demultiplexing and their in-house quality-filtering.

Bioinformatics of 2b-RAD sequenced data

Data cleaning and decontamination

Demultiplexed raw data quality scores were verified in FastQC software v0.11.9 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). 2.3% (16/689) Million reads (Mreads) were removed due to incomplete CspCI restriction site (36 bp) and having across read quality score below 30 [97]. The 624.7 high quality Mreads with integrate restriction site had their Illumina adaptors and barcodes trimmed, and reads were forwarded (5’-3’) using custom scripts. To exclude non-target sequences, 1.2 Mreads (0.2%) mapping to bacteria, virus, archaeal, Trypanosoma cruzi [98] and homo sapiens (Genome Reference Consortium human build 38) genomes were removed using DeconSeq standalone v4.3 [99] with an alignment identity threshold of 85% and Kraken [100] taxonomic classifier (S1 Methods). After decontamination, each sample yielded on average 1.6 Million reads (interquartile range = 1.9 Mreads).

Optimisation and genotyping

As advised in refs. [101,102], we optimised STACKS v2.55 [103] DENOVO_MAP.PL programme by varying at a time one of the main controlling parameters (-m -M and -n, -N) on each run while keeping the rest of the parameters at the setting used in early experiments (e.g., -m 5, -M 2, -n 1, -N 4, -alpha 0.01, -bound_low 0, -bound_high 0.01, -r 0.8, -min_maf 0.01 [94]). These parameters control the minimum number of raw reads required to form a stack (a putative allele) which is comparable to the minimum depth of coverage (-m), the number of mismatches allowed between stacks (putative alleles) to merge them into a putative locus which is comparable to the number of nucleotide mismatches allowed (- M), the number of mismatches allowed between stacks (putative loci) during construction of the catalog that contains all loci and alleles of the population (-n) and the number of mismatches allowed to align secondary reads (reads that did not form stacks) to assemble putative loci to increase locus depth (-N) [103]. The parameter combination yielding the highest number of SNPs with the least missing data and genotyping error rate was chosen to be the optimal set (S2 Methods). Genotypes below a quality score of 30, and samples with above 50% missing genotypes across sites and among loci were removed from downstream analysis using the VCFtools software suite v0.1.5 5 [104]. The remaining missing genotypes (< 0.5%) were imputed using the k-nearest neighbour genotype imputation (LDkNNi) method [105] implemented in the TASSEL software v5 [106].

Genetic diversity and linkage disequilibrium

Genetic diversity measures (e.g., observed (HO) and gene diversity (HE), inbreeding coefficient (FIS) and allelic richness (Ar)–S2 Table) were calculated for each collection site, and ecotopes (domestic and wild) within collection sites, in the HIERFSTAT [107] and pegas [108] packages in R [109]. Sample-size corrected Ar was calculated using the rarefaction method [62] implemented in the PopGenReport [110] R package. To evaluate the percentage of SNP markers in linkage disequilibrium (LD), correlation coefficient (r2) estimates were calculated between markers pairs using using the GUS-LD R package [111] which revealed a very low percentage (< 0.20%). To observe whether genetic diversity difference between ecotope pairs was significant, a permutation-based (10,000 permutations) two sample t-test was performed on each pair diversity values using the RVAideMemoire R package (https://www.rdocumentation.org/packages/RVAideMemoire).

Individual-based genomic differentiation

Genomic differentiation among R. ecuadoriensis domestic and wild samples within a subset of seven collection sites (Fig 2A) was visualised in a Neighbour-Joining midpoint tree [112] (Fig 2B) built from Euclidean genetic distances of allele frequencies with the ape [113] R package. Tree components were edited in FigTree software v1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/) to better illustrate domestic and wild samples, and their overall clustering pattern. To explore samples genomic differentiation further, a DAPC [114] was performed in the same seven collection sites with the adegenet [115] R package (Fig 2C). The most likely a priori number of clusters was chosen based on the lowest Bayesian information criterion (BIC). In the DAPC, all principal components (PCs) and the eigenvectors of the first three DA discriminant functions were kept for visualizing the samples individual coordinates of different PCs linear combinations (S8 Fig).

Pairwise FST comparisons

To support previous hierarchical analyses, pairwise FST comparisons [67] (Fig 2D) were performed between R. ecuadoriensis from domestic and wild ecotopes within the seven collection sites (Fig 2A). In this study, FST was exploited as a measure of genomic connectivity (flow) between ecotopes within given collection sites. Specifically, Nei’s FST [116] pairwise comparisons were computed in adegenet R package and tested at 5% significance via 999 permutations of individuals selected randomly within and between groups. P-values were corrected for multiple comparisons using the false discovery rate (FDR) method [117] in the function p.adjust of the stats R package [109].

Hierarchical F-statistics

R. ecuadoriensis molecular variation was explored at a four-level (e.g., among collection sites, among ecotopes (domestic or wild) within collection sites, among collection year within collection sites and among individuals within populations) hierarchy of population structure. For each hierarchy, a F-statistic (with 95% C.I.) was calculated, and its significance tested via 999 randomised permutations with the HIERFSTAT R package. For comparison and given not all sites had both ecotopes, two hierarchical analysis were performed, one with the total collection sites (n = 25) and the other with a subset of collection sites (n = 7) with samples collected in both ecotopes (S3 Table).

Domestic-wild SNP association analyses

As a response of R. ecuadoriensis ecotopes fluxes in multiple collection sites across Loja, we screened for SNP RADseq markers under a strong signal of selection (outlier loci). The power for detecting outlier loci of four different approaches, Random Forest (RF) machine learning (ML) classification algorithm (implemented in refs. [118-120]), redundancy analysis (RDA) constraint ordination [121], and OutFlank [122] and fsthet [123] FST-outlier methods, was evaluated using a roughly similar number of domestic (n = 56) and wild (n = 52) R. ecuadoriensis across Loja province sharing a total of 2552 SNPs.

Random forest

The RF algorithm [124] implemented in the randomForest [125] R package was used to build a series of recursive decision trees (S3 Methods), or forest, to classify domestic and wild R. ecuadoriensis based on their shared SNPs (predictors) covarying to a specific ecotope (response variable). Within each RF run, decision trees were trained by random subsampling with replacement 66.6% of triatomine samples (training dataset), for which aleatory selected SNPs were top-ranked classifiers when minimizing the most within-ecotope variation (that is, partitioning triatomine by ecotope). Trained trees predictive power was tested with the remaining 33.3% triatomine samples (‘Out-of-bag’ test dataset) in which ecotope misclassification of samples estimated an OOB-ER for that RF run; SNPs importance classification accuracy was averaged among the total number of trees created in a given RF. Three independent (spatial structure-corrected) RFs with 100,000 trees were run and their convergence on SNPs importance classification accuracy was evaluated by Pearson’s correlation test. Top-ranked SNPs (Fig 3A inset) among the three RFs (that is, importance classification accuracy above 3) were chosen for backwards purging, as implemented in refs. [119,126]. Backwards purging (Fig 3A) iteratively runs RFs starting with the full top-ranked SNPs and discarding the least important ones before the next iteration until only two were left. The subset with the lowest OOB-ER contained SNPs outlying strongly for the ecotope response.

Redundancy analysis

Outlier loci likely under selection were also identified using RDA multivariate constrained ordination [127] implemented in the vegan [128,129] R package. First, a matrix fitted values were obtained using multivariate linear regression between a matrix of genotypes (response) and ecotopes (explanatory) with an additional term controlling for spatial structure (based on the three first axes of an individual principal coordinates of each sample). Then, principal component analysis (PCA) on the fitted values matrix resulted in a constrained axis composed from the variation explained, ‘redundancy’, by our explanatory variable. Overall RDA model and variation explained by the constrained RDA axis were tested for significance via 999 permutations designed for constrained correspondence analysis. Additionally, SNPs (Fig 3B) and samples (Fig 3C) coordinates were scaled and plotted in the ordination space to see their relationship with the constrained axis (RDA 1), ecotope. SNPs z-transformed loadings separated by ±2 and ±3 standard deviations (permissible and conservative thresholds, respectively) from the mean distribution of the total SNPs loadings in our RDA axis were considered under selection (Fig 3B) (for further details on step-by-step RDA see S3 Methods and refs. [121,130,131]).

FST-Heterozygosity outlier method

The FST-Heterozygosity outlier method aims to identify loci with strong allele differences among ecotopes. First, ecotope differentiation for each locus is calculated using Wright’s FST without sample correction. The distribution of these values is expected to have a chi-squared shape. The main goal is inferring a null FST distribution from neutral loci not strongly affected by diversifying selection [122]. Therefore, a best-fit to the chi-squared FST distribution was achieved by trimming the lowest and highest FST values (loci in the tails of the distribution are likely to be under effective diversifying selection) and considering only the values in the centre (neutral loci and loci experiencing spatial uniform balancing selection). Loci with unusual FST values relative to this fitted distribution can be thought of experiencing additional diversifying selection [122,123]. We used two R packages to accomplish this analysis, OutFlank [122] (Fig 3D left) and fsthet [123] (Fig 3D right), and compared the results. The difference between the packages is that fsthet uses smoothed quantiles of the empirical FST-Heterozygosity distribution to identify outlier loci and does not assume a particular distribution or model of evolution as compared to OutFlank. We set OutFlank function with proportion of lower and upper loci trimmed to 0.06 and 0.35, respectively, and the rest of the values to default.

Mapping SNP outlier loci

In order to identify genes that may be responsible for local adaptation in the Chagas disease vector, R. ecuadoriensis, to the domestic environment we mapped the SNPs found in the association analyses to the R. prolixus annotated genome [27]. We used the BWA alignment tool implemented in DeconSeq software v0.4.3 [99] to map SNPs sequences (38 bp) at a minimum alignment threshold of 85. The sequences of the regions (60-300kb) in which our SNPs aligned were BLAST searched and compared to the R. prolixus genome.

Estimating gene flow with distance

Matrices of genetic (FST [116]) and geographic (Km) distances (Fig 4A) between the 25 collection sites (Fig 4B), and between domestic and wild collection sites separately, were obtained with the adegenet and raster [132] R packages, respectively. Mantel tests [133] were performed on those matrices using the ecodist [134] R packages. Genetic and geographic correlation between domestic and wild ecotopes was also viewed separately by fitting a generalised least square (GLS) model with a maximum likelihood population effects correction (MLPE) [135] implemented in the corMLPE (https://github.com/nspope/corMLPE/) R package and assuming a linear relationship between two distance matrices based on genetic and geographic distance measures, Y and X, respectively. Centring the X in about its mean, , removes the correlation between the estimates of α and β [135]. H, determines the ecotope and the τij term adds the MLPE random effect correlation structure.

Estimating gene flow with resistance

Genetic distances

Given genomic differentiation between domestic and wild ecotopes was low, we combined all samples within a collection site and used collection site as the unit in our landscape genetic analysis. Collection site units are logistically and budgetary important when carrying out triatomine surveys and insecticide spraying. Heirachical, phylogenetic and DAPC analyses also suggested R. ecuadoriensis samples were closely clustered by collection sites. To estimate ancestry of invididuals at each collection site and support our clustering criteria, an ADMIXTURE [136] analysis was performed using a 5-step expectation-maximization algorithm and 10-fold cross-validation with 200 boostrap resampling iterations to estimate the standard errors for K = 2–30 (S1 and S2 Figs). Using a landscape genomics mixed modelling framework (Fig 1), we aimed to disentangle the effects of landscape heterogeneity on R. ecuadoriensis population structure and gene flow (S4 Methods). A Hedrick’s GST [66], which corrects for sampling limited populations [137], distance matrix among the 25 collection sites (Fig 5B) was obtained in the GenoDive v3.04 [138] software (S5 Table). In addition, we ran a Pearson’s correlation test between the Hedrick’s GST matrix, and Meriman’s standardised FST [68] and FST [67] matrices, calculated in the same software, to evaluate the consistency of genomic differentiation pattern among collection sites with different genetic distance measures.

GIS data collection and preparation

Elevation, land cover and road network (hereafter, surfaces—S5, S6 and S7 Figs) landscape variables were chosen over temperature and precipitation to test R. ecuadoriensis dispersal and gene flow and to avoid multicollineary and overffting in our landscape mixed models. For the continuous surface (elevation surface–S5 Fig), only monomolecular transformations (e.g., S4 Fig) with any possible shape and maximum parameters were used to explore the relationship between gene flow and altitude. Our categorical surfaces, land cover and road network, were reclassified as follows. Those highly fragmented land cover categories (e.g., cultivated and managed areas) were reclassified to the least resistance of gene flow, whereas regular flooded areas and water bodies were reclassified to the highest resistance values (S6 Fig and S7 and S8 Tables). High transitable roads (e.g., highways and tertiary roads) were assigned to the least resistant values, whereas absence of roads were assigned to the highest resistance values (S9 Table). Original GIS surfaces were obtained from multiple sources (S10 Table) and transformed to have the same format (raster), resolution (250 m2 grid), extent (~ 97 Km2) and coordinate reference system (Universal Transverse Mercator (UTM)). Spearman’s rank correlation coefficient (rho) tests were run (S6 Table) and plotted (S3 Fig) on each pair of surfaces to ensure variables were uncorrelated (rho < 0.29 based on Cohen [139]). All three surfaces original values were transformed to the same scale (i.e., a minimum value of 1 and a maximum of 100) to meet our initial hypothesis.

ResistanceGA principle

The genetic algorithm [140] implemented in the R package, ResistanceGA [69], was used for multiple and sinlge-surface optimization of resistance values to gene flow in the above surfaces (S4 Methods). The method works by correlating genomic (response) and effective (predictor) distances (derived from a random-walk commute time algorithm [141]–S4 Methods) matrices through a maximum likelihood population effects [135] model and, on each iteration, evaluates the best resistance parameters based on a ML objective function, log-likelihood in our case. Simulating the process of evolution on each iteration, the best model and parameters are selected and passed over the next generation with some random change on parameter values to explore the parameter space widely.

Multiple surface optimisation

We performed three replicate runs to optimise all possible combinations of our surfaces (hereafter, composite surfaces), including surfaces individually (hereafter, single surfaces) to generate models with optimised resistance values. The major GA algorithm options were set to default, except for the ‘pop.mult’ which was set to 20 to increase the number of parameters to evaluate on each surface every iteration. All optimisation processes were run in parallel with 10–20 cores in a Debian cluster (http://userweb.eng.gla.ac.uk/umer.ijaz/#orion) at the University of Glasgow. Running times varied from days to weeks depending on surface size and number combined at a time.

Model selection

Composite and single surface models, including an intercept- only (null model) and a geographic distance (resistance grid cells are set to 1 to model isolation-by-distance) model were evaluated (Table 1) and the best model was selected based on the lowest AICc, AICc weight and Delta AICc. To confirm the robustness of the optimisation surfaces and controlling for potential bias due to uneven distribution of sample locations in the landscape, we carried out bootstrap resampling (10,000 iterations) in 85% of our sample locations and then fit the subset to each of the effective distance matrices from the optimised surfaces. After the bootstrapping analysis, the average AICc among all iterations and the percentage a model was top over all iterations was used as a criterion to rank the best model (Table 2).

Landscape connectivity model

We used the best optimised single (elevation surface) resistance surface models to estimate landscape connectivity through a circuit theory algorithm [71,72] (Fig 5C) implemented in the software CIRCUITSCAPE v5 [142]. Here, our resistance surfaces were converted into electric networks in which each grid cell represented a node connected to their neighbours by resistors of different weight. Resistor weights were calculated from the average resistance values (i.e., optimised resistance values) of the two grid cells being connected. The algorithm applies a simulated electric current between all pairs of focal nodes (collection sites) in the network to estimate effective distances between them (S4 Methods). A current density map (Fig 5C) was obtained from those resistance distance estimations representing a random walk probability of movement through our study area.

Data decontamination in Rhodnius ecuadoriensis sequenced reads.

(PDF) Click here for additional data file.

Optimisation of genotyping strategy.

(PDF) Click here for additional data file.

Genomic scans in domestic and wild Rhodnius ecuadoriensis.

(PDF) Click here for additional data file.

Landscape genomics mixed modelling framework on arthropod vectors.

(PDF) Click here for additional data file.

Admixture bar plot of triatomine ancestries in Loja assuming K = 20 ancestral populations.

(PDF) Click here for additional data file.

Admixture analysis cross-validation error plot.

(PDF) Click here for additional data file.

Scatterplot matrix showing the relation between raster surfaces.

(PDF) Click here for additional data file.

Comparison between original and optimised relief resistance surfaces.

(PDF) Click here for additional data file.

Relief of Loja, Ecuador.

(PDF) Click here for additional data file.

Map of the land cover types present in Loja, Ecuador.

(PDF) Click here for additional data file.

Map of the road network of Loja, Ecuador.

(PDF) Click here for additional data file.

Discriminant analysis of principal components (DAPC) scatter plots of all possible PCs axes combinations comparing ecotope vs collection site in 89 samples using 2,552 SNP markers.

(PDF) Click here for additional data file.

Rhodnius ecuadoriensis and Rhodnius prolixus (out group) samples proccessed in this study.

(PDF) Click here for additional data file.

Rhodnius ecuadoriensis population genetic summary statistics.

(PDF) Click here for additional data file.

Summarised results of hierarchical differentiation of molecular variance components in 2552 SNP loci on the complete (n = 272 samples) and a small (n = 89 samples) datasets.

(PDF) Click here for additional data file.

GLS-MLPE model results.

(PDF) Click here for additional data file.

Matirx of pairwise GST values for the 25 collection sites in Loja.

(PDF) Click here for additional data file.

Spearman correlation test, rho, between raster surfaces previous to optimization with ResistanceGA.

(PDF) Click here for additional data file.

Global Land Cover 2000 project Legend.

(PDF) Click here for additional data file.

Land cover reclassified values.

(PDF) Click here for additional data file.

Roads reclassified values.

(PDF) Click here for additional data file.

Original GIS data, description and collection source.

(PDF) Click here for additional data file. 24 Sep 2021 Dear Dr Hernandez Castro, Thank you very much for submitting your Research Article entitled 'The genomic basis of anthropogenic adaptation and geographic dispersal in Chagas disease vectors' to PLOS Genetics. The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review a much-revised version. We cannot, of course, promise publication at that time. Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. [LINK] We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions. Yours sincerely, Giorgio Sirugo Associate Editor PLOS Genetics Scott Williams Section Editor: Natural Variation PLOS Genetics Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: In the manuscript, the authors used a landscape genomics approach to assay gene flow, local adaptation and dispersal of wild and domestic Rhodnius ecuadoriensis species, at 25 sites in the Loja Valley in Southern Ecuador. Previously, the same authors identified R. ecuadoriensis, which is one of the main vectors of Chagas disease, in sylvatic highlands of the same region (Grijalva et al., 2009) and showed that ecological factors such as nest type, height above sea level and distance to houses are important for the widespread distribution of sylvatic R. ecuadoriensis populations in southern Ecuador (Grijalva et al., 2012). The authors also hypothesized that the arrival of sylvatic populations to dwellings may be an important re-infestation source, which could hamper vector control efforts. With the availability of Next-generation sequencing technologies, the authors have been taking the next steps in their analysis to study the genetic diversity in those sampled populations. They were able to recover 2,552 SNP markers across 272 R. ecuadoriensis samples collected across ecological gradients in Loja and found high levels of gene flow between multiple domestic and wild R. ecuadoriensis populations. They then screened for SNP RADseq markers under a strong signal of selection (outlier loci) using 4 different approaches: the Random Forest (RF) algorithm, redundancy analysis (RDA) constraint ordination and OutFlank and fsthet FST-outlier methods. 43 SNPs were identified with the highest classification accuracy in RF analysis. Four of those SNPs showed a strong signal of adaptation by RDA analysis. One SNP (ID 15732) flagged in all four analyses. In the absence of an available R. ecuadoriensis genome, the resulting SNPs found in the association analyses were then Blast searched against the R. prolixus annotated genome. Three SNPs were mapped to genes involved with embryogenesis and saliva production. Unfortunately, Locus ID 15732 did not give any suitable match. This study identified some of the genomic signatures of domestication of a triatomine species which could be useful to provide a better understanding of key mechanisms of domestication and potential targets for novel vector control approaches (e.g. genetic technologies). A lot of effort has gone into the data analysis and the landscape genomic model which showed that R. ecuadoriensis dispersal is fundamentally restricted by landscape elevation and identified highly connected and isolated triatomine populations could be very useful to predict reinfestation of houses by wild R. ecuadoriensis following vector control interventions. I recommend publication of this study in PLoS Genetics if some of the points below are addressed since I believe it will be widely appreciated by the Chagas disease research community as well as population geneticists and modelling community to adapt this thoughtful analysis to their specific research needs. Here are some general comments: 1) There are already numerous publications that have looked at the importance of environmental conditions such as altitude, climate, vegetation types and land uses for the Triatome distribution. In my opinion this information is important and should be presented in the study. For example, Ramsey et al (2000) showed that differences in altitude and mean annual precipitation were important in the habitat partitioning of 8 Triatoma species and R. prolixus in the state of Oaxaca, Mexico. Another study looked at reinfestation processes in Argentina and already highlights the importance of altitude and suggests that Chagas disease control program should consider potential sources of triatomines up to 1,500 m around the dwelling to reduce adult invasion (Cecere et al., 2007). Further, Bustamante et al. (2007) published a study on environmental determinants of the distribution of Chagas disease vectors in south-eastern Guatemala. I would therefore like to encourage the authors to expand in the introduction or discussion on what is already known about factors that affect Triatome dispersal and to highlight their unique and valuable map that shows connectivity between sites. Further, it would be useful for the reader to know why those parameters (Road, Land cover and elevation) were chosen for the model over others such as average minimum temperature, rainfall/relative humidity? 2) The authors mentioned that the current vector control strategies are insufficient and merely reactive. However, they did not provide more details in the introduction or discussion on what the current control measures are, nor how/ whether they are applied in the sampled regions. Could this be another parameter to consider? 3) Did the collection happen throughout the year or only during a certain season? Does the abundance/composition of wild vs. domestic R. ecuadoriensis species vary across the year? How would the authors expect this to affect the model? Some specific suggestions: Line 78-81 The sentence would benefit from being shortened. That section is kept quite general. Therefore, I think it would be important to state that the definition of landscape functional connectivity can vary between different diseases vectors. Line 107 delete “explored” Line 127 As mentioned in my general comments. Is the level of vector control the same across all sites? Are there other vector control interventions used in the area? Line 130 It would be nice to expand a bit on other factors that might be linked to the landscape elevation parameter. What other factors are there? Does temperature seasonality affect the species?, Are there biannual variations in triatomine abundance at various landscape elevations? Line 131 The authors suggest frequent and spatially targeted intervention. I agree, but it might be good to explain in a paragraph how the current vector control looks like. Line 133 Is this the specific recommendation for Loja or is this recommended for similar sites that show high gene flow and fragmented populations? Figure 1. Unfortunately, the resolution was too low to read the samples ID in Figure 1b. Potentially Figure 1 needs to be split into at least 2 Figures. It is not intuitive to look through the different parts of the Figure from right to left (a on the right, c on the left). I would prefer it the other way around. Line 372 Please clarify the first part of this sentence. Reviewer #2: This paper reports a substantial body of work that adopts a landscape genomics approach to investigate the dispersal dynamics of Rhodnius ecuadoriensis in southern Ecuador. Studying vector dispersal dynamics and adaptation is certainly a relevant and topical research area. I really appreciated the effort of the authors in putting this study together, which includes a substantial number of samples from various collection sites, an appropriate bio-informatic pipeline to recover SNP genotypes and relatively challenging statistical analyses. I like the fact that authors included collection sites with both wild and domestic ecotypes, which reflects a correct study design for the aims outlined in this study. I also think that the results presented here, in particular those stemming from the landscape genomic analyses, are certainly worth publishing in a journal as PLoS Genetics. My main concern is that the authors claim to have identified genomic regions linked to adaptation to the domestic setting. I think that this may be overstated because - in my opinion - the authors merely found some outlier loci between domestic and wild vector populations, without further evidence that these loci are truly adaptive. Based on the title alone, I truly expected to read substantial novel insights on the genes, gene arrays or genomic regions that are involved in the anthropogenic adaptation of this vector species. Unfortunately, there is merely a description of some outlier loci and the discussion barely touches on this topic. There is in fact only one SNP (LOCUS ID 15732) that was identified as an outlier SNP by all statistical analyses, but the location of this SNP in the genome remains unknown. This leaves the reader with nothing more than simply a bunch of SNPs identified by various programs without any clear discussion of their role or evidence that these are truly adaptive. I think that this problem could be simply solved if the authors are willing to downplay their title and some statements throughout the text (see below), and sell the landscape genomic results as the main outcome of their study. My second main question is whether the authors have considered doing ADMIXTURE/STRUCTURE analyses or similar analyses that would allow to investigate population structure and potential hybrid/uncertain ancestry of these samples? I think this may be important to consider, as for the landscape genetic analyses the authors estimate genetic distance between collection sites, but without considering the potential that some of these collection sites may constitute a single population or a mix of populations. Some clarification is needed here. My third main question is how the wild and domestic ecotypes were initially defined? In the methods section it is merely stated that a widespread spatial sampling of ecotypes was done. But how are these ecotypes defined? When is a bug classified as domestic and when as wild? This information is important given the aim of the study. Other concerns: Page 7 - line 147. Not sure where to find S2 Table. Page 8 - lines 165-170. It is difficult to evaluate this because the map in Fig 1 doesn't show all collection sites. Also, if geography is the main factor impacting topology, why not colour accordingly instead of ecotype? Page 8 - line 172. You report low Fst between ecotypes. For comparative purposes it would be useful to also have Fst between collection sites. Page 10. Here the authors repeatedly use the word adaptation / adaptive loci / signal of adaptation or alike, while there is no evidence at all that these loci are adaptive. Better to stick to outlier loci throughout, unless you have proof that these loci are linked to genes + you know the role of these genes + you have evidence that these genes play a role in domestication (e.g. through knock-out etc.)? Page 11 - line 230. Did the authors try a BLAST approach of a consensus read covering locus ID 15732? Page 12 - line 254-264. This paragraph is somewhat confusing. First the authors state that only one SNP from the RDA analysis is mapped to the Kruppel gene. Then later in the paragraph the authors state that there are three mapped loci under balancing selection from the fsthet analysis. This is the first time the authors describe balancing selection or the fact that they found loci under balancing selection. In the previous paragraphs they are always referring to outlier loci (so called adaptive loci). More-over, one or a few of these three SNPs under balancing selection also map to the Kruppel gene. Does this mean that there is one SNP from RDA analyses (and under so called adaptive selection) and one from fsthet analyses (and under so called balancing selection) in the same gene? This seems odd. Page 14 - line 300-301. I'm not sure that this statement is correct. Or at least it is not clear from Figure 4b. Sure, there is one obvious cluster (NJ, RT, HY, SJ and EX) and there is one obvious outlier (NT), but the other clusters seem less obvious. For instance, BM seems to be lowly differentiated from the cluster including CG, BR etc. Page 15 - line 340-341. Even though bootstrapping clearly demonstrates that geographic distance may be a strong predictor, the author still claim that elevation is the strongest predictor based on results without bootstrapping. This seems like the authors are trying to deceive themselves or the readers. Based on these bootstrapping results, both distance and elevation are strong predictors, which also would make sense as vectors do not travel far. Page 24 - line 529. For completion, please clarify -m, -M and -n to avoid forcing the reader to find this out themselves online. Figure 2b. There are loci on the right associated with wild ecotype. By the same logic, loci on the left are associated with domestic ecotypes? But the authors implicate both left and right loci are involved in adaptation to domestic ecotypes? Also, what is on the x-axis - how to interpret these values? Also here the authors refer to adaptive loci - better to stick with outlier loci. Figure 3b. Map of collection sites should come much earlier in the manuscript. There is a problem with the ordering of Figure and Table numbers. Also, often there is referral to the figures in S1 Methods, but many of these figures in S1 methods are results. It would make more sense to split these. The paper would benefit from some proof-reading. E.g. line 57 says populationS pairs; line 61 says landAcape genomic; line 72 says vector-BONE diseases, line 96 Alternately, etc. (I'm not going to list all of them here) Reviewer #3: This was a very comprehensive study using a range of methods to understand complex population structures among sylvatic and domestic vectors of Chagas disease. While I really enjoyed the paper, I found it difficult to extract the key findings from the results section and found some discrepancies between the aims and conclusions of the study. Specific comments below… 1. The abstract indicates the study focused on identifying adaptations that allow wild populations to enter the domestic environment however this is inaccurate. The focus is much more on understanding patterns of gene flow between wild and domestic sites 2. First paragraph makes some bold statements related to the research conducted here (modest changes drive epidemics) but there is little context provided that is relevant to this work and the links with second half of the paragraph (landscape heterogeneity) are weak. Would prefer to either have the authors expand on the first point with sufficient references, then introduce the second point later and in association with specific challenges for triatomine. As it reads the first paragraph is too broad 3. There isn’t sufficient background to understand triatomine population structures, and what is in volved in “domestication”. Do they enter the home seasonally? Would it be appropriate to elaborate here on what type of adaptions are required between the two environments? What does domestication involve in terms of changes in feeding/mating/oviposition/climate tolerance, etc? 4. Frame the research question in the context of what is known/unknown about the species. Why not highlight historical difficulties in controlling the populations with indoor insecticide? Previous studies have shown R. ecuadoriensis gene flow between sylvatic and domestic habitats and reinfestation after spraying, etc. 5. Line 102, I’m unclear on the relevance of pigs as an example here. First the authors need to demonstrate clearly that sylvatic and domestic populations are genetically distinct (if that’s the case). 6. Can the authors clarify when they describing domestication as a long-term evolutionary event or as an occasional occurrence where wild populations infest the home, these two points occasionally became confused throughout the manuscript 7. The introduction would benefit from a clear problem statement earlier on about why these questions are relevant to Chagas control and triatomine populations. It appears in lines 121/122 but the background leading up to this lacks focus. Furthermore the aim as laid out (from line 124) doesn’t reflect the methods and results that follow. Results 8. There are a number of instances where the primary result that is reported within a paragraph refers the reader to supplemental results. Since these should all be supplementary to the primary findings, I would suggest giving an overview of the data (numbers collected, etc) without sending the reader to the supplement, then presenting the primary findings with reference to figures, before presenting the supplemental results and referring the reader to consult the additional document. 9. Line 161 – is it correct to call these two populations sympatric? Are they within each other’s dispersal range? Some studies have hypothesized that the sylvatic and domestic populations are isolated. 10. In terms of “Genetic loci correlated with domestic colonisation”, colonisation refers to population movement, in this case movement from a wild to domestic setting. I wonder if it’s possible to determine that the loci are association with migration/colonisation or are there other possibilities? 11. In general, results may be clearer if hypotheses were stated above 12. Table 1 and 2 take up a lot of space but don’t convey much clear information on what parameters influence genetic differentiation. Is AIC that important here? 13. Please clarify whether is domestic/sylvatic site locations are included as a variable alongside the landscape variables? What about the variable of time (since collections were over fourteen years)? Were individuals from the same households and nests sequenced, which may influence the sample size needed to address these questions. It seems there is a lot of variability that is not accounted for in the model. Without sufficient information in the methods about the collection locations, I can’t tell if the study is appropriately powered to determine landscape drivers of dispersal. Discussion 14. The summary at the beginning of the discussion is very helpful to distil key findings, I would suggest the other sections (abstract, statement of aims and research questions) mirror this. 15. In the second paragraph bringing up the evolutionary processes that led to colonisation with examples taken from bloodfeeding in Culicidae, however this is tangential and the section would be stronger if it focused on triatomines 16. The mapping of outlier loci to genes related to saliva production and embryonic development is mentioned in abstract (indicating it is a key finding) and discussion, but I don’t see it mentioned as a result. If this is a key finding it should appear in the results section first. 17. The study wasn’t designed to determine which genes enable domestication (requires QTL, etc) but there are a few instances where the authors are trying to stretch their rationale to identify genes identified are required for adaptation. Greater care should be taken to communicate the aims and the limits of the study. 18. There are attempts here to generalise findings to regional triatomine control and support for IVM which is warranted. Unclear how the conclusion to target secondary vector species arose from the study findings Methods 19. Info on broader study design is needed, how were houses/sites selected (either for the primary collection or for secondary analyses). How were “wild” triatomines targeted? What was the minimum/maximum distance between domestic and wild collection sites within a locale? 20. Section on GIS data includes a mix of methods, background and rationale, and possibly some results? (“absence of roads was a strong barrier”). Would prefer greater clarity in the methods, highlighting only what the authors have done with the data, and ensure background and limitations to the assumptions are covered elsewhere. Similarly the ResistanceGA provides background to published methods and could probably be trimmed down. 21. In general I found the frequent cross referencing to the supplementary material a distraction, so please ensure the most salient information is available in the text 22. Supplement figure 2, I found this pictorial overview of the methods really useful, is there space to include in the main paper? Minor edits Abstract 31 – use synonym for “man-made” (the built environment, domestic, human habitation, etc) 32 – In the second part of the sentence “their dispersal” would refer to adaptations rather than vectors. Consider rephrasing for clarity Summary 57 – change to “population pairs” 59 – misspelling of hindered 60 – change to “triatomines” Introduction Line 97 Is domestication the correct term to use for zoonotic parasites? 132 – disease should be lower case 134 – “genomic basis” Results Line 316 “selecting” Line 329 “than” Fig 1. Panel 1A is in the lower right, and would be clearer if ABC were positioned left to right. Fig 1b is illegible Fig 3b the site names and colours are difficult to discern. Similarly for 4a Discussion Line 415 “significance of such a change” Line 432 remove extra comma before “are” Line 436 – again, references to mosquitoes is a distraction Line 437, additional context needed about this study on parasite isolates Methods Line 498 should read (443/502) Line 520 remove “reads” Line 524 “yielded” ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: None Reviewer #3: No: authors state they will be available upon publication ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: Yes: Lisa Reimer 7 Dec 2021 Submitted filename: Response_to_Reviewers.docx Click here for additional data file. 16 Dec 2021 Dear Dr Hernandez Castro, Thank you very much for submitting your Research Article entitled 'Population genomics and geographic dispersal in Chagas disease vectors: landscape drivers and evidence of possible adaptation to the domestic setting.' to PLOS Genetics. The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some concerns that we ask you address in a revised manuscript. The concerns raised by Reviewer # 3 require particular attention, as we agree that points made by the Reviewer on the first round of review were not adequately addressed. We therefore ask you to modify the manuscript according to the review recommendations. Your revisions should address the specific points made by each reviewer. In addition we ask that you: 1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. 2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images. We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. [LINK] Please let us know if you have any questions while making these revisions. Yours sincerely, Giorgio Sirugo Associate Editor PLOS Genetics Scott Williams Section Editor: Natural Variation PLOS Genetics Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: In general the authors have sufficiently addressed the points I raised. There are just a few minor points. The explanation in the authors response regarding the rather broad spatial context of existing studies that investigated triatomine distribution was fine, but unfortunately, the description in the manuscript is less clear (what are "broad" and "detailed" vector population dynamics). In the method section, in line 528, the authors explained that the sampling was done "mainly" during the summer. As mentioned by the authors the number of triatomes can vary throughout the seasons and the definition of "mainly" can be anything between 51-99%. I would therefore suggest to be more specific. Finally, there are several unnecessary typos that were made in the updated sections and the paper would benefit from a repeated proof-reading. For example: presence in stead of present (line 139), showed instead of shown (line 139), summe instead of summer (line 528). Reviewer #3: The introduction has been strengthened with more relevant background. I still find it a distraction to begin the introduction with so many mosquito references, however I will leave this one to the authors. The additional information on sample collection is clear. The inclusion of Fig 5 is good, however it would be more informative if it appeared before the results. IS it possible to first reference the figure when the study design is presented? The authors responded to many of my comments by saying paragraphs have been rewritten - for example to clarify the hypothesis in the end of the introduction, or to improve the description of GIS methods. However I found very few meaningful changes that were related to my criticisms. Previous comments highlighted the need to improve the figures, and this is still the case. Many of them are still illegible. The reviewers raised some limitations of the study and the authors have responded by softening some of the language around adaptation. This is the bare minimum and I think the paper still needs to clearly acknowledge the limitations of the study (not just related to genome coverage). The newly written sections should be carefully proofread, I noticed quite a few examples of spelling and grammatical errors ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #3: No: not yet share in a public repository ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #3: Yes: Lisa Reimer 5 Jan 2022 Submitted filename: Response_to_Reviewers.docx Click here for additional data file. 6 Jan 2022 Dear Dr Hernandez Castro, We are pleased to inform you that your manuscript entitled "Population genomics and geographic dispersal in Chagas disease vectors: landscape drivers and evidence of possible adaptation to the domestic setting." has been editorially accepted for publication in PLOS Genetics. Congratulations! Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made. Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date. Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics! Yours sincerely, Giorgio Sirugo Associate Editor PLOS Genetics Scott Williams Section Editor: Natural Variation PLOS Genetics www.plosgenetics.org Twitter: @PLOSGenetics ---------------------------------------------------- Comments from the reviewers (if applicable): ---------------------------------------------------- Data Deposition If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website. The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-21-00937R2 More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support. Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present. ---------------------------------------------------- Press Queries If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org. 31 Jan 2022 PGENETICS-D-21-00937R2 Population genomics and geographic dispersal in Chagas disease vectors: landscape drivers and evidence of possible adaptation to the domestic setting. Dear Dr Hernandez-Castro, We are pleased to inform you that your manuscript entitled "Population genomics and geographic dispersal in Chagas disease vectors: landscape drivers and evidence of possible adaptation to the domestic setting." has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work! With kind regards, Livia Horvath PLOS Genetics On behalf of: The PLOS Genetics Team Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom plosgenetics@plos.org | +44 (0) 1223-442823 plosgenetics.org | Twitter: @PLOSGenetics
  118 in total

1.  Reliable Detection of Loci Responsible for Local Adaptation: Inference of a Null Model through Trimming the Distribution of F(ST).

Authors:  Michael C Whitlock; Katie E Lotterhos
Journal:  Am Nat       Date:  2015-09-15       Impact factor: 3.926

2.  Using circuit theory to model connectivity in ecology, evolution, and conservation.

Authors:  Brad H McRae; Brett G Dickson; Timothy H Keitt; Viral B Shah
Journal:  Ecology       Date:  2008-10       Impact factor: 5.499

3.  Constraints on the FST-Heterozygosity Outlier Approach.

Authors:  Sarah P Flanagan; Adam G Jones
Journal:  J Hered       Date:  2017-07-01       Impact factor: 2.645

4.  Divergent host preferences of above- and below-ground Culex pipiens mosquitoes and their hybrid offspring.

Authors:  M L Fritz; E D Walker; J R Miller; D W Severson; I Dworkin
Journal:  Med Vet Entomol       Date:  2015-01-20       Impact factor: 2.739

5.  Active dispersal of natural populations of Triatoma infestans (Hemiptera: Reduviidae) in rural northwestern Argentina.

Authors:  Gonzalo M Vazquez-Prokopec; Leonardo A Ceballos; Uriel Kitron; Ricardo E Gürtler
Journal:  J Med Entomol       Date:  2004-07       Impact factor: 2.278

6.  LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms.

Authors:  Daniel Money; Kyle Gardner; Zoë Migicovsky; Heidi Schwaninger; Gan-Yuan Zhong; Sean Myles
Journal:  G3 (Bethesda)       Date:  2015-09-15       Impact factor: 3.154

7.  Potential Distribution of Chagas Disease Vectors (Hemiptera, Reduviidae, Triatominae) in Colombia, Based on Ecological Niche Modeling.

Authors:  Gabriel Parra-Henao; Laura C Suárez-Escudero; Sebastián González-Caro
Journal:  J Trop Med       Date:  2016-12-28

8.  A chromosomal-level genome assembly for the insect vector for Chagas disease, Triatoma rubrofasciata.

Authors:  Qin Liu; Yunhai Guo; Yi Zhang; Wei Hu; Yuanyuan Li; Dan Zhu; Zhengbin Zhou; Jiatong Wu; Nansheng Chen; Xiao-Nong Zhou
Journal:  Gigascience       Date:  2019-08-01       Impact factor: 6.524

9.  Linkage Disequilibrium Estimation in Low Coverage High-Throughput Sequencing Data.

Authors:  Timothy P Bilton; John C McEwan; Shannon M Clarke; Rudiger Brauning; Tracey C van Stijn; Suzanne J Rowe; Ken G Dodds
Journal:  Genetics       Date:  2018-03-27       Impact factor: 4.562

10.  Integrated vector control of Chagas disease in Guatemala: a case of social innovation in health.

Authors:  Diana Castro-Arroyave; Maria Carlota Monroy; Maria Isabel Irurita
Journal:  Infect Dis Poverty       Date:  2020-04-14       Impact factor: 4.520

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.