Literature DB >> 31391198

In Silico Karyotyping of Chromosomally Polymorphic Malaria Mosquitoes in the Anopheles gambiae Complex.

R Rebecca Love1, Seth N Redmond2, Marco Pombi3, Beniamino Caputo3, Vincenzo Petrarca3, Alessandra Della Torre3, Nora J Besansky4.   

Abstract

Chromosomal inversion polymorphisms play an important role in adaptation to environmental heterogeneities. For mosquito species in the Anopheles gambiae complex that are significant vectors of human malaria, paracentric inversion polymorphisms are abundant and are associated with ecologically and epidemiologically important phenotypes. Improved understanding of these traits relies on determining mosquito karyotype, which currently depends upon laborious cytogenetic methods whose application is limited both by the requirement for specialized expertise and for properly preserved adult females at specific gonotrophic stages. To overcome this limitation, we developed sets of tag single nucleotide polymorphisms (SNPs) inside inversions whose biallelic genotype is strongly correlated with inversion genotype. We leveraged 1,347 fully sequenced An. gambiae and Anopheles coluzzii genomes in the Ag1000G database of natural variation. Beginning with principal components analysis (PCA) of population samples, applied to windows of the genome containing individual chromosomal rearrangements, we classified samples into three inversion genotypes, distinguishing homozygous inverted and homozygous uninverted groups by inclusion of the small subset of specimens in Ag1000G that are associated with cytogenetic metadata. We then assessed the correlation between candidate tag SNP genotypes and PCA-based inversion genotypes in our training sets, selecting those candidates with >80% agreement. Our initial tests both in held-back validation samples from Ag1000G and in data independent of Ag1000G suggest that when used for in silico inversion genotyping of sequenced mosquitoes, these tags perform better than traditional cytogenetics, even for specimens where only a small subset of the tag SNPs can be successfully ascertained.
Copyright © 2019 Love et al.

Entities:  

Keywords:  Anopheles gambiae; chromosomal inversion polymorphism; genomics; inversion genotyping; karyotype analysis; malaria vector; tag SNP

Mesh:

Year:  2019        PMID: 31391198      PMCID: PMC6778791          DOI: 10.1534/g3.119.400445

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.154


A chromosomal inversion originates when a chromosome segment reverses end to end. Inversions maintained in plant and animal populations as structural polymorphisms tend to be large (several megabases) and contain hundreds of genes (reviewed in Wellenreuther and Bernatchez 2018). Long-term balancing selection can maintain these polymorphisms through millions of generations and multiple species radiations (Wellenreuther and Bernatchez 2018). Because recombination is greatly reduced between opposite orientations in inversion heterozygotes, inversions preserve selectively advantageous combinations of alleles despite homogenizing gene flow in collinear regions. Theory and mounting evidence implicate inversions in local adaptation, adaptive divergence, and range expansion, though the precise molecular mechanisms are rarely known (Hoffmann ; Kirkpatrick and Barton 2006; Hoffmann and Rieseberg 2008; Schaeffer 2008; Kirkpatrick 2010; Lowry and Willis 2010; Joron ; Jones ; Kirkpatrick and Barrett 2015; Twyford and Friedman 2015; Kapun ; Ayala ; Fuller ; Wellenreuther ; Wellenreuther and Bernatchez 2018). Importantly, because of occasional double-crossovers and gene conversion, the suppression of gene flux is not absolute. As long as inversion heterozygotes are formed in populations, any significant association between an inversion and an allele within its boundaries is subject to eventual erosion unless gene flux is countered by selection (Navarro ; Andolfatto ). The Anopheles gambiae complex is a medically important group of at least eight closely related and morphologically indistinguishable mosquito sibling species from sub-Saharan Africa (White ; Coetzee ). Three members of the complex (the eponymous Anopheles gambiae, Anopheles coluzzii, and Anopheles arabiensis) are among the most significant malaria vectors globally, responsible for a majority of the 435,000 malaria deaths in 2017 (World Health Organization 2018). The ecological plasticity of these three species contributes greatly to their status as major human malaria vectors (Coluzzi ). In contrast to the other five, these three species have wide distributions across diverse biomes of tropical Africa. Not coincidentally, they also segregate strikingly high numbers of paracentric inversion polymorphisms, which are implicated in adaptation to seasonal and spatial environmental heterogeneities related both to climatic variables and anthropogenic alterations of the landscape (Coluzzi ; Bryan ; Coluzzi ; Toure ; Manoukis ; Costantini ; Simard ; Cheng ; Ayala ; Caputo ; Ayala ; Cheng ). Some of these inversions also have been associated with ecologically relevant phenotypes, including desiccation and thermal tolerance (Gray ; Rocca ; Cassone ; Fouet ; Ayala ; Cheng ). The sister taxa An. gambiae and An. coluzzii, the focus of the present investigation, are the most closely related species in the An. gambiae complex, sharing extensive nucleotide variation through both recent common ancestry and introgression (Fontaine ; Hanemaaijer ), while maintaining characteristic differences in ecology and behavior (Costantini ; Diabaté ; Simard ; Gimonneau ; Gimonneau ; Gimonneau ; Dabire ; Tene Fossog ; Ayala ). They also share four of six common chromosomal inversion polymorphisms on chromosomal arm 2R (b, c, d, u) and the only inversion polymorphism on chromosomal arm 2L (a) (Figure 1) (della Torre ). These inversions range in size from ∼4Mb to 22Mb, and together span thousands of genes and a sizeable fraction of chromosome 2: ∼61% of 2R and ∼38% of 2L polytene (euchromatic) content (Coluzzi ). Inversions 2La and 2Rb are found in populations throughout tropical Africa and are therefore cosmopolitan, while three other inversions on 2R (c, d, and u) are widespread in West, very rare in Central Africa, and absent from East Africa. The remaining two inversions, 2Rj and 2Rk, have more restricted geographic distributions (Coluzzi ; Ayala ).
Figure 1

Diagrammatic representation of the common polymorphic inversions (labeled brackets) on chromosome 2 in An. gambiae. Polytene chromosome map modified from Figure 1 and poster in Coluzzi . CT, centromere.

Diagrammatic representation of the common polymorphic inversions (labeled brackets) on chromosome 2 in An. gambiae. Polytene chromosome map modified from Figure 1 and poster in Coluzzi . CT, centromere. Early cytogenetic studies of An. gambiae and An. coluzzii, presumed at the time to be a single heterogenous species, uncovered genetic discontinuities that led to the designation of five presumed assortatively-mating ‘chromosomal forms’: FOREST, SAVANNA, MOPTI, BAMAKO, and BISSAU (Coluzzi ; Toure ; Coluzzi ; della Torre ). They were delineated based on stable non-random associations of different sets of chromosome 2R inversions in co-occurring populations, and differed in larval ecology. Subsequent DNA-based studies identified fixed differences in the ribosomal DNA (rDNA), located in the pericentromeric region of the X chromosome, leading to the definition of two assortatively mating M and S ‘molecular forms’ of An. gambiae (della Torre ). The molecular forms, which were eventually given specific status as An. coluzzii (formerly M) and An. gambiae sensu stricto (formerly S) (Coetzee ), are incongruent with the chromosomal forms. Nearly all inversion associations segregate in both species albeit at different frequencies, and likely play similar roles in ecological specialization and adaptation in both An. gambiae s.s. (hereafter, An. gambiae) and An. coluzzii (della Torre ; Costantini ; Simard ; Ayala ). Hence, inversion associations are indicative of environmental heterogeneities more so than intrinsic reproductive boundaries. Beyond a role in ecological specialization, inversions in the An. gambiae complex are also associated with vector traits affecting malaria transmission intensity and control: biting and resting behavior (Coluzzi ; Riehle ), seasonality (Rishikesh ), morphometric variation (Petrarca ), and Plasmodium infection rates (Petrarca and Beier 1992; Riehle ). Although a robust molecular assay is available for genotyping inversion 2La in natural populations (White ), 2R inversions with characterized breakpoint sequences (j, b, c, and u) (Coulibaly ; Sangare 2007; Lobo ) proved difficult to genotype molecularly at the breakpoints (Coulibaly ; Lobo ), owing to extensive tracts of flanking repetitive DNA. The 2Rk breakpoints have yet to be characterized, but recent localization of the 2Rd breakpoints in the reference genome assembly using proximity-ligation sequencing (Corbett-Detig ) also revealed high repeat content, suggesting that repetitive DNA at inversion breakpoints will pose a significant challenge both for breakpoint characterization and for molecular genotyping assays targeting breakpoint regions in these species. Failure to account for the presence of inversions is a barrier to a more comprehensive understanding of epidemiologically relevant mosquito behavior and physiology. Inversion-blind analysis of population data can mislead population genetic inference, and create spurious associations in genome-wide association studies (Seich Al Basatena ; Houle and Marquez 2015). Powerful genomic resources exist for An. gambiae, including a high-quality reference genome assembly (Holt ) and a database of genomic variation (Ag1000G) based on deep genome re-sequencing of thousands of mosquitoes from natural populations across Africa (The ). Unfortunately, inversion genotypes are not automatically revealed by genome re-sequencing, as reads are mapped to their position in the reference genome assembly, not their position in the re-sequenced mosquito genome. Despite advancing genome technology, the only method currently available to determine the An. gambiae karyotype is a method perfected half a century ago (Coluzzi 1968) involving cytological analysis of ovarian nurse cell polytene chromosomes (Coluzzi ; Pombi ). At best, such cytological analysis is severely rate-limiting because it is laborious and requires highly specialized training. At worst, it is prohibitive because it requires proper preservation of chromosomes harvested only from ovaries of adult females at a specific gonotrophic stage; suitable polytenization is absent at other gonotrophic stages as well as in males (della Torre 1997). While salivary glands of late fourth instar larvae also contain chromosomes with an adequate degree of polytenization, and the banding patterns of salivary and ovarian chromosomes are homologous in principle, most bands are difficult to homologize due to a different pattern of chromosome ‘puffing’ (della Torre 1997), rendering this alternative impractical. To overcome these impediments, our goal is to develop broadly accessible computational and molecular methods of genotyping chromosomal inversions in individual specimens of An. gambiae and An. coluzzii. Here, we exploit the Ag1000G database and leverage the subset of cytologically karyotyped specimens within that database to develop a computational approach for karyotyping applicable to whole genome sequence data. We identify multiple tag single nucleotide polymorphisms (SNPs) significantly associated with inversions across geography that collectively predict with high confidence the genotypes of six common polymorphic inversions on chromosome 2 (a, j, b, c, d, u) in individually sequenced genomes of An. coluzzii and An. gambiae. We then apply this approach to data generated independently of Ag1000G to show that our approach has wider utility, even for specimens where only a small subset of the tag SNPs can be successfully ascertained.

Materials and Methods

Mosquito genotype data

Variant call data used for the discovery of inversion tag SNPs were accessed from Ag1000G (The ) and Vector Observatory (VOBS; Table S1), projects of the Malaria Genomic Epidemiology Network (MalariaGEN; https://www.malariagen.net/) that provide catalogs of genomic sequence variation based on individual wild-collected An. gambiae and An. coluzzii mosquitoes sampled from multiple African countries and the Mayotte archipelago. With the exception of four atypical samples (see next section), we verified species identifications as reported in Ag1000G and VOBS using principal component analysis (PCA) of biallelic SNPs on the X chromosome. We excluded any specimens with more than 50,000 missing genotypes on chromosomal arm 2R (N = 9), and any specimens subjected to whole genome amplification (WGA) prior to genomic sequencing (N = 44), as PCA revealed strong biases associated with WGA. After filtering, we retained variant call data from 1,347 mosquitoes (Table S2).

Karyotype imputation by local PCA

Cytological karyotype information derived from phase contrast microscopy of ovarian polytene chromosomes (della Torre 1997) was available only for a relatively small subset of specimens (N = 373) in Ag1000G/VOBS (hereafter, Ag1000G for brevity). Thus, as a first step toward discovering SNPs putatively predictive of inversion status (tag SNPs), we imputed karyotypes computationally at each of six focal inversions (Figure 1), using local PCA (where ‘local’ refers to windows of the genome corresponding to chromosomal rearrangements). Ma and Amos (2012) showed that applying PCA to SNP genotypes in a window of the genome containing an inversion polymorphic in population genomic data (an approach that we call ‘PCA karyotyping’) produces a pattern of three equidistant clusters (stripes) in a plot of the first two principal components, assuming adequate numbers of each of three possible inversion genotypes: inverted and uninverted (standard) homokaryotypes, and heterokaryotypes. The two flanking stripes represent alternative homokaryotypes, and the middle stripe represents the inversion heterokaryotype, a 1:1 “admixture” between the two homokaryotype classes (Ma and Amos 2012). To apply this approach, we combined specimens from both species (An. gambiae and An. coluzzii) and different geographic localities into a single metapopulation sample of 1,347 mosquitoes (Tables S1, S2). We identified a set of biallelic SNPs within inversion boundaries (Table S3) with potentially informative levels of polymorphism [minor allele count ≥3 and minimum alternate allele frequency (MAF) ≥0.15 for all inversions except 2Rd, for which the MAF threshold was reduced to 0.03]. As 2Rd overlaps 2Ru in the genome (Figure 1), we limited consideration to only those SNPs found outside (proximal to) 2Ru for PCA karyotyping of 2Rd (Table S3). Next, we converted mosquito genotypes at these SNPs to a count of the number of alternate alleles (‘0’ if both matched the reference allele, ‘1’ or ‘2’ if one or both matched the alternate allele, respectively). Using the scikit-allel Python package v1.1.9 (Miles and Harding 2017), we then applied PCA to the resulting matrix of alternate allele counts, and represented the output as a scatter plot of the first two principal components for each mosquito in the population sample. The correct genotype corresponding to the two homokaryotype stripes was determined based on the inclusion in a given stripe of mosquitoes with cytologically determined karyotype. Based on this classification, mosquitoes without cytologically determined karyotypes were assigned a PCA karyotype. The distinction between stripes was not always sharp; the stripes could be diffuse and oblique rather than tightly clustered. In extreme cases, stripes were not initially discernable. Through an iterative process of ‘leave one population sample out’ followed by PCA, we determined that absence of a clear three-stripe pattern was attributable to some or all of the same four atypical source populations, in particular, those from Kenya, Mayotte, The Gambia, and Guinea Bissau. The Kenyan sample has been found to display signs of extreme inbreeding (The ), and Mayotte is an island whose mosquito population is plausibly subject both to inbreeding and a degree of isolation from mainland samples. The Gambia and Guinea Bissau are localities with unusually high degrees of hybridization and introgression between An. gambiae and An. coluzzii (Caputo ; Oliveira ; Caputo ; Marsden ; Weetman ; Nwakanma ). Where necessary, we removed these population samples, as well as two An. gambiae-An. coluzzii hybrid specimens from Burkina Faso and Guinea Conakry, and repeated the PCA. In addition, successful PCA karyotyping of 2Rd and 2Rj required the removal of all An. coluzzii specimens owing to taxonomic structuring of variation. Accordingly, PCA karyotyping was successful on all (2La) or subsets (all 2R inversions) of the 1,347 specimens (Table S4).

Discovery of SNPs predictive of inversion orientation

The PEST reference genome assembly for An. gambiae (AgamP4; Giraldo-Calderón ) was derived from a colony whose karyotype was homozygous standard with respect to all common chromosomal inversions in this species. We therefore had the general expectation that an individual SNP might be a good predictor of chromosomal inversion orientation if the reference allele is strongly associated with the standard arrangement and the alternate allele is strongly associated with the inverted arrangement within and across population samples. As shown in Figure 2 in overview, we assessed SNP genotype-inversion genotype concordance for each inversion in individual mosquitoes, limiting our assessment to potentially more informative, higher frequency biallelic SNPs inside inversion boundaries (i.e., those with MAF ≥ 5%). We converted both the SNP genotype and the corresponding mosquito’s PCA-based inversion genotype to single numbers, representing the count of alternate alleles (0, 1, or 2) in the case of SNP genotype, and the count of inverted chromosomes (0, 1, or 2) in the case of inversion genotype. Successfully performing tags are expected to have a SNP genotype that correlates strongly with the PCA-based inversion genotype.
Figure 2

Assessment of correspondence between SNP and inversion genotype in each mosquito. For each chromosomal rearrangement and mosquito, biallelic SNP genotypes inside rearrangement boundaries were converted to a number representing the count of alternative alleles (relative to the AgamP4 reference). We applied PCA to the resulting matrix to assign each individual mosquito an inversion genotype. The expectation is that the PCA-based genotype, expressed as the number of inverted chromosomes at the focal rearrangement, should match the number of alternative alleles at SNPs predictive of inversion status (tag SNPs).

Assessment of correspondence between SNP and inversion genotype in each mosquito. For each chromosomal rearrangement and mosquito, biallelic SNP genotypes inside rearrangement boundaries were converted to a number representing the count of alternative alleles (relative to the AgamP4 reference). We applied PCA to the resulting matrix to assign each individual mosquito an inversion genotype. The expectation is that the PCA-based genotype, expressed as the number of inverted chromosomes at the focal rearrangement, should match the number of alternative alleles at SNPs predictive of inversion status (tag SNPs). More formally, we sought to identify candidate tag SNPs using the procedure illustrated in Figure 3 (applied separately for each inversion). Specimens assigned a PCA-based karyotype for a focal inversion were divided into a training sample used for tag SNP discovery (75%) and a validation sample that was held in reserve until a later time (25%), using the model_selection module of the scikit-learn Python package (v0.19.2) (Pedregosa ). We ensured that both partitions were balanced with regard to inversion genotypes but randomized in all other respects. For robust identification of candidate tag SNPs within the training sample, we masked all SNP genotypes inside the inversion boundaries with a genotype quality (GQ) below 20.
Figure 3

Overview of experimental design. For each inversion, the appropriate Ag1000G sample of mosquitoes that had been successfully karyotyped by PCA was partitioned into a training set (75%) and a validation set (25%). Ten bootstrap replicates of the training set were created from a random sample of 75% of the full training set. For each bootstrap replicate and each mosquito, higher frequency biallelic SNPs within inversion breakpoints were interrogated for genotypic concordance with the PCA-based genotype. Results were summarized across the ten replicates to create a set of candidate tag SNPs with concordance rates exceeding 80%. Candidate tags were used to genotype the held-out validation set, and the computational karyotype score computed across tags was compared to the PCA-based karyotype. Candidate tags were also used to interrogate mosquitoes sequenced independently of Ag1000G, and the computational karyotype score was compared to the associated cytogenetically determined karyotype.

Overview of experimental design. For each inversion, the appropriate Ag1000G sample of mosquitoes that had been successfully karyotyped by PCA was partitioned into a training set (75%) and a validation set (25%). Ten bootstrap replicates of the training set were created from a random sample of 75% of the full training set. For each bootstrap replicate and each mosquito, higher frequency biallelic SNPs within inversion breakpoints were interrogated for genotypic concordance with the PCA-based genotype. Results were summarized across the ten replicates to create a set of candidate tag SNPs with concordance rates exceeding 80%. Candidate tags were used to genotype the held-out validation set, and the computational karyotype score computed across tags was compared to the PCA-based karyotype. Candidate tags were also used to interrogate mosquitoes sequenced independently of Ag1000G, and the computational karyotype score was compared to the associated cytogenetically determined karyotype. Next, we created ten bootstrap replicates of the training sample (Figure 3). Each of the ten replicates consisted of sub-samples of 75% of the full training sample, chosen at random with respect to all variables except inversion genotype balance. For each bootstrap replicate at each interrogated SNP (biallelic, MAF ≥ 5%), we calculated the SNP genotype-inversion genotype concordance for each mosquito in the sample, as described above (Figure 2). Genotypic concordance at each SNP interrogated in a given bootstrap replicate was expressed as the percentage of mosquitoes for which the number of alternate SNP alleles matched the number of inverted chromosomes. Because an imbalance among inversion genotypes could lead to false-positive tag SNPs, we calculated concordance separately for the three inversion genotypes in each of the ten bootstrap replicates. We then averaged the concordance scores across the ten replicates, by inversion genotype. To generate a single, conservative tag SNP concordance statistic, we used the minimum of the three mean values. Note that because the mosquito composition differed among bootstrap replicates, some SNPs were not evaluated in all ten, if they did not pass our filters in one or more iterations. Finally, to eliminate SNP positions with high levels of missing genotypes, we also calculated for each inversion genotype in each bootstrap replicate the percentage of mosquitoes with SNP genotype calls at the candidate tag (the ‘call rate’), and averaged across the ten replicates. The procedure just described returned from 99 to 349 candidate tag SNPs for five inversions, but only two for 2Rc (Table 1). We therefore adopted a modified approach to control for suspected population structure. One possible source of structure was the haplotype configuration of 2Rc with respect to the flanking inversions (2Rb and 2Ru) (Figure 1). The inverted orientation of 2Rc is in almost perfect linkage disequilibrium with the inverted orientation of either 2Rb (as haplotype ‘2Rbc’) or 2Ru (as haplotype ‘2Rcu’). In a ∼50-year cytogenetic database compiled from samples collected in many parts of sub-Saharan Africa (described in Pombi ), only four specimens were ever recorded as carrying the inverted orientation of 2Rc unaccompanied by either 2Rb or 2Ru (V. Petrarca, unpublished data). A second source, not mutually exclusive, was population structure between An. coluzzii, An. gambiae, and the BAMAKO chromosomal form that is subsumed taxonomically within An. gambiae but is at least partially reproductively isolated and genetically differentiated (Manoukis ; Love ). Although 2Rc occurs in all three taxa, there is a strong karyotype imbalance among them in natural populations and in Ag1000G. For example, of 70 An. coluzzii with 2Rc in Ag1000G, at least 49 (70%) carried the 2Rbc haplotype (haplotypes of the other specimens could not be inferred unambiguously). Similarly, of 64 non-BAMAKO An. gambiae with 2Rc, 62 (97%) carried the 2Rbc haplotype. On the other hand, all 45 BAMAKO, by definition, carried 2Rcu. We initially partitioned our sample by species, but the inclusion of BAMAKO in the An. gambiae partition resulted in very few candidate tags concordant with inversion genotype (N = 17). Ultimately, we retained two data partitions (An. coluzzii and non-BAMAKO An. gambiae), eliminating a third BAMAKO partition due to the fixation of 2Rc in this taxon (Coluzzi ). From the non-BAMAKO An. gambiae partition (hereafter, An. gambiae for brevity), we omitted two of only three specimens carrying 2Ru (AZ0267-C from Mali and AV0043 C from Guinea), guided by PCA. As described above, both data partitions were split into training (75%) and validation (25%) sets, and ten bootstrap replicates of each training set were analyzed.
Table 1

Candidate tag SNPs predictive of inversion genotype in Ag1000G data

InversionConcordance ThresholdNo. Tags
2La>0.995209
2Rj-gambiae>0.899
2Rb>0.8349
2Ru>0.8177
2Rd-gambiae>0.8147
2Rc>0.82
2Rc-coluzzii>0.857
2Rc-gambiae>0.849
Ultimately, the candidate tag SNPs chosen (Table 1) met the following three criteria: they were (i) analyzed in at least eight of the ten bootstrap replicates; (ii) called at a rate greater than 90% within each karyotype class; and (iii) concordant with karyotype more than 80% of the time within each karyotype class (99.5% for 2La). Their approximate physical position relative to the span of each inversion is illustrated in Figure S1.

Validation of candidate tag SNPs in Ag1000G

We interrogated the candidate tag SNPs in the validation samples from Ag1000G that had been held aside during the discovery phase (Figure 3). For each mosquito in the validation set, we masked genotypes inside the focal inversion with GQ scores less than 20. Next, among the retained SNPs, we identified those corresponding to candidate tags and converted their diploid genotypes to a count of the number of alternate alleles. Finally, the number of alternate alleles at each tag SNP was summed across tags and averaged to provide an overall computational karyotype score. We compared this mean score to the PCA-based karyotype.

Testing tag SNPs in data independent of the Ag1000G pipeline

We also explored the efficacy of our tag SNPs for computational karyotyping in wild-caught mosquitoes subject to whole genome sequencing and variant calling by individual investigators, for which corresponding cytological karyotypes had been determined through phase microscopy (Figure 3). We used specimens originating from southern Mali, 8 An. gambiae BAMAKO chromosomal form (Fontaine ; Love ) and 17 An. coluzzii (Main ), whose variant calls and cytogenetic metadata are publicly accessible (Tables S5, S6). These data include specimens sequenced to much lower coverage than the standard adhered to by Ag1000G. We followed the same procedure described for the Ag1000G validation set to computationally karyotype these specimens, and compared their computational and cytologically determined karyotypes.

Genetic distance trees to assess inversion history

We compared patterns of relatedness near the breakpoints of all six inversions using unrooted neighbor-joining (NJ) trees. For each inversion, we used biallelic SNPs with a MAF of 0.01 found within 5 kb upstream and downstream of the distal and proximal breakpoints (15 kb for 2Rd). Total numbers of SNPs for each inversion were: 2La, 596; 2Rj, 909; 2Rb, 428; 2Rc, 2141; 2Rd, 955; 2Ru, 1110. Using the python package anhima, we converted the number of alternate alleles at these SNPs into a Euclidean distance matrix, and then constructed neighbor-joining trees using all 1,347 specimens. To assess support for the nodes of the 2Rc tree, we used the transfer bootstrap estimate (TBE; Lemoine ), a statistic that measures the number of taxa that must be transferred to make a given branch of a reference tree match the closest equivalent branch in a bootstrap tree. To calculate this statistic, we imported the matrix of alternate allele counts into R (v. 3.5.1, “Feather Spray”; R Core Team 2018) and used the dist() function of base R to construct the Euclidean distance matrix. We then used the nj() function in the ape package (v. 5.2) to construct the neighbor joining tree, and the boot.phylo() function to generate 1,000 bootstrap trees. We used these trees as input to booster (Lemoine ), which calculates the TBE for each node.

Code and data availability

All genomic sequence data and variant call files used in this study are located in open data repositories as specified in Tables S1 and S2. Phases 1 and 2 of Ag1000G are available from https://www.malariagen.net/data; variant calls and metadata for Ag1000G phase 3 and Vector Observatory data are available from https://figshare.com/projects/Data_for_In_silico_karyotyping_of_chromosomally_polymorphic_malaria_mosquitoes_in_the_Anopheles_gambiae_complex_/65522 under the Ag1000G terms of use: https://www.malariagen.net/data/terms-use/ag1000g-terms-use. The An. gambiae AgamP4 reference assembly is available through VectorBase (https://www.vectorbase.org). All custom code necessary to reproduce this analysis can be found at https://github.com/rrlove/comp_karyo_notebooks and https://github.com/rrlove/ingenos. Software packages used, and their versions, are summarized in Table S7. The complete set of tag SNPs, together with a custom script for computational karyotyping, which calculates the mean inversion genotype across the relevant tag SNPs, can be found at https://github.com/rrlove/compkaryo. Supplemental material available at Figshare: https://doi.org/10.25387/g3.9159479.

Results

After filtering, we retained the genotype data from 1,347 individually sequenced An. coluzzii and An. gambiae mosquitoes from the Ag1000G repository of natural genomic sequence variation, representing population samples from 13 West, Central, and East African countries and the island of Mayotte (Tables S1, S2).

Patterns of genetic variation at inversion breakpoints

To gain insight into the relative roles of inversion history, taxonomic status, and geographic location in structuring genetic variation for each inversion, we reconstructed neighbor-joining trees based on SNPs in the immediate vicinity of the breakpoints (Figure 4). The resulting dendrograms, color-coded by inversion genotype, taxon and African country, indicate little clustering on the basis of geographic location; outlier population samples are those with a history of inbreeding or hybridization (see Methods). On the other hand, with the notable exception of 2La, taxonomic status is an important factor structuring inversion variation between An. gambiae and An. coluzzii. Moreover, BAMAKO specimens appear to comprise a differentiated outlier clade within the larger An. gambiae cluster. It is interesting to note that for inversion 2Rc, taxonomic status appears to be a more decisive factor than inversion genotype. All three 2Rc inversion genotypes cluster within their respective species (supported by bootstrap at 90%, or 98% if dendrograms are constructed after removing outlier samples from The Gambia, Guinea-Bissau and Kenya; not shown). Further investigation is required to determine whether this pattern results from a monophyletic inversion that subsequently differentiated between taxa, or from independent inversion events in the two taxa.
Figure 4

Neighbor-joining dendrograms reconstructed from 1,347 An. gambiae and An. coluzzii mosquitoes from Ag1000G, using biallelic SNPs within 5 kb of inversion breakpoints (15 kb for 2Rd) having a minimum minor allele frequency of 0.01. Columns represent the same inversion dendrogram, alternately color-coded by inversion genotype as determined from PCA (first row), taxon (second row), or geographic source (third row). Some specimens that could not be karyotyped by PCA for inversions 2Rc, 2Rd, 2Rj, and 2Ru had cytogenetically determined karyotypes, which were used in place of PCA for color-coding the inversion genotype. ‘None’ refers to mosquitoes that were not assigned an inversion genotype either by PCA or cytogenetically; ‘Other’ refers to mosquitoes that were not identified taxonomically. Countries: AGO, Angola; BFA, Burkina Faso; CIV, Cote d’Ivoire; CMR, Cameroon; GAB, Gabon; GHA, Ghana; GIN, Guinea; GMB, The Gambia; GNB, Guinea-Bissau; GNQ, Equatorial Guinea; KEN, Kenya; MLI, Mali; FRA, France (Mayotte Archipelago); UGA, Uganda.

Neighbor-joining dendrograms reconstructed from 1,347 An. gambiae and An. coluzzii mosquitoes from Ag1000G, using biallelic SNPs within 5 kb of inversion breakpoints (15 kb for 2Rd) having a minimum minor allele frequency of 0.01. Columns represent the same inversion dendrogram, alternately color-coded by inversion genotype as determined from PCA (first row), taxon (second row), or geographic source (third row). Some specimens that could not be karyotyped by PCA for inversions 2Rc, 2Rd, 2Rj, and 2Ru had cytogenetically determined karyotypes, which were used in place of PCA for color-coding the inversion genotype. ‘None’ refers to mosquitoes that were not assigned an inversion genotype either by PCA or cytogenetically; ‘Other’ refers to mosquitoes that were not identified taxonomically. Countries: AGO, Angola; BFA, Burkina Faso; CIV, Cote d’Ivoire; CMR, Cameroon; GAB, Gabon; GHA, Ghana; GIN, Guinea; GMB, The Gambia; GNB, Guinea-Bissau; GNQ, Equatorial Guinea; KEN, Kenya; MLI, Mali; FRA, France (Mayotte Archipelago); UGA, Uganda.

Inversion karyotype imputation by PCA

Only 373 of the 1,347 mosquitoes were associated with metadata that included cytologically determined inversion karyotypes. As discovery of candidate tag SNPs requires provisional inversion genotype assignments, we applied local PCA to assign genotypes for individual inversions on chromosome 2, following Ma and Amos (2012). A recognized limitation to this population-level approach, beyond the fact that it cannot be applied to individual mosquitoes, is that its success depends upon the presence of all three inversion genotypes in the sample under study. For this reason, and with the goal of finding the most flexible solution to inversion genotyping across geography and taxa, we began with PCA based on the entire set of 1,347 mosquitoes, under the simplifying assumption that the expected ‘three-stripe’ signal on a PCA plot would not be overwhelmed by geographic or population structure. Only in the case of 2La could genotype assignments be confidently inferred from the combination of all 1,347 specimens. For inversions on 2R, from one to four admixed (An. gambiae-An. coluzzii) or geographic outlier populations (highly inbred or island samples) had to be removed from analysis before a three-genotype pattern could be discerned on the PCA plot (Tables S2, S4; see Methods). Additionally, for 2Rd and 2Rj, An. gambiae-An. coluzzii population structure dominated the PCA. Taken together with the fact that 2Rj has yet to be found in An. coluzzii (Coluzzi ; della Torre ), we removed all 341 An. coluzzii specimens (Tables S2, S4) prior to PCA karyotyping of 2Rd and 2Rj in An. gambiae. Ultimately, PCA karyotypes were imputed for 780-1,347 mosquitoes, depending upon the inversion (Table S4).

Tag SNP ascertainment and validation in Ag1000G

Dividing the Ag1000G samples into training (75%) and validation (25%) sets for each inversion, and working within the training sets using a bootstrapping procedure, we screened for candidate tag SNPs in the five 2R inversions and 2La (see Methods for details). Candidate tag SNPs were those whose genotypes were concordant with the corresponding PCA genotypes, when averaged across ten bootstrap replicates, for more than 80% of the specimens that were scored (99.5% for 2La). The number of candidate tags ranged from 99 (2Rj) to 349 (2Rb) excluding 2Rc, in which only two candidates were found due to population structure between An. gambiae and An. coluzzii (Figure 4; Table 1). Partitioning the 2Rc sample by taxon (and omitting BAMAKO; see Methods) resulted in 49 and 57 tags for An. gambiae and An. coluzzii, respectively (Table 1). Notably, there was no overlap between the two sets of tags. To assess the performance of these candidate tags, we used them to predict karyotypes in the held-out validation sets of Ag1000G specimens. For each inversion and specimen, we calculated a computational karyotype score representing the average genotype inferred across all candidate tag SNPs ascertained (see Methods). Histograms of resulting computational karyotype scores generally showed tight clustering around the three theoretical genotypic optima (0, 1, 2), reflecting close agreement among specimens (Figure 5). For each specimen in a validation set, we then compared the computational karyotype score to its PCA karyotype, and tallied the number of disagreements (Table 2). All except one specimen had matching PCA and computational karyotype scores. This exception, one of 254 (0.4%) assignments for 2Rc in An. gambiae, involved a specimen carrying 2Ru (AZ0267-C) already noted as an outlier (see Methods).
Figure 5

Histograms of computational karyotyping scores calculated by interrogating tag SNPs in An. gambiae and An. coluzzii mosquitoes from the Ag1000G validation sets. Mean scores cluster around 0, 1, and 2. Note differences in Y-axis scale.

Table 2

Mismatches between PCA and computational karyotypes in the Ag1000G validation sets

Matching karyotypesMismatched karyotypes
InversionTotal specimensNo. tags scoredNo. specimens% tags supporting scoreNo. Specimens% tags supporting score
2La337168-20333793.6-1000
2Rj-gambiae19594-9919583.8-1000
2Rb325304-34932577.5-97.70
2Rc-coluzzii8055-578078.9-1000
2Rc-gambiae19645-4919559.6-100167.3
2Rd-gambiae201128-14720155.2a-95.90
2Ru286124-17728676.6-1000

Next highest value is 70.1%.

Histograms of computational karyotyping scores calculated by interrogating tag SNPs in An. gambiae and An. coluzzii mosquitoes from the Ag1000G validation sets. Mean scores cluster around 0, 1, and 2. Note differences in Y-axis scale. Next highest value is 70.1%.

Performance of tag SNPs in resequencing data independent of Ag1000G

Previous studies re-sequenced An. gambiae or An. coluzzii mosquitoes from Mali whose karyotypes had been determined from the polytene chromosome banding pattern (Main ; Love ). Although sample size is limited, these data allow a direct comparison of cytogenetic and in silico karyotyping under less ideal conditions—lower sequencing depth, with variant calls made independently of the Ag1000G pipeline. For each specimen and inversion, we calculated computational karyotype scores (averaged across all tag SNPs that could be ascertained in a specimen) (Tables S5, S6). Histograms of these scores by inversion, similar to those based on Ag1000G validation sets, reveal clustering of scores around the three genotypic optima provided that taxon-specific tags (2Rc-coluzzii and 2Rc-gambiae) are applied to the conspecific taxon, and heterospecific applications (including use of 2Rc-gambiae tags to genotype BAMAKO) are avoided (Figure 6, Figure S2).
Figure 6

Histograms of computational karyotyping scores calculated by interrogating tag SNPs in An. gambiae and An. coluzzii mosquitoes re-sequenced independently of the Ag1000G pipeline, often at lower sequencing depth. Scores cluster near 0, 1, and 2 with little dispersion except when taxon-specific tag SNPs are applied to a different taxon (indicated by an asterisk).

Histograms of computational karyotyping scores calculated by interrogating tag SNPs in An. gambiae and An. coluzzii mosquitoes re-sequenced independently of the Ag1000G pipeline, often at lower sequencing depth. Scores cluster near 0, 1, and 2 with little dispersion except when taxon-specific tag SNPs are applied to a different taxon (indicated by an asterisk). In the BAMAKO sample of Love where mean sequencing depth ranged from 9-10x, there was concordance in karyotype assignments between cytogenetic and computational methods for five inversions including 2La, even though only 10-12 2La tags were ascertained (Tables S5, S6). However, as expected for BAMAKO, the An. gambiae 2Rc tags failed. Due to the extreme geospatial restriction of BAMAKO, this specific problem is limited in scope. In the An. coluzzii sample of Main , mean sequencing coverage varied widely (4-66x; Table S5). The impact of very low sequencing coverage on the success of computational karyotyping is illustrated by specimens 04SEL021 and 04SEL02 (4x and 5x, respectively). For 04SEL021, there is no apparent disagreement between the cytogenetic and mean computational genotype scores for any of the six inversions. Nevertheless, for those inversions classified as heterozygotes both cytogenetically and computationally (2La, 2Rb, 2Rc), the proportion of tags whose genotype matches the mean computational score drops drastically to ∼30% (Table S5), likely because true heterozygous sites are often scored as homozygous either for the reference or alternate allele (0 or 2) due to low sequencing coverage. (Indeed, using chromosome 3L, we confirmed the expected drop in the rate of heterozygosity with decreasing coverage in these 17 specimens; data not shown). Low coverage alone is less likely to bias computational scores toward zero or two. For 04SEL02 (5x coverage), where cytogenetic vs. computational discrepancies occur at 2Rb and 2Ru, the computational karyotype is supported respectively by 81% of 208 tags and >94% of 57 tags, favoring the computational genotype by weight of evidence. The remaining six specimens with discordant inversion genotypes were sequenced to at least 10x coverage. In these cases, when the computational genotype score signaled ‘1’ in contradiction to a homokaryotypic cytogenetic genotype (02SEL85, 02SEL006, 02SEL009, 01Osel134), the proportion of tags supporting the computational genotype ranged from 65 to >92%. For other types of genotypic disagreements between methods, the computational score was supported by >80% of tags scored. Overall, these results suggest that computational karyotyping using tag SNPs can be successful in data derived independently of Ag1000G (Tables S5, S6), though care should be taken when this approach is applied to very low coverage samples.

Performance of tag SNPs against cytogenetically karyotyped Ag1000G specimens

We compared the cytogenetic karyotype assignments for 373 specimens in Ag1000G to their corresponding computational karyotype assignments (Table 3). Conflicts were few overall, and for every inversion, all but one conflict (involving specimen AZ0267-C, the exceptional An. gambiae carrier of the 2Ru inversion) could be attributed to errors in the cytogenetically assigned scores, as genotypes imputed from both PCA and tag SNPs contradict the cytogenetic assignment. Visual reference back to the PCA plots clearly confirmed that for specimens whose cytogenetic and tag SNP assignments differed and for whom PCA karyotypes could be determined, their locations on the plot strongly agreed with the tag SNP genotype (Figure S3). Considering that we ascertained tens or hundreds of tags per specimen, and that the proportion of tags whose SNP genotype matched the computational score was greater than 83% in all except the unusual specimen AZ0267-C (Table 3), the computational scores more confidently predict the true inversion genotype than traditional cytogenetics for these five inversions. The most dramatic example is with respect to inversion 2Ru, where we noted an unusually large number of erroneous cytogenetic genotypes of ‘1’ (N = 18/29) conflicting with both PCA and computational assignments of ‘0’. It is not immediately clear what could lead to such an elevated rate of cytogenetic error (which otherwise is ∼4%), but it is possible that the 2Ru heterozygous loop was mistaken either for a loop created by a rare inversion (sensu Pombi ) in the same chromosomal region, or for a 2Rd loop in samples from regions where 2Ru is rare (as supported by the fact all 11 cytogenetic errors in An. gambiae were found in samples from the same small region in Cameroon, six of which were scored computationally as ‘1’ for 2Rd).
Table 3

Discrepancies between cytogenetic and computational karyotypes in Ag1000G mosquitoes analyzed

Specimens with discrepancies
Inversion TagsPartitionCYTSpecimens (N)Mismatch CYT-TAG (%)Match TAG-PCA (%)No. tag SNPs scored (% matching TAG)
2La01175 (4.3)5 (100)200-203 (99.5-100)
1685 (7.4)5 (100)193-203 (100)
21602 (1.3)2 (100)201-203 (99.5-100)
2Rj-gambiaegambiae02360 (0)
140 (0)
2450 (0)
2Rb01272 (1.6)2 (100)348 (85.3-87.6)
11244 (3.2)4 (100)346-349 (86.8-93.7)
21216 (5.0)6 (100)331-348 (88.8-91.6)
2Rc-gambiaegambiaea01847 (3.8)7 (100)48-49 (83.7-98.0)
1323 (9.4)2 (66.7)47-49 (42.9b-91.5)
2242 (8.3)2 (100)48-49 (90.0-91.8)
2Rc-coluzzicoluzzii0131 (7.7)1 (100)56 (87.5)
1250 (0)
2160 (0)
2Rd-gambiaegambiae02349 (3.8)9 (100)143-147 (84.9-96.6)
1284 (14.3)4 (100)146-147 (88.4-93.9)
2223 (13.6)3 (100)146-147 (89.1-91.8)
2Rucol+gam02631 (0.38)1 (100)176 (85.2)
12918 (62.1)18 (100)170-177 (88.7-99.4)
2471 (2.1)1 (100)176 (97.2)

CYT, cytogenetic genotype; TAG, computational genotype; PCA, genotype inferred by PCA.

An. gambiae excluding BAMAKO.

This value corresponds to one of three non-BAMAKO An. gambiae carriers of the 2Ru inversion, AZ0267-C. The next highest value is 85.7.

CYT, cytogenetic genotype; TAG, computational genotype; PCA, genotype inferred by PCA. An. gambiae excluding BAMAKO. This value corresponds to one of three non-BAMAKO An. gambiae carriers of the 2Ru inversion, AZ0267-C. The next highest value is 85.7. Our results also highlight the pitfalls of using taxon-specific tags to genotype other taxa, or populations with high levels of admixture between taxa (Table S8; Figure S4 summarizes the correct use of taxon-specific tags). As expected, we find elevated numbers of cytogenetic-computational disagreements when (i) 2Rc-gambiae tags are applied to BAMAKO (60% of the 45 specimens), (ii) 2Rd-gambiae tags are used to genotype An. coluzzii, and (iii) 2Rd-gambiae tags are applied to admixed An. gambiae-An. coluzzii populations such as those from Guinea Bissau. These disagreements involve specimens carrying inverted arrangements according to cytogenetic analysis which are not tagged as inverted computationally, due to the lack of correlation between tags and the inverted orientation in the heterospecific genetic background.

Discussion

In recent years, numerous approaches have been developed for the discovery and detection of chromosomal inversions aside from traditional cytogenetics, including optical mapping, Hi-C, read cloud sequencing, and identification of discordantly mapping sequencing reads. Our method is not intended to substitute for these largely discovery-based approaches. Here, we are concerned with specific inversions that segregate at relatively high frequencies in natural populations, and have a long history of cytogenetic study. Our approach benefits from the fact that the chromosomal coordinates of their breakpoints already are known precisely or to a good approximation. The goal was to overcome the substantial limitations of cytogenetic analysis by identifying SNPs inside these rearrangements whose allelic state is predictive of inversion status, allowing for rapid in silico or molecular karyotyping of hundreds or thousands of specimens. Analysis of the Ag1000G database allowed us to develop the first standardized computational karyotyping of the six main polymorphic chromosomal inversions in the major malaria vectors An. coluzzii and An. gambiae, despite the fact that only a small subset of specimens in the database had cytogenetic karyotype assignments (Figure 7). Direct comparison of computational karyotype scores with the cytogenetic assignment for the same specimen in Ag1000G suggests that computational karyotyping outperforms traditional cytogenetics in terms of accuracy, given that assignments are based on tens or hundreds of individual tags. Preliminary testing on specimens sequenced and computationally processed by individual laboratories outside of Ag1000G standards suggests that our tag SNPs have the potential to perform well, even on specimens sequenced to much lower depth. Our approach not only has a lower error rate compared to classical cytogenetics, but also is more widely applicable (regardless of mosquito gender, physiological status, or method of preservation), more widely accessible to those without specialized expertise, higher throughput, and therefore, ultimately cheaper to implement at scale. This method can now be used to predict inversion genotypes in previously sequenced data sets for which ecological and behavioral data may already be available. Even more important, easy large-scale adoption of this approach allows for new and properly powered association studies to be performed on ecologically and epidemiologically relevant mosquito phenotypes, studies that that would have been prohibitively ambitious when relying on cytogenetic karyotyping. In addition, this method can now facilitate sequencing experiments for which inversion karyotype is relevant at scale. Expanding the possibilities further, molecular assays based on these results that will allow inversion genotyping without whole genome sequencing are under development.
Figure 7

Map of the study area with the frequency of An. gambiae and An. coluzzii inversion genotypes inferred for up to six polymorphic chromosome 2 inversions summarized by country (and the island of Mayotte; see Table S2). Blue connecting lines point to An. coluzzii samples, while white connecting lines point to An. gambiae and hybrid/outlier populations. Image: Visible Earth, NASA. Produced with cartopy v.0.17.1.

Map of the study area with the frequency of An. gambiae and An. coluzzii inversion genotypes inferred for up to six polymorphic chromosome 2 inversions summarized by country (and the island of Mayotte; see Table S2). Blue connecting lines point to An. coluzzii samples, while white connecting lines point to An. gambiae and hybrid/outlier populations. Image: Visible Earth, NASA. Produced with cartopy v.0.17.1. However, some important limitations exist. Computational karyotyping is strictly dependent upon tag SNPs that are strongly correlated with inversion status, a contingency that depends upon representative sampling. Although Ag1000G is populated by samples derived from multiple countries in West, Central and East Africa, An. coluzzii is underrepresented, as is southern Africa (The ). Even more importantly, with the exception of the cosmopolitan inversions 2La and 2Rb, the inverted orientation of other rearrangements (2Rj, 2Rc, 2Rd, and 2Ru) is underrepresented in the Ag1000G data that was available at the time of our analysis. It is clear that population structure is an especially important factor in the application and further development of tags for 2Rc and 2Rd. The current taxon-specific tags should not be used to genotype heterospecific specimens (including BAMAKO) or samples from areas where high rates of interspecific hybridization are known. The presence of strong population structure means that correlations between tags and the inverted orientation characteristic of the target taxon cannot be assumed in a different taxon. The absence of correlation should downwardly bias the computational score, resulting in false negatives when genotyping true inverted homozygotes and heterozygotes. Finally, our inversion breakpoint dendrograms raise the possibility that at least one cytologically-recognized inversion, 2Rc, may have arisen repeatedly at the molecular level, a result that requires further investigation beyond the scope of this study. With the exception of 2Rc, 2Rd, and 2Rj, for which we developed taxon-specific tags, our approach implicitly assumed that inversions shared by An. gambiae and An. coluzzii are monophyletic, and may yield unexpected results if this assumption is violated. Accordingly, these tools should be applied with caution, and there is ample room for improvement as more data become available. Fortunately, our standardized approach makes it easy to accommodate improvements. The success of our method thus far suggests that the general approach may be suitable for studying inversions more broadly, in additional malaria vectors as well as other systems where inversions are implicated in local adaptation. Nearly twenty years ago, Coluzzi and colleagues predicted that the then-newly-assembled An. gambiae reference genome would facilitate our analyses of polymorphic chromosomal inversions in the An. gambiae complex (Coluzzi ). Our work continues the realization of that prediction by providing, for the first time, cross-continent diagnostics for multiple inversions. These computational diagnostics, and the molecular diagnostics that they leverage, take us one step closer to understanding the contribution of chromosomal inversions to the deadly facility of An. gambiae and An. coluzzii for vectoring malaria.
  67 in total

1.  Gene flow-dependent genomic divergence between Anopheles gambiae M and S forms.

Authors:  David Weetman; Craig S Wilding; Keith Steen; João Pinto; Martin J Donnelly
Journal:  Mol Biol Evol       Date:  2011-08-11       Impact factor: 16.240

2.  Asymmetric introgression between the M and S forms of the malaria vector, Anopheles gambiae, maintains divergence despite extensive hybridization.

Authors:  Clare D Marsden; Yoosook Lee; Catelyn C Nieman; Michelle R Sanford; Joao Dinis; Cesario Martins; Amabelia Rodrigues; Anthony J Cornel; Gregory C Lanzaro
Journal:  Mol Ecol       Date:  2011-11-08       Impact factor: 6.185

3.  Local adaptation along an environmental cline in a species with an inversion polymorphism.

Authors:  M Wellenreuther; H Rosenquist; P Jaksons; K W Larson
Journal:  J Evol Biol       Date:  2017-05-08       Impact factor: 2.411

4.  High levels of hybridization between molecular forms of Anopheles gambiae from Guinea Bissau.

Authors:  E Oliveira; P Salgueiro; K Palsson; J L Vicente; A P Arez; T G Jaenson; A Caccone; J Pinto
Journal:  J Med Entomol       Date:  2008-11       Impact factor: 2.278

5.  Breakpoint structure of the Anopheles gambiae 2Rb chromosomal inversion.

Authors:  Neil F Lobo; Djibril M Sangaré; Allison A Regier; Kyanne R Reidenbach; David A Bretz; Maria V Sharakhova; Scott J Emrich; Sekou F Traore; Carlo Costantini; Nora J Besansky; Frank H Collins
Journal:  Malar J       Date:  2010-10-25       Impact factor: 2.979

6.  Intraspecific chromosomal polymorphism in the Anopheles gambiae complex as a factor affecting malaria transmission in the Kisumu area of Kenya.

Authors:  V Petrarca; J C Beier
Journal:  Am J Trop Med Hyg       Date:  1992-02       Impact factor: 2.345

7.  Complex genome evolution in Anopheles coluzzii associated with increased insecticide usage in Mali.

Authors:  Bradley J Main; Yoosook Lee; Travis C Collier; Laura C Norris; Katherine Brisco; Abdrahamane Fofana; Anthony J Cornel; Gregory C Lanzaro
Journal:  Mol Ecol       Date:  2015-10-06       Impact factor: 6.185

8.  The Anopheles gambiae 2La chromosome inversion is associated with susceptibility to Plasmodium falciparum in Africa.

Authors:  Michelle M Riehle; Tullu Bukhari; Awa Gneme; Wamdaogo M Guelbeogo; Boubacar Coulibaly; Abdrahamane Fofana; Adrien Pain; Emmanuel Bischoff; Francois Renaud; Abdoul H Beavogui; Sekou F Traore; N'Fale Sagnon; Kenneth D Vernick
Journal:  Elife       Date:  2017-06-23       Impact factor: 8.140

9.  Spatial swarm segregation and reproductive isolation between the molecular forms of Anopheles gambiae.

Authors:  Abdoulaye Diabaté; Adama Dao; Alpha S Yaro; Abdoulaye Adamou; Rodrigo Gonzalez; Nicholas C Manoukis; Sékou F Traoré; Robert W Gwadz; Tovi Lehmann
Journal:  Proc Biol Sci       Date:  2009-09-04       Impact factor: 5.349

10.  Ecological niche partitioning between Anopheles gambiae molecular forms in Cameroon: the ecological side of speciation.

Authors:  Frédéric Simard; Diego Ayala; Guy Colince Kamdem; Marco Pombi; Joachim Etouna; Kenji Ose; Jean-Marie Fotsing; Didier Fontenille; Nora J Besansky; Carlo Costantini
Journal:  BMC Ecol       Date:  2009-05-21       Impact factor: 2.964

View more
  7 in total

1.  Fine-Mapping Complex Inversion Breakpoints and Investigating Somatic Pairing in the Anopheles gambiae Species Complex Using Proximity-Ligation Sequencing.

Authors:  Russell B Corbett-Detig; Iskander Said; Maria Calzetta; Max Genetti; Jakob McBroome; Nicholas W Maurer; Vincenzo Petrarca; Alessandra Della Torre; Nora J Besansky
Journal:  Genetics       Date:  2019-10-30       Impact factor: 4.562

2.  Identification of a rapidly-spreading triple mutant for high-level metabolic insecticide resistance in Anopheles gambiae provides a real-time molecular diagnostic for antimalarial intervention deployment.

Authors:  Harun Njoroge; Arjen Van't Hof; Ambrose Oruni; Dimitra Pipini; Sanjay C Nagi; Amy Lynd; Eric R Lucas; Sean Tomlinson; Xavi Grau-Bove; Daniel McDermott; Francis T Wat'senga; Emile Z Manzambi; Fiacre R Agossa; Arlette Mokuba; Seth Irish; Bilali Kabula; Charles Mbogo; Joel Bargul; Mark J I Paine; David Weetman; Martin J Donnelly
Journal:  Mol Ecol       Date:  2022-07-12       Impact factor: 6.622

3.  Inversion Genotyping in the Anopheles gambiae Complex Using High-Throughput Array and Sequencing Platforms.

Authors:  R Rebecca Love; Marco Pombi; Moussa W Guelbeogo; Nathan R Campbell; Melissa T Stephens; Roch K Dabire; Carlo Costantini; Alessandra Della Torre; Nora J Besansky
Journal:  G3 (Bethesda)       Date:  2020-09-02       Impact factor: 3.154

4.  Highly specific PCR-RFLP assays for karyotyping the widespread 2Rb inversion in malaria vectors of the Anopheles gambiae complex.

Authors:  Raquel Montanez-Gonzalez; Verena Pichler; Maria Calzetta; Rachel R Love; Alexandra Vallera; Lydia Schaecher; Beniamino Caputo; Marco Pombi; Vincenzo Petrarca; Alessandra Della Torre; Nora J Besansky
Journal:  Parasit Vectors       Date:  2020-01-10       Impact factor: 3.876

Review 5.  VectorBase.org updates: bioinformatic resources for invertebrate vectors of human pathogens and related organisms.

Authors:  Gloria I Giraldo-Calderón; Omar S Harb; Sarah A Kelly; Samuel Sc Rund; David S Roos; Mary Ann McDowell
Journal:  Curr Opin Insect Sci       Date:  2021-12-03       Impact factor: 5.254

6.  Detecting inversions with PCA in the presence of population structure.

Authors:  Ronald J Nowling; Krystal R Manke; Scott J Emrich
Journal:  PLoS One       Date:  2020-10-29       Impact factor: 3.240

7.  Fast, low-memory detection and localization of large, polymorphic inversions from SNPs.

Authors:  Ronald J Nowling; Fabian Fallas-Moya; Amir Sadovnik; Scott Emrich; Matthew Aleck; Daniel Leskiewicz; John G Peters
Journal:  PeerJ       Date:  2022-01-20       Impact factor: 2.984

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.