Literature DB >> 31157884

A chromosome-scale assembly of the major African malaria vector Anopheles funestus.

Jay Ghurye1,2, Sergey Koren2, Scott T Small3, Seth Redmond4,5, Paul Howell6, Adam M Phillippy2, Nora J Besansky3.   

Abstract

BACKGROUND: Anopheles funestus is one of the 3 most consequential and widespread vectors of human malaria in tropical Africa. However, the lack of a high-quality reference genome has hindered the association of phenotypic traits with their genetic basis in this important mosquito.
FINDINGS: Here we present a new high-quality A. funestus reference genome (AfunF3) assembled using 240× coverage of long-read single-molecule sequencing for contigging, combined with 100× coverage of short-read Hi-C data for chromosome scaffolding. The assembled contigs total 446 Mbp of sequence and contain substantial duplication due to alternative alleles present in the sequenced pool of mosquitos from the FUMOZ colony. Using alignment and depth-of-coverage information, these contigs were deduplicated to a 211 Mbp primary assembly, which is closer to the expected haploid genome size of 250 Mbp. This primary assembly consists of 1,053 contigs organized into 3 chromosome-scale scaffolds with an N50 contig size of 632 kbp and an N50 scaffold size of 93.811 Mbp, representing a 100-fold improvement in continuity versus the current reference assembly, AfunF1.
CONCLUSION: This highly contiguous and complete A. funestus reference genome assembly will serve as an improved basis for future studies of genomic variation and organization in this important disease vector.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Keywords:  Anopheles mosquito; DNA sequencing; Hi-C chromosome conformation capture; genome assembly; malaria

Mesh:

Year:  2019        PMID: 31157884      PMCID: PMC6545970          DOI: 10.1093/gigascience/giz063

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


Data Description

Introduction and background

Many insect genomes remain a challenge to assemble, and mosquito genomes have proven particularly difficult owing to their repeat content and structurally dynamic genomes. These issues are compounded by the fact that long-read sequencing technologies typically require >10 μg of DNA for library construction. As a result, it is often impossible to construct a sequencing library from a single individual. Instead, it has been necessary to sequence a pool of individuals from an inbred population [1]. For species that are amenable to extensive inbreeding, this approach has led to reference-grade genomes directly from the assembler [2]. However, when inbreeding is not possible, the sequenced pool of individuals can carry population variation that fragments the resulting assembly. In this case, instead of assembling a single genome, the assembler must reconstruct some unknown number of variant haplotypes. Motivated by the goal of genome-enabled malaria control, a large international consortium previously sequenced and assembled the genomes of 16 Anopheles species using short-read Illumina sequencing [3, 4]. Although these draft assemblies represented a crucial first step, their potential for (i) understanding and manipulating vectorial capacity traits, (ii) inferring how key vector adaptations to hosts and habitats have arisen and are maintained, and (iii) accurately defining vector breeding units and migration between them is constrained by 2 major limitations. First, many of these Anopheles assemblies are highly fragmented collections of relatively short scaffolds, causing gene annotation problems such as missing genes, missing exons, and genes split between scaffolds or sequencing gaps. Thus, one of the consequences of fragmented assemblies is that it is difficult to estimate gene copy number, which may be linked to important phenotypic traits (e.g., insecticide resistance) [5, 6]. Genes of particular interest with respect to arthropod disease vectors (e.g., cytochrome P450s and odorant/gustatory receptors) may be especially prone to annotation errors because many belong to gene families whose members are often physically clustered into tandem arrays. A second major limitation of fragmented insect assemblies is that they are rarely scaffolded into chromosomes, owing to difficulty and lack of funding for physical or linkage mapping. Among other consequences, the unknown placement of scaffolds along chromosome arms means that their position within or outside of chromosomal inversions is difficult or impossible to determine. Many anopheline species are highly polymorphic for chromosomal inversions, which tend to occur disproportionately on particular chromosome arms [7-9]. In a heterozygote carrying 1 inverted and 1 uninverted chromosome, recombination between the reversed chromosomal segments is greatly reduced [10], creating cryptic population structure that can cause spurious associations in genome-wide association studies (GWAS) [11] and mislead recombination-based inference of selection and gene flow [12, 13]. Importantly, chromosomal inversions also directly or indirectly influence traits affecting malaria transmission intensity—anopheline biting and resting behavior [14, 15], seasonality [16], aridity tolerance [14, 17–21], ecological plasticity [22, 23], morphometric variation [24], and Plasmodium infection rates [25, 26]. Thus, correct population genomic and GWAS inferences depend upon knowing the location of a marker in the genome. Anopheles funestus (NCBI:txid62324) is one of the 3 most important and widespread vectors of human malaria in tropical Africa [27-30], and unlike Anopheles gambiae with which it broadly co-occurs, it is a relatively neglected species. It is considered even more highly anthropophilic and endophilic than A. gambiae and amenable to conventional indoor-based vector control such as bed nets and indoor spraying of houses with residual insecticides. Indeed, historical house spraying campaigns in eastern and southern Africa not only locally eliminated this species, but the effect was maintained for several years following the cessation of spraying, due to the apparent inability of A. funestus to recolonize some areas. Likewise, A. funestus was eliminated from a humid forest and degraded forest areas in West Africa where malaria is meso- or hypoendemic [31]. However, in the savanna environment of West Africa where malaria is holo- or hyperendemic, similar historical indoor spraying campaigns failed to eliminate the species. Exophilic populations persisted, which—despite marked anthropophily—continued to feed outdoors on cattle but also entered sprayed houses to bite humans. Today, the situation is worsened by the emergence and spread of insecticide resistance in this species [29, 32–34]. Mastery over malaria will require tackling A. funestus, but it remains understudied; information on its behavior and genetics lags far behind that of A. gambiae. At least part of the reason for its neglect may be the historical lack of laboratory colonies, a problem solved with the establishment of the FUMOZ colony and its registration with the Anopheles program of BEI Resources [35]. A. funestus shares with A. gambiae not only a broad sub-Saharan distribution and major vector status but also abundant chromosomal inversion polymorphism and shallow range-wide population structure [36]. However, there are behavioral and genetic heterogeneities relevant to malaria transmission that remain poorly understood. In West Africa, strong cytogenetic evidence points to cryptic, temporally stable assortatively mating populations co-occurring in the same villages [37-40]. These chromosomally recognized forms of A. funestus, named Kiribina and Folonzo, seem to differ in larval ecology, and—importantly—they also differ in adult behaviors affecting vectorial capacity, most notably indoor resting behavior. Mechanistic understanding of the genomic determinants of these and other epidemiologically important phenotypic and behavioral traits ultimately depends on upgrading the A. funestus reference to a chromosome-based assembly in which the unanchored scaffolds are united, ordered, and oriented on chromosome arms.

Chromosome-scale assembly of Anopheles funestus

To achieve a complete and highly contiguous assembly of the A. funestus genome (AfunF3), we first assembled contigs from long, single-molecule reads and then scaffolded these contigs into chromosome-scale scaffolds using Hi-C proximity ligation data. A similar strategy was recently used to improve the genome of Aedes aegypti [41]. An initial assembly of the long-read data alone (AfunF3 contigs) yielded a contig N50 size of 94.05 kbp (N50 such that 50% of assembled bases are in contigs of this size or greater) and extensive haplotype separation as evidenced by an inflated assembly size of 446.04 Mbp and a high rate of core gene duplications (48%) as measured by BUSCO [42]. These alternative alleles likely derive from natural variation circulating within the sequenced FUMOZ colony, as the DNA from a pool of adult mosquitoes was required for Pacific Biosciences (PacBio) library preparation. Identifying and removing duplicate contigs via an all-vs-all alignment reduced the primary assembly size to 211.75 Mbp and improved the N50 size to 631.72 kbp (Table 1).
Table 1:

Assembly statistics for the A. funestus genome

ContigsScaffoldsTotal assembly sizeQV (accuracy)
AssemblyNo.N50Maximum sizeNo.N50Maximum sizeIllumina10X Genomics
AfunF19,88060,925563,6451,392671,9603,832,769225,223,60438.93 (99.84%)22.69 (99.46%)
AfunF3 contigs10,24594,2597,564,9799,175238,90299,362,816446,039,04129.82 (99.89%)28.18 (99.84%)
AfunF3 primary1,053631,7227,564,979393,811,34899,362,816210,827,32724.94 (99.64%)25.82 (99.73%)

AfunF1 represents the prior reference assembly, AfunF3 contigs denotes the complete long-read assembly with all contigs included, and AfunF3 primary denotes the assembly after deduplication and scaffolding. The assembly quality value (QV) was estimated using Illumina or 10X Genomics data. QV (Illumina) is highest for the AfunF1 assembly because it is the same data used to generate that assembly, whereas QV (10X Genomics) is based on data from a single mosquito of the same FUMOZ colony. The numbers in parentheses in the QV columns denote the estimated accuracy of the assembly based on QV score.

Assembly statistics for the A. funestus genome AfunF1 represents the prior reference assembly, AfunF3 contigs denotes the complete long-read assembly with all contigs included, and AfunF3 primary denotes the assembly after deduplication and scaffolding. The assembly quality value (QV) was estimated using Illumina or 10X Genomics data. QV (Illumina) is highest for the AfunF1 assembly because it is the same data used to generate that assembly, whereas QV (10X Genomics) is based on data from a single mosquito of the same FUMOZ colony. The numbers in parentheses in the QV columns denote the estimated accuracy of the assembly based on QV score. The primary set of contigs (excluding alternative alleles) was then scaffolded using Hi-C Illumina reads to first bin the contigs into 3 chromosomes, followed by ordering and orientation of the contigs using the Proximo method (Phase Genomics, Seattle, WA, USA). The final scaffolded assembly (AfunF3 primary) contains 210.82 Mbp of sequence and a scaffold N50 of 93.81 Mbp. The resulting scaffolds represent the entirety of the 3 A. funestus chromosomes: 2, 3, and X (Fig. 1).
Figure 1:

Circos plot comparing the AfunF1 assembly of A. funestus to the updated AfunF3 assembly. AfunF1 scaffolds (colored half of the outer ring) are ordered by majority alignment location onto AfunF3 (black half of the outer ring). Connecting lines indicate pairwise alignments between the 2 assemblies, and crossing lines indicate that part of the AfunF1 scaffold aligns to discordant regions on the AfunF3 chromosome. The first internal ring color corresponds to the AfunF1 scaffold color. The second internal ring represents the orientation of the AfunF1 scaffolds onto AfunF3, where orange is forward and green is reverse.

Circos plot comparing the AfunF1 assembly of A. funestus to the updated AfunF3 assembly. AfunF1 scaffolds (colored half of the outer ring) are ordered by majority alignment location onto AfunF3 (black half of the outer ring). Connecting lines indicate pairwise alignments between the 2 assemblies, and crossing lines indicate that part of the AfunF1 scaffold aligns to discordant regions on the AfunF3 chromosome. The first internal ring color corresponds to the AfunF1 scaffold color. The second internal ring represents the orientation of the AfunF1 scaffolds onto AfunF3, where orange is forward and green is reverse. Because single-molecule PacBio data are prone to insertion and deletion errors, all AfunF3 contigs were polished twice with Arrow [43] using the signal-level PacBio data and once with Pilon [44] using paired-end Illumina data from the same FUMOZ colony. Because Illumina-based polishing tools typically do not correct bases that appear heterozygous in the read set, we anticipated that variation in the FUMOZ colony would prevent the correction of variant bases. To help address this issue, we finally polished the assembly using 10X Genomics Illumina data obtained from an individual mosquito. As an independent test of base accuracy, we compared our new assembly (AfunF3 primary) and the prior assembly (AfunF1) to a 10X Genomics dataset from a different individual mosquito. The average Phred-scaled quality value (QV) [45] of the new assembly was estimated as QV 28 (99.84% identity) versus QV 23 (99.49% identity) for the Illumina-based AfunF1 assembly. These independent data from a single mosquito of the FUMOZ colony indicate that the new AfunF3 assembly is of comparable accuracy to the prior Illumina-based assembly and that the small differences between quality estimates could be due to genetic diversity within the colony. We next evaluated the structural accuracy of the AfunF1 and AfunF3 assemblies by measuring their agreement with the raw PacBio reads. The intermediate assembly AfunF2 [46] was assembled before collection of all PacBio and Hi-C data and so was deemed redundant and excluded from these analyses. When compared to the raw data, the AfunF3 primary assembly had fewer called structural differences (insertions, deletions, duplications, and inversions) than AfunF1 (Table 2). Despite the substantial single-nucleotide polymorphism observed within the FUMOZ colony, no large polymorphic inversions could be identified from the combined PacBio, Hi-C, and 10X Genomics data. Comparison of the chromosome-scale AfunF3 primary assembly versus the A. gambiae reference genome (AgamP4) confirmed a known reciprocal whole-arm translocation between 2L and 3R, as well as substantial intra-chromosomal shuffling (Fig. 2). AfunF3 contigs also had fewer fragmented BUSCO core genes and a similar number of complete BUSCOs compared to AfunF1 (Table 2) but also a high rate of duplication. The AfunF3 primary scaffolds reduce duplication at the expense of lower BUSCO completeness.
Table 2:

Validation of A. funestus genome assemblies using BUSCO gene set completeness, agreement of the assemblies with RNA-Seq transcriptome data, and structural accuracy inferred using PacBio long-read data

Assembly BUSCO statisticsTransciptome data statistics (%)Structural variants called with long reads
C/SC/DFMAlignment rateMulti-mapped readsTranscripts in a single contigDeletionsDuplicationsInversionsInsertions
AfunF12,75616271681.7923.9284.969,0364551523,798
AfunF3 contigs2,7651,068181784.3436.9791.16NANANANA
AfunF3 primary2,68554308184.8627.0389.40571610702

AfunF1 represents the prior reference assembly, AfunF3 contigs denotes the complete long-read assembly with all contigs included, and AfunF3 primary denotes the assembly after deduplication and scaffolding. For BUSCO categories C denotes “complete genes,” S denotes “single copy genes,” D denotes “duplicated genes,” F denotes “fragmented genes,” and M denotes “missing genes.”

Figure 2:

Hi-C interaction map for assembled A. funestus scaffolds generated using the Juicebox Hi-C visualization program [47]. Darker colors indicate a higher frequency of chromatin interaction. The plot shows clear separation of chromosome boundaries and limited off-diagonal interactions, supporting the global structure of the chromosome-scale scaffolds. Note that the light colored “cross” centered near the centromere of chromosome 3 is the repetitive rDNA locus, which could not be confidently placed using the Hi-C data alone and may require future correction using other mapping techniques (see Methods).

Hi-C interaction map for assembled A. funestus scaffolds generated using the Juicebox Hi-C visualization program [47]. Darker colors indicate a higher frequency of chromatin interaction. The plot shows clear separation of chromosome boundaries and limited off-diagonal interactions, supporting the global structure of the chromosome-scale scaffolds. Note that the light colored “cross” centered near the centromere of chromosome 3 is the repetitive rDNA locus, which could not be confidently placed using the Hi-C data alone and may require future correction using other mapping techniques (see Methods). Validation of A. funestus genome assemblies using BUSCO gene set completeness, agreement of the assemblies with RNA-Seq transcriptome data, and structural accuracy inferred using PacBio long-read data AfunF1 represents the prior reference assembly, AfunF3 contigs denotes the complete long-read assembly with all contigs included, and AfunF3 primary denotes the assembly after deduplication and scaffolding. For BUSCO categories C denotes “complete genes,” S denotes “single copy genes,” D denotes “duplicated genes,” F denotes “fragmented genes,” and M denotes “missing genes.” To further evaluate AfunF3’s suitability as an updated reference for A. funestus, we mapped RNA-sequencing (RNA-Seq) expression data to the assemblies and computed the number of concordant paired-end reads. A better assembly is expected to have both a higher fraction of mapped reads (completeness) as well as a higher fraction of correctly spaced and oriented pairs (structural accuracy). Both primary and complete AfunF3 assemblies have better agreement of mapped read pairs as well as a higher overall mapping rate versus the AfunF1 assembly (Table 2). The AfunF3 contigs do have a higher rate of multi-mapping RNA-Seq reads, but this is reduced in the primary assembly while preserving the high mapping rate. In addition to a higher mapping rate, more complete transcripts were mapped to single contigs within the long-read assemblies. The average number of complete transcripts contained per contig was 67.38 for AfunF3 primary versus 5.28 for the AfunF1 assembly. These results demonstrate the greater continuity of the updated assembly, which provides sequence-resolved reconstructions of many A. funestus intergenic regions for the first time.

Discussion

Anopheles funestus is one of the leading vectors of malaria, and understanding the organization and function of its genome is key to controlling this deadly disease. Herein we describe a chromosome-scale assembly of the A. funestus genome using multiple sequencing technologies and assembly methods. The tremendous improvement in the completeness and contiguity of its genome will provide a valuable resource for future genomic analyses and functional characterization of this important species and enable a mechanistic understanding of the genomic determinants of epidemiologically important phenotypic and behavioral traits.

Materials and Methods

Library preparation and sequencing

A gravid female mosquito of the FUMOZ colony was allowed to lay eggs, and her offspring were inbred for a single generation. From this, an isofemale line was grown and DNA extracted from the adult females for sequencing with PacBio and Hi-C. A total of 46 single-molecule real-time (SMRT) cells of PacBio RSII sequencing using the P6-C4 chemistry were run by the core facility at the Icahn School of Medicine at Mount Sinai (New York, NY), resulting in 173× coverage (assuming a 250-Mbp genome size). A previous study generated 70× coverage of the same colony using the older PacBio P5-C3 chemistry sequencing [46]. These older data were combined with the additional 173× coverage, totaling 60.95 Gb of long-read data in 10.93 million sequences (average length 5.6 kb, N50 read length 8.4 kb) and an estimated total coverage of 234×. Two Hi-C libraries were prepared and sequenced (one from mixed-sex larvae, the second from adult females) by Phase Genomics (Seattle, WA, USA), resulting in ∼100× coverage of Illumina Hi-C data containing ∼187 million 80-bp paired-end Illumina reads.

Assembly and scaffolding

PacBio contig assembly was performed with Canu v1.3 (Canu, RRID:SCR_015880) [48] using the following parameters: corOutCoverage = 100 genomeSize = 250m errorRate = 0.013 batOptions = “-dg 3 -db 3 -dr 1 -ca 500 -cp 50”. The resulting contigs were then polished with Arrow [43] using default parameters and the P6-C4 PacBio signal data (because Arrow does not support the older P5-C3 data). After polishing, the assembly was separated into primary and alternative contigs to remove unnecessarily duplicated alleles from the AfunF3 contigs. This was performed using 2 different approaches. First, contigs containing ≥1 complete BUSCO gene were identified. For each BUSCO gene, if it was found contained in ≥2 contigs, the contig with the highest alignment score was kept as the primary. Next, all contigs not containing a BUSCO gene but assembled with high coverage (>40X) were added to the primary set. To order and orient the primary contigs along the chromosomes, Hi-C reads were aligned using Bowtie2 (Bowtie, RRID:SCR_005476) [49] and scaffolding using Proximo (Phase Genomics, Seattle, WA, USA). Scaffold gaps spanned by PacBio reads were filled using PBJelly (PBJelly, RRID:SCR_012091) [50]. This assembly was again run through Arrow to polish the sequences inserted by PBJelly and fill any remaining short gaps. The Hi-C assembled scaffolds were then aligned using NUCmer [51] to the AfunF1 contigs for validation and the alignments visualized using Circos (Circos, RRID:SCR_011798) [52] and mummerplot. This identified a mis-join of chromosomes 3R and X, which was manually corrected. Additional manual curation using mapped transcripts, fluorescence in situ hybridization (FISH) probes [46], and comparison to AfunF1 scaffolds identified a few additional inversion errors in the scaffolds, mainly on distal 2L. Visual inspection of the Hi-C data showed clear signatures of scaffolding error. These errors were corrected by manually extracting the region and placing the sequence at the correct locus, as indicated by the Hi-C interactions. After these corrections, the scaffolded chromosomes (AfunF3 primary) show good agreement with the Hi-C data (Fig. 3). The largest remaining ambiguity in the Hi-C map is the placement of the ribosomal DNA (rDNA) locus, which is placed near the centromere of chromosome 3 in the AfunF3 assembly. Given that the rDNA locus in A. gambiae is known to be on the X chromosome [53], this is possibly a mis-assembly in AfunF3 mediated by the increased proportion of repetitive transposable elements surrounding the rDNA and centromeres. However, there was insufficient long-read or Hi-C evidence to confidently place this highly repetitive locus in AfunF3, which may require correcting in future A. funestus assemblies.
Figure 3:

Whole-genome alignment dotplot for Anopheles funestus and Anopheles gambiae genomes generated using D-GENIES [54]. A dot in the plot corresponds to a match between the corresponding genomic positions indicated on the axes. The A. gambiae reference genome is displayed on the x-axis, and the A. funestus AfunF3 primary assembly on the y-axis. A reciprocal whole-arm translocation between 2L and 3R is apparent, as well as substantial intra-chromosomal shuffling between these genomes.

Whole-genome alignment dotplot for Anopheles funestus and Anopheles gambiae genomes generated using D-GENIES [54]. A dot in the plot corresponds to a match between the corresponding genomic positions indicated on the axes. The A. gambiae reference genome is displayed on the x-axis, and the A. funestus AfunF3 primary assembly on the y-axis. A reciprocal whole-arm translocation between 2L and 3R is apparent, as well as substantial intra-chromosomal shuffling between these genomes. Because diploid and population variation introduces indels in the Arrow polishing process [55], the final assemblies were also polished by Pilon using paired-end Illumina data (NCBI SRA accession numbers: SRX209628 and SRX209387) and 10X Genomics Illumina data from a single individual (NCBI SRA accession number: SRX4819916). The paired-end Illumina data were mapped using BWA-MEM [56] and the 10X Genomics data mapped using Lariat [57] in a barcode-aware manner, so as to improve the mapping quality. Consensus quality of the final assemblies was then estimated using an independent 10X Genomics dataset (NCBI SRA accession number: SRX4819903) of a different mosquito of the same FUMOZ colony. Based on the alignment of reads to the assembly, variants were called using freebayes (parameters: -C 2 -0 -O -q 20 -z 0.10 -E 0 -X -u -p 2 -F 0.5), and the assembly QV was estimated using called homozygous variants (i.e., positions where nearly all Illumina reads agreed with each other yet disagreed with the assembly).

Validation

To check for the presence of contamination, assembled contigs were classified using Kraken [58] using a custom database including all microbial RefSeq genomes and all available mosquito genomes. Most of the assembled sequence (96.00%) was classified as A. funestus or Culicidae. The remaining sequences were primarily unannotated or annotated at a higher taxonomic level (3.76%), from possible bacterial/human sources (0.24%, 32 contigs), and had slightly lower guanine-cytosine (GC) content (Fig. 4). However, none of these contigs were called contaminants by NCBI's independent contamination check and so all contigs were included in the submitted assembly to avoid excluding novel mosquito sequence missing from the prior draft assemblies.
Figure 4:

GC content versus coverage plot for all assembled A. funestus contigs. The orange points denote the contigs classified by Kraken as A. funestus and green points denote everything else. A majority of the contigs are classified as A. funestus by Kraken, and there is no indication of extensive contamination.

GC content versus coverage plot for all assembled A. funestus contigs. The orange points denote the contigs classified by Kraken as A. funestus and green points denote everything else. A majority of the contigs are classified as A. funestus by Kraken, and there is no indication of extensive contamination. The structural accuracy of the assemblies was evaluated by mapping raw PacBio reads and calling structural variants. PacBio reads were aligned to each assembly using NGMLR [59] with the following parameters: -t 16 -x pacbio –skip-write. Using these alignments, variants were called using Sniffles [59] with the following parameters: -t 32 -s 10 -f 0.25. Variants were then filtered to avoid capturing heterozygous population variants such that variants for which the alternate variant had ≥45 supporting reads and the assembly variant had <10 supporting reads were called as assembly errors. Paired-end RNA-Seq for the A. funestus FUMOZ colony was downloaded from NCBI under accession SRR826832. These reads were aligned to all assemblies using the HISAT2 aligner (HISAT2, RRID:SCR_015530) [60] and assembled into transcripts using Trinity (Trinity, RRID:SCR_013048) [61] with default parameters. The assembled transcripts were then mapped to all assemblies using GMAP (GMAP, RRID:SCR_008992) [62]. Transcripts were required to be aligned over 90% of their length to a single contig to be considered “complete” in the assembly.

Availability of supporting data and materials

Raw genomic sequence reads are available in the NCBI Sequence Read Archive under project accession PRJNA494870. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession RCWQ00000000. The version described in this paper is version RCWQ01000000. Supporting data and materials are available in the GigaScience GigaDB database [63].

Abbreviations

bp: base pairs; BUSCO: Benchmarking Universal Single-Copy Orthologs; FISH: fluorescence in situ hybridization; GC: guanine-cytosine; GWAS: genome-wide association studies; kbp: kilobase pairs; Mbp: megabase pairs; NUCmer: NUCleotide MUMmer; PacBio: Pacific Biosciences; rDNA: ribosomal DNA; RNA-Seq: RNA-sequencing; NCBI: National Center for Biotechnology Information; QV: quality value; SMRT: single-molecule real-time; SRA: Sequence Read Archive.

Competing interests

The authors declare that they have no competing interests.

Funding

Physical mapping and data production were supported by the US National Institutes of Health (NIH) National Institute of Allergy and Infectious Diseases (NIAID) grant R21 AI112734 to N.J.B. S.T.S. and N.J.B. received support from NIAID grant R21 AI123491 and Target Malaria, which receives core funding from the Bill & Melinda Gates Foundation and from the Open Philanthropy Project Fund, an advised fund of Silicon Valley Community Foundation. J.G., S.K., and A.M.P. were supported by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health. This work utilized the computational resources of the NIH HPC Biowulf cluster (https://hpc.nih.gov).

Authors' contributions

A.M.P. and N.J.B. conceived and coordinated the project. J.G., S.K., S.T.S., and A.M.P. performed the genome assembly, validation, and comparative analyses. S.R. provided the 10X Genomics data and analysis. P.H. provided FUMOZ samples for sequencing. J.G., A.M.P., and N.J.B. drafted the manuscript. All the authors have read and approved the manuscript. Click here for additional data file. Click here for additional data file. Click here for additional data file. 1/30/2019 Reviewed Click here for additional data file. 2/8/2019 Reviewed Click here for additional data file.
  57 in total

1.  Circos: an information aesthetic for comparative genomics.

Authors:  Martin Krzywinski; Jacqueline Schein; Inanç Birol; Joseph Connors; Randy Gascoyne; Doug Horsman; Steven J Jones; Marco A Marra
Journal:  Genome Res       Date:  2009-06-18       Impact factor: 9.043

2.  Chromosomal evidence of incipient speciation in the Afrotropical malaria mosquito Anopheles funestus.

Authors:  W M Guelbeogo; O Grushko; D Boccolini; P A Ouédraogo; N J Besansky; N F Sagnon; C Costantini
Journal:  Med Vet Entomol       Date:  2005-12       Impact factor: 2.739

Review 3.  Copy number variation (CNV) and insecticide resistance in mosquitoes: evolving knowledge or an evolving problem?

Authors:  David Weetman; Luc S Djogbenou; Eric Lucas
Journal:  Curr Opin Insect Sci       Date:  2018-04-13       Impact factor: 5.186

4.  Chromosomal and environmental determinants of morphometric variation in natural populations of the malaria vector Anopheles funestus in Cameroon.

Authors:  Diego Ayala; Harling Caro-Riaño; Jean-Pierre Dujardin; Nil Rahola; Frederic Simard; Didier Fontenille
Journal:  Infect Genet Evol       Date:  2011-03-15       Impact factor: 3.342

5.  Intraspecific chromosomal polymorphism in the Anopheles gambiae complex as a factor affecting malaria transmission in the Kisumu area of Kenya.

Authors:  V Petrarca; J C Beier
Journal:  Am J Trop Med Hyg       Date:  1992-02       Impact factor: 2.345

6.  Mosquito genomics. Highly evolvable malaria vectors: the genomes of 16 Anopheles mosquitoes.

Authors:  Daniel E Neafsey; Robert M Waterhouse; Mohammad R Abai; Sergey S Aganezov; Max A Alekseyev; James E Allen; James Amon; Bruno Arcà; Peter Arensburger; Gleb Artemov; Lauren A Assour; Hamidreza Basseri; Aaron Berlin; Bruce W Birren; Stephanie A Blandin; Andrew I Brockman; Thomas R Burkot; Austin Burt; Clara S Chan; Cedric Chauve; Joanna C Chiu; Mikkel Christensen; Carlo Costantini; Victoria L M Davidson; Elena Deligianni; Tania Dottorini; Vicky Dritsou; Stacey B Gabriel; Wamdaogo M Guelbeogo; Andrew B Hall; Mira V Han; Thaung Hlaing; Daniel S T Hughes; Adam M Jenkins; Xiaofang Jiang; Irwin Jungreis; Evdoxia G Kakani; Maryam Kamali; Petri Kemppainen; Ryan C Kennedy; Ioannis K Kirmitzoglou; Lizette L Koekemoer; Njoroge Laban; Nicholas Langridge; Mara K N Lawniczak; Manolis Lirakis; Neil F Lobo; Ernesto Lowy; Robert M MacCallum; Chunhong Mao; Gareth Maslen; Charles Mbogo; Jenny McCarthy; Kristin Michel; Sara N Mitchell; Wendy Moore; Katherine A Murphy; Anastasia N Naumenko; Tony Nolan; Eva M Novoa; Samantha O'Loughlin; Chioma Oringanje; Mohammad A Oshaghi; Nazzy Pakpour; Philippos A Papathanos; Ashley N Peery; Michael Povelones; Anil Prakash; David P Price; Ashok Rajaraman; Lisa J Reimer; David C Rinker; Antonis Rokas; Tanya L Russell; N'Fale Sagnon; Maria V Sharakhova; Terrance Shea; Felipe A Simão; Frederic Simard; Michel A Slotman; Pradya Somboon; Vladimir Stegniy; Claudio J Struchiner; Gregg W C Thomas; Marta Tojo; Pantelis Topalis; José M C Tubio; Maria F Unger; John Vontas; Catherine Walton; Craig S Wilding; Judith H Willis; Yi-Chieh Wu; Guiyun Yan; Evgeny M Zdobnov; Xiaofan Zhou; Flaminia Catteruccia; George K Christophides; Frank H Collins; Robert S Cornman; Andrea Crisanti; Martin J Donnelly; Scott J Emrich; Michael C Fontaine; William Gelbart; Matthew W Hahn; Immo A Hansen; Paul I Howell; Fotis C Kafatos; Manolis Kellis; Daniel Lawson; Christos Louis; Shirley Luckhart; Marc A T Muskavitch; José M Ribeiro; Michael A Riehle; Igor V Sharakhov; Zhijian Tu; Laurence J Zwiebel; Nora J Besansky
Journal:  Science       Date:  2014-11-27       Impact factor: 47.728

7.  Investigation of inversion polymorphisms in the human genome using principal components analysis.

Authors:  Jianzhong Ma; Christopher I Amos
Journal:  PLoS One       Date:  2012-07-09       Impact factor: 3.240

8.  The Anopheles gambiae 2La chromosome inversion is associated with susceptibility to Plasmodium falciparum in Africa.

Authors:  Michelle M Riehle; Tullu Bukhari; Awa Gneme; Wamdaogo M Guelbeogo; Boubacar Coulibaly; Abdrahamane Fofana; Adrien Pain; Emmanuel Bischoff; Francois Renaud; Abdoul H Beavogui; Sekou F Traore; N'Fale Sagnon; Kenneth D Vernick
Journal:  Elife       Date:  2017-06-23       Impact factor: 8.140

9.  The Genetic Basis of Host Preference and Resting Behavior in the Major African Malaria Vector, Anopheles arabiensis.

Authors:  Bradley J Main; Yoosook Lee; Heather M Ferguson; Katharina S Kreppel; Anicet Kihonda; Nicodem J Govella; Travis C Collier; Anthony J Cornel; Eleazar Eskin; Eun Yong Kang; Catelyn C Nieman; Allison M Weakley; Gregory C Lanzaro
Journal:  PLoS Genet       Date:  2016-09-15       Impact factor: 5.917

10.  2La chromosomal inversion enhances thermal tolerance of Anopheles gambiae larvae.

Authors:  Kyle A C Rocca; Emilie M Gray; Carlo Costantini; Nora J Besansky
Journal:  Malar J       Date:  2009-07-02       Impact factor: 2.979

View more
  20 in total

1.  Integrating Hi-C links with assembly graphs for chromosome-scale assembly.

Authors:  Jay Ghurye; Arang Rhie; Brian P Walenz; Anthony Schmitt; Siddarth Selvaraj; Mihai Pop; Adam M Phillippy; Sergey Koren
Journal:  PLoS Comput Biol       Date:  2019-08-21       Impact factor: 4.475

Review 2.  Recent advances and future perspectives in vector-omics.

Authors:  Austin Compton; Igor V Sharakhov; Zhijian Tu
Journal:  Curr Opin Insect Sci       Date:  2020-05-29       Impact factor: 5.186

3.  Chromosome-level genome assemblies of the malaria vectors Anopheles coluzzii and Anopheles arabiensis.

Authors:  Anton Zamyatin; Pavel Avdeyev; Jiangtao Liang; Atashi Sharma; Chujia Chen; Varvara Lukyanchikova; Nikita Alexeev; Zhijian Tu; Max A Alekseyev; Igor V Sharakhov
Journal:  Gigascience       Date:  2021-03-15       Impact factor: 7.658

4.  A high-quality genome assembly from a single, field-collected spotted lanternfly (Lycorma delicatula) using the PacBio Sequel II system.

Authors:  Sarah B Kingan; Julie Urban; Christine C Lambert; Primo Baybayan; Anna K Childers; Brad Coates; Brian Scheffler; Kevin Hackett; Jonas Korlach; Scott M Geib
Journal:  Gigascience       Date:  2019-10-01       Impact factor: 6.524

5.  Mosquito genomes are frequently invaded by transposable elements through horizontal transfer.

Authors:  Elverson Soares de Melo; Gabriel Luz Wallau
Journal:  PLoS Genet       Date:  2020-11-30       Impact factor: 5.917

6.  Radiation with reticulation marks the origin of a major malaria vector.

Authors:  Scott T Small; Frédéric Labbé; Neil F Lobo; Lizette L Koekemoer; Chadwick H Sikaala; Daniel E Neafsey; Matthew W Hahn; Michael C Fontaine; Nora J Besansky
Journal:  Proc Natl Acad Sci U S A       Date:  2020-12-01       Impact factor: 11.205

7.  CYP6P9-Driven Signatures of Selective Sweep of Metabolic Resistance to Pyrethroids in the Malaria Vector Anopheles funestus Reveal Contemporary Barriers to Gene Flow.

Authors:  Delia Doreen Djuicy; Jack Hearn; Magellan Tchouakui; Murielle J Wondji; Helen Irving; Fredros O Okumu; Charles S Wondji
Journal:  Genes (Basel)       Date:  2020-11-05       Impact factor: 4.096

8.  A Chromosome-Scale Assembly of the Asian Honeybee Apis cerana Genome.

Authors:  Zi-Long Wang; Yong-Qiang Zhu; Qing Yan; Wei-Yu Yan; Hua-Jun Zheng; Zhi-Jiang Zeng
Journal:  Front Genet       Date:  2020-03-27       Impact factor: 4.599

9.  Evolutionary superscaffolding and chromosome anchoring to improve Anopheles genome assemblies.

Authors:  Robert M Waterhouse; Sergey Aganezov; Yoann Anselmetti; Jiyoung Lee; Livio Ruzzante; Maarten J M F Reijnders; Romain Feron; Sèverine Bérard; Phillip George; Matthew W Hahn; Paul I Howell; Maryam Kamali; Sergey Koren; Daniel Lawson; Gareth Maslen; Ashley Peery; Adam M Phillippy; Maria V Sharakhova; Eric Tannier; Maria F Unger; Simo V Zhang; Max A Alekseyev; Nora J Besansky; Cedric Chauve; Scott J Emrich; Igor V Sharakhov
Journal:  BMC Biol       Date:  2020-01-02       Impact factor: 7.364

10.  The Beginning of the End: A Chromosomal Assembly of the New World Malaria Mosquito Ends with a Novel Telomere.

Authors:  Austin Compton; Jiangtao Liang; Chujia Chen; Varvara Lukyanchikova; Yumin Qi; Mark Potters; Robert Settlage; Dustin Miller; Stéphane Deschamps; Chunhong Mao; Victor Llaca; Igor V Sharakhov; Zhijian Tu
Journal:  G3 (Bethesda)       Date:  2020-10-05       Impact factor: 3.542

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.