| Literature DB >> 31157884 |
Jay Ghurye1,2, Sergey Koren2, Scott T Small3, Seth Redmond4,5, Paul Howell6, Adam M Phillippy2, Nora J Besansky3.
Abstract
BACKGROUND: Anopheles funestus is one of the 3 most consequential and widespread vectors of human malaria in tropical Africa. However, the lack of a high-quality reference genome has hindered the association of phenotypic traits with their genetic basis in this important mosquito.Entities:
Keywords: Anopheles mosquito; DNA sequencing; Hi-C chromosome conformation capture; genome assembly; malaria
Mesh:
Year: 2019 PMID: 31157884 PMCID: PMC6545970 DOI: 10.1093/gigascience/giz063
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Assembly statistics for the A. funestus genome
| Contigs | Scaffolds | Total assembly size | QV (accuracy) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Assembly | No. | N50 | Maximum size | No. | N50 | Maximum size | Illumina | 10X Genomics | |
| AfunF1 | 9,880 | 60,925 | 563,645 | 1,392 | 671,960 | 3,832,769 | 225,223,604 | 38.93 (99.84%) | 22.69 (99.46%) |
| AfunF3 contigs | 10,245 | 94,259 | 7,564,979 | 9,175 | 238,902 | 99,362,816 | 446,039,041 | 29.82 (99.89%) | 28.18 (99.84%) |
| AfunF3 primary | 1,053 | 631,722 | 7,564,979 | 3 | 93,811,348 | 99,362,816 | 210,827,327 | 24.94 (99.64%) | 25.82 (99.73%) |
AfunF1 represents the prior reference assembly, AfunF3 contigs denotes the complete long-read assembly with all contigs included, and AfunF3 primary denotes the assembly after deduplication and scaffolding. The assembly quality value (QV) was estimated using Illumina or 10X Genomics data. QV (Illumina) is highest for the AfunF1 assembly because it is the same data used to generate that assembly, whereas QV (10X Genomics) is based on data from a single mosquito of the same FUMOZ colony. The numbers in parentheses in the QV columns denote the estimated accuracy of the assembly based on QV score.
Figure 1:Circos plot comparing the AfunF1 assembly of A. funestus to the updated AfunF3 assembly. AfunF1 scaffolds (colored half of the outer ring) are ordered by majority alignment location onto AfunF3 (black half of the outer ring). Connecting lines indicate pairwise alignments between the 2 assemblies, and crossing lines indicate that part of the AfunF1 scaffold aligns to discordant regions on the AfunF3 chromosome. The first internal ring color corresponds to the AfunF1 scaffold color. The second internal ring represents the orientation of the AfunF1 scaffolds onto AfunF3, where orange is forward and green is reverse.
Validation of A. funestus genome assemblies using BUSCO gene set completeness, agreement of the assemblies with RNA-Seq transcriptome data, and structural accuracy inferred using PacBio long-read data
| Assembly | BUSCO statistics | Transciptome data statistics (%) | Structural variants called with long reads | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| C/S | C/D | F | M | Alignment rate | Multi-mapped reads | Transcripts in a single contig | Deletions | Duplications | Inversions | Insertions | |
| AfunF1 | 2,756 | 16 | 27 | 16 | 81.79 | 23.92 | 84.96 | 9,036 | 455 | 152 | 3,798 |
| AfunF3 contigs | 2,765 | 1,068 | 18 | 17 | 84.34 | 36.97 | 91.16 | NA | NA | NA | NA |
| AfunF3 primary | 2,685 | 54 | 30 | 81 | 84.86 | 27.03 | 89.40 | 571 | 6 | 10 | 702 |
AfunF1 represents the prior reference assembly, AfunF3 contigs denotes the complete long-read assembly with all contigs included, and AfunF3 primary denotes the assembly after deduplication and scaffolding. For BUSCO categories C denotes “complete genes,” S denotes “single copy genes,” D denotes “duplicated genes,” F denotes “fragmented genes,” and M denotes “missing genes.”
Figure 2:Hi-C interaction map for assembled A. funestus scaffolds generated using the Juicebox Hi-C visualization program [47]. Darker colors indicate a higher frequency of chromatin interaction. The plot shows clear separation of chromosome boundaries and limited off-diagonal interactions, supporting the global structure of the chromosome-scale scaffolds. Note that the light colored “cross” centered near the centromere of chromosome 3 is the repetitive rDNA locus, which could not be confidently placed using the Hi-C data alone and may require future correction using other mapping techniques (see Methods).
Figure 3:Whole-genome alignment dotplot for Anopheles funestus and Anopheles gambiae genomes generated using D-GENIES [54]. A dot in the plot corresponds to a match between the corresponding genomic positions indicated on the axes. The A. gambiae reference genome is displayed on the x-axis, and the A. funestus AfunF3 primary assembly on the y-axis. A reciprocal whole-arm translocation between 2L and 3R is apparent, as well as substantial intra-chromosomal shuffling between these genomes.
Figure 4:GC content versus coverage plot for all assembled A. funestus contigs. The orange points denote the contigs classified by Kraken as A. funestus and green points denote everything else. A majority of the contigs are classified as A. funestus by Kraken, and there is no indication of extensive contamination.