| Literature DB >> 32641717 |
Mathilde Paris1,2, Roxane Boyer1,3, Rita Jaenichen4, Jochen Wolf4,5, Marianthi Karageorgi1,6, Jack Green1, Mathilde Cagnon4, Hugues Parinello7, Arnaud Estoup8, Mathieu Gautier9, Nicolas Gompel10, Benjamin Prud'homme11.
Abstract
Over the past decade, the spotted wing Drosophila, Drosophila suzukii, has invaded Europe and America and has become a major agricultural pest in these areas, thereby prompting intense research activities to better understand its biology. Two draft genome assemblies already exist for this species but contain pervasive assembly errors and are highly fragmented, which limits their values. Our purpose here was to improve the assembly of the D. suzukii genome and to annotate it in a way that facilitates comparisons with D. melanogaster. For this, we generated PacBio long-read sequencing data and assembled a novel, high-quality D. suzukii genome assembly. It is one of the largest Drosophila genomes, notably because of the expansion of its repeatome. We found that despite 16 rounds of full-sib crossings the D. suzukii strain that we sequenced has maintained high levels of polymorphism in some regions of its genome. As a consequence, the quality of the assembly of these regions was reduced. We explored possible origins of this high residual diversity, including the presence of structural variants and a possible heterogeneous admixture pattern of North American and Asian ancestry. Overall, our assembly and annotation constitute a high-quality genomic resource that can be used for both high-throughput sequencing approaches, as well as manipulative genetic technologies to study D. suzukii.Entities:
Mesh:
Year: 2020 PMID: 32641717 PMCID: PMC7343843 DOI: 10.1038/s41598-020-67373-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(a) Genomic structure of the Orco gene in the Dsuz-WT3_v1.0 genome assembly[9] and in the Dsuz-WT3_v2.0 assembly (this article). Genomic structure in D. melanogaster is shown for comparison. The locus encompassing exon 1 is missing in the Dsuz-WT3_v1.0 assembly. (b) Number of nearly identical neighboring exons present in D. suzukii assemblies Dsuz-WT3_v1.0 and Dsuz-WT3_v2.0, and in the D. biarmipes Dbia_1.0 and D. melanogaster dm6 assemblies.
Figure 2Co-linearity between the 20 longest contigs of the Dsuz-WT3_v2.0 assembly and D. melanogaster chromosome arms. The graph was built using VGST[67] (details in the methods section). Colors represent synteny blocks automatically assigned by VGSC.
Figure 3Location at the contig level of various genomic features on the Dsuz-WT3_v2.0 assembly. Regions assembled as distinct haplotypes, regions of higher nucleotide diversity, regions with structural variants, regions with higher one-nucleotide assembly errors (in the form of indels) are highlighted. The criteria for defining the boundaries of “high indel rate” and “high nucleotide diversity” regions were as follows: a region is initialized if the rate is above 0.005 on at least 5 consecutive windows of 10 kb; the region is closed if the rate drops below 0.001 on at least 5 consecutive 10 kb windows. Using this rule, 64 regions with high indel rates (median length 140 kb) and 27 regions with high nucleotide diversity (median length 420 kb) were identified. Contigs are ordered according to their chromosomal assignment. Only the longest 100 contigs are shown, representing 79% of the total length of the assembly.
Figure 4Comparison of nucleotide diversity among D. suzukii strains and populations, and among chromosomes. Nucleotide diversity was estimated from pools of individuals for a Chinese population (Nin), a Hawaiian population (Haw), the Watsonville population (Wat), the Dsuz-WT3_v1.0 strain[9] and the Dsuz-WT3_v2.0 strain (this study). Values of nucleotide diversity parameter Theta (θ) were estimated over 10 kb windows (a) for all autosomes or (b) per chromosome. “2L”, “2R”, “3L”, “3R”, “4”, “X”: assigned D. melanogaster chromosome for each Dsuz-WT3_v2.0 contigs. “A”: autosomal contigs with no clear corresponding D. melanogaster chromosome. “Unknown”: contigs for which chromosomal features (i.e., assignment to autosomal or X chromosomes and to D. melanogaster chromosome arm) remain unknown. Error bars correspond to S.E.M.