Literature DB >> 35419492

The genome sequence of Gymnosoma rotundatum (Linnaeus, 1758), a parasitoid ladybird fly.

Matthew Smith1.   

Abstract

We present a genome assembly from an individual male Gymnosoma rotundatum (Arthropoda; Insecta; Diptera; Tachinidae). The genome sequence is 779 megabases in span. The majority of the assembly (97.07%) is scaffolded into six chromosomal pseudomolecules, with the X sex chromosome assembled. Copyright:
© 2022 Smith M et al.

Entities:  

Keywords:  Diptera; Gymnosoma rotundatum; chromosomal; genome sequence

Year:  2022        PMID: 35419492      PMCID: PMC8987346          DOI: 10.12688/wellcomeopenres.17782.1

Source DB:  PubMed          Journal:  Wellcome Open Res        ISSN: 2398-502X


Species taxonomy

Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; Oestroidea; Tachinidae; Phasiinae; Gymnosomatini; Gymnosoma; Gymnosoma rotundatum (Linnaeus, 1758) (NCBI:txid569046).

Background

The Tachinid flies (Diptera: Tachinidae) are one of the largest families of flies. The entire family are parasitic, with the larvae developing as internal parasites in a range of hosts, mostly insects. Gymnosoma rotundatum (Diptera: Tachinidae) is a small, 5-6-mm-long fly with a dark thorax dusted with gold in males, and a globular red or orange abdomen decorated with dark markings along the midline. This shape and colouration has given rise to the use of the name "ladybird flies" as a common name for various Gymnosoma species. Gymnosoma rotundatum is known from Britain and Ireland ( Belshaw, 1993). In Ireland, the species has only been recorded from a few localities in southern Ireland, with the most recent record from County Kerry in 2015. In Britain, the species has historically been regarded as rare, and was accorded Red Data Book 3 status by Falk (1991). Morris (1997) summarised the known British records and distribution of the species up to 1996, noting that G. rotundatum appeared to be "seemingly confined to a narrow corridor from the West Sussex coast through Surrey and parts of North Hampshire". Gymnosoma rotundatum appears to be one of the species benefiting from the warming climate in the UK, and since 1996 it has been increasingly recorded away from its restricted historical range. It is now known from a large number of sites in central southern and south-east England, with a few recent records from East Anglia. Gymnosoma rotundatum is a parasite of Shieldbugs (Hemiptera: Pentatomidae), though specific host details are limited. Tschorsnig & Herting (1994) only cite "Pentatomidae" and Belshaw (1993) lists Palomena spp. as a host, although there are no confirmed British rearing records. Adult flies are on the wing from late April until early October, with records peaking in August. The species is most often recorded from warm dry sites, where it visits a range of open shallow flowers such as Hogweed ( Heracleum sphondylium), Yarrow ( Achillea millefolium) and Mayweeds ( Tripleurospermum sp.).

Genome sequence report

The genome was sequenced from a single male G. rotundatum ( Figure 1) collected from Hartslock Reserve, Oxfordshire, UK (latitude 51.511263, longitude -1.112222). A total of 31-fold coverage in Pacific Biosciences single-molecule long reads and 32-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 191 missing/misjoins and removed 3 haplotypic duplications, reducing the assembly size by 0.11% and the scaffold number by 23.88%, and increasing the scaffold N50 by 14.22%.
Figure 1.

Images of the Gymnosoma rotundatum specimen, taken during preservation and processing.

Left, lateral view; right, dorsal view.

Images of the Gymnosoma rotundatum specimen, taken during preservation and processing.

Left, lateral view; right, dorsal view. The final assembly has a total length of 779 Mb in 392 sequence scaffolds with a scaffold N50 of 137.8 Mb ( Table 1). The majority, 97.07%, of the assembly sequence was assigned to 6 chromosomal-level scaffolds, representing 5 autosomes (numbered by sequence length), and the X sex chromosome ( Figure 2– Figure 5; Table 2). The X chromosome has been identified based on half diploid coverage. There are a large number of unassigned scaffolds that may belong to X or Y, as we are uncertain whether the karyotype is X0 or XY. The assembly has a BUSCO v5.2.2 ( Manni ) completeness of 98.8% (single 98.3%, duplicated 0.4%) using the diptera_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.
Table 1.

Genome data for Gymnosoma rotundatum, idGymRotn1.2.

Project accession data
Assembly identifieridGymRotn1.2
Species Gymnosoma rotundatum
SpecimenidGymRotn1
NCBI taxonomy ID569046
BioProjectPRJEB46301
BioSample IDSAMEA7849381
Isolate informationMale, thorax (genome assembly), head (Hi-C)
Raw data accessions
PacificBiosciences SEQUEL IIERR6939227
10X Genomics IlluminaERR6688431-ERR6688434
Hi-C IlluminaERR6688430
Genome assembly
Assembly accessionGCA_916610165.2
Accession of alternate haplotype GCA_916610175.2
Span (Mb)779
Number of contigs623
Contig N50 length (Mb)9.4
Number of scaffolds392
Scaffold N50 length (Mb)137.8
Longest scaffold (Mb)182.0
BUSCO * genome scoreC:98.8%[S:98.3%,D:0.4%],F:0.5%,M:0.7%,n:3285

*BUSCO scores based on the diptera_odb10 BUSCO set using v5.2.2. C= complete [S= single copy, D=duplicated], F=fragmented, M=missing, n=number of orthologues in comparison. A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/idGymRotn1.2/dataset/CAKAJB02/busco.

Figure 2.

Genome assembly of Gymnosoma rotundatum, idGymRotn1.2: metrics.

The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness. The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 779,146,119 bp assembly. The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (182,003,241 bp, shown in red). Orange and pale-orange arcs show the N50 and N90 scaffold lengths (137,798,182 and 132,556,942 bp), respectively. The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot. A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idGymRotn1.2/dataset/CAKAJB02/snail.

Figure 5.

Genome assembly of Gymnosoma rotundatum, idGymRotn1.2: Hi-C contact map.

Hi-C contact map of the idGymRotn1.2 assembly, visualised in HiGlass. Chromosomes are presented in order of size from left to right and top to bottom. An interactive version of this figure is available here.

Table 2.

Chromosomal pseudomolecules in the genome assembly of Gymnosoma rotundatum, idGymRotn1.2.

INSDC accessionChromosomeSize (Mb)GC%
OU744336.11182.0030.1
OU744337.12152.5130.3
OU744338.13137.8030.1
OU744339.14134.0530.3
OU744340.15132.5630.1
OU744341.1X17.3432.5
OU744342.1MT0.0218.9
-Unplaced22.8732.2
*BUSCO scores based on the diptera_odb10 BUSCO set using v5.2.2. C= complete [S= single copy, D=duplicated], F=fragmented, M=missing, n=number of orthologues in comparison. A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/idGymRotn1.2/dataset/CAKAJB02/busco.

Genome assembly of Gymnosoma rotundatum, idGymRotn1.2: metrics.

The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness. The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 779,146,119 bp assembly. The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (182,003,241 bp, shown in red). Orange and pale-orange arcs show the N50 and N90 scaffold lengths (137,798,182 and 132,556,942 bp), respectively. The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot. A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idGymRotn1.2/dataset/CAKAJB02/snail.

Genome assembly of Gymnosoma rotundatum, idGymRotn1.2: GC coverage.

BlobToolKit GC-coverage plot. Scaffolds are coloured by phylum. Circles are sized in proportion to scaffold length. Histograms show the distribution of scaffold length sum along each axis. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idGymRotn1.2/dataset/CAKAJB02/blob.

Genome assembly of Gymnosoma rotundatum, idGymRotn1.2: cumulative sequence.

BlobToolKit cumulative sequence plot. The grey line shows cumulative length for all scaffolds. Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/idGymRotn1.2/dataset/CAKAJB02/cumulative.

Genome assembly of Gymnosoma rotundatum, idGymRotn1.2: Hi-C contact map.

Hi-C contact map of the idGymRotn1.2 assembly, visualised in HiGlass. Chromosomes are presented in order of size from left to right and top to bottom. An interactive version of this figure is available here.

Methods

Sample acquisition and nucleic acid extraction

A male G. rotundatum (idGymRotn1) was collected from Hartslock Reserve, Oxfordshire, UK (latitude 51.511263, longitude -1.112222) by Matt Smith, independent researcher, who also identified the specimens. The specimens were collected from grassland using a net and snap-frozen in liquid nitrogen. DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute. The idGymRotn1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing. Thorax tissue was disrupted using a Nippi Powermasher fitted with a BioMasher pestle. Fragment size analysis of 0.01-0.5 ng of DNA was then performed using an Agilent FemtoPulse. High molecular weight (HMW) DNA was extracted using the Qiagen MagAttract HMW DNA extraction kit. Low molecular weight DNA was removed from a 200-ng aliquot of extracted DNA using 0.8X AMpure XP purification kit prior to 10X Chromium sequencing; a minimum of 50 ng DNA was submitted for 10X sequencing. HMW DNA was sheared into an average fragment size between 12–20 kb in a Megaruptor 3 system with speed setting 30. Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8X ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample. The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system.

Sequencing

Pacific Biosciences HiFi circular consensus and 10X Genomics Chromium read cloud sequencing libraries were constructed according to the manufacturers’ instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II and Illumina NovaSeq 6000 instruments. Hi-C data were generated from head tissue of idGymRotn1 using the Arima Hi-C+ kit and sequenced on a NovaSeq 6000 instrument.

Genome assembly

Assembly was carried out with Hifiasm ( Cheng ); haplotypic duplication was identified and removed with purge_dups ( Guan ). One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes ( Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data ( Rao ) using SALSA2 ( Ghurye ). The assembly was checked for contamination as described previously ( Howe ). Manual curation ( Howe ) was performed using HiGlass ( Kerpedjiev ) and Pretext. The mitochondrial genome was assembled using MitoHiFi ( Uliano-Silva ), which performed annotation using MitoFinder ( Allio ). The genome was analysed and BUSCO scores generated within the BlobToolKit environment ( Challis ). Table 3 contains a list of all software tool versions used, where appropriate.
Table 3.

Software tools used.

Software toolVersionSource
Hifiasm0.15.1 Cheng et al., 2021
purge_dups1.2.3 Guan et al., 2020
SALSA22.2 Ghurye et al., 2019
longranger align2.2.2 https://support.10xgenomics.com/ genome-exome/software/pipelines/latest/ advanced/other-pipelines
freebayes1.3.1-17-gaa2ace8 Garrison & Marth, 2012
MitoHiFi2.0 Uliano-Silva et al., 2021
HiGlass1.11.6 Kerpedjiev et al., 2018
PretextView0.2.x https://github.com/wtsi-hpag/PretextView
BlobToolKit3.0.5 Challis et al., 2020

Ethics/compliance issues

The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner. The submission of materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice. By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project. Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.

Data availability

European Nucleotide Archive: Gymnosoma rotundatum. Accession number PRJEB46301; https://identifiers.org/ena.embl/PRJEB46301. The genome sequence is released openly for reuse. The G. rotundatum genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome will be annotated and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1. This paper presents the genome of a Tachinid fly, sequenced and assembled using state-of-the-art techniques. I have very little to add. In the introduction, some natural history information is given, but the reason for picking this species for genome sequencing is not discussed. The fact that the huge Tachnidae group is poorly represented among parasitoid insects with genome sequences available might be mentioned. Parasitoids are important model systems for various research topics as well as biological control agents. . However, genomics resources are limited, except for parasitoid wasps. As the authors are undoubtedly aware, it would have been nice to have some (short-read) data from a female of the same species. That would allow identification of Y-chromosomal contigs, as these would have very few of the female reads mapping to them. Are sufficient details of methods and materials provided to allow replication by others? Yes Is the rationale for creating the dataset(s) clearly described? No Are the datasets clearly presented in a useable and accessible format? Yes Are the protocols appropriate and is the work technically sound? Yes Reviewer Expertise: evolutionary genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. Thank you for the opportunity to review the data note manuscript by Matthew Smith et al. on the genome sequence of Gymnosoma rotundatum (Diptera: Tachinidae). The approach taken by DToL can be seen as a benchmark for many other genome sequencing initiatives for providing a state-of-the-art, chromosomally scaffolded reference genome for the species. I am not sure how many exists for true flies, but to my knowledge this is the first for the family of Tachinidae (parasitic flies) and certainly the first for the subfamily of bug killing flies (Phasiinae). I have only few minor suggestions to improve the content of the manuscript. Title: “parasitoid ladybird fly” is a bit awkward. True that it refers to the vernacular (although not widely adopted) English name of the species, but it also might mislead a quick reader to think of ladybird beetles (Coccinellidae). I would suggest something in the line of “The genome sequence of a bug killing fly, Gymnosoma rotudatum (Linnaeus, 1758) (Diptera, Tachinidae)”. Spelling out the family name will help to find the paper for those who are interested in tachinids in general but are unfamiliar with the genus. Background: Nit-picking, but I would put the size of G. rotundatum to 5-9 mm. As of note, there are taxonomic uncertainties in the genus due to the morphological variation and DNA barcode sharing (see DOI: 10.1111/syen.12450), for which the reference genome will probably bring more resolution. Fortunately, the specimen illustrated in the Figure 1. is photographed in a way that it shows enough detail to convince me that it really represents our current understanding of the taxon under the name Gymnosoma rotundatum. Taxonomically the publication is also on safe grounds as if there are synonyms in the future, G. rotundatum is the oldest name in the genus. There is also a comprehensive host catalogue for Palearctic Tachinidae here: http://www.nadsdiptera.org/Tach/WorldTachs/CatPalHosts/Home.html, which could be cited for the host records. As there are many different host species, they could be summed just as “suitable sized shield bugs (Pentatomidae) in the fly’s habitat”. Genome sequence report: You might be able to infer the existence of a Y-chromosome indirectly (most calyptrate flies follow XY-system of sex determination) by looking at the existence of the dominant male-determining factor in your sequence data (e.g. PLoS Biol.  13(4), e1002078). The fact that you have a haploid X, strongly suggests the XY-male type as there are other examples of Oestroidea with homomorphic sex chromosomes ( https://doi.org/10.1038/s41598-020-72880-0). Figs. 2-5. Looks very good. I especially enjoyed the chromosome-level mapping in Fig. 5. and its interactive version. Table 2: It is good to note that the fact that X chromosome is quite small compared to the others, is in line with some Calliphoridae (doi: 10.3897/CompCytogen.v9i1.8671). Sample acquisition & nucleic acids extraction: Collection date missing. Specimens in plural, are there others? No RNA was extracted? Was the whole specimen destroyed in the DNA extractions or is there some reference tissue left? Where (and how) is this stored and how can it be located? If the reference specimen is still existing, describe all associated labels. For later morphological analysis it would be great to preserve at least the male genitalia as a voucher. The voucher should be placed in a public collection. I assume that this is probably for all specimens in DToL but needs to be described here. Ethics/compliance: Was a collection permit required for the Hartslock Reserve? If so, please provide details. Are sufficient details of methods and materials provided to allow replication by others? Yes Is the rationale for creating the dataset(s) clearly described? Yes Are the datasets clearly presented in a useable and accessible format? Yes Are the protocols appropriate and is the work technically sound? Yes Reviewer Expertise: molecular biology, genetics, population genetics, evolution, Diptera taxonomy I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
  14 in total

1.  A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.

Authors:  Suhas S P Rao; Miriam H Huntley; Neva C Durand; Elena K Stamenova; Ivan D Bochkov; James T Robinson; Adrian L Sanborn; Ido Machol; Arina D Omer; Eric S Lander; Erez Lieberman Aiden
Journal:  Cell       Date:  2014-12-11       Impact factor: 41.582

2.  Numerous transitions of sex chromosomes in Diptera.

Authors:  Beatriz Vicoso; Doris Bachtrog
Journal:  PLoS Biol       Date:  2015-04-16       Impact factor: 8.029

3.  Comparative study of mitotic chromosomes in two blowflies, Luciliasericata and L.cluvia (Diptera, Calliphoridae), by C- and G-like banding patterns and rRNA loci, and implications for karyotype evolution.

Authors:  Mónica G Chirino; Luis F Rossi; María J Bressa; Juan P Luaces; María S Merani
Journal:  Comp Cytogenet       Date:  2015-03-31       Impact factor: 1.800

4.  Significantly improving the quality of genome assemblies through curation.

Authors:  Kerstin Howe; William Chow; Joanna Collins; Sarah Pelan; Damon-Lee Pointon; Ying Sims; James Torrance; Alan Tracey; Jonathan Wood
Journal:  Gigascience       Date:  2021-01-09       Impact factor: 6.524

5.  Identifying and removing haplotypic duplication in primary genome assemblies.

Authors:  Dengfeng Guan; Shane A McCarthy; Jonathan Wood; Kerstin Howe; Yadong Wang; Richard Durbin
Journal:  Bioinformatics       Date:  2020-05-01       Impact factor: 6.937

6.  HiGlass: web-based visual exploration and analysis of genome interaction maps.

Authors:  Peter Kerpedjiev; Nezar Abdennur; Fritz Lekschas; Chuck McCallum; Kasper Dinkla; Hendrik Strobelt; Jacob M Luber; Scott B Ouellette; Alaleh Azhir; Nikhil Kumar; Jeewon Hwang; Soohyun Lee; Burak H Alver; Hanspeter Pfister; Leonid A Mirny; Peter J Park; Nils Gehlenborg
Journal:  Genome Biol       Date:  2018-08-24       Impact factor: 13.583

7.  Genomic Resources for Goniozus legneri, Aleochara bilineata and Paykullia maculata, Representing Three Independent Origins of the Parasitoid Lifestyle in Insects.

Authors:  Ken Kraaijeveld; Peter Neleman; Janine Mariën; Emile de Meijer; Jacintha Ellers
Journal:  G3 (Bethesda)       Date:  2019-04-09       Impact factor: 3.154

8.  MitoFinder: Efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics.

Authors:  Rémi Allio; Alex Schomaker-Bastos; Jonathan Romiguier; Francisco Prosdocimi; Benoit Nabholz; Frédéric Delsuc
Journal:  Mol Ecol Resour       Date:  2020-04-25       Impact factor: 7.090

9.  The genomes of a monogenic fly: views of primitive sex chromosomes.

Authors:  Anne A Andere; Meaghan L Pimsler; Aaron M Tarone; Christine J Picard
Journal:  Sci Rep       Date:  2020-09-25       Impact factor: 4.379

10.  BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes.

Authors:  Mosè Manni; Matthew R Berkeley; Mathieu Seppey; Felipe A Simão; Evgeny M Zdobnov
Journal:  Mol Biol Evol       Date:  2021-09-27       Impact factor: 16.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.