Literature DB >> 35136843

The genome sequence of the heath fritillary, Melitaea athalia (Rottemburg, 1775).

Alex Hayward1, Roger Vila2, Dominik R Laetsch3, Konrad Lohse3, Tobias Baril1.   

Abstract

We present a genome assembly from an individual female Melitaea athalia (also known as Mellicta athalia; the heath fritillary; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 610 megabases in span. In total, 99.98% of the assembly is scaffolded into 32 chromosomal pseudomolecules, with the W and Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 12,824 protein coding genes. Copyright:
© 2021 Hayward A et al.

Entities:  

Keywords:  Melitaea athalia; Mellicta athalia; chromosomal; genome sequence; heath fritillary

Year:  2021        PMID: 35136843      PMCID: PMC8796007          DOI: 10.12688/wellcomeopenres.17280.1

Source DB:  PubMed          Journal:  Wellcome Open Res        ISSN: 2398-502X


Species taxonomy

Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Endopterygota; Lepidoptera; Glossata; Ditrysia; Papilionoidea; Nymphalidae; Nymphalinae; Melitaea athalia (also known as Mellicta athalia) (Rottemburg, 1775) (NCBI:txid113330).

Introduction

The heath fritillary, Melitaea athalia (also known as Mellicta athalia), is a medium-small sized butterfly found throughout the Palaearctic from western Europe to Japan. Historically, the species has been linked with the traditional practice of woodland coppicing, earning it the nickname of ‘Woodman’s Follower’. M. athalia is one of the UK’s rarest butterflies and was on the brink of extinction during the 1970s, but conservation efforts have since helped to save the species ( Warren, 1987). In the UK M. athalia is restricted to grasslands in Cornwall and Devon, heathland in Exmoor, and coppiced woodland in Kent and Essex ( Tomlinson & Still, 2002) and is a species of principal importance under the Natural Environment and Rural Communities Act 2006. However, it is listed as Least Concern in the IUCN Red List (Europe) ( van Swaay ). Up to eight forms and subspecies are recognized in Europe ( Tolman & Lewington, 1997). The taxon celadussa Fruhstorfer, 1910, originally described as a subspecies of athalia from southwestern Europe, is now recognized by many authors as a distinct parapatric species, with a contact zone extending from France to Austria where hybrids are found ( Wiemers ). Univoltine Fennoscandian and southern European alpine subspecies fly in single broods (June-July), whilst subalpine subspecies are bivoltine and fly during May-June and late July-August ( Tolman & Lewington, 1997). Females of M. athalia lay eggs in batches on the underside of leaves of a wide range of herbaceus food plants, with caterpillars feeding, aestivating, and hibernating together in silk nests ( Wahlberg, 2000). The standard haploid karyotype of M. athalia consists of 30 autosomes and one sex chromsome ( Bátori ), and the female is heterogametic (WZ).

Genome sequence report

The genome was sequenced from a single female M. athalia collected from Lupşa, Transylvania, Romania (latitude 46.416, longitude 23.192) ( Figure 1). A total of 30-fold coverage in Pacific Biosciences single-molecule long reads (N50 16 kb) and 64-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 82 missing/misjoins and removed 19 haplotypic duplications, reducing the assembly size by 1.94% and scaffold number by 45.12%, and increasing the scaffold N50 by 7.20%.
Figure 1.

Fore and hind wings of Melitaea athalia specimen from which the genome was sequenced.

( A) Dorsal surface view of wings from specimen RO_MA_953 (ilMelAtha1.1) from Lupşa, Transylvania, Romania, used to generate Pacific Biosciences and 10X genomics data. ( B) Ventral surface view of wings from specimen RO_MA_953 (ilMelAtha1.1) from Lupşa, Transylvania, Romania, used to generate Pacific Biosciences and 10X genomics data.

The final assembly has a total length of 610 Mb in 46 sequence scaffolds with a scaffold N50 of 20 Mb ( Table 1). Of the assembly sequence, 99.98% was assigned to 32 chromosomal-level scaffolds, representing 30 autosomes (numbered by sequence length), and the W and Z sex chromosome ( Figure 2– Figure 5; Table 2). The assembly has a BUSCO ( Simão ) completeness of 98.6% (single 97.9%, duplicated 0.7%, fragmented 0.4%, missing 1.0%) using the lepidoptera_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.
Table 1.

Genome data for Melitaea athalia, ilMelAtha1.1.

Project accession data
Assembly identifierilMelAtha1.1
Species Melitaea athalia (also known as Mellicta athalia)
SpecimenilMelAtha1, RO_MA_953
NCBI taxonomy IDNCBI:txid113330
BioProjectPRJEB42954
BioSample IDSAMEA7523312
Isolate informationFemale, whole organism
Raw data accessions
PacificBiosciences SEQUEL IIERR6576319
10X Genomics IlluminaERR6054423-ERR6054426
Hi-C IlluminaERR6054427
Genome assembly
Assembly accessionGCA_905163435.1
Accession of alternate haplotype GCA_905163405.1
Span (Mb)576
Number of contigs70
Contig N50 length (Mb)18
Number of scaffolds43
Scaffold N50 length (Mb)19
Longest scaffold (Mb)23
BUSCO * genome scoreC:98.6%[S:97.9%,D:0.7%],F:0.4%,M:1.0%,n:5286
Gene annotation
Number of protein coding genes12,824
Average coding sequence length (bp)1,492
Average number of exons per transcript8
Average exon size (bp)264
Average intron size (bp)2,892

*BUSCO scores based on the lepidoptera_odb10 BUSCO set using v5.1.2. C= complete [S= single copy, D=duplicated], F=fragmented, M=missing, n=number of orthologues in comparison. A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/ilMelAtha1.1/dataset/CAJNAG01/busco.

Figure 2.

Genome assembly of Mellitaea athalia, ilMelAtha1.1: metrics.

The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness. The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 609,564,789 bp assembly. The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest chromosome present in the assembly (26,233,870 bp, shown in red). Orange and pale-orange arcs show the N50 and N90 scaffold lengths (20,295,254 and 13,271,753 bp), respectively. The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot. A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilMelAtha1.1/dataset/CAJNAG01/snail.

Figure 5.

Genome assembly of Mellitaea athalia, ilMelAtha1.1: Hi-C contact map.

Hi-C contact map of the ilMelAtha1.1 assembly, visualised in HiGlass. Chromosomes are arranged in size order from left to right and top to bottom.

Table 2.

Chromosomal pseudomolecules in the genome assembly of Melitaea athalia, ilMelAtha1.1.

INSDC accessionChromosomeSize (Mb)GC%
HG992177.1125.1334.6
HG992178.1224.8834.4
HG992179.1323.6534.6
HG992180.1422.9334.2
HG992181.1522.9034.1
HG992182.1622.7934.5
HG992183.1721.8734.4
HG992184.1821.4234.3
HG992185.1921.3934.2
HG992186.11021.3834.1
HG992187.11121.2334.4
HG992188.11220.5134.3
HG992189.11320.3034.8
HG992190.11420.2134.2
HG992191.11519.9934.3
HG992192.11619.8234.6
HG992193.11719.6434.5
HG992194.11819.5534.7
HG992195.11918.4434.9
HG992196.12018.3735
HG992197.12117.0634.4
HG992198.12216.6234.6
HG992199.12315.2234.7
HG992200.12415.1536.8
HG992201.12514.9735
HG992202.12614.4036.2
HG992203.12713.2734.7
HG992204.12812.7634.9
HG992205.12911.9735.8
HG992206.13010.9036.1
HG992207.1W5.2737.4
HG992176.1Z26.2334
HG992208.1MT0.0219.7
-Unplaced9.3437.2

Fore and hind wings of Melitaea athalia specimen from which the genome was sequenced.

( A) Dorsal surface view of wings from specimen RO_MA_953 (ilMelAtha1.1) from Lupşa, Transylvania, Romania, used to generate Pacific Biosciences and 10X genomics data. ( B) Ventral surface view of wings from specimen RO_MA_953 (ilMelAtha1.1) from Lupşa, Transylvania, Romania, used to generate Pacific Biosciences and 10X genomics data.

Genome assembly of Mellitaea athalia, ilMelAtha1.1: metrics.

The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness. The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 609,564,789 bp assembly. The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest chromosome present in the assembly (26,233,870 bp, shown in red). Orange and pale-orange arcs show the N50 and N90 scaffold lengths (20,295,254 and 13,271,753 bp), respectively. The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot. A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilMelAtha1.1/dataset/CAJNAG01/snail.

Genome assembly of Mellitaea athalia , ilMelAtha1.1: GC coverage.

BlobToolKit GC-coverage plot. Scaffolds are coloured by phylum. Circles are sized in proportion to scaffold length. Histograms show the distribution of scaffold length sum along each axis. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilMelAtha1.1/dataset/CAJNAG01/blob.

Genome assembly of Mellitaea athalia, ilMelAtha1.1: cumulative sequence.

BlobToolKit cumulative sequence plot. The grey line shows cumulative length for all scaffolds. Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilMelAtha1.1/dataset/CAJNAG01/cumulative.

Genome assembly of Mellitaea athalia, ilMelAtha1.1: Hi-C contact map.

Hi-C contact map of the ilMelAtha1.1 assembly, visualised in HiGlass. Chromosomes are arranged in size order from left to right and top to bottom. *BUSCO scores based on the lepidoptera_odb10 BUSCO set using v5.1.2. C= complete [S= single copy, D=duplicated], F=fragmented, M=missing, n=number of orthologues in comparison. A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/ilMelAtha1.1/dataset/CAJNAG01/busco.

Gene annotation

The Ensembl gene annotation system ( Aken ) was used to generate annotation for the Melitaea athalia assembly (GCA_905220545.1, see https://rapid.ensembl.org/Mellicta_athalia_GCA_905220545.1/; Table 1). The annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt ( UniProt Consortium, 2019)) and OrthoDB ( Kriventseva ). Prediction tools, CPC2 ( Kang ) and RNAsamba ( Camargo ), were used to aid determination of protein coding genes.

Methods

Sample acquisition, nucleic acid extraction and sequencing

A single female M. athalia was collected from Lupşa, Transylvania, Romania (latitude 46.416, longitude 23.192) by Alex Hayward (University of Exeter), Roger Vila (Universitat Pompeu Fabra), Dominik Laetsch and Konrad Lohse (both University of Edinburgh), using a net. The specimen was identified by Roger Vila and was snap-frozen in liquid nitrogen. DNA was extracted from the whole organism of ilMelAtha1 using the Qiagen MagAttract HMW DNA kit, according to the manufacturer’s instructions. Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud sequencing libraries were then constructed according to the manufacturers’ instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II and Illumina HiSeq X instruments. Hi-C data were generated using the Arima v1.0 kit and sequenced on HiSeq X.

Genome assembly

Assembly was carried out with HiCanu ( Nurk ). Haplotypic duplication was identified and removed with purge_dups ( Guan ). One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes ( Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data ( Rao ) using SALSA2 ( Ghurye ). The assembly was checked for contamination and corrected using the gEVAL system ( Chow ) as described previously ( Howe ). Manual curation was performed using gEVAL, HiGlass ( Kerpedjiev ) and Pretext. The mitochondrial genome was assembled using MitoHiFi ( Uliano-Silva ). The genome was analysed and BUSCO scores generated within the BlobToolKit environment ( Challis ). Table 3 contains a list of all software tool versions used, where appropriate.
Table 3.

Software tools used.

Software toolVersionSource
HiCanu2.1 Nurk et al., 2020
purge_dups1.2.3 Guan et al., 2020
SALSA22.2 Ghurye et al., 2019
longranger align2.2.2 https://support.10xgenomics.com/genome-exome/software/pipelines/latest/advanced/other-pipelines
freebayes1.3.1-17-gaa2ace8 Garrison & Marth, 2012
MitoHiFi1 https://github.com/marcelauliano/MitoHiFi
gEVAL2016 Chow et al., 2016
HiGlass1.11.6 Kerpedjiev et al., 2018
PretextView0.1.x https://github.com/wtsi-hpag/PretextView
BlobToolKit2.6.2 Challis et al., 2020

Ethical/compliance issues

The materials that have contributed to this genome note were supplied by a Tree of Life collaborator. The Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use. The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible. The overarching areas of consideration are: Ethical review of provenance and sourcing of the material; Legality of collection, transfer and use (national and international). Each transfer of samples is undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Tree of Life collaborator, Genome Research Limited (operating as the Wellcome Sanger Institute) and in some circumstances other Tree of Life collaborators.

Data availability

European Nucleotide Archive: Mellicta athalia (heath fritillary). Accession number PRJEB42954; https://identifiers.org/ena.embl/PRJEB42954. The genome sequence is released openly for reuse. The M. athalia genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. Raw data and assembly accession identifiers are reported in Table 1. I believe that there is data which contributes to the community. Both the genome sequencing and assembly  methods and applications looks good and presented well although it would be great to go into details a little more so that other scientists can use/compare/benefit form their experiences. I am unsure if Figure 4 is necessary to show cumulative seq. The authors were even able to assemble the W chromosome. This would definitely help sequencing and chromosome level assembly of wheat stem sawfly and orange wheat blossom midge, dangerous insects in the USA. Are sufficient details of methods and materials provided to allow replication by others? Yes Is the rationale for creating the dataset(s) clearly described? Yes Are the datasets clearly presented in a useable and accessible format? Yes Are the protocols appropriate and is the work technically sound? Yes Reviewer Expertise: Plant genomics and biology, Next generation sequencing and annotations, smallRNAs, microRNAs, LncRNAs. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. This manuscript presents a concise report of the genome sequencing and assembly of  Melitaea athalia, one of the rarest butterflies in Europe. The methods used are appropriate and appear to have been conducted appropriately. The result is a highly contiguous, chromosome-level assembly of the genome. It is noteworthy that the authors were able to assemble the W chromosome and release the unphased assemblies of both diploid copies. This genome is certain to be a valuable resource for comparative genomics of Lepidoptera and future conservation efforts of the species. I found no major errors or concerns with the manuscript, and felt the tables and figures included sufficiently and succinctly presented relevant information to assess the methods used and quality of the genome assembly. I was unclear which version of BUSCO was used and if the "lepidoptera_odb10" database was the 2019 version. These details could be useful for future users of the data, and perhaps could be simply added to Table 3. Overall, I commend the authors for their clear and concise presentation of the work. Are sufficient details of methods and materials provided to allow replication by others? Yes Is the rationale for creating the dataset(s) clearly described? Yes Are the datasets clearly presented in a useable and accessible format? Yes Are the protocols appropriate and is the work technically sound? Yes Reviewer Expertise: Lepidoptera, Evolution, Genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
  15 in total

1.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors:  Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal:  Bioinformatics       Date:  2015-06-09       Impact factor: 6.937

2.  RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences.

Authors:  Antonio P Camargo; Vsevolod Sourkov; Gonçalo A G Pereira; Marcelo F Carazzolle
Journal:  NAR Genom Bioinform       Date:  2020-01-13

3.  CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features.

Authors:  Yu-Jian Kang; De-Chang Yang; Lei Kong; Mei Hou; Yu-Qi Meng; Liping Wei; Ge Gao
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

4.  A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.

Authors:  Suhas S P Rao; Miriam H Huntley; Neva C Durand; Elena K Stamenova; Ivan D Bochkov; James T Robinson; Adrian L Sanborn; Ido Machol; Arina D Omer; Eric S Lander; Erez Lieberman Aiden
Journal:  Cell       Date:  2014-12-11       Impact factor: 41.582

5.  OrthoDB: the hierarchical catalog of eukaryotic orthologs.

Authors:  Evgenia V Kriventseva; Nazim Rahman; Octavio Espinosa; Evgeny M Zdobnov
Journal:  Nucleic Acids Res       Date:  2007-10-18       Impact factor: 16.971

6.  An updated checklist of the European Butterflies (Lepidoptera, Papilionoidea).

Authors:  Martin Wiemers; Emilio Balletto; Vlad Dincă; Zdenek Faltynek Fric; Vladimir Lukhtanov; Miguel L Munguira; Roger Vila; Albert Vliegenthart; Niklas Wahlberg; Rudi Verovnik
Journal:  Zookeys       Date:  2018-12-31       Impact factor: 1.546

7.  UniProt: a worldwide hub of protein knowledge.

Authors: 
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

8.  Significantly improving the quality of genome assemblies through curation.

Authors:  Kerstin Howe; William Chow; Joanna Collins; Sarah Pelan; Damon-Lee Pointon; Ying Sims; James Torrance; Alan Tracey; Jonathan Wood
Journal:  Gigascience       Date:  2021-01-09       Impact factor: 6.524

9.  Identifying and removing haplotypic duplication in primary genome assemblies.

Authors:  Dengfeng Guan; Shane A McCarthy; Jonathan Wood; Kerstin Howe; Yadong Wang; Richard Durbin
Journal:  Bioinformatics       Date:  2020-05-01       Impact factor: 6.937

10.  HiGlass: web-based visual exploration and analysis of genome interaction maps.

Authors:  Peter Kerpedjiev; Nezar Abdennur; Fritz Lekschas; Chuck McCallum; Kasper Dinkla; Hendrik Strobelt; Jacob M Luber; Scott B Ouellette; Alaleh Azhir; Nikhil Kumar; Jeewon Hwang; Soohyun Lee; Burak H Alver; Hanspeter Pfister; Leonid A Mirny; Peter J Park; Nils Gehlenborg
Journal:  Genome Biol       Date:  2018-08-24       Impact factor: 13.583

View more
  1 in total

1.  The genome sequence of the lesser marbled fritillary, Brenthis ino, and evidence for a segregating neo-Z chromosome.

Authors:  Alexander Mackintosh; Dominik R Laetsch; Tobias Baril; Robert G Foster; Vlad Dincă; Roger Vila; Alexander Hayward; Konrad Lohse
Journal:  G3 (Bethesda)       Date:  2022-05-30       Impact factor: 3.542

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.