Literature DB >> 35028424

The genome sequence of the common toad, Bufo bufo (Linnaeus, 1758).

Abstract

We present a genome assembly from an individual male Bufo bufo (the common toad; Chordata; Amphibia; Anura; Bufonidae). The genome sequence is 5.04 gigabases in span. The majority of the assembly (99.1%) is scaffolded into 11 chromosomal pseudomolecules. Gene annotation of this assembly by the NCBI Eukaryotic Genome Annotation Pipeline has identified 21,517 protein coding genes. Copyright:

Entities: Chemical

Keywords: Bufo bufo; chromosomal; common toad; genome sequence

Year: 2021 PMID： 35028424 PMCID： PMC8729185 DOI： 10.12688/wellcomeopenres.17298.1

Source DB: PubMed Journal: Wellcome Open Res ISSN： 2398-502X

Species taxonomy

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Amphibia; Batrachia; Anura; Neobatrachia; Hyloidea; Bufonidae; Bufo; Bufo bufo Linnaeus 1758 (NCBI:txid8384).

Introduction

The common toad, Bufo bufo (Anura: Bufonidae) is widely distributed throughout Europe. It has a biphasic life cycle that includes aquatic, benthic larvae and terrestrial adults. Bufonids like B. bufo are notable amongst anurans in that they (1) lack maxillary teeth, (2) have Bidder’s organs, and (3) have paired paratoid glands that contain alkaloid toxins. Bufo bufo has been used extensively in comparative vertebrate research including as a model system in sensory biology ( Ewert, 1974). Based on populations from mainland Europe, the nuclear genome size of B. bufo was previously estimated to be between 5.82 and 7.75 picograms (= 5.69 and 7.58 gigabases; ( Gregory, 2021)). This is slightly larger than our 5.04 gigabase assembly. The eleven pseudomolecules in our assembly match the expected number of chromosomes in B. bufo (2N = 22; six macro- and five micro-chromosomes; ( Birstein & Mazin, 1982; Makino & Others, 1951). This is the third nuclear genome sequence to be reported from a bufonid anuran ( Edwards ; Lu ). The B. bufo reference genome reported here has been used to study pseudogenization of the tooth enamel gene amelogenin in bufonids ( Shaheen ). The genome of a common toad from the UK is particularly timely as a tool for understanding the dynamics of population declines observed over the last two decades ( Carrier & Beebee, 2003; Petrovan & Schmidt, 2016).

Genome sequence report

The genome was sequenced from one male B. bufo collected from the Natural History Museum (NHM) Wildlife Garden, London, UK ( Figure 1A, B). A total of 64-fold coverage in Pacific Biosciences single-molecule long reads (N50 28 kb) and 56-fold coverage in 10X Genomics read clouds (from molecules with an estimated N50 of 29 kb) were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 3498 missing/misjoins and removed 513 haplotypic duplications, reducing the assembly length by 2.4% and the scaffold number by 49.5%, and increasing the scaffold N50 by 38.9%.

Figure 1.

( A) Male voucher specimen of Bufo bufo (NHMUK 2013.484; Field ID, JWS 758; Snout–Vent Length 55.5 mm) from which the genome was sequenced. ( B) Ventral surface of NHMUK 2013.484. ( C) The individual was collected from the Natural History Museum Wildlife Garden, London, England. ( D) Large numbers of an unidentified nematode parasite were present in the stomach of NHMUK 2013.484. The final assembly has a total length of 5.04 Gb in 1307 sequence scaffolds with a scaffold N50 of 636 Mb ( Table 1). The majority, 99.1%, of the assembly sequence was assigned to 11 chromosomal-level scaffolds (numbered by sequence length) ( Figure 2– Figure 5; Table 2). The assembly has a BUSCO ( Simão ) v5.1.2 completeness of 90.1% using the tetrapoda_odb10 reference set. However, a BUSCO (v4.0.2) score of 95.3% using the same reference set was obtained for the annotated gene set of the aBufBuf1.1 assembly (see section Genome annotation), indicating that the assembly has a high level of completeness and that some genes were missed during BUSCO analysis of the whole genome assembly. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Table 1.

Genome data for Bufo bufo, aBufBuf1.1.

Project accession data
Assembly identifier	aBufBuf1.1
Species	Bufo bufo
Specimen	aBufBuf1
NCBI taxonomy ID	NCBI:txid8384
BioProject	PRJEB42238
BioSample ID	SAMEA7521636
Isolate information	Male, heart tissue; NHMUK 2013.484
Raw data accessions
PacificBiosciences SEQUEL II	ERR7012639, ERR7015063-ERR7015065
10X Genomics Illumina	ERR6002753-ERR6002766, ERR6003048, ERR6003049
Hi-C Illumina	ERR6002767-ERR6002770
BioNano	ERZ3003198
Genome assembly
Assembly accession	GCA_905171765.1
Accession of alternate haplotype	GCA_905171715.1
Span (Mb)	5,045
Number of contigs	5,502
Contig N50 length (Mb)	3.96
Number of scaffolds	1,307
Scaffold N50 length (Mb)	636
Longest scaffold (Mb)	843
BUSCO genome score *	C:90.1%[S:88.5%,D:1.6%],F:3.2%,M:6.7%,n:5310
Genome annotation
Number of genes	30,286
Number of protein-coding genes	21,517
Average length of gene (bp)	57,667
Average number of exons per gene	12
Average exon size (bp)	241
Average intron size (bp)	8,995
BUSCO annotation score **	C:95.3%[S:93.2%,D:2.1%],F:0.7%,M:4.0%,n:5310

C= complete [S= single copy, D=duplicated], F=fragmented, M=missing, n=number of orthologues in comparison.

*BUSCO scores based on the terapoda_odb10 BUSCO set using v5.1.2, run on the aBufBuf1.1 genome assembly using BlobToolKit. A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/aBufBuf1.1/dataset/CAJIMN01/busco.

**BUSCO scores based on the terapoda_odb10 BUSCO set using v4.0.2, run on the NCBI RefSeq annotation of the aBufBuf1.1 genome assembly ( NCBI ).

Figure 2.

Genome assembly of Bufo bufo, aBufBuf1.1: metrics.

The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness. The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 5,044,762,059 bp assembly. The distribution of chromosome lengths is shown in dark grey with the plot radius scaled to the longest chromosome present in the assembly (843,366,180 bp, shown in red). Orange and pale-orange arcs show the N50 and N90 chromosome lengths (635,713,434 and 230,778,867 bp), respectively. The pale grey spiral shows the cumulative chromosome count on a log scale with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot. A summary of complete, fragmented, duplicated and missing BUSCO genes in the tetrapoda_odb10 set is shown in the top right. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Bufo%20bufo/dataset/CAJIMN01/snail.

Figure 5.

Genome assembly of Bufo bufo, aBufBuf1.1: Hi-C contact map.

Hi-C contact map of the aBufBuf1.1 assembly, visualised in HiGlass. Chromosomes are presented in size order from left to right and top to bottom.

Table 2.

Chromosomal pseudomolecules in the genome assembly of Bufo bufo, aBufBuf1.1.

INSDC accession	Chromosome	Size (Mb)	GC%
LR991667.1	1	843.37	44.5
LR991668.1	2	842.56	44.3
LR991669.1	3	707.96	44.4
LR991670.1	4	635.71	44.4
LR991671.1	5	567.30	44.4
LR991672.1	6	439.63	44.8
LR991673.1	7	236.60	44.7
LR991674.1	8	231.67	44.8
LR991675.1	9	230.78	44.8
LR991676.1	10	151.57	44.8
LR991677.1	11	103.21	45
LR991678.1	MT	0.02	42.7
-	Unlocalised	54.40	45.1

C= complete [S= single copy, D=duplicated], F=fragmented, M=missing, n=number of orthologues in comparison. *BUSCO scores based on the terapoda_odb10 BUSCO set using v5.1.2, run on the aBufBuf1.1 genome assembly using BlobToolKit. A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/aBufBuf1.1/dataset/CAJIMN01/busco. **BUSCO scores based on the terapoda_odb10 BUSCO set using v4.0.2, run on the NCBI RefSeq annotation of the aBufBuf1.1 genome assembly ( NCBI ).

Genome assembly of Bufo bufo, aBufBuf1.1: metrics.

Genome assembly of Bufo bufo, aBufBuf1.1: GC-coverage.

BlobToolKit GC-coverage plot. Scaffolds are coloured by phylum. Circles are sized in proportion to scaffold length. Histograms show the distribution of scaffold length sum along each axis. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Bufo%20bufo/dataset/CAJIMN01/blob.

Genome assembly of Bufo bufo, aBufBuf1.1: cumulative sequence.

BlobToolKit cumulative sequence plot. The grey line shows cumulative length for all scaffolds. Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Bufo%20bufo/dataset/CAJIMN01/cumulative.

Genome assembly of Bufo bufo, aBufBuf1.1: Hi-C contact map.

Hi-C contact map of the aBufBuf1.1 assembly, visualised in HiGlass. Chromosomes are presented in size order from left to right and top to bottom.

Genome annotation

The B. bufo assembly was annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. The annotation ( NCBI Bufo bufo Annotation Release 100; Table 1) was generated from transcripts and proteins retrieved from NCBI Entrez by alignment to the genome assembly, as described ( Pruitt ).

Methods

Sample acquisition

A single male B. bufo was collected from a stable, isolated population in the NHM Wildlife Garden, London, UK (latitude 51.49586, longitude -0.178622, elevation 17 m) by Jeffrey W. Streicher on 1 July 2015 ( Figure 1C). The specimen of B. bufo (NHMUK 2013.484, Field ID: JWS 758) was 55.5 mm snout–vent length (determined using Miyamoto digital callipers to the nearest 0.1 mm) and contained many nematode parasites in its stomach ( Figure 1D). The specimen was collected with permission from the NHM Wildlife Garden management team and is part of a long-term monitoring project run by the Department of Life Sciences and the Angela Marmont Centre for UK Biodiversity. It was humanely euthanised using a saturated solution of tricaine mesylate (MS-222). Multiple tissues including heart, thigh muscle, liver, eyes, kidney, testes, Bidder’s organ, and intestines were sampled into an ammonium sulfate-based RNA + DNA preservation buffer. After ~24 hours of storage at 4°C, the tissues were transferred to -80°C until they were sent for genome sequencing. Sample tissue has been accessioned by the NHM Molecular Collections Facility (NHMUK 2013.484).

DNA extraction and sequencing

DNA was extracted from heart tissue using the Bionano Prep Animal Tissue DNA Isolation kit according to the manufacturer's instructions. Pacific Biosciences CLR long read and 10X Genomics read cloud sequencing libraries were constructed according to the manufacturers’ instructions. Hi-C data were generated from heart tissue using the Arima v2 Hi-C kit. Extraction and sequencing was performed by the Scientific Operations DNA Pipelines at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL I and Illumina HiSeq X instruments. DNA was labeled for Bionano Genomics optical mapping following the Bionano Prep Direct Label and Stain (DLS) Protocol and run on one Saphyr instrument chip flowcell.

Genome assembly

Assembly was carried out following the Vertebrate Genome Project pipeline v1.6 (( Rhie )) with Falcon-unzip ( Chin ), haplotypic duplication was identified and removed with purge_dups ( Guan ) and a first round of scaffolding carried out with 10X Genomics read clouds using scaff10x. Hybrid scaffolding was performed using the BioNano DLE-1 data and BioNano Solve. Scaffolding with Hi-C data (( Rao )) was carried out with HiLine, then 3D-DNA ( Dudchenko ). The Hi-C scaffolded assembly was polished with arrow using the PacBio data, then polished with the 10X Genomics Illumina data by aligning to the assembly with longranger align, calling variants with freebayes ( Garrison & Marth, 2012) and applying homozygous non-reference edits using bcftools consensus. Two rounds of the Illumina polishing were applied. The mitochondrial genome was assembled at The Rockefeller University using the mitoVGP pipeline ( Formenti ). The assembly was checked for contamination and corrected using the gEVAL system ( Chow ; ( Howe )). Manual curation was performed using evidence from Bionano (using the Bionano Access viewer), using HiGlass ( Kerpedjiev ) and Pretext, as described previously ( Howe ). Figure 2– Figure 4 and BUSCO values were generated using BlobToolKit ( Challis ). Table 3 contains a list of software tools and versions, where applicable.

Figure 4.

Genome assembly of Bufo bufo, aBufBuf1.1: cumulative sequence.

Table 3.

Software tools used.

Software tool	Version	Source
Falcon-unzip	falcon-kit 1.4.2	( Chin et al., 2016)
purge_dups	1.0.0	( Guan et al., 2020)
scaff10x	4.2	https://github.com/wtsi-hpag/Scaff10X
Bionano Solve	3.3_10252018	https://bionanogenomics.com/downloads/bionano-solve/
3D-DNA	180922	( Dudchenko et al., 2017)
Arrow	gcpp 1.9.0-SL-release- 8.0.0+1-37-gd7b188d	https://github.com/PacificBiosciences/GenomicConsensus
Longranger	2.2.2	https://support.10xgenomics.com/genome-exome/software/pipelines/latest/advanced/other-pipelines
freebayes	v1.3.1-17-gaa2ace8	( Garrison & Marth, 2012)
bcftools- consensus	1.10.2	http://samtools.github.io/bcftools/bcftools.html
gEVAL	N/A	( Chow et al., 2016)
HiGlass	1.11.6	( Kerpedjiev et al., 2018)
PretextView	0.1	https://github.com/wtsi-hpag/PretextView
BlobToolKit	2.6.1	( Challis et al., 2020)

Ethical/compliance issues

The materials that have contributed to this genome note were supplied by a Tree of Life collaborator. The Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use. The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible. The overarching areas of consideration are: Ethical review of provenance and sourcing of the material; Legality of collection, transfer and use (national and international). Each transfer of samples is undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Tree of Life collaborator, Genome Research Limited (operating as the Wellcome Sanger Institute) and in some circumstances other Tree of Life collaborators.

Data availability

European Nucleotide Archive: Bufo bufo (common toad). Accession number PRJEB42238; https://identifiers.org/ena.embl/PRJEB42238. The genome sequence is released openly for reuse. The B. bufo genome sequencing initiative is part of the Darwin Tree of Life (DToL) project and the Vertebrate Genomes Project. All raw sequence data and the assembly have been deposited in INSDC databases. Raw data and assembly accession identifiers are reported in Table 1.

Author information

Members of the Wellcome Sanger Institute Tree of Life programme collective are listed here: https://doi.org/10.5281/zenodo.5377053. Members of Wellcome Sanger Institute Scientific Operations: DNA Pipelines collective are listed here: https://doi.org/10.5281/zenodo.4790456. Members of the Tree of Life Core Informatics collective are listed here: https://doi.org10.5281/zenodo.5013542. Members of the Darwin Tree of Life Consortium are listed here: https://doi.org/10.5281/zenodo.4783559. This is a straightforward genome release report of an important European anuran. The methods and presentation of the data are obviously formulaic, but in this case, that’s fine. It could be indexed as is, although I have a few small comments that might increase its visibility or clarity, particularly for the amphibian community that will be its primary user group. The Introduction is very thin and not really very informative or compelling. I would include more on the family Bufonidae, for example. It is the third most speciose family out of 54 currently recognized on Amphibiaweb, there’s been a lot of recent controversy over genus-level taxonomy - that kind of thing. I’d like to see a bit more on conservation, given how hard toads have been hit by chytrid. What about the portion of the family phylogeny that is now covered by the three species with genomes, and important targets for additional lineages? Methods: Small notes: Usually one records snout-urostyle length in an anuran, rather than snout-vent. Or at least that’s the measurement I prefer. They are almost identical. I could live without Figure 4, but it’s fine if that is what you want. I don’t think it’s terribly necessary. Benthic larvae - I’ve never heard that term for anuran larva. Unless they are very specialized, it applies to pretty much all tadpoles. I would drop this. Last sentence of the Genome Annotation section should be corrected to say “…as described by Pruitt et al. (2014): "The annotation (NCBI Bufo bufo Annotation Release 100; Table 1) was generated from transcripts and proteins retrieved from NCBI Entrez by alignment to the genome assembly, as described by (Pruitt et al., 2014)." Are sufficient details of methods and materials provided to allow replication by others? Yes Is the rationale for creating the dataset(s) clearly described? Partly Are the datasets clearly presented in a useable and accessible format? Yes Are the protocols appropriate and is the work technically sound? Yes Reviewer Expertise: Amphibian and reptile evolution and ecology; conservation genomics and genetics; conservation biology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The authors presented a high quality genome assembly of the common toad. The data generation and analysis pipelines applied in this report are sound and the information for the resource reuse are clearly stated in the report. However, in the "Genome annotation" part, the authors stated the method which should go to method part instead of a summary of the annotated genomic feature. Also, in the method part, the detail of running BUSCO should be stated clearly. Are sufficient details of methods and materials provided to allow replication by others? Yes Is the rationale for creating the dataset(s) clearly described? Yes Are the datasets clearly presented in a useable and accessible format? Yes Are the protocols appropriate and is the work technically sound? Yes Reviewer Expertise: Bioinformatics, quantitative genetics and genomics. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

17 in total

1. Phased diploid genome assembly with single-molecule real-time sequencing.

Authors: Chen-Shan Chin; Paul Peluso; Fritz J Sedlazeck; Maria Nattestad; Gregory T Concepcion; Alicia Clum; Christopher Dunn; Ronan O'Malley; Rosa Figueroa-Balderas; Abraham Morales-Cruz; Grant R Cramer; Massimo Delledonne; Chongyuan Luo; Joseph R Ecker; Dario Cantu; David R Rank; Michael C Schatz
Journal: Nat Methods Date: 2016-10-17 Impact factor: 28.547

2. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.

Authors: Suhas S P Rao; Miriam H Huntley; Neva C Durand; Elena K Stamenova; Ivan D Bochkov; James T Robinson; Adrian L Sanborn; Ido Machol; Arina D Omer; Eric S Lander; Erez Lieberman Aiden
Journal: Cell Date: 2014-12-11 Impact factor: 41.582

3. Pseudogenized Amelogenin Reveals Early Tooth Loss in True Toads (Anura: Bufonidae).

Authors: John Shaheen; Austin B Mudd; Thomas G H Diekwisch; John Abramyan
Journal: Integr Comp Biol Date: 2021-11-17 Impact factor: 3.326

4. gEVAL - a web-based browser for evaluating genome assemblies.

Authors: William Chow; Kim Brugger; Mario Caccamo; Ian Sealy; James Torrance; Kerstin Howe
Journal: Bioinformatics Date: 2016-04-07 Impact factor: 6.937

5. Volunteer Conservation Action Data Reveals Large-Scale and Long-Term Negative Population Trends of a Widespread Amphibian, the Common Toad (Bufo bufo).

Authors: Silviu O Petrovan; Benedikt R Schmidt
Journal: PLoS One Date: 2016-10-05 Impact factor: 3.240

6. BlobToolKit - Interactive Quality Assessment of Genome Assemblies.

Authors: Richard Challis; Edward Richards; Jeena Rajan; Guy Cochrane; Mark Blaxter
Journal: G3 (Bethesda) Date: 2020-04-09 Impact factor: 3.154

7. Complete vertebrate mitogenomes reveal widespread repeats and gene duplications.

Authors: Giulio Formenti; Arang Rhie; Jennifer Balacco; Bettina Haase; Jacquelyn Mountcastle; Olivier Fedrigo; Samara Brown; Marco Rosario Capodiferro; Farooq O Al-Ajli; Roberto Ambrosini; Peter Houde; Sergey Koren; Karen Oliver; Michelle Smith; Jason Skelton; Emma Betteridge; Jale Dolucan; Craig Corton; Iliana Bista; James Torrance; Alan Tracey; Jonathan Wood; Marcela Uliano-Silva; Kerstin Howe; Shane McCarthy; Sylke Winkler; Woori Kwak; Jonas Korlach; Arkarachai Fungtammasan; Daniel Fordham; Vania Costa; Simon Mayes; Matteo Chiara; David S Horner; Eugene Myers; Richard Durbin; Alessandro Achilli; Edward L Braun; Adam M Phillippy; Erich D Jarvis
Journal: Genome Biol Date: 2021-04-29 Impact factor: 13.583

8. Significantly improving the quality of genome assemblies through curation.

Authors: Kerstin Howe; William Chow; Joanna Collins; Sarah Pelan; Damon-Lee Pointon; Ying Sims; James Torrance; Alan Tracey; Jonathan Wood
Journal: Gigascience Date: 2021-01-09 Impact factor: 6.524

9. Identifying and removing haplotypic duplication in primary genome assemblies.

Authors: Dengfeng Guan; Shane A McCarthy; Jonathan Wood; Kerstin Howe; Yadong Wang; Richard Durbin
Journal: Bioinformatics Date: 2020-05-01 Impact factor: 6.937

10. HiGlass: web-based visual exploration and analysis of genome interaction maps.

Authors: Peter Kerpedjiev; Nezar Abdennur; Fritz Lekschas; Chuck McCallum; Kasper Dinkla; Hendrik Strobelt; Jacob M Luber; Scott B Ouellette; Alaleh Azhir; Nikhil Kumar; Jeewon Hwang; Soohyun Lee; Burak H Alver; Hanspeter Pfister; Leonid A Mirny; Peter J Park; Nils Gehlenborg
Journal: Genome Biol Date: 2018-08-24 Impact factor: 13.583

1 in total

1. Challenging a host-pathogen paradigm: Susceptibility to chytridiomycosis is decoupled from genetic erosion.

Authors: Donal Smith; David O'Brien; Jeanette Hall; Chris Sergeant; Lola M Brookes; Xavier A Harrison; Trenton W J Garner; Robert Jehle
Journal: J Evol Biol Date: 2022-02-28 Impact factor: 2.516

1 in total