Literature DB >> 35028428

The genome sequence of the European nightjar, Caprimulgus europaeus (Linnaeus, 1758).

Simona Secomandi1, Fernando Spina2, Giulio Formenti3,4, Guido Roberto Gallo1, Manuela Caprioli5, Roberto Ambrosini5, Sara Riello6.   

Abstract

We present a genome assembly from an individual female Caprimulgus europaeus (the European nightjar; Chordata; Aves; Caprimulgiformes; Caprimulgidae). The genome sequence is 1,178 megabases in span. The majority of the assembly (99.33%) is scaffolded into 37 chromosomal pseudomolecules, including the W and Z sex chromosomes. Copyright:
© 2021 Secomandi S et al.

Entities:  

Keywords:  Caprimulgus europaeus; Eurasian nightjar; European nightjar; chromosomal; genome sequence

Year:  2021        PMID: 35028428      PMCID: PMC8729189          DOI: 10.12688/wellcomeopenres.17451.1

Source DB:  PubMed          Journal:  Wellcome Open Res        ISSN: 2398-502X


Species taxonomy

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Caprimulgimorphae; Caprimulgiformes; Caprimulgidae; Caprimulginae; Caprimulgus; Caprimulgus europaeus Linnaeus 1758 (NCBI:txid85660).

Background

The European nightjar ( Caprimulgus europaeus; also known as the Eurasian nightjar and common goatsucker) is an insectivorous, crepuscular, ground-nesting bird distributed throughout the Western Palearctic ( Hagemeijer & Blair, 1997). It breeds in semi-natural dry and open habitats with scattered trees ( Cramp & Brooks, 1985). Little is known about the ecology of the European nightjar ( Cramp & Brooks, 1985; Polakowski ), and in general that of the Caprimulgidae family. The family comprises peculiar species such as the only bird known to hibernate, the Common Poorwill ( Phalaenoptilus nuttallii) ( Carey, 2019; French, 2019; Woods ), and one of the few birds that uses echo-localization, the South American Oilbird ( Steatornis caripensis) ( Brinkløv ). The European nightjar has been found to be more resistant to pathogens than other bird species ( Jiang ). Although categorized as ‘least concern’ by the IUCN ( IUCN, 2016), the European nightjar has experienced a steady population decline in the past decades, and is of conservation concern in Europe ( Eaton ; Evens ; Keller ). The availability of a high-quality, chromosome-level reference genome will help to deepen the knowledge on the biology and evolution of this species, boosting studies on the genomics of the peculiar family of Caprimulgidae. Moreover, as genomic resources gain preheminence in conservation efforts ( Allendorf, 2017; Fuentes-Pardo & Ruzzante, 2017; Supple & Shapiro, 2018), we expect that the reference genome presented here will help aid planning conservation actions for the European nightjar.

Genome sequence report

The genome was sequenced from a blood sample taken from a single female C. europaeus collected from a bird ringing station in Ventotene, Italy (latitude 40.79404, longitude 13.42777). A total of 87-fold coverage in Pacific Biosciences single-molecule long reads and 62-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 144 missing/misjoins and removed 31 haplotypic duplications, reducing the assembly length by 0.15% and the scaffold number by 21.94%, and increasing the scaffold N50 by 26.46%. The final assembly has a total length of 1,178 Mb in 121 sequence scaffolds with a scaffold N50 of 83 Mb ( Table 1). Of the assembly sequence, 99.3% was assigned to 37 chromosomal-level scaffolds, representing 35 autosomes (numbered by sequence length) and the W and Z sex chromosomes ( Figure 1– Figure 4; Table 2). The assembly has a BUSCO ( Simão ) completeness of 97.4% (single 96.9%, duplicated 0.6%) using the aves_odb10 reference set. While not fully phased, the assembly deposited is of one pseudo-haplotype. Contigs corresponding to the alternate haplotype have also been deposited.
Table 1.

Genome data for Caprimulgus europaeus, bCapEur3.1.

Project accession data
Assembly identifierbCapEur3.1
Species Caprimulgus europaeus
SpecimenbCapEur3
NCBI taxonomy IDNCBI:txid111811
BioProjectPRJEB44540
BioSample IDSAMEA7524394
Isolate informationFemale, blood
Raw data accessions
PacificBiosciences SEQUEL IIERR6445211
10X Genomics IlluminaERR6054683-ERR6054686
Hi-C IlluminaERR6054687, ERR6054688
Genome assembly
Assembly accessionGCA_907165065.1
Accession of alternate haplotype GCA_907165095.1
Span (Mb)1,178
Number of contigs274
Contig N50 length (Mb)31
Number of scaffolds121
Scaffold N50 length (Mb)83
Longest scaffold (Mb)126
BUSCO * genome scoreC:97.4%[S:96.9%, D:0.6%],F:0.5%,M:2.1%,n:8338

*BUSCO scores based on the aves_odb10 BUSCO set using v5.1.2. C= complete [S= single copy, D=duplicated], F=fragmented, M=missing, n=number of orthologues in comparison. A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/bCapEur3.1/dataset/CAJRAV01/busco.

Figure 1.

Genome assembly of Caprimulgus europaeus, bCapEur3.1: metrics.

The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness. The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 1,177,791,212 bp assembly. The distribution of chromosome lengths is shown in dark grey with the plot radius scaled to the longest chromosome present in the assembly (126,318,510 bp, shown in red). Orange and pale-orange arcs show the N50 and N90 chromosome lengths (82,614,289 and 15,699,869 bp), respectively. The pale grey spiral shows the cumulative chromosome count on a log scale with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot. A summary of complete, fragmented, duplicated and missing BUSCO genes in the aves_odb10 set is shown in the top right. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/bCapEur3.1/dataset/CAJRAV01/snail.

Figure 4.

Genome assembly of Caprimulgus europaeus, bCapEur3.1: Hi-C contact map.

Hi-C contact map of the bCapEur3 assembly, visualised in HiGlass. Chromosomes are shown in order of size from left to right and top to bottom.

Table 2.

Chromosomal pseudomolecules in the genome assembly of Caprimulgus europaeus, bCapEur3.1.

INSDC accessionChromosomeSize (Mb)GC%
OU015523.11126.3240.1
OU015524.12125.3740.3
OU015525.13100.1639.8
OU015526.1483.3239.9
OU015528.1582.6140.7
OU015529.1665.3541.7
OU015530.1760.4740.6
OU015531.1850.9142.8
OU015532.1948.6641.6
OU015533.11043.0041.3
OU015534.11135.2342.1
OU015535.11223.5243.4
OU015536.11322.8142.3
OU015538.11422.3543.3
OU015539.11519.4042.8
OU015540.11618.7445
OU015541.11716.9345.6
OU015542.11815.7045.4
OU015543.11913.7846.1
OU015544.12012.5246.8
OU015545.12112.3547.5
OU015546.1229.1646.8
OU015547.1238.1949.8
OU015548.1247.5747.7
OU015549.1257.5451.3
OU015550.1267.5050.8
OU015551.1276.2652.3
OU015552.1286.0448.1
OU015553.1293.3955.8
OU015554.1302.9456.1
OU015555.1312.4749.2
OU015556.1322.2250.6
OU015557.1331.2656.6
OU015558.1340.5651.3
OU015559.1350.2047.7
OU015537.1W22.4944.5
OU015527.1Z82.6340.2
-Unplaced7.8654.9
*BUSCO scores based on the aves_odb10 BUSCO set using v5.1.2. C= complete [S= single copy, D=duplicated], F=fragmented, M=missing, n=number of orthologues in comparison. A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/bCapEur3.1/dataset/CAJRAV01/busco.

Genome assembly of Caprimulgus europaeus, bCapEur3.1: metrics.

The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness. The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 1,177,791,212 bp assembly. The distribution of chromosome lengths is shown in dark grey with the plot radius scaled to the longest chromosome present in the assembly (126,318,510 bp, shown in red). Orange and pale-orange arcs show the N50 and N90 chromosome lengths (82,614,289 and 15,699,869 bp), respectively. The pale grey spiral shows the cumulative chromosome count on a log scale with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot. A summary of complete, fragmented, duplicated and missing BUSCO genes in the aves_odb10 set is shown in the top right. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/bCapEur3.1/dataset/CAJRAV01/snail.

Genome assembly of Caprimulgus europaeus, bCapEur3.1: GC coverage.

BlobToolKit GC-coverage plot. Scaffolds are coloured by phylum. Circles are sized in proportion to scaffold length. Histograms show the distribution of scaffold length sum along each axis. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/bCapEur3.1/dataset/CAJRAV01/blob.

Genome assembly of Caprimulgus europaeus, bCapEur3.1: cumulative sequence.

BlobToolKit cumulative sequence plot. The grey line shows cumulative length for all scaffolds. Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/bCapEur3.1/dataset/CAJRAV01/cumulative.

Genome assembly of Caprimulgus europaeus, bCapEur3.1: Hi-C contact map.

Hi-C contact map of the bCapEur3 assembly, visualised in HiGlass. Chromosomes are shown in order of size from left to right and top to bottom.

Methods

Sample acquisition

Sampling was performed during the routine activity of the scientific ringing station located in Ventotene island, Latina, Italy (latitude 40.7926°, longitude 13.4241°) during spring migration. Samples have been collected by ISPRA researchers within their institutional activities as from Italian national Law n. 157/92. Bird capture was performed in the evening according to standardized protocols using mist-nets ( Saino ; Spina ). The sample was collected with a heparinized capillary tube after puncturing the ulnar vein with an intra-epidermal needle. The blood was immediately transferred into 99% ethanol, initially kept at room temperature and then frozen.

DNA extraction and sequencing

High molecular weight DNA was extracted from the blood sample at the Scientific Operations core of the Wellcome Sanger Institute using the Bionano Prep Blood DNA Isolation Kit according to the Bionano Prep Frozen Blood protocol. Pacific Biosciences CLR long read and 10X Genomics read cloud sequencing libraries were constructed according to the manufacturers’ instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II and Illumina HiSeq X instruments. Hi-C data were generated from the same blood sample using the Arima Hi-C+ kit and sequenced on HiSeq X.

Genome assembly

Assembly was carried out following the Vertebrate Genome Project pipeline v1.6 ( Rhie ) with Falcon-unzip ( Chin ), haplotypic duplication was identified and removed with purge_dups ( Guan ) and a first round of scaffolding carried out with 10X Genomics read clouds using scaff10x. Scaffolding with Hi-C data ( Rao ) was carried out with SALSA2 ( Ghurye ). The Hi-C scaffolded assembly was polished with arrow using the PacBio data, with merfin ( Formenti ) applied to avoid a drop in QV, then polished with the 10X Genomics Illumina data by aligning to the assembly with longranger align, calling variants with freebayes ( Garrison & Marth, 2012) and applying homozygous non-reference edits using bcftools consensus. A complete mitochondrion was not found using mitoVGP ( Formenti ), likely due to the sample being sourced from blood tissue, so mitochondrial sequence NC_025773.1 ( Caprimulgus indicus) was used during polishing. The assembly was checked for contamination and corrected using the gEVAL system ( Chow ) as described previously ( Howe ). Manual curation ( Howe ) was performed using gEVAL, HiGlass ( Kerpedjiev ) and Pretext. The genome was analysed, and BUSCO scores generated, within the BlobToolKit environment ( Challis ). Table 3 gives version numbers of the software tools used in this work.
Table 3.

Software tools used.

Software toolVersionSource
Falcon-unzip1.8.0 Chin et al., 2016
purge_dups1.2.3 Guan et al., 2020
SALSA22.2 Ghurye et al., 2019
ArrowGCpp-1.9.0 https://github.com/PacificBiosciences/GenomicConsensus
Merfin1.7 Formenti et al., 2021b
longranger align2.2.2 https://support.10xgenomics.com/genome-exome/software/pipelines/latest/advanced/other-pipelines
freebayes1.3.1-17-gaa2ace8 Garrison & Marth, 2012
gEVALN/A Chow et al., 2016
HiGlass1.11.6 Kerpedjiev et al., 2018
PretextView0.1.x https://github.com/wtsi-hpag/PretextView
BlobToolKit2.6.2 Challis et al., 2020

Data availability

European Nucleotide Archive: Caprimulgus europaeus (Eurasian nightjar). Accession number PRJEB44830; https://identifiers.org/ena.embl:PRJEB44830. The genome sequence is released openly for reuse. The C. europaeus genome sequencing initiative is part of the Darwin Tree of Life (DToL) project and the Vertebrate Genomes Project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome will be annotated and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1. The authors describe the sequencing and assembly of the chromosome-scale reference genome for the European Nightjar. The methods follow that of the Vertebrate Genome Project pipeline. I just have some minor comments: How was the bird identified as female? About how much blood was used for the sequencing? How was the quality of the DNA checked? How many PacBio cells and Illumina lanes were used for each sequencing method? How did you know how many chromosomes should have been assembled? Can you provide more details on the assembly, which parameters were used and how was manual curation performed? If this is detailed in a different manuscript, please explicitly state which manuscript. Are sufficient details of methods and materials provided to allow replication by others? Partly Is the rationale for creating the dataset(s) clearly described? Yes Are the datasets clearly presented in a useable and accessible format? Yes Are the protocols appropriate and is the work technically sound? Yes Reviewer Expertise: genomics, evolution, population genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The authors described a nice almost complete genome with pseudo-chromosomes of the European nightjar using PacBio Sequel II, Illumina, and HiCi sequencing methods and thus present important data for further genetic analysis. There are two points that could be improved: There are some redundancies between Figures 1, 2, and Table 1. The method how to get long HMV DNA is not well described since the Bionano protocol is for human blood and not for bird blood. Are sufficient details of methods and materials provided to allow replication by others? Partly Is the rationale for creating the dataset(s) clearly described? Yes Are the datasets clearly presented in a useable and accessible format? Yes Are the protocols appropriate and is the work technically sound? Yes Reviewer Expertise: genomic, molecular biology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
  18 in total

1.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors:  Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal:  Bioinformatics       Date:  2015-06-09       Impact factor: 6.937

Review 2.  Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations.

Authors:  Angela P Fuentes-Pardo; Daniel E Ruzzante
Journal:  Mol Ecol       Date:  2017-09-05       Impact factor: 6.185

3.  The avian "hibernation" enigma: thermoregulatory patterns and roost choice of the common poorwill.

Authors:  Christopher P Woods; Zenon J Czenze; R Mark Brigham
Journal:  Oecologia       Date:  2018-11-20       Impact factor: 3.225

4.  Phased diploid genome assembly with single-molecule real-time sequencing.

Authors:  Chen-Shan Chin; Paul Peluso; Fritz J Sedlazeck; Maria Nattestad; Gregory T Concepcion; Alicia Clum; Christopher Dunn; Ronan O'Malley; Rosa Figueroa-Balderas; Abraham Morales-Cruz; Grant R Cramer; Massimo Delledonne; Chongyuan Luo; Joseph R Ecker; Dario Cantu; David R Rank; Michael C Schatz
Journal:  Nat Methods       Date:  2016-10-17       Impact factor: 28.547

5.  Cloning and structural analysis of complement component 3d in wild birds provides insight into its functional evolution.

Authors:  Bo Jiang; Zhenhua Zhang; Jian Xu; Huan Jin; Yongqing Li
Journal:  Dev Comp Immunol       Date:  2020-12-15       Impact factor: 3.636

6.  Towards complete and error-free genome assemblies of all vertebrate species.

Authors:  Arang Rhie; Shane A McCarthy; Olivier Fedrigo; Joana Damas; Giulio Formenti; Sergey Koren; Marcela Uliano-Silva; William Chow; Arkarachai Fungtammasan; Juwan Kim; Chul Lee; Byung June Ko; Mark Chaisson; Gregory L Gedman; Lindsey J Cantin; Francoise Thibaud-Nissen; Leanne Haggerty; Iliana Bista; Michelle Smith; Bettina Haase; Jacquelyn Mountcastle; Sylke Winkler; Sadye Paez; Jason Howard; Sonja C Vernes; Tanya M Lama; Frank Grutzner; Wesley C Warren; Christopher N Balakrishnan; Dave Burt; Julia M George; Matthew T Biegler; David Iorns; Andrew Digby; Daryl Eason; Bruce Robertson; Taylor Edwards; Mark Wilkinson; George Turner; Axel Meyer; Andreas F Kautt; Paolo Franchini; H William Detrich; Hannes Svardal; Maximilian Wagner; Gavin J P Naylor; Martin Pippel; Milan Malinsky; Mark Mooney; Maria Simbirsky; Brett T Hannigan; Trevor Pesout; Marlys Houck; Ann Misuraca; Sarah B Kingan; Richard Hall; Zev Kronenberg; Ivan Sović; Christopher Dunn; Zemin Ning; Alex Hastie; Joyce Lee; Siddarth Selvaraj; Richard E Green; Nicholas H Putnam; Ivo Gut; Jay Ghurye; Erik Garrison; Ying Sims; Joanna Collins; Sarah Pelan; James Torrance; Alan Tracey; Jonathan Wood; Robel E Dagnew; Dengfeng Guan; Sarah E London; David F Clayton; Claudio V Mello; Samantha R Friedrich; Peter V Lovell; Ekaterina Osipova; Farooq O Al-Ajli; Simona Secomandi; Heebal Kim; Constantina Theofanopoulou; Michael Hiller; Yang Zhou; Robert S Harris; Kateryna D Makova; Paul Medvedev; Jinna Hoffman; Patrick Masterson; Karen Clark; Fergal Martin; Kevin Howe; Paul Flicek; Brian P Walenz; Woori Kwak; Hiram Clawson; Mark Diekhans; Luis Nassar; Benedict Paten; Robert H S Kraus; Andrew J Crawford; M Thomas P Gilbert; Guojie Zhang; Byrappa Venkatesh; Robert W Murphy; Klaus-Peter Koepfli; Beth Shapiro; Warren E Johnson; Federica Di Palma; Tomas Marques-Bonet; Emma C Teeling; Tandy Warnow; Jennifer Marshall Graves; Oliver A Ryder; David Haussler; Stephen J O'Brien; Jonas Korlach; Harris A Lewin; Kerstin Howe; Eugene W Myers; Richard Durbin; Adam M Phillippy; Erich D Jarvis
Journal:  Nature       Date:  2021-04-28       Impact factor: 49.962

7.  Significantly improving the quality of genome assemblies through curation.

Authors:  Kerstin Howe; William Chow; Joanna Collins; Sarah Pelan; Damon-Lee Pointon; Ying Sims; James Torrance; Alan Tracey; Jonathan Wood
Journal:  Gigascience       Date:  2021-01-09       Impact factor: 6.524

8.  Identifying and removing haplotypic duplication in primary genome assemblies.

Authors:  Dengfeng Guan; Shane A McCarthy; Jonathan Wood; Kerstin Howe; Yadong Wang; Richard Durbin
Journal:  Bioinformatics       Date:  2020-05-01       Impact factor: 6.937

9.  HiGlass: web-based visual exploration and analysis of genome interaction maps.

Authors:  Peter Kerpedjiev; Nezar Abdennur; Fritz Lekschas; Chuck McCallum; Kasper Dinkla; Hendrik Strobelt; Jacob M Luber; Scott B Ouellette; Alaleh Azhir; Nikhil Kumar; Jeewon Hwang; Soohyun Lee; Burak H Alver; Hanspeter Pfister; Leonid A Mirny; Peter J Park; Nils Gehlenborg
Journal:  Genome Biol       Date:  2018-08-24       Impact factor: 13.583

Review 10.  Conservation of biodiversity in the genomics era.

Authors:  Megan A Supple; Beth Shapiro
Journal:  Genome Biol       Date:  2018-09-11       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.