Literature DB >> 34926833

The genome sequence of the grey wolf, Canis lupus Linnaeus 1758.

Mikkel-Holger S Sinding1,2, Shyam Gopalakrishnan1,3, Katrine Raundrup2, Love Dalén4,5,6, Jonathan Threlfall7, Tom Gilbert1,3,8.   

Abstract

We present a genome assembly from an individual male Canis lupus orion (the grey wolf, subspecies: Greenland wolf; Chordata; Mammalia; Carnivora; Canidae). The genome sequence is 2,447 megabases in span. The majority of the assembly (98.91%) is scaffolded into 40 chromosomal pseudomolecules, with the X and Y sex chromosomes assembled. Copyright:
© 2021 Sinding MHS et al.

Entities:  

Keywords:  Canis lupus; Canis lupus orion; Greenland wolf; Polar wolf; chromosomal; genome sequence; grey wolf

Year:  2021        PMID: 34926833      PMCID: PMC8649967          DOI: 10.12688/wellcomeopenres.17332.1

Source DB:  PubMed          Journal:  Wellcome Open Res        ISSN: 2398-502X


Species taxonomy

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Carnivora; Caniformia; Canidae; Canis; Canis lupus Linnaeus 1758 (NCBI:txid9612).

Background

The grey wolf, Canis lupus, is the largest species within the group wolf-like canids (Subtribe: Canina) and the member with the largest geographic distribution. Originally wolves were found throughout Eurasia, with the exception of tropical Southeast Asia, and all of North America. This vast distribution contains numerous habitats, encompassing wolf ecotypes adapted to the diverse environments throughout their distribution. The wolf is locally extinct in several places, such as the UK, Ireland and Brittany, yet it still holds much of its original distribution; the global population is estimated to be in the order of 200–250 thousand individuals ( Jhala ). Once numerous, wolves were eradicated from the islands of Great Britain in the 15th century and Ireland in the 18th century. There have been proposals to reintroduce populations of wolves to the Scottish Highlands to manage populations of red deer, which have a negative effect on biodiversity through overgrazing ( Nilsen ). The Scottish Highlands are considered to be the only location in Great Britain that could support a healthy population of wolves; however, objections of livestock owners are likely to prevent their reintroduction in the near future ( Wilson, 2004). The reintroduction of wolves elsewhere has led not only to the reestablishment of this apex predator, but also to marked improvements in biodiversity in the ecosystem as a whole ( Ripple ). Wolves reintroduced into the Yellowstone National Park, Wyoming, USA, in 1995 predated grazing animals such as wapiti ( Cervus canadensis) that preserved grasslands. The subsequent changes in prey behaviour led to trophic cascades that resulted in the reestablishment of tree species and an associated increase in populations of species that rely directly and indirectly on this habitat ( Ripple & Beschta, 2012). Wolves have historically been found in Northwest, Northeast and East Greenland ( Dawes ). Wolves were extirpated from East Greenland through hunting by 1939 and were absent from this area for the next 40 years ( Marquard-Petersen, 2012). In around 1979, a pair of wolves travelled from the north of the island and began a recolonisation of East Greenland, establishing a population of around 23 wolves ( Marquard-Petersen, 2011). A recent assessment found no trace of wolves for a number of years in East Greenland, while a population of up to 32 animals is still found in the northernmost parts of Greenland. Since the population in East Greenland was located entirely within the Northeast Greenland National Park, affording the wolves legal protection, it is unlikely that this extinction event was driven by hunting ( Marquard-Petersen, 2021). Domestic dogs share a common ancestor with Eurasian wolves around 33,000 years ago ( Skoglund ; Wang ). In this regard, the Greenland wolf or Polar wolf reference genome described herein is highly relevant for dog and/or Eurasian wolf genomics. The Polar wolf is a North American wolf, an outgroup to dogs and Eurasian wolves ( Gopalakrishnan ; Sinding ), which will aid in making a minimally reference-biased representation of diversity in re-sequenced genomes ( Gopalakrishnan ). The Polar wolf is also the North American wolf type with the least coyote-like ancestry ( Sinding ); thus, it is probably the closest possible outgroup to dogs and Eurasian wolves with the least amount of exotic admixture that other North American wolves carry. Finally, this reference genome permits detailed genomic investigations of Polar wolves themselves, as a precise reference, to identify rare genomic variation. The genome is therefore an overall useful resource for research in the Polar wolf itself, a small, isolated and understudied population, but also canids, wolves and dogs overall.

Genome sequence report

The genome was sequenced from a single male C. lupus subspecies orion collected from Siorapaluk, Greenland (latitude 77.785278, longitude -70.631389) in 2016. A total of 28-fold coverage in Pacific Biosciences single-molecule long reads and 74-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 135 missing/misjoins and removed 9 haplotypic duplications, reducing the assembly length by 0.2% and the scaffold number by 42.1%, and increasing the scaffold N50 by 15.9%. The final assembly has a total length of 2,447 Mb in 82 sequence scaffolds with a scaffold N50 of 66 Mb ( Table 1). Of the assembly sequence, 98.91% was assigned to 40 chromosomal-level scaffolds (named by synteny to an assembly for C. lupus familiaris, breed labrador: GCF_014441545.1), including 38 autosomes and the X and Y chromosomes ( Figure 1– Figure 4; Table 2). The assembly has a BUSCO ( Simão ) completeness of 95.5% (single 93.0%, duplicated 2.4%) using the carnivora_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.
Table 1.

Genome data for Canis lupus, mCanLor1.2.

Project accession data
Assembly identifiermCanLor1.2
Species Canis lupus
SpecimenmCanLor1
NCBI taxonomy IDNCBI:txid9612
BioProjectPRJEB43200
BioSample IDSAMEA7532739
Isolate informationMale, muscle
Raw data accessions
PacificBiosciences SEQUEL IIERR6406204, ERR6406205, ERR6412029, ERR6412030, ERR6412359, ERR6412360
10X Genomics IlluminaERR6054484-ERR6054491
Hi-C IlluminaERR6511153
Illumina RNA-SeqERR6054492
Genome assembly
Assembly accessionGCA_905319855.2
Accession of alternate haplotypeGCA_905319845.1
Span (Mb)2,447
Number of contigs248
Contig N50 length (Mb)34
Number of scaffolds82
Scaffold N50 length (Mb)66
Longest scaffold (Mb)123
BUSCO * genome scoreC:95.8%[S:94.6%,D:1.2%], F:2.0%,M:2.2%,n:4104

*BUSCO scores based on the carnivora_odb10 BUSCO set using v5.1.2. C= complete [S= single copy, D=duplicated], F=fragmented, M=missing, n=number of orthologues in comparison. A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/Canis%20lupus/dataset/CAJNRB02/busco.

Figure 1.

Genome assembly of Canis lupus, mCanLor1.2: metrics.

The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness. The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 2,447,463,909 bp assembly. The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (124,665,963 bp, shown in red). Orange and pale-orange arcs show the N50 and N90 scaffold lengths (65,778,685 and 41,774,919 bp), respectively. The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot. A summary of complete, fragmented, duplicated and missing BUSCO genes in the carnivora_odb10 set is shown in the top right. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/mCanLor1.2/dataset/CAJNRB02/snail.

Figure 4.

Genome assembly of Canis lupus, mCanLor1.2: Hi-C contact map.

Hi-C contact map of the mCanLor1.2 assembly, visualised in HiGlass. Chromosomes are shown in size order from left to right and top to bottom.

Table 2.

Chromosomal pseudomolecules in the genome assembly of Canis lupus, mCanLor1.2.

INSDC accessionChromosomeSize (Mb)GC%
HG994383.11122.9641.7
HG994387.1286.4042.9
HG994384.1393.4840.5
HG994386.1488.6340.4
HG994385.1589.7844.3
HG994389.1678.3942.8
HG994388.1782.2941.1
HG994390.1877.5940.8
HG994394.1966.7945.9
HG994393.11071.9342.9
HG994391.11175.7540.4
HG994392.11273.7339.2
HG994397.11365.4440.3
HG994400.11462.7939
HG994395.11565.7840.5
HG994398.11663.6741.5
HG994396.11765.9641.9
HG994402.11857.5943
HG994403.11956.7538.7
HG994401.12059.7744.6
HG994405.12153.1140.4
HG994399.12263.4538.3
HG994406.12352.9640
HG994407.12449.8844.7
HG994404.12553.6241.6
HG994409.12646.1146.2
HG994408.12748.7540.8
HG994413.12842.4843.9
HG994412.12944.0938.9
HG994415.13041.6241.6
HG994411.13144.7641
HG994414.13241.7738.1
HG994417.13332.6639.4
HG994410.13445.9041.6
HG994419.13528.5342
HG994416.13633.4339
HG994418.13731.5040.1
HG994420.13826.4441.5
HG994381.1X124.6740.3
HG994382.1Y6.5441.5
HG998573.1MT0.0239.6
-Unplaced29.7450.3
*BUSCO scores based on the carnivora_odb10 BUSCO set using v5.1.2. C= complete [S= single copy, D=duplicated], F=fragmented, M=missing, n=number of orthologues in comparison. A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/Canis%20lupus/dataset/CAJNRB02/busco.

Genome assembly of Canis lupus, mCanLor1.2: metrics.

The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness. The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 2,447,463,909 bp assembly. The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (124,665,963 bp, shown in red). Orange and pale-orange arcs show the N50 and N90 scaffold lengths (65,778,685 and 41,774,919 bp), respectively. The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot. A summary of complete, fragmented, duplicated and missing BUSCO genes in the carnivora_odb10 set is shown in the top right. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/mCanLor1.2/dataset/CAJNRB02/snail.

Genome assembly of Canis lupus, mCanLor1.2: GC coverage.

BlobToolKit GC-coverage plot. Scaffolds are coloured by phylum. Circles are sized in proportion to scaffold length. Histograms show the distribution of scaffold length sum along each axis. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/mCanLor1.2/dataset/CAJNRB02/blob.

Genome assembly of Canis lupus, mCanLor1.2: cumulative sequence.

BlobToolKit cumulative sequence plot. The grey line shows cumulative length for all scaffolds. Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/mCanLor1.2/dataset/CAJNRB02/cumulative.

Genome assembly of Canis lupus, mCanLor1.2: Hi-C contact map.

Hi-C contact map of the mCanLor1.2 assembly, visualised in HiGlass. Chromosomes are shown in size order from left to right and top to bottom.

Methods

A single 4-year-old male C. lupus orion (mCanLor1) was collected from Siorapaluk, Greenland (latitude 77.785278, longitude -70.631389) by The Ministry of Fisheries, Hunting and Agriculture, Government of Greenland. The animal was put down by the local municipal bailiff in Siorapaluk on 13 January 2016. The wolf had little fear of humans, persistently entered the village and could not be chased away. It was therefore decided that the wolf should be killed to protect villagers and dogs in Siorapaluk. After termination, the skull of the specimen was confiscated by the authorities and made available for the purposes of research to the Greenland Institute of Natural Resources. DNA was extracted from the muscle tissue of mCanLor1 at the Wellcome Sanger Institute (WSI) Scientific Operations core from the whole organism using the Qiagen MagAttract HMW DNA kit, according to the manufacturer’s instructions. RNA (from the same muscle tissue) was extracted in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer’s instructions. RNA was then eluted in 50 μl RNAse-free water and its concentration RNA assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit. Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay. Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud sequencing libraries were constructed according to the manufacturers’ instructions. Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit. DNA sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II and Illumina HiSeq X instruments. RNA sequencing was performed using an Illumina MiSeq instrument. Further 10X sequencing was performed at SciLifeLab, Stockholm, Sweden. DNA was extracted using the automatic KingFisher™ Duo Prime Purification System (Thermo Fisher Scientific, Bremen, Germany) following the manufacturer's protocol. Following this, Illumina TruSeq PCR-free libraries were constructed and sequencing performed on HiSeq X. Hi-C data were generated at SciLifeLab, Stockholm, Sweden using the Dovetail Hi-C kit and sequenced on HiSeq X. Assembly was carried out with Hifiasm ( Cheng ). Haplotypic duplication was identified and removed with purge_dups ( Guan ). Scaffolding with Hi-C data ( Rao ) was carried out with SALSA2 ( Ghurye ). The Hi-C scaffolded assembly was polished with the 10X Genomics Illumina data by aligning to the assembly with longranger align, calling variants with freebayes ( Garrison & Marth, 2012). One round of the Illumina polishing was applied. The mitochondrial genome was assembled with MitoHiFi ( Uliano-Silva ). The assembly was checked for contamination and corrected using the gEVAL system ( Chow ) as described previously ( Howe ). Manual curation ( Howe ) was performed using gEVAL, HiGlass ( Kerpedjiev ) and Pretext. Regions of concern were identified and resolved using 10X longranger and genetic mapping data. The genome was analysed within the BlobToolKit environment ( Challis ). Table 3 contains a list of all software tool versions used, where appropriate.
Table 3.

Software tools used.

Software toolVersionSource
Hifiasm0.12 Cheng et al., 2021
purge_dups1.2.3 Guan et al., 2020
SALSA22.2 Ghurye et al., 2019
longranger align2.2.2 https://support.10xgenomics.com/ genome-exome/software/pipelines/latest/ advanced/other-pipelines
freebayes1.3.1-17-gaa2ace8 Garrison & Marth, 2012
MitoHiFi1 Uliano-Silva et al., 2021
gEVALN/A Chow et al., 2016
PretextView0.1.x https://github.com/wtsi-hpag/PretextView
HiGlass1.11.6 Kerpedjiev et al., 2018
BlobToolKit2.6.2 Challis et al., 2020

Data availability

European Nucleotide Archive: Canis lupus (Greenland wolf). Accession number PRJEB43200; https://identifiers.org/ena.embl/PRJEB43200. The genome sequence is released openly for reuse. The C. lupus genome sequencing initiative is part of the Darwin Tree of Life (DToL) project and the Vertebrate Genomes Project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome will be annotated using the RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1. The Data Note "The genome sequence of the grey wolf,  Canis lupus Linnaeus 1758" presented by Sinding et al. is the description of the genome of the grey wolf obtained following the Darwin Tree of Life protocols. PacBio single-molecule long reads, 10X Genomics read clouds, Illumina reads and Hi-C data were generated and used to assemble this genome. RNA-seq libraries were also constructed and sequenced. However, the authors do not elaborate on why and where these are used (for Ensembl pipeline?). The resulting assembly is of high quality with the majority of the assembly (~99%) scaffolded into 40 chromosomal pseudomolecules. However, given the genome provided here, I think a comparison (based on BUCSO scores or traditional measures such as number of scaffolds or N50) with other genomes available for the species Canis lupus would have been interesting. Indeed, even if this genome is the first available assembly for the subspecies Canis lupus orion, there are 25 other assemblies for this species available on NCBI, with among them, other chromosome length genome assemblies. Minor but mandatory comments: Explain the use of RNA-seq data. Is it only for use in the Ensembl pipeline? Not all softwares are presented in Table 3. At least BUSCO and its version are missing. Are sufficient details of methods and materials provided to allow replication by others? Yes Is the rationale for creating the dataset(s) clearly described? Yes Are the datasets clearly presented in a useable and accessible format? Yes Are the protocols appropriate and is the work technically sound? Yes Reviewer Expertise: Phylogenomics and Genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The publication describes a high-quality genome of the Polar wolf, which will be an asset for population genomics, pangenomics of Canids and likely research aiming at understanding dog domestication. The study is well motivated, the methods are clearly described and the assembly is done by an expert team of genome scientists. I have only two suggestions: The method section mentions that HiFi circular consensus reads were produced. Maybe this can be mentioned above in the genome report: "A total of 28-fold coverage in Pacific Biosciences single-molecule, circular consensus (HiFi) long reads ..." Since 10X data was also produced and used for polishing, it makes sense to use Merqury to compute a base QV value. Maybe this can be added. Are sufficient details of methods and materials provided to allow replication by others? Yes Is the rationale for creating the dataset(s) clearly described? Yes Are the datasets clearly presented in a useable and accessible format? Yes Are the protocols appropriate and is the work technically sound? Yes Reviewer Expertise: genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
  16 in total

Review 1.  Status and ecological effects of the world's largest carnivores.

Authors:  William J Ripple; James A Estes; Robert L Beschta; Christopher C Wilmers; Euan G Ritchie; Mark Hebblewhite; Joel Berger; Bodil Elmhagen; Mike Letnic; Michael P Nelson; Oswald J Schmitz; Douglas W Smith; Arian D Wallach; Aaron J Wirsing
Journal:  Science       Date:  2014-01-10       Impact factor: 47.728

2.  Ancient wolf genome reveals an early divergence of domestic dog ancestors and admixture into high-latitude breeds.

Authors:  Pontus Skoglund; Erik Ersmark; Eleftheria Palkopoulou; Love Dalén
Journal:  Curr Biol       Date:  2015-05-21       Impact factor: 10.834

3.  Wolf reintroduction to Scotland: public attitudes and consequences for red deer management.

Authors:  Erlend B Nilsen; E J Milner-Gulland; Lee Schofield; Atle Mysterud; Nils Chr Stenseth; Tim Coulson
Journal:  Proc Biol Sci       Date:  2007-04-07       Impact factor: 5.349

4.  A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.

Authors:  Suhas S P Rao; Miriam H Huntley; Neva C Durand; Elena K Stamenova; Ivan D Bochkov; James T Robinson; Adrian L Sanborn; Ido Machol; Arina D Omer; Eric S Lander; Erez Lieberman Aiden
Journal:  Cell       Date:  2014-12-11       Impact factor: 41.582

5.  gEVAL - a web-based browser for evaluating genome assemblies.

Authors:  William Chow; Kim Brugger; Mario Caccamo; Ian Sealy; James Torrance; Kerstin Howe
Journal:  Bioinformatics       Date:  2016-04-07       Impact factor: 6.937

6.  Population genomics of grey wolves and wolf-like canids in North America.

Authors:  Mikkel-Holger S Sinding; Shyam Gopalakrishan; Filipe G Vieira; Jose A Samaniego Castruita; Katrine Raundrup; Mads Peter Heide Jørgensen; Morten Meldgaard; Bent Petersen; Thomas Sicheritz-Ponten; Johan Brus Mikkelsen; Ulf Marquard-Petersen; Rune Dietz; Christian Sonne; Love Dalén; Lutz Bachmann; Øystein Wiig; Anders J Hansen; M Thomas P Gilbert
Journal:  PLoS Genet       Date:  2018-11-12       Impact factor: 5.917

7.  Interspecific Gene Flow Shaped the Evolution of the Genus Canis.

Authors:  Shyam Gopalakrishnan; Mikkel-Holger S Sinding; Jazmín Ramos-Madrigal; Jonas Niemann; Jose A Samaniego Castruita; Filipe G Vieira; Christian Carøe; Marc de Manuel Montero; Lukas Kuderna; Aitor Serres; Víctor Manuel González-Basallote; Yan-Hu Liu; Guo-Dong Wang; Tomas Marques-Bonet; Siavash Mirarab; Carlos Fernandes; Philippe Gaubert; Klaus-Peter Koepfli; Jane Budd; Eli Knispel Rueness; Claudio Sillero; Mads Peter Heide-Jørgensen; Bent Petersen; Thomas Sicheritz-Ponten; Lutz Bachmann; Øystein Wiig; Anders J Hansen; M Thomas P Gilbert
Journal:  Curr Biol       Date:  2019-12-02       Impact factor: 10.834

8.  Significantly improving the quality of genome assemblies through curation.

Authors:  Kerstin Howe; William Chow; Joanna Collins; Sarah Pelan; Damon-Lee Pointon; Ying Sims; James Torrance; Alan Tracey; Jonathan Wood
Journal:  Gigascience       Date:  2021-01-09       Impact factor: 6.524

9.  Identifying and removing haplotypic duplication in primary genome assemblies.

Authors:  Dengfeng Guan; Shane A McCarthy; Jonathan Wood; Kerstin Howe; Yadong Wang; Richard Durbin
Journal:  Bioinformatics       Date:  2020-05-01       Impact factor: 6.937

10.  HiGlass: web-based visual exploration and analysis of genome interaction maps.

Authors:  Peter Kerpedjiev; Nezar Abdennur; Fritz Lekschas; Chuck McCallum; Kasper Dinkla; Hendrik Strobelt; Jacob M Luber; Scott B Ouellette; Alaleh Azhir; Nikhil Kumar; Jeewon Hwang; Soohyun Lee; Burak H Alver; Hanspeter Pfister; Leonid A Mirny; Peter J Park; Nils Gehlenborg
Journal:  Genome Biol       Date:  2018-08-24       Impact factor: 13.583

View more
  1 in total

1.  The Australian dingo is an early offshoot of modern breed dogs.

Authors:  Matt A Field; Sonu Yadav; Olga Dudchenko; Meera Esvaran; Benjamin D Rosen; Ksenia Skvortsova; Richard J Edwards; Jens Keilwagen; Blake J Cochran; Bikash Manandhar; Sonia Bustamante; Jacob Agerbo Rasmussen; Richard G Melvin; Barry Chernoff; Arina Omer; Zane Colaric; Eva K F Chan; Andre E Minoche; Timothy P L Smith; M Thomas P Gilbert; Ozren Bogdanovic; Robert A Zammit; Torsten Thomas; Erez L Aiden; J William O Ballard
Journal:  Sci Adv       Date:  2022-04-22       Impact factor: 14.957

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.