Literature DB >> 33215047

The genome sequence of the eastern grey squirrel, Sciurus carolinensis Gmelin, 1788.

Dan Mead1, Kathryn Fingland2, Rachel Cripps3, Roberto Portela Miguez4, Michelle Smith1, Craig Corton1, Karen Oliver1, Jason Skelton1, Emma Betteridge1, Jale Doulcan1, Michael A Quail1, Shane A McCarthy1, Kerstin Howe1, Ying Sims1, James Torrance1, Alan Tracey1, Richard Challis1, Richard Durbin1, Mark Blaxter1.   

Abstract

We present a genome assembly from an individual male Sciurus carolinensis (the eastern grey squirrel; Vertebrata; Mammalia; Eutheria; Rodentia; Sciuridae). The genome sequence is 2.82 gigabases in span. The majority of the assembly (92.3%) is scaffolded into 21 chromosomal-level scaffolds, with both X and Y sex chromosomes assembled. Copyright:
© 2020 Mead D et al.

Entities:  

Keywords:  Sciurus carolinensis; chromosomal; genome sequence; grey squirrel

Year:  2020        PMID: 33215047      PMCID: PMC7653645          DOI: 10.12688/wellcomeopenres.15721.1

Source DB:  PubMed          Journal:  Wellcome Open Res        ISSN: 2398-502X


Species taxonomy

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciuromorpha; Sciuridae; Sciurinae; Sciurini; Sciurus; Sciurus carolinensis Gmelin, 1788 (NCBI txid 30640).

Background

The eastern grey squirrel, Sciurus carolinensis, is native to eastern North America, where it plays important roles in forest regeneration through its habit of caching food nuts and seeds ( Corbet & Hill, 1991) [1]. In North America, S. carolinensis has been introduced outside its native range such that it is now found from the Canadian Pacific northwest to Florida. S. carolinensis was introduced to Britain (in 1876), Ireland (in 1911), Italy (in 1948), South Africa (before 1900), Australia (in 1880s, extirpated in 1973) and Pitcairn island (in 1987) (see https://www.cabi.org/isc/datasheet/49075). S. carolinensis, which thrives in urban parklands and gardens, is classed as invasive in Europe and on Pitcairn island. In Britain and Ireland the expansion of S. carolinensis populations has driven decline in populations of the native red squirrel, Sciurus vulgaris, which we have also assembled ( Mead ). The negative impact of S. carolinensis is through interspecific competition, leading to competitive exclusion of S. vulgaris, and by their carriage of squirrelpox virus, to which they are resistant but S. vulgaris are not ( Chantrey ) ( Darby ). The S. carolinensis genome will aid analyses of resistance and susceptibility to squirrelpox, as well as to the genomics of invasiveness.

Genome sequence report

The genome was sequenced from DNA extracted from a naturally deceased male S. carolinensis collected as part of a squirrel monitoring project run by the Wildlife Trust for Lancashire, Manchester and North Merseyside. A total of 74-fold coverage in Pacific Biosciences single-molecule long reads (N50 28 kb) and 40-fold coverage in 10X Genomics read clouds (from molecules with an estimated N50 of 19 kb) were generated. Primary assembly contigs were scaffolded with chromosome conformation HiC data (42-fold coverage). A contamination check identified a small number of low-coverage contigs that were likely to have derived from an apicomplexan parasite infecting the squirrel ( Léveillé ); these were removed. Subsequent manual assembly curation corrected 272 missing/misjoins and removed three haplotypic duplications, reducing the scaffold number by 19% and increasing the scaffold N50 by 242% The final assembly has a total length of 2.82 Gb in 752 sequence scaffolds with a scaffold N50 of 148.2 Mb ( Table 1). The majority, 92.3%, of the assembly sequence was assigned to 21 chromosomal-level scaffolds representing 19 autosomes (numbered by sequence length), and the X and Y sex chromosomes ( Figure 1– Figure 5; Table 2) plus 13 unlocalised scaffolds (assigned to chromosomes but with ambiguous placement). The assembly has a BUSCO ( Simão ) completeness of 93.7% using the mammalia_odb9 reference set. The primary assembly is a large-scale mosaic of both haplotypes (i.e. is not fully phased) and we have therefore also deposited the contigs corresponding to the alternate haplotype. The S. carolinensis mSciCar1 genome sequence is largely collinear with that of S. vulgaris mSciVul1 ( Figure 4).
Table 1.

Genome data for Sciurus carolinensis mSciCar1.

Project accession data
Assembly identifiermSciCar1
Species Sciurus carolinensis
SpecimenNHMUK ZD 2019.214
NCBI taxonomy ID30640
BioProjectPRJEB35386
Biosample IDSAMEA994726
Isolate informationWild isolate; male
Raw data accessions
PacificBiosciences SEQUEL IERR3313242-ERR3313245, ERR3313247-ERR3313255, ERR3313329, ERR3313331, ERR3313332, ERR3313342- ERR3313348
10X Genomics IlluminaERR3316153-ERR3316156, ERR3316173-ERR3316176
Hi-C IlluminaERR3312499-ERR3312500, ERR3850937
Genome assembly
Assembly accessionGCA_902686445.1
Accession of alternate haplotypeGCA_902685475.1
Span (Mb)2,815,397,268
Number of contigs2576
Contig N50 length (Mb)13.98
Number of scaffolds752
Scaffold N50 length (Mb)148.23
Longest scaffold (Mb)208.99
BUSCO * genome scoreC:93.7%[S:92.3%,D:1.4%],F:2.8%,M :3.5%,n:4104

* BUSCO scores based on the mammalia_odb9 BUSCO set using v3.0.2. C= complete [S= single copy, D=duplicated], F=fragmented, M=missing, n=number of orthologues in comparison. A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/mSciCar1_1/dataset/mSciCar1_1/busco.

Figure 1.

Genome assembly of Sciurus carolinensis mSciCar1: Metrics.

BlobToolKit Snailplot showing N50 metrics for S. carolinensis assembly mSciCar1 and BUSCO scores for the Euarchontoglires set of orthologues. The interactive version is available here.

Figure 5.

Genome assembly of Sciurus carolinensis mSciCar1: Hi-C contact map.

Hi-C scaffolding of the S. carolinensis mSciCar1 assembly visualised in HiGlass ( Kerpedjiev ).

Table 2.

Chromosomal pseudomolecules in the genome assembly of Sciurus carolinensis mSciCar1.

ENA accession Chromosome Size (Mb) GC%
LR738590.11208.9940.3
LR738591.12199.8340.8
LR738592.13183.5540.3
LR738593.14177.1139.5
LR738594.15175.9139.1
LR738595.16162.2738.7
LR738596.17154.9939.1
LR738597.18148.2340.5
LR738598.19141.4238.8
LR738599.110140.9838.1
LR738600.111135.2340.1
LR738602.112118.6540.1
LR738603.11394.6841.1
LR738604.11488.6540.2
LR738605.11583.1440.5
LR738606.11668.5744.7
LR738607.11766.0542.7
LR738608.11841.5647.8
LR738609.11930.9944
LR738601.1X131.7237.8
LR738610.1Y4.8138.3
-Unplaced258.0840
Figure 4.

Genome assembly of Sciurus carolinensis mSciCar1: Whole genome alignment with Sciurus vulgaris mSciVul1.

A nucmer ( Kurtz ) pairwise alignment of mSciCar1 (x-axis) with mSciVul1 (Y axis).

* BUSCO scores based on the mammalia_odb9 BUSCO set using v3.0.2. C= complete [S= single copy, D=duplicated], F=fragmented, M=missing, n=number of orthologues in comparison. A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/mSciCar1_1/dataset/mSciCar1_1/busco.

Genome assembly of Sciurus carolinensis mSciCar1: Metrics.

BlobToolKit Snailplot showing N50 metrics for S. carolinensis assembly mSciCar1 and BUSCO scores for the Euarchontoglires set of orthologues. The interactive version is available here.

Genome assembly of Sciurus carolinensis mSciCar1: GC-coverage plot.

BlobToolKit GC-coverage plot of S. carolinensis mSciCar1 from long read data submission ERR3316154. The interactive version is available here.

Genome assembly of Sciurus carolinensis mSciCar1: Cumulative sequence plot.

The blue line in the main plot shows the cumulative sequence plot for mSciCar. The sashed line shows the cumulative sequence plot of S. vulgaris mSciVul1 for comparison. The interactive version is available here.

Genome assembly of Sciurus carolinensis mSciCar1: Whole genome alignment with Sciurus vulgaris mSciVul1.

A nucmer ( Kurtz ) pairwise alignment of mSciCar1 (x-axis) with mSciVul1 (Y axis).

Genome assembly of Sciurus carolinensis mSciCar1: Hi-C contact map.

Hi-C scaffolding of the S. carolinensis mSciCar1 assembly visualised in HiGlass ( Kerpedjiev ).

Methods

The eastern grey squirrel specimen was collected by the Wildlife Trust for Lancashire, Manchester and North Merseyside as part of an ongoing programme of recovery of dead squirrels. A full tissue dissection and preservation in 80% ethanol was undertaken and the specimen accessioned by the Natural History Museum, London. DNA was extracted using an agarose plug extraction from spleen tissue following the Bionano Prep Animal Tissue DNA Isolation Soft Tissue Protocol [2]. Pacific Biosciences CLR long read and 10X Genomics read cloud sequencing libraries were constructed according to the manufacturers’ instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL I (single molecule long read) and Illumina HiSeq X (10X Genomics Chromium). HiC data were generated using the Dovetail v1.0 kit and sequenced on HiSeq X. See Table 3 for software versions and sources. Assembly was carried out using Falcon-unzip ( Chin ), haplotypic duplication was identified and removed with purge_dups ( Guan ) and a first round of scaffolding carried out with 10X Genomics read clouds using scaff10x. Scaffolding with Hi-C data was carried out using SALSA2. The Hi-C scaffolded assembly was polished with arrow using the PacBio data, then polished with the 10X Genomics Illumina data by aligning to the assembly with longranger align, calling variants with freebayes ( Garrison & Marth, 2012) and applying homozygous non-reference edits using bcftools consensus. Two rounds of the Illumina polishing were applied. The assembly was checked for contamination and corrected using the gEVAL system ( Chow ). Since Hi-C data were sparse, curation was aided by synteny with the assembly for Sciurus vulgaris simultaneously being curated by the Wellcome Sanger Institute. The genome was analysed within the BlobToolKit environment ( Challis ).
Table 3.

Software tools used.

Software toolVersionSource
Falcon-unzipfalcon-kit 1.2.2( Chin et al., 2016)
purge_dups1.0.0( Guan et al., 2020)
SALSA22.2( Ghurye et al., 2018)
scaff10x4.2 https://github.com/wtsi- hpag/Scaff10X
arrowGenomicConsensus 2.3.3 https://github.com/ PacificBiosciences/ GenomicConsensus
longranger align2.2.2 https:// support.10xgenomics. com/genome-exome/ software/pipelines/latest/ advanced/other-pipelines
freebayesv1.1.0-3-g961e5f3( Garrison & Marth, 2012)
bcftools consensus1.9 http://samtools.github. io/bcftools/bcftools.html
gEVAL2016( Chow et al., 2016)
BlobToolKit1( Challis et al., 2019)
nucmer from MUMmer 3 3.0( Kurtz et al., 2004)

Data availability

Underlying data

European Nucleotide Archive: Sciurus carolinensis (grey squirrel) genome assembly, mSciCar1. BioProject accession number PRJEB35386; https://identifiers.org/ena.embl:PRJEB35386. The genome sequence is released openly for reuse. The S. carolinensis genome sequencing initiative is part of the Wellcome Sanger Institute’s “25 genomes for 25 years” project [3]. It is also part of the Vertebrate Genome Project (VGP) [4] and the Darwin Tree of Life (DToL) project [5]. The specimen has been preserved in ethanol and deposited with the Natural History Museum, London under registration number NHMUK ZD 2019.214, where it will remain accessible to the research community for posterity. All raw data and the assembly have been deposited in the ENA. The genome will be annotated and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1. The authors constructed a whole-genome sequence for the eastern gray squirrel using spleen-derived DNA from naturally dead squirrels collected in a Wildlife Trust project. This genome sequence is based on long reads (74-fold coverage) from the PacBio sequencer, short reads from 10X Genomics + Illumina sequencer (40-fold coverage), and scaffolding by Hi-C data (42-fold coverage). From the various genome statistics calculated and BUSCO's score, I think this whole-genome sequence is useful for comparative genome studies. My comments are as follows: I would suggest that the authors describe the age of the squirrel used as samples (if the authors know) and the collecting date. This information may be useful for future secondary use of the Hi-C data acquired in this paper. Also, I would suggest that the authors describe how the amount was the spleen used for DNA extraction. I would suggest that the authors add the number of libraries used to generate the reads by the PacBio and Illumina in the text, respectively. The readers could know their numbers by looking and counting them in Table 1, but I think it will help the reader understand this work's data quality if specified in the text. For the contact map in Figure 5, the authors should describe the X and Y axes and label them. Also, I think it would be easier for the reader to understand the figure if there is a color key. To drive the reproducibility of the data by readers, it would be important to describe the settings of the software tools used to construct the genome sequence. I would suggest that the authors describe the settings of all software tools in the text. Are sufficient details of methods and materials provided to allow replication by others? Partly Is the rationale for creating the dataset(s) clearly described? Yes Are the datasets clearly presented in a useable and accessible format? Yes Are the protocols appropriate and is the work technically sound? Yes Reviewer Expertise: Evolutionary biology. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The authors present the assembly of a male eastern grey squirrel, based on PacBio single molecule sequencing and 10X genomics linked read sequencing. The approach is technically very sound, and mirrors that used in the Vertebrate Genomes Project. The assembly's completeness is impressive, as the authors show by comparison to an existing assembly from the red squirrel, evaluation of the assembly against BUSCO, and several plots from BlobToolKit. I am confident that this assembly will be usable by researchers working on this species. I would suggest that the authors improve the rendering of several of the figures. Those produced by the BlobToolKit have very small font relative to their rasterized pixel density. I would either render them as vector graphics or adjust the rendering (if possible) to improve the font size. The HiGlass plot clearly demonstrates the expected pattern of connectivity across the chromosome-scale scaffolds, but overplotting of the delimiting line (grey bars) makes the region of the plot (to the bottom and right) referring to the smaller scaffolds completely illegible. If this can be fixed, it might make the plot a little more interesting. Are sufficient details of methods and materials provided to allow replication by others? Yes Is the rationale for creating the dataset(s) clearly described? Yes Are the datasets clearly presented in a useable and accessible format? Yes Are the protocols appropriate and is the work technically sound? Yes Reviewer Expertise: (pan)genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
  12 in total

1.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors:  Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal:  Bioinformatics       Date:  2015-06-09       Impact factor: 6.937

2.  Phased diploid genome assembly with single-molecule real-time sequencing.

Authors:  Chen-Shan Chin; Paul Peluso; Fritz J Sedlazeck; Maria Nattestad; Gregory T Concepcion; Alicia Clum; Christopher Dunn; Ronan O'Malley; Rosa Figueroa-Balderas; Abraham Morales-Cruz; Grant R Cramer; Massimo Delledonne; Chongyuan Luo; Joseph R Ecker; Dario Cantu; David R Rank; Michael C Schatz
Journal:  Nat Methods       Date:  2016-10-17       Impact factor: 28.547

3.  Multilocus sequencing of Hepatozoon cf. griseisciuri infections in Ontario eastern gray squirrels (Sciurus carolinensis) uncovers two genotypically distinct sympatric parasite species.

Authors:  Alexandre N Léveillé; Nahla El Skhawy; John R Barta
Journal:  Parasitol Res       Date:  2020-01-07       Impact factor: 2.289

4.  Integrating Hi-C links with assembly graphs for chromosome-scale assembly.

Authors:  Jay Ghurye; Arang Rhie; Brian P Walenz; Anthony Schmitt; Siddarth Selvaraj; Mihai Pop; Adam M Phillippy; Sergey Koren
Journal:  PLoS Comput Biol       Date:  2019-08-21       Impact factor: 4.475

5.  Versatile and open software for comparing large genomes.

Authors:  Stefan Kurtz; Adam Phillippy; Arthur L Delcher; Michael Smoot; Martin Shumway; Corina Antonescu; Steven L Salzberg
Journal:  Genome Biol       Date:  2004-01-30       Impact factor: 13.583

6.  gEVAL - a web-based browser for evaluating genome assemblies.

Authors:  William Chow; Kim Brugger; Mario Caccamo; Ian Sealy; James Torrance; Kerstin Howe
Journal:  Bioinformatics       Date:  2016-04-07       Impact factor: 6.937

7.  The genome sequence of the Eurasian red squirrel, Sciurus vulgaris Linnaeus 1758.

Authors:  Daniel Mead; Kathryn Fingland; Rachel Cripps; Roberto Portela Miguez; Michelle Smith; Craig Corton; Karen Oliver; Jason Skelton; Emma Betteridge; Jale Dolucan; Olga Dudchenko; Arina D Omer; David Weisz; Erez Lieberman Aiden; Olivier Fedrigo; Jacquelyn Mountcastle; Erich Jarvis; Shane A McCarthy; Ying Sims; James Torrance; Alan Tracey; Kerstin Howe; Richard Challis; Richard Durbin; Mark Blaxter
Journal:  Wellcome Open Res       Date:  2020-02-03

8.  BlobToolKit - Interactive Quality Assessment of Genome Assemblies.

Authors:  Richard Challis; Edward Richards; Jeena Rajan; Guy Cochrane; Mark Blaxter
Journal:  G3 (Bethesda)       Date:  2020-04-09       Impact factor: 3.154

9.  Identifying and removing haplotypic duplication in primary genome assemblies.

Authors:  Dengfeng Guan; Shane A McCarthy; Jonathan Wood; Kerstin Howe; Yadong Wang; Richard Durbin
Journal:  Bioinformatics       Date:  2020-05-01       Impact factor: 6.937

10.  HiGlass: web-based visual exploration and analysis of genome interaction maps.

Authors:  Peter Kerpedjiev; Nezar Abdennur; Fritz Lekschas; Chuck McCallum; Kasper Dinkla; Hendrik Strobelt; Jacob M Luber; Scott B Ouellette; Alaleh Azhir; Nikhil Kumar; Jeewon Hwang; Soohyun Lee; Burak H Alver; Hanspeter Pfister; Leonid A Mirny; Peter J Park; Nils Gehlenborg
Journal:  Genome Biol       Date:  2018-08-24       Impact factor: 13.583

View more
  3 in total

1.  A de novo genome assembly and annotation of the southern flying squirrel (Glaucomys volans).

Authors:  Jesse F Wolf; Jeff Bowman; Sonesinh Keobouasone; Rebecca S Taylor; Paul J Wilson
Journal:  G3 (Bethesda)       Date:  2022-01-04       Impact factor: 3.542

2.  A draft genome assembly for the eastern fox squirrel, Sciurus niger.

Authors:  Lin Kang; Pawel Michalak; Eric Hallerman; Nancy D Moncrief
Journal:  G3 (Bethesda)       Date:  2021-12-08       Impact factor: 3.154

3.  Sequence locally, think globally: The Darwin Tree of Life Project.

Authors: 
Journal:  Proc Natl Acad Sci U S A       Date:  2022-01-25       Impact factor: 12.779

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.