Literature DB >> 30513700

The Genome of the North American Brown Bear or Grizzly: Ursus arctos ssp. horribilis.

Gregory A Taylor¹, Heather Kirk², Lauren Coombe³, Shaun D Jackman⁴, Justin Chu⁵, Kane Tse⁶, Dean Cheng⁷, Eric Chuah⁸, Pawan Pandoh⁹, Rebecca Carlsen¹⁰, Yongjun Zhao¹¹, Andrew J Mungall¹², Richard Moore¹³, Inanc Birol^14,15, Maria Franke¹⁶, Marco A Marra^17,18, Christopher Dutton¹⁹, Steven J M Jones^20,21,22.

Abstract

The grizzly bear (Ursus arctos ssp. horribilis) represents the largest population of brown bears in North America. Its genome was sequenced using a microfluidic partitioning library construction technique, and these data were supplemented with sequencing from a nanopore-based long read platform. The final assembly was 2.33 Gb with a scaffold N50 of 36.7 Mb, and the genome is of comparable size to that of its close relative the polar bear (2.30 Gb). An analysis using 4104 highly conserved mammalian genes indicated that 96.1% were found to be complete within the assembly. An automated annotation of the genome identified 19,848 protein coding genes. Our study shows that the combination of the two sequencing modalities that we used is sufficient for the construction of highly contiguous reference quality mammalian genomes. The assembled genome sequence and the supporting raw sequence reads are available from the NCBI (National Center for Biotechnology Information) under the bioproject identifier PRJNA493656, and the assembly described in this paper is version QXTK01000000.

Entities: Chemical Disease Mutation Species

Keywords: Ursus arctos ssp. Horribilis; genome; grizzly bear; microfluidic partitioning; nanopore

Year: 2018 PMID： 30513700 PMCID： PMC6315469 DOI： 10.3390/genes9120598

Source DB: PubMed Journal: Genes (Basel) ISSN： 2073-4425 Impact factor: 4.096

1. Introduction

The grizzly bear is the most common sub-species of brown bears found in North America. Brown bears (Ursus arctos) were historically found across much of North America, Asia, Europe, and even Northern Africa, but loss of habitat, human encroachment, and hunting have seen this range greatly reduced in the past two centuries [1]. Within North America, both the California Grizzly (Ursus arctos californicus) and the Mexican Grizzly (Ursus arctos nelsoni) are already extinct. The North American brown bear is among the largest predators on the continent, second only to its close relative the polar bear and, as such, requires a large territory to sustain its diet. Grizzly bear density varies from 3 per 100 km2 in the resource scarce interior of British Columbia, to as many as 25 per 100 km2 in resource dense coastal regions [2]. As human encroachment on grizzly habitat continues, a better understanding of their biology will aid in conservation. DNA was sequenced from a 20-year old male grizzly bear, Samson. This bear was orphaned in the wilds of Alaska in 1998 when he was less than a year old. He was rescued and raised at the Alaska Children’s Zoo and then moved to the Toronto Zoo.

2. Methods

The genome assembly was constructed from both an Illumina (San Diego, CA, USA) HiSeqX sequenced Chromium library and an Oxford Nanopore (Oxford, UK) library sequenced on a MinION sequencer. The raw sequence data came completely from the Chromium library; the nanopore reads were used only for scaffolding information to increase the assembly contiguity. A blood sample was taken as part of a routine physical exam of an adult male North American brown bear at the Toronto Zoo (GAN/ISIS:MIG12-29695490/34125). The portion of the sample used for sequencing was deemed to be in excess after the animal’s physical health was ascertained and was donated by the zoo. The Toronto Zoo operates under the accreditation of AZA (Accredited Zoos and Aquariums), CAZA (Canada’s Accredited Zoos and Aquariums), and all of their research is carried out under a certification of Good Animal Practice® awarded by CCAC (Canadian Council on Animal Care). High molecular weight (HMW) DNA was extracted from a fresh whole blood sample using the Qiagen MagAttract HMW DNA Kit (cat. no. 67563, QIAGEN, Germantown, MD, USA) and the HMW genomic DNA (gDNA) extraction protocol as detailed in the Chromium Genome Reagent Kits Version 2 User Guide (PN-120229) [3,4]. Integrity of the DNA was assessed by Pulsed Field Gel Electrophoresis (PFGE) with the majority of DNA fragments over 50 kb in length. The fragment size was confirmed in silico after assembly; the weighted mean molecule length was 46 kb. A micro-fluidic partitioned library was created using the Chromium system from 10× Genomics (10× Genomics, Pleasanton, CA, USA). GEMs (Gel beads-in-EMulsion) were produced by combining DNA, Master Mix, and partitioning oil in the 10× Genomics Chromium Controller instrument with the micro-fluidic Genome Chip (PN-120216) (10× Genomics). The DNA in each GEM underwent isothermic amplification as a barcode was added to each fragment. Barcoded fragments then underwent Illumina library construction (as per the Chromium Genome Reagent Kits Version 2 User Guide (PN-120229)). The resulting library was assessed for quality using the Agilent 2100 Bioanalyzer (Santa Clara, CA, USA) and a DNA 1000 assay. The median insert size was 375 bp. The library was then sequenced on an Illumina HiSeqX sequencer using the paired-end protocol to produce 855 million 150 bp reads, an estimated 55-fold genome coverage. A second genomic library was constructed from the same HMW DNA sample, but this one conformed to Oxford Nanopore Technologies’ protocols. It was constructed using the SQK-LSK108 Ligation Library Kit. Liquid handling was performed using wide bore tips to avoid physically breaking the DNA. Six µg of HMW DNA were gently sheared using 10 passes up and down through a 26 gauge needle (BD medical, Franklin Lakes, NJ, USA, cat. no. 309625) and a size selection step was completed using a 0.35:1 ratio of PCRClean DX magnetic beads to DNA (cat. no. C-1003-450, ALINE Biosciences, Woburn, MA, USA). NEB Ultra II (New England Biolabs, Ipswich, MA, USA, cat. no. E7646A) was used for end-repair and 3′ A-tailing. NEB Blunt/TA Ligation Master Mix (M0367S) was used to ligate the Oxford Nanopore adapters. A final size selection of 0.4:1 ratio (magnetic beads to library) was done to eliminate smaller molecules. MinION sequencing proceeded using the FLO-MIN106 (R9 Version) flow cell and the software programs MinKnow 1.13.1 and GUI 2.0.13. The MinION sequencing run produced over 1 million reads, totaling 16 billion base pairs, with an N50 of 20,211 bp. This amount of data equates to a 7-fold genome coverage. Supernova (version 2.0.1) [5] was used to assemble the Chromium Illumina reads. The 855 million reads (55× coverage) were assembled into an initial assembly of 8474 scaffolds, totaling 2.31 Gb in length and an N50 of 33.78 Mb. Tigmint [6] was used to break potential misassemblies, based on read coverage across the assembly. This provided a starting point for re-scaffolding that was as error free as possible. ARCS [7] was run to scaffold the contigs using the Chromium reads, thereby making the most of this data. LINKS [8] was then used to add scaffolding information to the assembly from the Nanopore reads. LINKS is a tool designed to use uncorrected long reads for scaffolding and does not introduce any new bases to the assembly. The tool was run iteratively eight times, each time using a larger fragment size to produce pseudo read-pairs from the Nanopore data that were then applied to the assembly as scaffolding. Within LINKS, the distance values of 1000, 2500, 5000, 7500, 10,000, 12,500, 15,000, and 30,000 were used successively to achieve the scaffolding results observed in Table 1.

Table 1

Assembly statistics and gene content for the genome sequences reported in this study. Busco: Benchmarking Universal Single-Copy Orthologs.

Assembly	# of Scaffolds	Gaps within Scaffolds	Scaffold N50 (bp)	Longest Scaffold (bp)	BUSCO Complete Genes (of 4104)
Supernova	8474	21,957	33.78 × 10⁶	105.9 × 10⁶	3943 (96.1%)
Tigmint	8728	21,947	26.32 × 10⁶	92.41 × 10⁶	3943 (96.1%)
ARCS	8679	21,996	27.77 × 10⁶	92.41 × 10⁶	3943 (96.1%)
LINKS1	8350	22,219	27.77 × 10⁶	92.41 × 10⁶	3943 (96.1%)
LINKS8	6673	23,947	36.71 × 10⁶	92.42 × 10⁶	3943 (96.1%)
Sealer	6673	15,572	36.71 × 10⁶	92.43 × 10⁶	3943 (96.1%)

Finally, Sealer [9] was used to fill in the gaps that the previous scaffolding steps had produced. Because the Illumina reads from the Chromium library are substantially more accurate than the Nanopore reads, the Bloom filter for the Sealer run was populated using only the former read set. At each step of the assembly, BUSCO [10] was used as an additional measure to assess genome completeness. BUSCO (Benchmarking Universal Single-Copy Orthologs) attempts to reconstruct a set of 4104 conserved mammalian genes, and the number of genes reconstructed is an indicator of genome assembly completeness. Ninety-six point one percent of the BUSCO mammalian gene set was reconstructed as complete genes in this assembly, and a further 1.9% of the genes were identified as gene fragments. The high reconstruction rate of BUSCO genes was constant at all stages of the assembly process, indicating both the high quality of the assembly and that gene rich regions are more likely to be assembled correctly from the start due to their high information content. BUSCO genes represented approximately 20% of all grizzly bear genes, and extrapolations from this data set are premised upon these genes being representative of all genes in both their distribution and structure. The genome was annotated using the RefSeq eukaryotic pipeline [11]. This analysis indicated the presence of 19,848 coding genes, 7061 non-coding genes, 3671 pseudo-genes, and 119 immunoglobulin gene segments.

3. Results and Discussion

The grizzly bear genome has a diploid karyotype of 37 chromosome pairs [12,13], and there is a mean distance of 688 bp between heterozygous positions in this assembly. Based on the N50 of our assembly and the estimated genome size of 2.3 Gb, the longest scaffolds in the grizzly bear assembly most likely represent full chromosome arms, and the observed heterozygous positions can act as a starting point for further population diversity studies. The polar bear is the closest relative to the grizzly bear for which the genome has been sequenced [14]. Based on BUSCO analysis of both assemblies, using the 4301 gene mammalian dataset, the grizzly bear genome is more complete. The grizzly bear genome is also more contiguous than the polar bear genome as detailed in Table 2.

Table 2

Assembly statistics of the grizzly bear and its closest sequenced relative, the polar bear.

Assembly	# of Scaffolds	Scaffold N50 (bp)	Scaffold L50	# of Contigs	Contig N50 (bp)	Contig L50	BUSCO Complete Genes
Grizzly Bear	6673	36.71 × 10⁶	21	22,245	314 × 10³	2191	3943 (96.1%)
Polar Bear	23,819	15.94 × 10⁶	46	134,162	46 × 10³	14,124	3890 (94.7%)

A global alignment of the grizzly bear assembly to the polar bear assembly was accomplished using BWA-MEM (version 0.7.17) [15] and visualized as a Jupiter plot [16] (an adaptation of a Circos diagram [17]) in Figure 1. The colored bands of Figure 1 represent regions of synteny. Diagonal lines in the plot identify split alignments that may be the result of rearrangements, but more likely these diagonal lines point to differing break-points in contiguity between the two assemblies, since they all appear on the edges of scaffolds. At 10 kb resolution, there were no definite breaks in synteny between the two assemblies.

Figure 1

Jupiter plot. A global genome alignment, using BWA-MEM, of the grizzly genome (left side of circle) to the polar bear genome (right side). Connections show the aligned regions of each assembly. The grizzly scaffolds are limited to those over 10 Mb in length (>85% of the assembly). The longest polar bear scaffolds were selected to sum to the same amount of sequence (2 Gb). Only alignments over 10 kb in length are displayed.

Micro-fluidics has proven to be a reliable technology for creating partitioned libraries. These libraries, in turn, greatly enhance the quality of the resulting genome assembly. Nanopore reads provide a great deal of scaffolding information, from an independent source, that can corroborate the integrity of a Supernova assembly. This combined approach to genome assembly has produced a high-quality reference genome. A reference brown bear genome can serve as a solid starting point for further investigation of the evolutionary ties to other bear species and the intra-species diversity present within the grizzly bear population.

15 in total

1. Circos: an information aesthetic for comparative genomics.

Authors: Martin Krzywinski; Jacqueline Schein; Inanç Birol; Joseph Connors; Randy Gascoyne; Doug Horsman; Steven J Jones; Marco A Marra
Journal: Genome Res Date: 2009-06-18 Impact factor: 9.043

2. Comparative genomics: tracking chromosome evolution in the family ursidae using reciprocal chromosome painting.

Authors: W G Nash; J Wienberg; M A Ferguson-Smith; J C Menninger; S J O'Brien
Journal: Cytogenet Cell Genet Date: 1998

3. Phylogeography and mitochondrial diversity of extirpated brown bear (Ursus arctos) populations in the contiguous United States and Mexico.

Authors: C R Miller; L P Waits; P Joyce
Journal: Mol Ecol Date: 2006-12 Impact factor: 6.185

4. Population genomics reveal recent speciation and rapid evolutionary adaptation in polar bears.

Authors: Shiping Liu; Eline D Lorenzen; Matteo Fumagalli; Bo Li; Kelley Harris; Zijun Xiong; Long Zhou; Thorfinn Sand Korneliussen; Mehmet Somel; Courtney Babbitt; Greg Wray; Jianwen Li; Weiming He; Zhuo Wang; Wenjing Fu; Xueyan Xiang; Claire C Morgan; Aoife Doherty; Mary J O'Connell; James O McInerney; Erik W Born; Love Dalén; Rune Dietz; Ludovic Orlando; Christian Sonne; Guojie Zhang; Rasmus Nielsen; Eske Willerslev; Jun Wang
Journal: Cell Date: 2014-05-08 Impact factor: 41.582

5. Sealer: a scalable gap-closing application for finishing draft genomes.

Authors: Daniel Paulino; René L Warren; Benjamin P Vandervalk; Anthony Raymond; Shaun D Jackman; Inanç Birol
Journal: BMC Bioinformatics Date: 2015-07-25 Impact factor: 3.169

6. RefSeq: an update on mammalian reference sequences.

Authors: Kim D Pruitt; Garth R Brown; Susan M Hiatt; Françoise Thibaud-Nissen; Alexander Astashyn; Olga Ermolaeva; Catherine M Farrell; Jennifer Hart; Melissa J Landrum; Kelly M McGarvey; Michael R Murphy; Nuala A O'Leary; Shashikant Pujar; Bhanu Rajput; Sanjida H Rangwala; Lillian D Riddick; Andrei Shkeda; Hanzhen Sun; Pamela Tamez; Raymond E Tully; Craig Wallin; David Webb; Janet Weber; Wendy Wu; Michael DiCuccio; Paul Kitts; Donna R Maglott; Terence D Murphy; James M Ostell
Journal: Nucleic Acids Res Date: 2013-11-19 Impact factor: 16.971

7. Direct determination of diploid genome sequences.

Authors: Neil I Weisenfeld; Vijay Kumar; Preyas Shah; Deanna M Church; David B Jaffe
Journal: Genome Res Date: 2017-04-05 Impact factor: 9.043

8. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

9. LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads.

Authors: René L Warren; Chen Yang; Benjamin P Vandervalk; Bahar Behsaz; Albert Lagman; Steven J M Jones; Inanç Birol
Journal: Gigascience Date: 2015-08-04 Impact factor: 6.524

10. ARCS: scaffolding genome drafts with linked reads.

Authors: Sarah Yeo; Lauren Coombe; René L Warren; Justin Chu; Inanç Birol
Journal: Bioinformatics Date: 2018-03-01 Impact factor: 6.937

11 in total

1. An Annotated Draft Genome for the Andean Bear, Tremarctos ornatus.

Authors: Nedda F Saremi; Jonas Oppenheimer; Christopher Vollmers; Brendan O'Connell; Shard A Milne; Ashley Byrne; Li Yu; Oliver A Ryder; Richard E Green; Beth Shapiro
Journal: J Hered Date: 2021-07-15 Impact factor: 2.645

2. Hibernation induces widespread transcriptional remodeling in metabolic tissues of the grizzly bear.

Authors: Heiko T Jansen; Shawn Trojahn; Michael W Saxton; Joanna L Kelley; Corey R Quackenbush; Brandon D Evans Hutzenbiler; O Lynne Nelson; Omar E Cornejo; Charles T Robbins
Journal: Commun Biol Date: 2019-09-13

3. CNVs with adaptive potential in Rangifer tarandus: genome architecture and new annotated assembly.

Authors: Julien Prunier; Alexandra Carrier; Isabelle Gilbert; William Poisson; Vicky Albert; Joëlle Taillon; Vincent Bourret; Steeve D Côté; Arnaud Droit; Claude Robert
Journal: Life Sci Alliance Date: 2021-12-15

4. Demographic History of the Brown Bear (Ursus arctos) on Hokkaido Island, Japan, Based on Whole-Genomic Sequence Analysis.

Authors: Yu Endo; Naoki Osada; Tsutomu Mano; Ryuichi Masuda
Journal: Genome Biol Evol Date: 2021-09-01 Impact factor: 3.416

5. Conservation of chromatin conformation in carnivores.

Authors: Marco Corbo; Joana Damas; Madeline G Bursell; Harris A Lewin
Journal: Proc Natl Acad Sci U S A Date: 2022-03-01 Impact factor: 11.205

6. The genome of the forest insect pest Pissodes strobi reveals genome expansion and evidence of a Wolbachia endosymbiont.

Authors: Kristina K Gagalova; Justin G A Whitehill; Luka Culibrk; Diana Lin; Véronique Lévesque-Tremblay; Christopher I Keeling; Lauren Coombe; Macaire M S Yuen; Inanç Birol; Jörg Bohlmann; Steven J M Jones
Journal: G3 (Bethesda) Date: 2022-04-04 Impact factor: 3.154

7. Dental Calculus as a Tool to Study the Evolution of the Mammalian Oral Microbiome.

Authors: Jaelle C Brealey; Henrique G Leitão; Tom van der Valk; Wenbo Xu; Katia Bougiouri; Love Dalén; Katerina Guschanski
Journal: Mol Biol Evol Date: 2020-10-01 Impact factor: 16.240

8. Analyses of key genes involved in Arctic adaptation in polar bears suggest selection on both standing variation and de novo mutations played an important role.

Authors: Jose Alfredo Samaniego Castruita; Michael V Westbury; Eline D Lorenzen
Journal: BMC Genomics Date: 2020-08-06 Impact factor: 3.969

9. Chromosome-Level Reference Genome Assembly for the American Pika (Ochotona princeps).

Authors: Bryson M F Sjodin; Kurt E Galbreath; Hayley C Lanier; Michael A Russello
Journal: J Hered Date: 2021-11-01 Impact factor: 2.645

10. Long-read isoform sequencing reveals tissue-specific isoform expression between active and hibernating brown bears (Ursus arctos).

Authors: Elizabeth Tseng; Jason G Underwood; Brandon D Evans Hutzenbiler; Shawn Trojahn; Brewster Kingham; Olga Shevchenko; Erin Bernberg; Michelle Vierra; Charles T Robbins; Heiko T Jansen; Joanna L Kelley
Journal: G3 (Bethesda) Date: 2022-03-04 Impact factor: 3.542