Literature DB >> 35141371

Draft genome sequence data of Indian rhinoceros, Rhinoceros unicornis.

Kei Nabeshima1, Nobuyoshi Nakajima2, Mitsuaki Ogata3, Manabu Onuma1.   

Abstract

The Indian rhinoceros (Rhinoceros unicornis) is a large herbivore found in northern India and southern Nepal. It is a critically endangered species, with an estimated population of approximately 3,600 in the wild. Genetic factors, such as the loss of genetic diversity and the accumulation of deleterious variations, are critical risk factors for the extinction of endangered species, such as the Indian rhinoceros. To support the conservation efforts of the Indian rhinoceros, we assembled its draft genome. The new genomic data will enable the study of functional genes associated with the ecological and physiological characteristics of Indian rhinoceros and help us establish more effective conservation measures. The muscles of an Indian rhinoceros that died from prostration at a zoo were collected, and the samples were stored at the National Institute for Environmental Studies (Tsukuba, Japan). Sequence data were obtained using an Illumina NovaSeq 6000 platform for short reads and an Oxford Nanopore Technologies PromethION for long reads. We generated approximately 235.2 Gbp of data. From these sequences, we assembled a 2,375,051,758 bp genome consisting of 7,615 contigs. The genome data are available from the National Center Biotechnology Information BioProject database under accession number BOSQ00000000.
© 2022 The Authors.

Entities:  

Keywords:  Hybrid sequencing; Indian rhinoceros; Whole-genome sequence; Wildlife

Year:  2022        PMID: 35141371      PMCID: PMC8814301          DOI: 10.1016/j.dib.2022.107857

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the Data

The Indian rhinoceros (Rhinoceros unicornis) is a critically endangered herbivore with little genetic information available. The Indian rhinoceros genome data can be used for ex situ conservation and infectious disease control. These genome data can be analyzed together with other high-quality Indian rhinoceros whole-genome data (accession number: JAFHKO000000000) to assess the diversity of the Indian rhinoceros species and identify genomic rearrangements. These data may also be used by other researchers to identify genes related to immunity and plan breeding programs to maintain genetic diversity.

Data Description

The Indian rhinoceros (Rhinoceros unicornis) is an endangered species categorized as vulnerable by the International Union for Conservation of Nature Redlist of Threatened Species [1]. Although the species declined to near extinction in the early 1900s, the population of Indian rhinoceros is currently increasing. The total population estimate in August 2018 was 3,588 individuals, with 649 animals in Nepal and 2,939 in India [2]. However, despite the increasing population size, there are still threats to the species [2]. It is important to consider genetic factors for conservation activities because the Indian rhinoceros declined to near extinction in the early 1900s [2]. Genetic factors such as the loss of genetic diversity and the accumulation of deleterious variations are known to be critical risk factors for the extinction of endangered species [3], [4], [5], [6]. To support the conservation efforts of the Indian rhinoceros, we generated high-precision genomic data. These data will enable the study of functional genes associated with the ecological and physiological characteristics of the Indian rhinoceros and will facilitate the establishment of more effective conservation measures. We sequenced both short-read and long-read libraries and generated approximately 235.2 Gbp of data. We obtained 623,580,225 short reads and 5,536,969 long reads (Table 1). The sequencing data were deposited in the Sequence Read Archive under accession numbers DRR308100 and DRR311486. The short and long reads were assembled into contigs using the HASLR program, which utilizes a hybrid assembly approach [7]. We assembled a 2,375,051,758 bp genome consisting of 7,615 contigs, with an N50 of 663 kbp. The GC content of the Indian rhinoceros was 40.99%, and the complete Benchmarking Universal Single-Copy Orthologs (BUSCO) score (C) was 96.5% (single copy, S:96.1%; duplicated, D:0.4%; fragmented, F:3.0%; and missing, M:0.5%; Table 2). The Indian rhinoceros genome sequence is a potentially useful resource for future molecular evolutionary analyses of mammals.
Table 1

Amount of data generated.

Type of readsNo. of readsAverage Read lengthTotal data
Short623,580,225150 bp188.3 Gbp
Long5,536,9698,236.2 bp46.9 Gbp
Table 2

General features of the Rhinoceros unicornis genome.

GC content (%)40.99
Number of contigs7,615
Number of scaffolds7,615
Total contig length (bp)2,375,051,758
N50 contig size (bp)663,630
Longest sequence (bp)5,292,610
Shortest sequence (bp)10,012
Mean sequence length (bp)311,891
Median sequence length (bp)156,082
BUSCO scoreC:96.5% [S:96.1%, D:0.4%], F:3.0%, M:0.5%, n:233
Amount of data generated. General features of the Rhinoceros unicornis genome.

Experimental Design, Materials and Methods

Sample preparation and sequencing

We sampled Rhinoceros unicornis at Yokohama Municipal Kanazawa Zoo, Yokohama, Japan (NIES ID: 5488M, female). The rhinoceros was born on February 1, 2007 and died from prostration on March 22, 2007. Muscle tissue was autopsied to determine the cause of death. Genomic DNA was extracted from the muscles using proteinase K and phenol/chloroform/isoamyl alcohol for short-read sequencing and a NucleoBond HMW DNA extraction kit for long-read sequencing. Short-read whole-genome sequencing was performed by Macrogen Japan (Tokyo,Japan). Short-read libraries were prepared using a TruSeq LT PCR-free DNA Library Preparation Kit, and sequencing was performed using the NovaSeq 6000 sequencing system, with 2 × 150 bp paired-end reads. Long-read sequencing was performed by GeneBay (Yokohoma, Japan). Long-read libraries were prepared using a Ligation Sequencing Kit, and sequencing was performed using the PromethION system.

De novo genome assembly and assessment

The sequenced reads were assembled using HASLR v. 2020-06a, which utilizes a hybrid assembly approach [7]. Assembly was performed by specifying a minimum long-read coverage of 20 ×, an estimated genome size of 2.4 Gb, and other parameters kept at default settings. The genome assembly was evaluated using BUSCO v5 [8], based on the core vertebrate gene set [9], using the gVolante pipeline [10,11].

Ethics Statement

None.

CRediT Author Statement

Kei Nabeshima: Methodology, Software, Writing– original draft preparation; Nobuyoshi Nakajima: Data curation; Mitsuaki Ogata: Sampling; Manabu Onuma: Conceptualization, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that have or could be perceived to have influenced the work reported in this article.
SubjectBiodiversity
Specific subject areaGenomics
Type of dataGenome sequences and table
How the data were acquiredHigh-throughput DNA sequencing using NovaSeq 6000 and PromethION platforms
Data formatRaw and assembled genome sequences
Description of data collectionThe sample was obtained from the muscle tissue of Rhinoceros unicornis at Yokohama Municipal Kanazawa Zoo, Yokohama, Japan (NIES ID: 5488M, female). Genomic DNA was extracted using proteinase K and phenol/chloroform/isoamyl alcohol for short-read sequencing and a NucleoBond HMW DNA extraction kit (Macherey-Nagel, Düren, Germany) for long-read sequencing. Short-read libraries were prepared using a TruSeq LT PCR-free DNA Library Preparation Kit, and sequencing was performed using the NovaSeq 6000 sequencing system (Illumina, San Diego, CA, USA) with 2 × 150 bp paired-end reads. Long-read libraries were prepared using a Ligation Sequencing Kit, and sequencing was performed using the PromethION system (Oxford Nanopore Technologies, Oxford, UK). The short and long reads were assembled into contigs using the HASLR program, which utilizes a hybrid assembly approach.
Data source locationTsukuba, Ibaraki, Japan
Data accessibilityData have been deposited in relevant databases and are publicly available. The sequencing data were deposited in the Sequence Read Archive under accession numbers DRR308100 (https://www.ncbi.nlm.nih.gov/sra/?term=DRR308100) and DRR311486 (https://www.ncbi.nlm.nih.gov/sra/?term=DRR311486). The whole-genome sequence, Rhinoceros unicornis ID: 5488M, was deposited in GenBank under accession number BOSQ00000000 (https://www.ncbi.nlm.nih.gov/nuccore/2085786713). All details regarding genome sequencing data are available at NCBI under BioProject accession number PRJDB11285 (https://www.ncbi.nlm.nih.gov/bioproject/PRJDB11285).
  8 in total

1.  Evaluating Genome Assemblies and Gene Models Using gVolante.

Authors:  Osamu Nishimura; Yuichiro Hara; Shigehiro Kuraku
Journal:  Methods Mol Biol       Date:  2019

2.  gVolante for standardizing completeness assessment of genome and transcriptome assemblies.

Authors:  Osamu Nishimura; Yuichiro Hara; Shigehiro Kuraku
Journal:  Bioinformatics       Date:  2017-11-15       Impact factor: 6.937

3.  The inflated significance of neutral genetic diversity in conservation genetics.

Authors:  João C Teixeira; Christian D Huber
Journal:  Proc Natl Acad Sci U S A       Date:  2021-03-09       Impact factor: 11.205

4.  Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation.

Authors:  Yuichiro Hara; Kaori Tatsumi; Michio Yoshida; Eriko Kajikawa; Hiroshi Kiyonari; Shigehiro Kuraku
Journal:  BMC Genomics       Date:  2015-11-18       Impact factor: 3.969

Review 5.  Conservation of biodiversity in the genomics era.

Authors:  Megan A Supple; Beth Shapiro
Journal:  Genome Biol       Date:  2018-09-11       Impact factor: 13.583

6.  HASLR: Fast Hybrid Assembly of Long Reads.

Authors:  Ehsan Haghshenas; Hossein Asghari; Jens Stoye; Cedric Chauve; Faraz Hach
Journal:  iScience       Date:  2020-07-25

Review 7.  Population genomics for wildlife conservation and management.

Authors:  Paul A Hohenlohe; W Chris Funk; Om P Rajora
Journal:  Mol Ecol       Date:  2020-11-18       Impact factor: 6.185

8.  BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes.

Authors:  Mosè Manni; Matthew R Berkeley; Mathieu Seppey; Felipe A Simão; Evgeny M Zdobnov
Journal:  Mol Biol Evol       Date:  2021-09-27       Impact factor: 16.240

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.