Literature DB >> 35141371

Draft genome sequence data of Indian rhinoceros, Rhinoceros unicornis.

Kei Nabeshima¹, Nobuyoshi Nakajima², Mitsuaki Ogata³, Manabu Onuma¹.

Abstract

The Indian rhinoceros (Rhinoceros unicornis) is a large herbivore found in northern India and southern Nepal. It is a critically endangered species, with an estimated population of approximately 3,600 in the wild. Genetic factors, such as the loss of genetic diversity and the accumulation of deleterious variations, are critical risk factors for the extinction of endangered species, such as the Indian rhinoceros. To support the conservation efforts of the Indian rhinoceros, we assembled its draft genome. The new genomic data will enable the study of functional genes associated with the ecological and physiological characteristics of Indian rhinoceros and help us establish more effective conservation measures. The muscles of an Indian rhinoceros that died from prostration at a zoo were collected, and the samples were stored at the National Institute for Environmental Studies (Tsukuba, Japan). Sequence data were obtained using an Illumina NovaSeq 6000 platform for short reads and an Oxford Nanopore Technologies PromethION for long reads. We generated approximately 235.2 Gbp of data. From these sequences, we assembled a 2,375,051,758 bp genome consisting of 7,615 contigs. The genome data are available from the National Center Biotechnology Information BioProject database under accession number BOSQ00000000.

Entities: Chemical

Keywords: Hybrid sequencing; Indian rhinoceros; Whole-genome sequence; Wildlife

Year: 2022 PMID： 35141371 PMCID： PMC8814301 DOI： 10.1016/j.dib.2022.107857

Source DB: PubMed Journal: Data Brief ISSN： 2352-3409

Specifications Table

Value of the Data

The Indian rhinoceros (Rhinoceros unicornis) is a critically endangered herbivore with little genetic information available. The Indian rhinoceros genome data can be used for ex situ conservation and infectious disease control. These genome data can be analyzed together with other high-quality Indian rhinoceros whole-genome data (accession number: JAFHKO000000000) to assess the diversity of the Indian rhinoceros species and identify genomic rearrangements. These data may also be used by other researchers to identify genes related to immunity and plan breeding programs to maintain genetic diversity.

Data Description

The Indian rhinoceros (Rhinoceros unicornis) is an endangered species categorized as vulnerable by the International Union for Conservation of Nature Redlist of Threatened Species [1]. Although the species declined to near extinction in the early 1900s, the population of Indian rhinoceros is currently increasing. The total population estimate in August 2018 was 3,588 individuals, with 649 animals in Nepal and 2,939 in India [2]. However, despite the increasing population size, there are still threats to the species [2]. It is important to consider genetic factors for conservation activities because the Indian rhinoceros declined to near extinction in the early 1900s [2]. Genetic factors such as the loss of genetic diversity and the accumulation of deleterious variations are known to be critical risk factors for the extinction of endangered species [3], [4], [5], [6]. To support the conservation efforts of the Indian rhinoceros, we generated high-precision genomic data. These data will enable the study of functional genes associated with the ecological and physiological characteristics of the Indian rhinoceros and will facilitate the establishment of more effective conservation measures. We sequenced both short-read and long-read libraries and generated approximately 235.2 Gbp of data. We obtained 623,580,225 short reads and 5,536,969 long reads (Table 1). The sequencing data were deposited in the Sequence Read Archive under accession numbers DRR308100 and DRR311486. The short and long reads were assembled into contigs using the HASLR program, which utilizes a hybrid assembly approach [7]. We assembled a 2,375,051,758 bp genome consisting of 7,615 contigs, with an N50 of 663 kbp. The GC content of the Indian rhinoceros was 40.99%, and the complete Benchmarking Universal Single-Copy Orthologs (BUSCO) score (C) was 96.5% (single copy, S:96.1%; duplicated, D:0.4%; fragmented, F:3.0%; and missing, M:0.5%; Table 2). The Indian rhinoceros genome sequence is a potentially useful resource for future molecular evolutionary analyses of mammals.

Table 1

Amount of data generated.

Type of reads	No. of reads	Average Read length	Total data
Short	623,580,225	150 bp	188.3 Gbp
Long	5,536,969	8,236.2 bp	46.9 Gbp

Table 2

General features of the Rhinoceros unicornis genome.

GC content (%)	40.99
Number of contigs	7,615
Number of scaffolds	7,615
Total contig length (bp)	2,375,051,758
N50 contig size (bp)	663,630
Longest sequence (bp)	5,292,610
Shortest sequence (bp)	10,012
Mean sequence length (bp)	311,891
Median sequence length (bp)	156,082
BUSCO score	C:96.5% [S:96.1%, D:0.4%], F:3.0%, M:0.5%, n:233

Amount of data generated. General features of the Rhinoceros unicornis genome.

Experimental Design, Materials and Methods

Sample preparation and sequencing

We sampled Rhinoceros unicornis at Yokohama Municipal Kanazawa Zoo, Yokohama, Japan (NIES ID: 5488M, female). The rhinoceros was born on February 1, 2007 and died from prostration on March 22, 2007. Muscle tissue was autopsied to determine the cause of death. Genomic DNA was extracted from the muscles using proteinase K and phenol/chloroform/isoamyl alcohol for short-read sequencing and a NucleoBond HMW DNA extraction kit for long-read sequencing. Short-read whole-genome sequencing was performed by Macrogen Japan (Tokyo,Japan). Short-read libraries were prepared using a TruSeq LT PCR-free DNA Library Preparation Kit, and sequencing was performed using the NovaSeq 6000 sequencing system, with 2 × 150 bp paired-end reads. Long-read sequencing was performed by GeneBay (Yokohoma, Japan). Long-read libraries were prepared using a Ligation Sequencing Kit, and sequencing was performed using the PromethION system.

De novo genome assembly and assessment

The sequenced reads were assembled using HASLR v. 2020-06a, which utilizes a hybrid assembly approach [7]. Assembly was performed by specifying a minimum long-read coverage of 20 ×, an estimated genome size of 2.4 Gb, and other parameters kept at default settings. The genome assembly was evaluated using BUSCO v5 [8], based on the core vertebrate gene set [9], using the gVolante pipeline [10,11].

Ethics Statement

None.

CRediT Author Statement

Kei Nabeshima: Methodology, Software, Writing– original draft preparation; Nobuyoshi Nakajima: Data curation; Mitsuaki Ogata: Sampling; Manabu Onuma: Conceptualization, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that have or could be perceived to have influenced the work reported in this article.

Subject	Biodiversity
Specific subject area	Genomics
Type of data	Genome sequences and table
How the data were acquired	High-throughput DNA sequencing using NovaSeq 6000 and PromethION platforms
Data format	Raw and assembled genome sequences
Description of data collection	The sample was obtained from the muscle tissue of Rhinoceros unicornis at Yokohama Municipal Kanazawa Zoo, Yokohama, Japan (NIES ID: 5488M, female). Genomic DNA was extracted using proteinase K and phenol/chloroform/isoamyl alcohol for short-read sequencing and a NucleoBond HMW DNA extraction kit (Macherey-Nagel, Düren, Germany) for long-read sequencing. Short-read libraries were prepared using a TruSeq LT PCR-free DNA Library Preparation Kit, and sequencing was performed using the NovaSeq 6000 sequencing system (Illumina, San Diego, CA, USA) with 2 × 150 bp paired-end reads. Long-read libraries were prepared using a Ligation Sequencing Kit, and sequencing was performed using the PromethION system (Oxford Nanopore Technologies, Oxford, UK). The short and long reads were assembled into contigs using the HASLR program, which utilizes a hybrid assembly approach.
Data source location	Tsukuba, Ibaraki, Japan
Data accessibility	Data have been deposited in relevant databases and are publicly available. The sequencing data were deposited in the Sequence Read Archive under accession numbers DRR308100 (https://www.ncbi.nlm.nih.gov/sra/?term=DRR308100) and DRR311486 (https://www.ncbi.nlm.nih.gov/sra/?term=DRR311486). The whole-genome sequence, Rhinoceros unicornis ID: 5488M, was deposited in GenBank under accession number BOSQ00000000 (https://www.ncbi.nlm.nih.gov/nuccore/2085786713). All details regarding genome sequencing data are available at NCBI under BioProject accession number PRJDB11285 (https://www.ncbi.nlm.nih.gov/bioproject/PRJDB11285).

8 in total

8. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes.

Authors: Mosè Manni; Matthew R Berkeley; Mathieu Seppey; Felipe A Simão; Evgeny M Zdobnov
Journal: Mol Biol Evol Date: 2021-09-27 Impact factor: 16.240

8 in total

Draft genome sequence data of Indian rhinoceros, Rhinoceros unicornis.

Specifications Table

Value of the Data

Data Description

Experimental Design, Materials and Methods

Sample preparation and sequencing

De novo genome assembly and assessment

Ethics Statement

CRediT Author Statement

Declaration of Competing Interest

1. Evaluating Genome Assemblies and Gene Models Using gVolante.

2. gVolante for standardizing completeness assessment of genome and transcriptome assemblies.

3. The inflated significance of neutral genetic diversity in conservation genetics.

4. Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation.

Review 5. Conservation of biodiversity in the genomics era.

6. HASLR: Fast Hybrid Assembly of Long Reads.

Review 7. Population genomics for wildlife conservation and management.

8. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes.