| Literature DB >> 35141371 |
Kei Nabeshima1, Nobuyoshi Nakajima2, Mitsuaki Ogata3, Manabu Onuma1.
Abstract
The Indian rhinoceros (Rhinoceros unicornis) is a large herbivore found in northern India and southern Nepal. It is a critically endangered species, with an estimated population of approximately 3,600 in the wild. Genetic factors, such as the loss of genetic diversity and the accumulation of deleterious variations, are critical risk factors for the extinction of endangered species, such as the Indian rhinoceros. To support the conservation efforts of the Indian rhinoceros, we assembled its draft genome. The new genomic data will enable the study of functional genes associated with the ecological and physiological characteristics of Indian rhinoceros and help us establish more effective conservation measures. The muscles of an Indian rhinoceros that died from prostration at a zoo were collected, and the samples were stored at the National Institute for Environmental Studies (Tsukuba, Japan). Sequence data were obtained using an Illumina NovaSeq 6000 platform for short reads and an Oxford Nanopore Technologies PromethION for long reads. We generated approximately 235.2 Gbp of data. From these sequences, we assembled a 2,375,051,758 bp genome consisting of 7,615 contigs. The genome data are available from the National Center Biotechnology Information BioProject database under accession number BOSQ00000000.Entities:
Keywords: Hybrid sequencing; Indian rhinoceros; Whole-genome sequence; Wildlife
Year: 2022 PMID: 35141371 PMCID: PMC8814301 DOI: 10.1016/j.dib.2022.107857
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Amount of data generated.
| Type of reads | No. of reads | Average Read length | Total data |
|---|---|---|---|
| Short | 623,580,225 | 150 bp | 188.3 Gbp |
| Long | 5,536,969 | 8,236.2 bp | 46.9 Gbp |
General features of the Rhinoceros unicornis genome.
| GC content (%) | 40.99 |
| Number of contigs | 7,615 |
| Number of scaffolds | 7,615 |
| Total contig length (bp) | 2,375,051,758 |
| N50 contig size (bp) | 663,630 |
| Longest sequence (bp) | 5,292,610 |
| Shortest sequence (bp) | 10,012 |
| Mean sequence length (bp) | 311,891 |
| Median sequence length (bp) | 156,082 |
| BUSCO score | C:96.5% [S:96.1%, D:0.4%], F:3.0%, M:0.5%, n:233 |
| Subject | Biodiversity |
| Specific subject area | Genomics |
| Type of data | Genome sequences and table |
| How the data were acquired | High-throughput DNA sequencing using NovaSeq 6000 and PromethION platforms |
| Data format | Raw and assembled genome sequences |
| Description of data collection | The sample was obtained from the muscle tissue of |
| Data source location | Tsukuba, Ibaraki, Japan |
| Data accessibility | Data have been deposited in relevant databases and are publicly available. The sequencing data were deposited in the Sequence Read Archive under accession numbers DRR308100 ( |