| Literature DB >> 31602298 |
Md Bazlur Rahman Mollah1, Mohd Golam Quader Khan2, Md Shahidul Islam3, Md Samsul Alam2.
Abstract
Background: Hilsa shad ( Tenualosa ilisha), a widely distributed migratory fish, contributes substantially to the economy of Bangladesh. The harvest of hilsa from inland waters has been fluctuating due to anthropological and climate change-induced degradation of the riverine habitats. The whole genome sequence of this valuable fish could provide genomic tools for sustainable harvest, conservation and productivity cycle maintenance. Here, we report the first draft genome of T. ilisha from the Bay of Bengal, the largest reservoir of the migratory fish.Entities:
Keywords: Bay of Bengal; Hilsa; SNP; anadromous; whole genome
Mesh:
Year: 2019 PMID: 31602298 PMCID: PMC6774053 DOI: 10.12688/f1000research.18325.1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. The experimental fish and its collection site.
Photograph of a T. ilisha specimen ( a) and a map of Bangladesh showing the sampling site (21.981753 N 90.305556 E) ( b).
Figure 2. Methodology outline.
Schematic diagram illustrating the methodology of whole genome sequencing, de novo assembly and identification of SNPs in T. ilisha from the Bay of Bengal.
Significant matches (blastp e = 0.001) of T. ilisha genes with other vertebrates.
| Species | No. of
| % match |
|---|---|---|
|
| 27062 | 72.26 |
|
| 28373 | 75.76 |
|
| 28325 | 75.63 |
|
| 29339 | 78.34 |
|
| 26480 | 70.71 |
|
| 27497 | 73.42 |
Figure 3. GenomeScope kmer profile plot of the T. ilisha.
Dataset show the fit of the GenomeScope model (black) based on 33-kmers in Illumina HiSeq sequence reads, max kmer coverage at 300× ( a) and 10000× coverage ( b).
Properties of T. ilisha genome estimated at three different kmers [1].
| Properties | Kmer 21 | Kmer 31 | Kmer 33 | |||
|---|---|---|---|---|---|---|
| min | max | min | max | min | max | |
| Genome Haploid
| 649,475,766 | 649,949,877 | 659,441,333 | 659,890,585 | 660,289,984 | 660,728,342 |
| Heterozygosity (%) | 0.654 | 0.660 | 0.590 | 0.594 | 0.579 | 0.583 |
| Genome Repeat
| 88,141,621 | 88,205,963 | 57,904,063 | 57,943,510 | 54,791,371 | 54,827,747 |
| Genome Unique
| 561,334,145 | 561,743,914 | 601,537,270 | 601,947,074 | 605,498,612 | 605,900,595 |
| Read Error Rate (%) | 0.527 | 0.477 | 0.468 | |||
1Kmers are unique subsequences of a sequence of length k. The estimated genome size varies according to kmer value. The estimated haploid genome lengths obtained from kmer 31 and kmer 33 are very close.
Contig and scaffold properties of T. ilisha genome.
| Contig | Scaffold | |||
|---|---|---|---|---|
| Parameters | Value | % | Parameters | Value |
| Read pairs | 769,262,291 | - | Scaffold Number | 100181 |
| Contig Number | 1724390 | - | Mean Scaffold size | 7090 |
| Mean Contig Size | 378 | - | Longest Scaffold | 832708 |
| Median Contig Size | 209 | - | Shortest Scaffold | 200 |
| Longest Contig | 27277 | - | N10 | 254367 |
| Shortest Contig | 100 | - | N30 | 118059 |
| Contig >100bp | 1704389 | 98.84 | N50 | 64157 |
| Contig >500bp | 335422 | 19.45 | N70 | 26438 |
| Contig >1K | 121379 | 7.04 | N90 | 4991 |
| Contig >10K | 58 | 0.00 | N count | 96164104 |
| Contig N50 (bp) | 594 | - | Assembled Genome Size (bp) | 710279582 |
| G+C content % | - | 43.01 | G+C content (%) | 42.95 |
Single nucleotide polymorphism (SNP) and Indels in the T. ilisha genome.
| SNPs / indels | Type | Number | % |
|---|---|---|---|
| Total SNPs | - | 792939 | 100 |
| Transitions | A>G | 256209 | 32.31 |
| C>T | 254042 | 32.04 | |
| Transversions | A>C | 75912 | 9.57 |
| A>T | 78469 | 9.90 | |
| C>G | 55810 | 7.04 | |
| G>T | 72497 | 9.14 | |
| Transition : Transversion | 1.8 : 1 | ||
| Number of Indels | 155574 | ||
Figure 4. Distribution of Indels in the T. ilisha genome.
The indel values ranged from 1 to 60 nucleotides. It shows that the frequency of indels decreased with the increase in size.