| Literature DB >> 31581597 |
Zhiyuan Li1, Changxu Tian2,3, Yang Huang4,5, Xinghua Lin6, Yaorong Wang7, Dongneng Jiang8,9, Chunhua Zhu10,11, Huapu Chen12,13, Guangli Li14,15.
Abstract
Sillago sihama has high economic value and is one of the most attractive aquaculture species in China. Despite its economic importance, studies of its genome have barely been performed. In this study, we conducted a first genomic survey of S. sihama using next-generation sequencing (NGS). In total, 45.063 Gb of high-quality sequence data were obtained. For the 17-mer frequency distribution, the genome size was estimated to be 508.50 Mb. The sequence repeat ratio was calculated to be 21.25%, and the heterozygosity ratio was 0.92%. Reads were assembled into 1,009,363 contigs, with a N50 length of 1362 bp, and then into 814,219 scaffolds, with a N50 length of 2173 bp. The average Guanine and Cytosine (GC) content was 45.04%. Dinucleotide repeats (56.55%) were the dominant form of simple sequence repeats (SSR).Entities:
Keywords: Genome size; Guanine and Cytosine (GC) content; Sillago sihama; simple sequence repeat (SSR)
Year: 2019 PMID: 31581597 PMCID: PMC6827152 DOI: 10.3390/ani9100756
Source DB: PubMed Journal: Animals (Basel) ISSN: 2076-2615 Impact factor: 2.752
Statistics of S. sihama genome sequencing data.
| Library | Insert Size (bp) | Raw Base | Effective Rate | Clean Base | Error Rate | Q20 1 | Q302 | GC Content |
|---|---|---|---|---|---|---|---|---|
| Specimen 1 | 350 | 54,836,979,600 | 99.98 | 45,063,446,400 | 0.03 | 95.93 | 90.81 | 45.03 |
| Specimen 2 | 350 | 54,451,684,200 | 99.74 | 38,583,415,200 | 0.03 | 95.75 | 90.44 | 45.36 |
1 Q20: The ratio of data with accuracy above 99% in total data. 2 Q30: The ratio of data with accuracy above 99.9% in total data.
Figure 1K-mer (k = 17) analysis for estimation of the genome size of S. sihama (specimen 1).
Estimation of S. sihama (specimen 1) genome based on K-mer statistics.
| Identity | K-mer | K-mer | K-mer Number | Genome Size | Revised Genome | Heterozygous | Repeat |
|---|---|---|---|---|---|---|---|
| Specimen 1 | 17 | 70 | 36,648,430,961 | 523.55 | 508.50 | 0.92 | 21.25 |
Statistics of S. sihama (specimen 1) assembled genome sequences.
| Identity | Total Length | Total Number | Max Length | N50 Length | N90 Length | |
|---|---|---|---|---|---|---|
| Contig | Specimen 1 | 559,219,807 | 1,009,363 | 46,417 | 1362 | 171 |
| Scaffold | Specimen 1 | 568,556,466 | 814,219 | 72,953 | 2173 | 219 |
Figure 2GC content and average sequencing depth of S. sihama (specimen 1) genome data used for assembly. For the spot graphs, the x-axis is GC content and the y-axis is sequencing depth. For the bar graphs, the x-axis is sequencing depth distribution and the y-axis is GC content distribution.
Simple Sequence Repeat (SSR) distribution statistics for S. sihama (specimen 1).
| Statistics | Di- | Tri- | Tetra- | Penta- | Hexa- |
|---|---|---|---|---|---|
| SSR number | 84,406 | 50,420 | 11,361 | 2200 | 870 |
| Percentage | 56.55% | 33.78% | 7.61% | 1.47% | 0.58% |
Figure 3Ratio of different SSRs in S. sihama (specimen 1).