| Literature DB >> 32090250 |
Sheng-Yong Xu1, Na Song2, Shi-Jun Xiao3, Tian-Xiang Gao1.
Abstract
The marbled rockfish Sebastiscus marmoratus is an ecologically and economically important marine fish species distributed along the northwestern Pacific coast from Japan to the Philippines. Here, next-generation sequencing was used to generate a whole genome survey dataset to provide fundamental information of its genome and develop genome-wide microsatellite markers for S. marmoratus. The genome size of S. marmoratus was estimated as approximate 800 Mb by using K-mer analyses, and its heterozygosity ratio and repeat sequence ratio were 0.17% and 39.65%, respectively. The preliminary assembled genome was nearly 609 Mb with GC content of 41.3%, and the data were used to develop microsatellite markers. A total of 191,592 microsatellite motifs were identified. The most frequent repeat motif was dinucleotide with a frequency of 76.10%, followed by 19.63% trinucleotide, 3.91% tetranucleotide, and 0.36% pentanucleotide motifs. The AC, GAG, and ATAG repeats were the most abundant motifs of dinucleotide, trinucleotide, and tetranucleotide motifs, respectively. In summary, a wide range of candidate microsatellite markers were identified and characterized in the present study using genome survey analysis. High-quality whole genome sequence based on the "Illumina+PacBio+Hi-C" strategy is warranted for further comparative genomics and evolutionary biology studies in this species.Entities:
Keywords: genome size; genome survey; marbled rockfish; microsatellite marker
Mesh:
Substances:
Year: 2020 PMID: 32090250 PMCID: PMC7040462 DOI: 10.1042/BSR20192252
Source DB: PubMed Journal: Biosci Rep ISSN: 0144-8463 Impact factor: 3.840
Figure 1Overview of the experimental design and analysis pipeline
Quality control information of Illumina sequencing data
| Lib ID | Raw data (bp) | Clean data (bp) | Effective rate (%) | Error rate (%) | Q20 | Q30 | GC content (%) |
|---|---|---|---|---|---|---|---|
| DES_L5 | 35,057,094,600 | 34,843,246,022 | 99.39 | 0.02; 0.03 | 98.02; 95.13 | 94.91; 89.18 | 42.86; 43.04 |
Note: The two statistics of error rate, Q20, Q30, and GC content were for pair-end read 1 and read 2, respectively.
Figure 2Distribution figure of error rate, sequencing quality and GC content of raw data
Figure 3K-mer (21-mer) analysis for estimating the genome size of S. marmoratus
The X-axis is depth and the Y-axis is the proportion that represents the frequency at that depth. Data produced from 350 bp insert library. The peak K-mer frequency was 38.
The result of assembly in S. marmoratus using 34.8-Gb Illumina clean data
| Contigs | Scaffolds | |||
|---|---|---|---|---|
| Size (bp) | Number | Size (bp) | Number | |
| N90 | 145 | 995,699 | 1117 | 179,431 |
| N80 | 241 | 680,067 | 1877 | 128,640 |
| N70 | 373 | 485,655 | 2589 | 94,551 |
| N60 | 518 | 353,199 | 3413 | 69,208 |
| N50 | 674 | 254,586 | 4362 | 49,699 |
| Total size | 583,830,195 | – | 609,456,819 | – |
| GC content | 41.39% | – | 41.30% | – |
| Total number (>100 bp) | 1,467,661 | 412,901 | ||
| Total number (>1 kb) | 127,823 | 188,316 | ||
Figure 4The distribution and frequency of microsatellite motifs
(A) Frequency of different microsatellite repeat types. ALL, all of the identified microsatellites, PAL, potentially amplifiable loci. (B) Frequency of different dinucleotide microsatellite motifs. (C) Frequency of different trinucleotide microsatellite motifs. (D) Frequency of different tetranucleotide microsatellite motifs.