| Literature DB >> 35933561 |
Kuo He1, Liulan Zhao1, Zihao Yuan2,3, Adelino Canario4, Qiao Liu1, Siyi Chen1, Jiazhong Guo1, Wei Luo1, Haoxiao Yan1, Dongmei Zhang1, Lisen Li4, Song Yang5.
Abstract
The largemouth bass (Micropterus salmoides) has become a cosmopolitan species due to its widespread introduction as game or domesticated fish. Here a high-quality chromosome-level reference genome of M. salmoides was produced by combining Illumina paired-end sequencing, PacBio single molecule sequencing technique (SMRT) and High-through chromosome conformation capture (Hi-C) technologies. Ultimately, the genome was assembled into 844.88 Mb with a contig N50 of 15.68 Mb and scaffold N50 length of 35.77 Mb. About 99.9% assembly genome sequences (844.00 Mb) could be anchored to 23 chromosomes, and 98.03% assembly genome sequences could be ordered and directed. The genome contained 38.19% repeat sequences and 2693 noncoding RNAs. A total of 26,370 protein-coding genes from 3415 gene families were predicted, of which 97.69% were functionally annotated. The high-quality genome assembly will be a fundamental resource to study and understand how M. salmoides adapt to novel and changing environments around the world, and also be expected to contribute to the genetic breeding and other research.Entities:
Mesh:
Year: 2022 PMID: 35933561 PMCID: PMC9357066 DOI: 10.1038/s41597-022-01601-1
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Fig. 1The pipelines overview of the largemouth bass chromosome-level genome assembly. Chrs: chromosomes.
Fig. 2K‐mer distribution of M. salmoides genome sequencing reads. The K-mers distribution (K = 19) was constructed using 350 bp library data. A total of 49,157,214,151 K-mers were used for genomic length estimation after the removal of the K-mers with abnormal depth. The peak 19‐mer depth was 56, and the genome size was calculated as 49,157,214,151/56 = 874.14 Mb.
M. salmoides genome sequencing statistics.
| Library | Sequencing platform | Clean data (Gb) | Depth (×) | Contig N50 (Mb) | GC content (%) | Q20 (%) | Q30 (%) | Genome size (Mb) |
|---|---|---|---|---|---|---|---|---|
| Short reads | Illumina NovaSeq 6000 | 58.51 | 66.94 | — | 40.88 | 96.63 | 91.36 | 874.14 |
| Long reads | PacBio Sequel | 94.69 | 112.07 | 15.68 | 40.78 | — | — | 844.88 |
| Hi-C | Illumina NovaSeq 6000 | 77.53 | 94.06 | 15.30 | 40.78 | 97.59 | 93.49 | 844.00 |
The sequence distribution of each chromosome using Hi-C technology.
| Group | Cluster Num | Cluster Len | Order Num | Order Len |
|---|---|---|---|---|
| Chr01 | 5 | 40,821,207 | 4 | 40,732,462 |
| Chr02 | 15 | 42,659,052 | 9 | 42,039,393 |
| Chr03 | 7 | 37,588,897 | 6 | 37,343,944 |
| Chr04 | 13 | 40,393,715 | 9 | 39,732,765 |
| Chr05 | 15 | 39,747,164 | 6 | 38,411,921 |
| Chr06 | 10 | 36,025,099 | 6 | 35,600,334 |
| Chr07 | 9 | 34,881,373 | 6 | 34,516,066 |
| Chr08 | 2 | 37,271,896 | 2 | 37,271,896 |
| Chr09 | 5 | 37,188,422 | 4 | 37,114,295 |
| Chr10 | 4 | 36,011,566 | 3 | 35,768,921 |
| Chr11 | 11 | 33,902,165 | 5 | 33,113,071 |
| Chr12 | 15 | 35,527,541 | 8 | 34,268,756 |
| Chr13 | 5 | 33,494,735 | 4 | 33,265,410 |
| Chr14 | 11 | 34,134,741 | 8 | 33,564,293 |
| Chr15 | 24 | 37,902,394 | 11 | 35,937,762 |
| Chr16 | 9 | 32,104,916 | 6 | 31,675,598 |
| Chr17 | 7 | 32,964,910 | 4 | 32,674,911 |
| Chr18 | 26 | 34,562,858 | 13 | 33,055,325 |
| Chr19 | 31 | 41,218,652 | 16 | 38,871,204 |
| Chr20 | 6 | 32,259,510 | 5 | 32,214,040 |
| Chr21 | 3 | 28,886,792 | 3 | 28,886,792 |
| Chr22 | 18 | 56,175,891 | 7 | 54,208,627 |
| Chr23 | 15 | 28,271,698 | 8 | 27,127,050 |
| Total (Ratio %) | 266 (97.08) | 843995194 (99.9) | 153 (57.52) | 827394836 (98.03) |
Note: Chr01-23 represent 23 chromosomes; Cluster Num: the number of sequences located on a chromosome; Cluster Len: the length of sequence located on a chromosome; Order Num: the number of sequences of the direction can be determined; Order Len: the sequence length of the direction can be determined.
Fig. 3Hi-C assembly of chromosome interactive heat map. Chr01 - Chr23 are the abbreviations of 23 Chromosome. The abscissa and ordinate represent the order of each bin on the corresponding chromosome group. The colour block illuminates the intensity of interaction from yellow (low) to red (high).
The repeat sequence statistics of assembled genome.
| Type | Number | Length (bp) | Rate (%) |
|---|---|---|---|
| Class I: Retroelement | 522983 | 121796357 | 14.42 |
| DIRS | 20880 | 6630621 | 0.78 |
| LINE | 234381 | 60698831 | 7.18 |
| LTR/Caulimovirus | 88 | 7962 | 0.00 |
| LTR/Copia | 12013 | 2607825 | 0.31 |
| LTR/ERV | 47996 | 6147308 | 0.73 |
| LTR/Gypsy | 103367 | 26357115 | 3.12 |
| LTR/Ngaro | 16775 | 2880533 | 0.34 |
| LTR/Pao | 13115 | 2271034 | 0.27 |
| LTR/Unknown | 37371 | 10223745 | 1.21 |
| LTR/Viper | 61 | 3732 | 0.00 |
| SINE | 36936 | 3967651 | 0.47 |
| Class II: DNA transposon | 1033511 | 198005683 | 23.44 |
| Academ | 1422 | 193616 | 0.02 |
| CACTA | 77241 | 10508494 | 1.24 |
| Crypton | 16505 | 2182911 | 0.26 |
| Dada | 7317 | 1072578 | 0.13 |
| Ginger | 4624 | 459506 | 0.05 |
| Helitron | 24163 | 10814282 | 1.28 |
| IS3EU | 3766 | 495816 | 0.06 |
| Kolobok | 31541 | 7047090 | 0.83 |
| MITE | 33 | 1774 | 0.00 |
| Maverick | 6547 | 1489673 | 0.18 |
| Merlin | 3162 | 472124 | 0.06 |
| Mutator | 6833 | 746138 | 0.09 |
| Novosib | 12838 | 1115265 | 0.13 |
| P | 15516 | 3779929 | 0.45 |
| PIF-Harbinger | 72041 | 15425212 | 1.83 |
| PiggyBac | 14753 | 2257482 | 0.27 |
| Sola | 7027 | 701813 | 0.08 |
| Stowaway | 1 | 57 | 0.00 |
| Tc1-Mariner | 115093 | 29999050 | 3.55 |
| Unknown | 128215 | 21996228 | 2.60 |
| Zator | 1443 | 282595 | 0.03 |
| Zisupton | 26754 | 4017294 | 0.48 |
| hAT | 456676 | 82946756 | 9.82 |
| Satellite | 4604 | 769122 | 0.09 |
| Unknown | 11211 | 2133201 | 0.25 |
| Total | 1572309 | 322704363 | 38.19 |
Note: Type: the type of repetitive sequence (Class I: retrotransposons; Class II: DNA transposon); Number: the number of repetitive sequences; Length: the total length of predicted repetitive sequences; Rate (%): the proportion of repetitive sequences in the total genome.
The basic information statistics of assembled genome.
| Item | Count |
|---|---|
| Gene Number | 26,370 |
| Gene Length (bp) | 381,932,021 |
| Average Gene Length (bp) | 14,483.58 |
| Exon Length (bp) | 68,599,926 |
| Average Exon Length (bp) | 2,601.44 |
| Exon Number | 260,466 |
| Average Exon Number | 9.88 |
| CDS Length (bp) | 45,485,238 |
| Average CDS Length (bp) | 1,724.89 |
| CDS Number | 253,748 |
| Average CDS Number | 9.62 |
| Intron Length (bp) | 313,332,095 |
| Average Intron Length (bp) | 11,882.14 |
| Intron Number | 234,096 |
| Average Intron Number | 8.88 |
| Measurement(s) | genome assemble |
| Technology Type(s) | Illumina paired-end sequencing, PacBio single molecule sequencing technique (SMRT) and High-through chromosome conformation capture (Hi-C) technologies |
| Sample Characteristic - Organism | Micropterus floridanus |