| Literature DB >> 35627308 |
Dao Minh Hai1,2, Duong Thuy Yen2, Pham Thanh Liem2, Bui Minh Tam2, Do Thi Thanh Huong2, Bui Thi Bich Hang2, Dang Quang Hieu2, Mutien-Marie Garigliany3, Wouter Coppieters4, Patrick Kestemont5, Nguyen Thanh Phuong2, Frédéric Farnir1.
Abstract
The HiFi sequencing technology yields highly accurate long-read data with accuracies greater than 99.9% that can be used to improve results for complex applications such as genome assembly. Our study presents a high-quality chromosome-scale genome assembly of striped catfish (Pangasianodon hypophthalmus), a commercially important species cultured mainly in Vietnam, integrating HiFi reads and Hi-C data. A 788.4 Mb genome containing 381 scaffolds with an N50 length of 21.8 Mb has been obtained from HiFi reads. These scaffolds have been further ordered and clustered into 30 chromosome groups, ranging from 1.4 to 57.6 Mb, based on Hi-C data. The present updated assembly has a contig N50 of 14.7 Mb, representing a 245-fold and 4.2-fold improvement over the previous Illumina and Illumina-Nanopore-Hi-C based version, respectively. In addition, the proportion of repeat elements and BUSCO genes identified in our genome is remarkably higher than in the two previously released striped catfish genomes. These results highlight the power of using HiFi reads to assemble the highly repetitive regions and to improve the quality of genome assembly. The updated, high-quality genome assembled in this work will provide a valuable genomic resource for future population genetics, conservation biology and selective breeding studies of striped catfish.Entities:
Keywords: HiFi reads; chromosome-scale genome assembly; selective breeding; striped catfish
Mesh:
Year: 2022 PMID: 35627308 PMCID: PMC9141817 DOI: 10.3390/genes13050923
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.141
Figure 1Detailed workflow for de novo whole-genome assembly and annotation.
Summary of sequencing data for striped catfish genome assembly.
| Libraries | Insert Size (bp) | Total Data (Gb) | Read Length (bp) | Sequence Coverage (×) * |
|---|---|---|---|---|
| Illumina reads | 350 | 51.2 | 150 | 71.8 |
| Illumina reads | 550 | 49.5 | 150 | 69.3 |
| PacBio (HiFi) reads | 16,400 | 17.28 | 14,791 | 24.2 |
| Total | 118.0 | 165.3 |
* The coverage was calculated according to estimated genome size of 713,911,345 bp.
Summary of the final striped catfish genome assembly.
| Category | Contig | Scaffold | ||
|---|---|---|---|---|
| Length (bp) | Number | Length (bp) | Number | |
| Total | 788,121,403 | 845 | 788,355,903 | 381 |
| Largest | 30,145,618 | NA | 35,439,358 | NA |
| N50 | 14,675,983 | 20 | 21,837,136 | 15 |
| N60 | 11,176,154 | 26 | 19,579,817 | 19 |
| N70 | 6,525,798 | 35 | 17,728,427 | 24 |
| N80 | 2,330,994 | 55 | 14,533,765 | 28 |
| N90 | 459,384 | 139 | 1,263,789 | 54 |
N/A: not applicable.
Statistics of chromosomal level assembly of striped catfish.
| Chr ID | Length (bp) | Chr ID | Length (bp) | Chr ID | Length (bp) |
|---|---|---|---|---|---|
| Chr1 | 57,614,776 | Chr11 | 30,308,490 | Chr21 | 20,241,537 |
| Chr2 | 57,569,486 | Chr12 | 30,160,703 | Chr22 | 20,107,636 |
| Chr3 | 47,895,633 | Chr13 | 27,700,161 | Chr23 | 19,020,310 |
| Chr4 | 40,963,295 | Chr14 | 26,554,853 | Chr24 | 18,006,869 |
| Chr5 | 37,055,990 | Chr15 | 26,362,405 | Chr25 | 16,472,023 |
| Chr6 | 33,881,059 | Chr16 | 25,536,520 | Chr26 | 12,481,112 |
| Chr7 | 33,777,686 | Chr17 | 25,407,408 | Chr27 | 6,262,317 |
| Chr8 | 33,526,538 | Chr18 | 22,657,484 | Chr28 | 2,381,093 |
| Chr9 | 32,744,715 | Chr19 | 22,587,380 | Chr29 | 1,580,235 |
| Chr10 | 32,245,061 | Chr20 | 20,661,286 | Chr30 | 1,373,485 |
| Total chromosome-level length | 783,137,546 | ||||
| Total length | 788,355,903 | ||||
| Chromosome length/total length | 99.3% | ||||
Genome assessment based on BUSCO annotations.
| Index | Number |
|---|---|
| Complete BUSCOs (C) | 3497 |
| Complete and single-copy BUSCOs (S) | 3405 |
| Complete and duplicated BUSCOs (D) | 92 |
| Fragmented BUSCOs (F) | 35 |
| Missing BUSCOs (M) | 108 |
| Total BUSCO groups searched (n) | 3640 |
| C: 96.0% [S: 93.5%, D: 2.5%], F: 1.0%, M: 3.0%, n: 3640 | |
Comparison of the genome assemblies of various Siluriformes species.
| Species | Genome Size (Mb) | Number of Contig | Number |
|---|---|---|---|
| 788.4 | 845 | 14.7 | |
| 883.3 | 44,869 | 0.05 | |
| 821.7 | 78,047 | 0.02 | |
| 1002.3 | 5816 | 2.7 | |
| 1030 | 169,048 | 0.007 | |
| 718.1 | 27,068 | 0.08 | |
| 570.8 | 928 | 1.9 | |
| 713.8 | 2402 | 1.0 |
Comparison of quality metrics of this study and the previous striped catfish genome assemblies.
| Genomic Feature | This Study | Gao et al. [ | Kim et al. [ |
|---|---|---|---|
| The size of genome (Mb) | 788.4 | 742.6 | 715.7 |
| Number of contigs | 845 | 821 | 23,340 |
| Contig N50 (Mb) | 14.7 | 3.5 | 0.06 |
| Longest contig (Mb) | 30.1 | 16.1 | 0.5 |
| GC content (%) | 38.9 | 38.9 | 38.7 |
| Repetitive regions (%) | 39.1 | 36.9 | 33.8 |
| Complete BUSCOs (C) (%) | 96.0 | 93.3 | 92.3 |