| Literature DB >> 31904811 |
Boping Tang1, Daizhen Zhang1, Haorong Li2, Senhao Jiang1, Huabin Zhang1, Fujun Xuan1, Baoming Ge1, Zhengfei Wang1, Yu Liu1, Zhongli Sha3, Yongxu Cheng4, Wei Jiang3, Hui Jiang5,6, Zhongkai Wang2, Kun Wang2, Chaofeng Li1, Yue Sun1, Shusheng She7, Qiang Qiu2, Wen Wang2, Xinzheng Li3, Yongxin Li2, Qiuning Liu1, Yandong Ren2.
Abstract
BACKGROUND: The swimming crab, Portunus trituberculatus, is an important commercial species in China and is widely distributed in the coastal waters of Asia-Pacific countries. Despite increasing interest in swimming crab research, a high-quality chromosome-level genome is still lacking.Entities:
Keywords: zzm321990 Portunus trituberculatuszzm321990 ; chromosome; crab; evolution; genome assembly
Mesh:
Year: 2020 PMID: 31904811 PMCID: PMC6944217 DOI: 10.1093/gigascience/giz161
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Swimming crab, Portunus trituberculatus. The adult male swimming crab collected from Bohai Bay, Hebei Province.
Assembly of swimming crab genome
| Term | Contig phase | Hi-C phase | ||
|---|---|---|---|---|
| Size (bp) | Number | Size (bp) | Number | |
| N90 | 439,683 | 334 | 11,273,125 | 41 |
| N80 | 1,225,551 | 203 | 14,151,211 | 33 |
| N70 | 2,035,154 | 141 | 16,942,622 | 27 |
| N60 | 2,950,146 | 100 | 19,786,189 | 21 |
| N50 | 4,121,416 | 71 | 21,793,880 | 17 |
| Maximum length | 17,984,318 | - | 42,710,960 | - |
| Total length | 1,004,084,521 | - | 1 005,046,021 | - |
| No. ≥100 bp | - | 2,446 | - | 523 |
| No. ≥10 kb | - | 1,756 | - | 314 |
Note: Contig phase represents results assembled by WTDBG software, and Hi-C phase represents scaffold statistics of genome after chromosome assembly.
Figure 2:Genome characteristics of swimming crab. From outer circle to inner circle: gene distribution, tandem repeats (TRP), long tandem repeats (LTR), long interspersed nuclear elements (LINE) and short interspersed nuclear elements (SINE), the DNA elements, and the GC content of the genome.
Quality evaluation of assembled swimming crab genome by BUSCO
| Library | Eukaryota | Metazoa |
|---|---|---|
| Complete BUSCO (C) | 287 | 909 |
| Complete and single-copy BUSCO (S) | 283 | 903 |
| Complete and duplicated BUSCO (D) | 4 | 6 |
| Fragmented BUSCO (F) | 2 | 19 |
| Missing BUSCO (M) | 14 | 50 |
| Total BUSCO groups searched | 303 | 978 |
| Percentage of complete BUSCO (%) | 94.7 | 92.9 |
Statistics on transposable elements in swimming crab genome
| Type | Repbase TEs | TE proteins |
| Combined TEs | ||||
|---|---|---|---|---|---|---|---|---|
| Length (bp) | % in genome | Length (bp) | % in genome | Length (bp) | % in genome | Length (bp) | % in genome | |
| DNA | 131,799,733 | 13.11 | 2,434,533 | 0.24 | 19,288,080 | 1.92 | 149,711,951 | 14.90 |
| LINE | 16,171,649 | 1.61 | 75,759,827 | 7.54 | 131,530,457 | 13.09 | 153,027,744 | 15.23 |
| SINE | 142,878 | 0.01 | 0 | 0 | 0 | 0 | 142,878 | 0.014 |
| LTR | 26,546,055 | 2.64 | 10,195,324 | 1.01 | 18,421,957 | 1.83 | 45,189,365 | 4.50 |
| Other | 89,969,319 | 8.95 | 0 | 0 | 211,157,523 | 21.01 | 230,116,216 | 22.90 |
| Unknown | 34,752 | 0.0035 | 0 | 0 | 90,989,908 | 9.05 | 91,007,921 | 9.06 |
| Total | 213,558,503 | 21.25 | 88,375,336 | 8.79 | 464,908,824 | 46.26 | 525,492,271 | 52.29 |
Figure 3:Annotation quality comparison of protein-coding genes. We compared the messenger RNA (mRNA) length, CDS length, exon length, and intron length among 5 species: P. trituberculatus, A. aegypti, S. mimosarum, D. melanogaster, and P. vannamei.
Functional annotation of predicted protein-coding genes
| Term | Gene number | Percentage (%) |
|---|---|---|
| GO | 8,712 | 51.87 |
| InterPro | 11,691 | 69.61 |
| KEGG | 10,880 | 64.78 |
| SwissProt | 12,558 | 74.77 |
| TrEMBL | 12,256 | 72.97 |
| Annotated | 16,053 | 95.58 |
| Unannotated | 743 | 4.42 |
| Total | 16,796 | 100 |
Figure 4:Gene family analysis of swimming crab. A. Orthologous genes among species. The multiple-copy orthologs are orthologs that have multiple copies in 1 species, the single-copy orthologs are orthologs that have only 1 copy in 1 species, the other orthologs are the rest of the orthologs, the unclustered genes are genes that have no homology with others, and the unique paralogs are genes that only exist in 1 specific species. B. Unique and common gene families among these species, including B. anynana, A. aegypti, D. melanogaster, P. vannamei, E. j. sinensis, S. mimosarum, and P. trituberculatus.
Figure 5:Phylogenetic relationships, divergence time, and evolution rate analysis. A. Phylogenetic relationship and divergence time of species. Red dot represents fossil record used here, and numbers in parentheses indicate 95% confidence interval. B. Relative evolution rate of species.