| Literature DB >> 35052381 |
Ran Meng1,2,3, Lin Zhang3, Chengxu Zhou1,2, Kai Liao3, Peng Xiao3, Qijun Luo3, Jilin Xu3, Yanze Cui4, Xiaodi Hu4, Xiaojun Yan3,5.
Abstract
Chrysotila is a genus of coccolithophores. Together with Emiliania, it is one of the representative genera in the Haptophyta which have been extensively studied. They are photosynthetic unicellular marine algae sharing the common characteristic of the production of CaCO3 platelets (coccoliths) on the surface of their cells and are crucial contributors to global biogeochemical cycles. Here, we report the genome assembly of Chrysotila roscoffensis. The assembled genome size was ~636 Mb distributed across 769 scaffolds with N50 of 1.63 Mb, and maximum contig length of ~2.6 Mb. Repetitive elements accounted for approximately 59% of the genome. A total of 23,341 genes were predicted from C. roscoffensis genome. The divergence time between C. roscoffensis and Emiliania huxleyi was estimated to be around 537.6 Mya. Gene families related to cytoskeleton, cellular motility and morphology, and ion transport were expanded. The genome of C. roscoffensis will provide a foundation for understanding the genetic and phenotypic diversification and calcification mechanisms of coccolithophores.Entities:
Keywords: Chrysotila roscoffensis; calcification; coccolithophores; phenotypic diversification
Mesh:
Year: 2021 PMID: 35052381 PMCID: PMC8775090 DOI: 10.3390/genes13010040
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Microscopic images of Chrysotila roscoffensis (strain NMBjih026-8). (a) motile coccolith-bearing cell, showing two flagellates (arrow) and coccolith (arrow head). (b) nonmotile coccolith-bearing cells. (c) non-calcified filamentous colonies. (d) scanning electron microscope (SEM) image of coccolith-bearing cell. (e) SEM image of coccoliths. (f) transmission electron microscope (TEM) image of coccolith-bearing cell. Chl: chloroplast; G: Golgi apparatus; M: mitochondrion; N: nucleus; P: pyrenoid; and V: vacuole.
Figure 217 K-mer analysis for estimating the genome size of C. roscoffensis. The distribution of 17-mer was calculated using jellyfish (version2.1.3) based on the sequencing data from short insert size libraries and the genome size was estimated based on the formula: genome size = total_kmer_num / kmer_depth, where total_kmer_num is the total number of K-mer and kmer_depth indicates the peak position on the K-mer frequency distribution map. Heterozygous peak indicates the genome heterozygosity, repeat peak represents the repeat rate of the genome.
Survey statistic results of C. roscoffensis.
| Species | Total Base (Gb) | K-Mer | K-Mer Number | K-Mer Depth | Genome Size (Mb) | Revised Genome Size (Mb) | Heterozygous Ratio (%) | Repeat Ratio (%) |
|---|---|---|---|---|---|---|---|---|
|
| 34.24 | 17 | 26,900,644,184 | 39 | 689.76 | 674.07 | 0.64 | 69.45 |
Sequencing data statistics of C. roscoffensis.
| Pair-End Libraries | Insert Size | Total Data (G) | Read Length (bp) | Sequence Coverage (X) |
|---|---|---|---|---|
| Illumina reads | 350 bp | 35.33 | 150 | 52.41 |
| Pacbio reads | 53.12 | 78.80 | ||
| 10X Genomics | 93.22 | 150 | 138.29 | |
| Total | 181.67 | 269.51 |
Summary of the final genome assembly of C. roscoffensis.
| Sample ID | Length | Number | ||
|---|---|---|---|---|
| Contig ** (bp) | Scaffold (bp) | Contig ** | Scaffold | |
| Total | 629,886,791 | 635,699,922 | 2167 | 769 |
| Max | 2,590,224 | 12,677,996 | ||
| Number ≥ 2000 | 2167 | 769 | ||
| N50 | 441,430 | 1,631,423 | 434 | 111 |
| N60 | 354,170 | 1,228,002 | 593 | 156 |
| N70 | 281,606 | 954,517 | 791 | 215 |
| N80 | 208,186 | 651,419 | 1053 | 296 |
| N90 | 141,820 | 391,115 | 1414 | 420 |
** Contig after scaffolding.
Summary of repeat contents in C. roscoffensis genome.
| Type | Repeat Size | % of Genome |
|---|---|---|
| Trf | 74,813,341 | 11.493833 |
| Repeatmasker | 327,015,645 | 50.240549 |
| Proteinmask | 67,002,054 | 10.293758 |
| Total | 381,019,300 | 58.537318 |
Statistics of transposable element (TE) classification in C. roscoffensis genome.
| Denovo + Repbase | TE Proteins | Combined TEs | ||||
|---|---|---|---|---|---|---|
| Length | % in Genome | Length | % in Genome | Length | % in Genome | |
| DNA | 33,824,343 | 5.196551 | 4,008,971 | 0.615912 | 36,809,695 | 5.655201 |
| LINE | 7,142,576 | 1.097339 | 2,411,479 | 0.370484 | 8,374,515 | 1.286606 |
| SINE | 196,696 | 0.030219 | 0 | 0 | 196,696 | 0.030219 |
| LTR | 236,201,808 | 36.288504 | 60,676,043 | 9.321871 | 241,112,694 | 37.042981 |
| Other | 0 | 0 | 0 | 0 | 0 | 0 |
| Satellite | 3,083,747 | 0.473767 | 0 | 0 | 3,083,747 | 0.473767 |
| Simple_repeat | 25,608,316 | 3.934295 | 0 | 0 | 25,608,316 | 3.934295 |
| Unknown | 31,651,266 | 4.862694 | 0 | 0 | 31,651,266 | 4.862694 |
| Total | 327,015,645 | 50.240549 | 67,002,054 | 10.293758 | 331,759,778 | 50.969407 |
Basic statistical results of gene structure prediction of C. roscoffensis genome.
| Gene Set | Number | Average Gene Length (bp) | Average CDS Length (bp) | Average Exons Per Gene | Average Exon Length (bp) | Average Intron Length (bp) | |
|---|---|---|---|---|---|---|---|
|
| Augustus | 43,490 | 3611.96 | 1504.35 | 4.12 | 365.32 | 675.96 |
| GlimmerHMM | 313,490 | 1985.29 | 1123.67 | 3.85 | 292.11 | 302.67 | |
| SNAP | 102,913 | 1468.91 | 842.4 | 2.1 | 401.8 | 571.35 | |
| Geneid | 104,522 | 2507.4 | 1130.75 | 2.74 | 412.95 | 791.97 | |
| Genscan | 55,474 | 8837.72 | 2586.56 | 8.02 | 322.45 | 890.29 | |
| Homolog |
| 21,246 | 1339.63 | 695.35 | 1.75 | 397.92 | 861.93 |
|
| 5755 | 1577.79 | 782.52 | 2.07 | 377.49 | 741.18 | |
|
| 12,700 | 1608.62 | 938.93 | 1.92 | 489.67 | 729.92 | |
|
| 5117 | 1463.96 | 732.1 | 1.98 | 369.66 | 746.45 | |
|
| 13,333 | 922.29 | 609.61 | 1.47 | 413.34 | 658.52 | |
|
| 13,684 | 1312.01 | 892.26 | 1.47 | 609.02 | 902.56 | |
| RNA-seq | Cufflinks | 43,799 | 7548.43 | 2585.25 | 6.42 | 402.61 | 915.52 |
| PASA | 76,439 | 3568.24 | 1093.39 | 4.32 | 253.27 | 746.1 | |
| EVM | 47,323 | 3839.76 | 1523.32 | 4.34 | 351 | 693.55 | |
| PASA-update | 46,875 | 3848.09 | 1550.63 | 4.33 | 357.92 | 689.43 | |
| Final set | 23,341 | 5013.31 | 1596.61 | 5.75 | 277.68 | 719.32 | |
The statistical results of gene function annotation of C. roscoffensis genome.
| Database | Annotated Num | Annotated Percent (%) | |
|---|---|---|---|
|
| 16,841 | 72.2 | |
| Swiss-Prot | 11,919 | 51.1 | |
| KEGG | 11,807 | 50.6 | |
| InterPro | All | 23,179 | 99.3 |
| Pfam | 12,799 | 54.8 | |
| GO | 21,194 | 90.8 | |
| Annotated | 23,216 | 99.5 | |
| Total | 23,341 | - | |
Figure 3The distribution of genes in Aureococcus anophagefferens, Arabidopsis thaliana, Bigelowiella natans, Chondrus crispus, Chlamydomonas eustigma, Chromochloris zofingiensis, Chlamydomonas reinhardtii, Chlorella sorokiniana, Emiliania huxleyi, Galdieria sulphuraria, Micromonas pusilla, Oryza sativa, Chrysotila roscoffensis, Phaeodactylum tricornutum, Porphyra umbilicalis, Saccharina japonica, Symbiodinium microadriaticum, Thalassiosira oceanica, Thalassiosira pseudonana and Chara braunii.
Figure 4Common and unique gene families in five groups. Venn diagram showing comparison of shared and unique protein-coding genes among Chrysotila roscoffensis, Emiliania huxleyi, Thalassiosira pseudonana, Thalassiosira oceanica, and Saccharina japonica based on orthology analysis.