| Literature DB >> 31940756 |
Tangcheng Li1,2, Liying Yu1, Bo Song3, Yue Song4, Ling Li1, Xin Lin1, Senjie Lin1,2,5.
Abstract
Cataloging an accurate functional gene set for the Symbiodiniaceae species is crucial for addressing biological questions of dinoflagellate symbiosis with corals and other invertebrates. To improve the gene models of Fugacium kawagutii, we conducted high-throughput chromosome conformation capture (Hi-C) for the genome and Illumina combined with PacBio sequencing for the transcriptome to achieve a new genome assembly and gene prediction. A 0.937-Gbp assembly of F. kawagutii were obtained, with a N50 > 13 Mbp and the longest scaffold of 121 Mbp capped with telomere motif at both ends. Gene annotation produced 45,192 protein-coding genes, among which, 11,984 are new compared to previous versions of the genome. The newly identified genes are mainly enriched in 38 KEGG pathways including N-Glycan biosynthesis, mRNA surveillance pathway, cell cycle, autophagy, mitophagy, and fatty acid synthesis, which are important for symbiosis, nutrition, and reproduction. The newly identified genes also included those encoding O-methyltransferase (O-MT), 3-dehydroquinate synthase, homologous-pairing protein 2-like (HOP2) and meiosis protein 2 (MEI2), which function in mycosporine-like amino acids (MAAs) biosynthesis and sexual reproduction, respectively. The improved version of the gene set (Fugka_Geneset _V3) raised transcriptomic read mapping rate from 33% to 54% and BUSCO match from 29% to 55%. Further differential gene expression analysis yielded a set of stably expressed genes under variable trace metal conditions, of which 115 with annotated functions have recently been found to be stably expressed under three other conditions, thus further developing the "core gene set" of F. kawagutii. This improved genome will prove useful for future Symbiodiniaceae transcriptomic, gene structure, and gene expression studies, and the refined "core gene set" will be a valuable resource from which to develop reference genes for gene expression studies.Entities:
Keywords: Fugacium kawagutii; Hi-C; RNA-seq; Symbiodiniaceae; core genes; gene set; genome
Year: 2020 PMID: 31940756 PMCID: PMC7023079 DOI: 10.3390/microorganisms8010102
Source DB: PubMed Journal: Microorganisms ISSN: 2076-2607
Sequenced genomes of Symbiodiniaceae.
| Species | Assembled Genome Size (M) | % Genome Assembled | Gene No. | Scaffold N50 (kbp) | Gene Average Length (+ Intron) | Gene Supported by EST *(%) | Reference |
|---|---|---|---|---|---|---|---|
|
| 616 | 41.07 | 41,925 | 126.2 | 11,959 | 77.20 | [ |
|
| 808 | 73.45/57.71 | 49,109 | 573.5 | 12,898 | 76.30 | [ |
|
| 1030 | 85.55 | 35,913 | 98.9 | 6967 | 67.02 | [ |
|
| 767 | NA | 69,018 | 133.4 | 8834 | 67.50 | [ |
|
| 705 | NA | 65,850 | 248.9 | 8192 | 62.50 | [ |
|
| 935 | 79.24 | 36,850 | 381 | 3788 | 72.82 | [ |
| 1050 | 88.98 | 26,609 | 268.8 | 6507 | 64.40 | [ | |
| 937 | 79.41 | 45,192 | 13,533.5 | 7242 | 90.09 | This study |
* EST: Expressed Sequence Tag; NA: Not available.
RNA-Seq sample information of F. kawagutii.
| Project | Methodology | Growth Condition | Clean Read Data | Mapping Rate (Genome, Geneset) | ||
|---|---|---|---|---|---|---|
| V1 * | V2 ** | V3 *** | ||||
|
| BGI RNA-seq | Normal | 329 M | 75%, 33% | 69%, 33% | 69%, 54% |
|
| Illumina NGS | Mix of 1# | 10 Gbp | 88%, 25% | 87%, 29% | 87%, 50% |
|
| Pacbio Sequel | Mix of 1# | 15 Gbp | 99%, NA | 99%, NA | 99%, NA |
V1 *: Fugka_Geneset_V1 (first version of the genome, [9]); V2 **: Fugka_Geneset_V2 (second revision of the genome, [10]); V3 ***: Fugka_Geneset_V3 (third revision of the genome, This study); NA: Not available.
Figure 1Physical distribution and expression profile of genes in the longest (121 Mbp) Scaffold in F. kawagutii. The circles from the outside to the inside mark scaffold length (in Mbp), gene location, and gene expression (bar height) under Zn-, Cu-, Fe-, Mn-, and Ni-deficient as well as normal conditions.
BUSCO (429 orthologs protein) evaluation of gene set completeness.
| BUSCO | V1 | V2 | V3 |
|---|---|---|---|
|
| 68 | 126 | 141 |
|
| 56 | 112 | 130 |
|
| 12 | 14 | 11 |
|
| 57 | 67 | 92 |
|
| 304 | 236 | 196 |
|
| 29% | 45% | 55% |
Figure 2Expression of newly identified genes and significantly enriched pathways: (A) Bar chart of gene expression. Blue, transcripts per million (TPM) < 1; pink, TPM > 1. (B) Comparison of expression profiles between newly identified genes and all genes in Fugka_Geneset_V3. (C) Significantly enriched KEGG pathways of newly identified genes (adjust p-value < 0.05).
Figure 3GO annotation and KEGG pathway enrichment of core genes: (A) GO annotation category (level 1 Go terms). (B) KEGG pathway enrichment with the p-value cutoff = 0.05. Dot size represents enriched DEGs count; color strength represents the p-value (from lowest in red to highest in blue).
Figure 4Housekeeping gene expression in nutrient-replete and +1/5Fe, –Mn, +1/5Zn, –Cu, –Ni deficient conditions: (A) Boxes of seven housekeeping gene expressions. (B) The stability and ranking of the selected genes shown from A calculated by geNorm. TUB: tubulin; GLU: beta-glucanase; CHA: chaperone protein; 40S: 40S ribosomal protein S4; RDR: ribonucleoside-diphosphate reductase.
Figure A1Comparison of multi-gene combinations to determine the optimal number of reference genes required for effective normalizing gene expression. The pairwise variation (Vn/Vn + 1, where n represent number of genes) between the normalization factors NFn and NFn + 1 was analyzed using geNorm. All the combinations gave a normalization factor value below the upper threshold 0.15, indicating that two-gene combinations are sufficient.