| Literature DB >> 33849527 |
Raúl A González-Pech1,2, Timothy G Stephens3,4, Yibi Chen3,5,6, Amin R Mohamed7,8, Yuanyuan Cheng9,10, Sarah Shah3,5,6, Katherine E Dougan5,6, Michael D A Fortuin5,6, Rémi Lagorce3,11, David W Burt9, Debashish Bhattacharya12, Mark A Ragan3, Cheong Xin Chan13,14,15.
Abstract
BACKGROUND: Dinoflagellates in the family Symbiodiniaceae are important photosynthetic symbionts in cnidarians (such as corals) and other coral reef organisms. Breakdown of the coral-dinoflagellate symbiosis due to environmental stress (i.e. coral bleaching) can lead to coral death and the potential collapse of reef ecosystems. However, evolution of Symbiodiniaceae genomes, and its implications for the coral, is little understood. Genome sequences of Symbiodiniaceae remain scarce due in part to their large genome sizes (1-5 Gbp) and idiosyncratic genome features.Entities:
Keywords: Coral symbionts; Dinoflagellates; Genome evolution; Symbiosis
Mesh:
Year: 2021 PMID: 33849527 PMCID: PMC8045281 DOI: 10.1186/s12915-021-00994-6
Source DB: PubMed Journal: BMC Biol ISSN: 1741-7007 Impact factor: 7.431
The seven Symbiodinium isolates for which de novo genome assemblies were generated in this study
| A1 | A1 | A3 | A4 | A13 | – | A2 | |
| Lifestyle | Symbiotic | Symbiotic | Symbiotic | Symbiotic | Opportunistic | Free-living | Free-living |
| Host or source of origin | (stony coral) | Open ocean | |||||
| Collection site | Hawaii (Pacific) | Florida (Atlantic) | Coral Sea (Pacific) | Bermuda (Atlantic) | Jamaica (Caribbean) | Hawaii (Pacific) | Jamaica (Caribbean) |
| Overall G+C (%) | 51.91 | 50.46 | 51.01 | 50.36 | 50.85 | 51.79 | 48.21 |
| Number of scaffolds | 67,937 | 57,558 | 6245 | 37,772 | 104,583 | 2855 | 48,302 |
| Assembly length (bp) | 813,744,491 | 775,008,844 | 1,103,301,044 | 694,902,460 | 767,953,253 | 761,619,964 | 1,089,424,773 |
| N50 scaffold length (bp) | 42,989 | 49,975 | 651,264 | 58,075 | 14,528 | 610,496 | 62,444 |
| Max. scaffold length (Mbp) | 0.38 | 1.08 | 4.01 | 0.46 | 1.34 | 3.40 | 1.34 |
| Number of contigs | 167,159 | 162,765 | 7913 | 141,380 | 157,685 | 4262 | 142,969 |
| N50 contig length (bp) | 10,400 | 11,136 | 356,695 | 11,147 | 11,420 | 358,021 | 17,506 |
| Max. contig length (Mbp) | 0.15 | 1.05 | 2.96 | 0.19 | 1.34 | 2.90 | 1.34 |
| Gap (%) | 1.15 | 1.44 | 0.02 | 1.35 | 0.56 | 0.02 | 0.79 |
| Estimated genome size (bp) | 1,120,150,369 | 1,052,668,212 | 1,287,259,774 | 914,781,885 | 1,007,022,374 | 740,100,732 | 1,993,912,458 |
| Assembled fraction of genome (%) | 72.65 | 73.62 | 85.71 | 75.96 | 76.26 | 100.03 | 54.64 |
An asterisk (*) denotes a hybrid genome assembly incorporating both short- and long-read sequence data. All other assemblies were generated using short-read sequence data
Fig. 1Genome divergence among Symbiodiniaceae. a Similarity between Symbiodiniaceae (and the outgroup P. glacialis) based on pairwise whole-genome sequence alignments. The colour of the square depicts the average percent identity of the best reciprocal one-to-one aligned regions (I) between each genome pair and the size of the square is proportional to the percent of the query genome that aligned to the reference (Q), as shown in the legend. The tree topologies on the left and bottom indicate the known phylogenetic relationship [26] among the isolates. Isolates in Symbiodinium are highlighted in grey, and their comparisons are highlighted in a bounded box. b Neighbour-joining tree based on 21-mers shared by genomes of Suessiales; branch lengths are proportional to the estimated distances. The shortest and longest distances (d) in the tree, as well as average distances (δ) among representative clades are shown following the bottom-left colour code. ‘Clade BCF’: clade including B. minutum, the two Cladocopium isolates, and F. kawagutii
Fig. 2Conserved synteny of Suessiales genomes. Number of collinear syntenic gene blocks shared by pairs of Suessiales genomes. Gene blocks shared by more than two isolates are not shown
Fig. 3Repeat composition of Suessiales genomes. a Percentage of sequence regions comprising the major classes of repetitive elements, shown for each genome assembly analysed in this study. b Interspersed repeat landscape for each assembled genome. Both a and b follow the colour code shown in the legend. ‘No repeats’ refers to non-repetitive regions of the genome, ‘Unknown’ represents repeats that are not classified into any known types in the RepeatMasker database, including novel repeats
Fig. 4Gene features in Symbiodiniaceae genomes. a Principal component analysis (PCA) based on metrics of predicted genes from the analysed 15 genomes. Data points are coloured by genus and shaped by lifestyle according to the legends to the right. Data points enclosed in a light blue area correspond to isolates with hybrid genome assemblies. Smi: S. microadriaticum, Sne: S. necroappetens, Sli: S. linucheae, Str: S. tridacnidorum, Sna: S. natans, Spi: S. pilosum, Bmi: B. minutum, Cgo: C. goreaui, Csp: Cladocopium sp. C92, Fka: F. kawagutii, Pgl: P. glacialis. Isolate name is shown in subscript for those species with more than one isolate. b Loading plot showing the contribution of the distinct gene metrics employed for the PCA to PC1 and to PC2
Fig. 5Number of gene families and conserved dark genes in Suessiales. a Tree topology shows the phylogenetic relationship of the 15 Suessiales isolates, follows the species tree inferred based on 28,116 gene families containing four or more sequences from any isolates, rooted with P. glacialis as outgroup to Symbiodiniaceae. At each node, the total number of families that include genes from all diverging isolates (and not others) is shown; the proportion of dark genes among all genes in these families are shown as a pie chart. b Number of isolate-specific gene families for each isolate, showing the proportion of families that are dark. c Number of gene families specific to Symbiodiniaceae, Symbiodinium, and other genera
Fig. 6Comparison of S. tridacnidorum and S. natans genomes. a Mapping rate of filtered read pairs generated for each species against the assembled genomes of itself and of the counterpart: S. tridacnidorum (St) versus S. natans (Sn). b Top ten most abundant protein domains recovered, sorted in decreasing relative abundance (from bottom to top) among proteins of St (left) and those of Sn (right). The abundance for each domain in both genomes is shown in each chart for comparison. Domains common among the top ten most abundant for both species are connected with a line between the charts. ‘MORN’: MORN repeat, ‘RCC1’: Regulator of chromosome condensation repeat, ‘RVT’: reverse transcriptase, ‘DUF’: domain of unknown function, ‘PPR’: pentatricopeptide repeat, ‘EFH’: EF-hand, ‘IonTr’: ion transporter, ‘Pkin’: protein kinase, ‘Ank’: ankyrin repeat, ‘DNAmet’: C-5 cytosine-specific DNA methylase. c Contribution of genomic features to the distinct composition of S. tridacnidorum and S. natans genomes, based on the ratio (Δ) of the total length of the implicated sequence region in S. tridacnidorum to the equivalent length in S. natans, shown in log2 scale. The ratio of the estimated genome sizes is shown as reference (marked with a dashed line). The untransformed Δ for each feature is shown in its corresponding bar. A genome feature with Δ greater than the reference likely contributed to the discrepancy of genome sizes. Bars are coloured based on the genome in which they are more abundant as shown in the legend. Pseudogenes are not included in this plot. d Volcano plot comparing gene-family sizes against Fisher’s exact test significance (p value). The colour of the circles indicates the species in which those gene families are larger according to the top-right legend; families recovered only in one genome but not in the other are not shown. The number of gene families with the same ratio and significance is represented with the circle size following the bottom-right legend. Filled circles represent size differences that are considered statistically significant (adjusted p ≤ 0.05)