| Literature DB >> 30464192 |
Timothy G Stephens1, Mark A Ragan1, Debashish Bhattacharya2, Cheong Xin Chan3,4.
Abstract
Dinoflagellates are a diverse group of unicellular primary producers and grazers that exhibit some of the most remarkable features known among eukaryotes. These include gigabase-sized nuclear genomes, permanently condensed chromosomes and highly reduced organelle DNA. However, the genetic inventory that allows dinoflagellates to thrive in diverse ecological niches is poorly characterised. Here we systematically assess the functional capacity of 3,368,684 predicted proteins from 47 transcriptome datasets spanning eight dinoflagellate orders. We find that 1,232,023 proteins do not share significant sequence similarity to known sequences, i.e. are "dark". Of these, we consider 441,006 (13.1% of overall proteins) that are found in multiple taxa, or occur as alternative splice variants, to comprise the high-confidence dark proteins. Even with unknown function, 43.3% of these dark proteins can be annotated with conserved structural features using an exhaustive search against available data, validating their existence and importance. Furthermore, these dark proteins and their putative homologs are largely lineage-specific and recovered in multiple taxa. We also identified conserved functions in all dinoflagellates, and those specific to toxin-producing, symbiotic, and cold-adapted lineages. Our results demonstrate the remarkable divergence of gene functions in dinoflagellates, and provide a platform for investigations into the diversification of these ecologically important organisms.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30464192 PMCID: PMC6249206 DOI: 10.1038/s41598-018-35620-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
The final 47 datasets used in this study.
| Taxon | Order | No. non-redundant protein sequences |
|---|---|---|
| Dinophysiales | 83,934 | |
| Gonyaulacales | 68,889 | |
| Gonyaulacales | 50,502 | |
| Gonyaulacales | 87,380 | |
| Gonyaulacales | 114,975 | |
| Gonyaulacales | 70,040 | |
| Gonyaulacales | 68,969 | |
| Gonyaulacales | 48,770 | |
|
| Gonyaulacales | 290,362 |
| Gonyaulacales | 39,652 | |
| Gonyaulacales | 96,319 | |
| Gonyaulacales | 75,595 | |
| Gonyaulacales | 99,554 | |
| Gymnodiniales | 35,832 | |
| Gymnodiniales | 49,240 | |
| Gymnodiniales | 82,846 | |
| Gymnodiniales | 79,497 | |
| Gymnodiniales | 83,816 | |
| Gymnodiniales | 69,522 | |
| Gymnodiniales | 90,529 | |
| Gymnodiniales | 57,487 | |
| Gymnodiniales | 42,196 | |
|
| Noctilucales | 40,801 |
| Oxyrrhinales | 34,348 | |
|
| Oxyrrhinales | 43,246 |
| Peridiniales | 66,253 | |
| Peridiniales | 88,656 | |
| Peridiniales | 106,311 | |
| Peridiniales | 45,573 | |
| Peridiniales | 43,925 | |
| Peridiniales | 57,688 | |
| Peridiniales | 161,360 | |
| Peridiniales | 53,784 | |
| Peridiniales | 74,092 | |
| Peridiniales | 74,862 | |
| Peridiniales | 101,032 | |
| Prorocentrales | 85,555 | |
| Prorocentrales | 79,005 | |
| Suessiales | 47,797 | |
| Suessiales | 58,545 | |
| Suessiales | 33,576 | |
| Suessiales | 37,221 | |
| Suessiales | 45,710 | |
| Suessiales | 43,277 | |
| Suessiales | 72,087 | |
| Suessiales | 44,936 | |
| Suessiales | 43,138 |
Figure 1(a) Maximum-likelihood phylogeny inferred using the 1043 orthologous protein sets. Support values, based on 2000 ultrafast bootstrap approximations[24], are shown at the internal nodes. The unit of branch length is the number of substitutions per site. (b) The percentage of recovered alveolate + stramenopile BUSCO proteins and of dark proteins in each dataset. High- and low-confidence dark proteins are shown in red and yellow bars, respectively.
Figure 2Heat map showing the proportion of dark protein sets shared between taxa used in this study. Each row is normalised by the total number of protein sets of which the taxon is a member. The order of the species on both axes and their associated dendrograms follow the phylogeny in Fig. 1a.
Figure 3Maximum-likelihood phylogeny reconstructed using the 403 strictly orthologous dark protein sets. Support values, based on 2000 ultrafast bootstrap approximations[24], are shown at the internal nodes. The unit of branch length is the number of substitutions per site.
Prevalent protein domains and membrane transporters annotated in dinoflagellate proteins consistently recovered among the top ten and among the top 20 in each of the 47 taxa.
| Pfam domain (Pfam identifier) | Membrane transporter (family identifier) |
|---|---|
| Among top 10 in each taxon | |
| Protein kinase (PF00069) | Eukaryotic Nuclear Pore Complex (E-NPC) Family (1.I.1) |
| RNA recognition motif (PF00076) | Mitochondrial Carrier (MC) Family (2.A.29) |
| Ankyrin repeats (3 copies) (PF12796) | Ankyrin (Ankyrin) Family (8.A.28) |
| EF-hand domain pair (PF13499) | ATP-binding Cassette (ABC) Superfamily (3.A.1) |
| Drug/Metabolite Transporter (DMT) Superfamily (2.A.7) | |
| Among top 20 in each taxon | |
| WD40 repeat (PF00400) | Voltage-gated Ion Channel (VIC) Superfamily (1.A.1) |
| MORN repeat (PF02493) | The Major Facilitator Superfamily (MFS) (2.A.1) |
| P-type ATPase (P-ATPase) Superfamily (3.A.3) | |