| Literature DB >> 31562323 |
Yuri Pavlovich Galachyants1, Yulia Robertovna Zakharova2, Nadezda Antonovna Volokitina2, Alexey Anatolyevich Morozov2, Yelena Valentinovna Likhoshway2, Mikhail Aleksandrovich Grachev2.
Abstract
Diatoms are a group of eukaryotic microalgae populating almost all aquatic and wet environments. Their abundance and species diversity make these organisms significant contributors to biogeochemical cycles and important components of aquatic ecosystems. Although significant progress has been made in studies of Diatoms (Bacillariophyta) over the last two decades, since the spread of "omics" technologies, our current knowledge of the molecular processes and gene regulatory networks that facilitate environmental adaptation remain incomplete. Here, we present a transcriptome analysis of Fragilaria radians isolated from Lake Baikal. The resulting assembly contains 27,446 transcripts encoding 21,996 putative proteins. The transcriptome assembly and annotation were coupled with quantitative experiments to search for differentially expressed transcripts between (i) exponential growth phase and dark-acclimated cell cultures, and (ii) those changing expression level during the early response to light treatment in dark-acclimated cells. The availability of F. radians genome and transcriptome data provides the basis for future targeted studies of this species. Furthermore, our results extend taxonomic and environmental sampling of Bacillariophyta, opening new opportunities for comparative omics-driven surveys.Entities:
Mesh:
Year: 2019 PMID: 31562323 PMCID: PMC6765018 DOI: 10.1038/s41597-019-0191-6
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Description of samples used to acquire RNA-seq data.
| Sample id | Sample name | Group | Strain | Cell divisions are synchronized | Biological replicate | DAPI test | Exposure to light, min | RIN at LIN | RIN at FGCZ | RNA concentration (ng/ul) | Raw reads* 106 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | NExp1 | Nexp | A6 | No | 1 | passed | 200 | 8.1 | 8.2 | 175 | 15.9 |
| 2 | NExp2 | Nexp | A6 | No | 2 | passed | 200 | 8.9 | 8.3 | 52 | 8.9 |
| 3 | DSLT1-0 | DSLT | 280 | Yes | 1 | passed | 0 | 8.0 | 8.3 | 544 | 6.8 |
| 4 | DSLT2-0 | DSLT | 280 | Yes | 2 | passed | 0 | 8.2 | 7.2 | 612 | 7.9 |
| 5 | DSLT1-20 | DSLT | 280 | Yes | 1 | passed | 20 | 8.5 | 7.7 | 638 | 11.9 |
| 6 | DSLT2-20 | DSLT | 280 | Yes | 2 | passed | 20 | 8.2 | 8.1 | 717 | 10.8 |
| 7 | DSLT2-40 | DSLT | 280 | Yes | 2 | passed | 40 | 8.3 | 8.2 | 608 | 7.4 |
| 8 | DSLT1-40 | DSLT | 280 | Yes | 1 | not passed |
Fig. 1Flowchart of the bioinformatic analysis pipeline. Several de novo transcriptome assembly strategies were applied and evaluated to find the most optimal one. This optimal transcriptome assembly was obtained with “merge reads → assemble with Trinity” workflow (“MA-Trinity”) and was then subjected to annotation and secondary bioinformatics analyses.
Summary of assembly statistics generated by various pipelines.
| Workflow* | Velvet/Oases | Trinity | ||
|---|---|---|---|---|
| 1 | 2 | 1 | 2 | |
|
| ||||
| Lenght, Mbp | 11.37 | 17.92 | 16.64 | 41.76 |
| Number of transcripts | 9,076 | 22,275 | 13,285 | 28,263 |
| N50, bp | 1,726 | 945 | 1,695 | 1,944 |
| L50 | 2,099 | 6,446 | 3,112 | 6,840 |
|
| ||||
| Score | 0.13 | 0.10 | 0.18 | 0.39 |
| Optimal score | 0.16 | 0.12 | 0.22 | 0.44 |
| Optimal cutoff | 0.26 | 0.24 | 0.29 | 0.35 |
| Good contigs, % | 79 | 80 | 80 | 87 |
| Complete | 120 | 62 | 167 | 267 |
| Fragmented | 18 | 95 | 18 | 7 |
| Missed | 165 | 146 | 118 | 29 |
*(1) assemble samples → merge assemblies (AM); (2) merge reads → assemble (MA).
**BUSCO Eukarya database OrthoDB v.9, 303 busco genes.
Fig. 2Transcriptome assembly results. Transcriptome assembly statistics reveal the best strategy to generate the high quality de novo transcriptome assembly. (a) Nx curves computed for RNA-seq assemblies. Vertical dotted line is drawn at 0.5 normalized assembly length. Colour of a curve encodes strategy used to generate the assembly (see Fig. 1 and Table 2 and Methods for more details). (b) N50 graph by expression percentiles plotted for the best assembly generated by “merge reads → assemble with Trinity” strategy. ExN50 – red line (y-scale on the left), number of transcripts – blue line (y-scale on the right). Vertical dotted line is drawn through the maximum of the ExN50 curve, showing that 14,009 transcripts are covered by 96% of reads and N50 of this assembly subset is equal to 2,120 bp.
OrthoMCL statistics.
| Species | OrthoMCL Taxonomic category | Gene set type | Number of | |||
|---|---|---|---|---|---|---|
| Input sequences | Sequences assigned by OrthoMCL | OrthoMCL Groups | BBH with | |||
|
| VIRI | transcriptome | 22 813 | 19 339 | 7 893 | — |
|
| VIRI | genome | 11 776 | 8 775 | 5 607 | 9 474 |
|
| OEUK | genome | 15 743 | 13 493 | 5 730 | 849 |
|
| META | genome | 27 273 | 21 362 | 11 201 | 423 |
|
| VIRI | genome | 33 200 | 29 730 | 12 546 | 370 |
|
| VIRI | genome | 26 777 | 20 655 | 11 024 | 314 |
*Number of proteins having best BLAST hits with F. radians transcriptome ORFs.
Fig. 3Exploratory analysis of RNA-seq samples similarity. Clustering and principal component analysis show that samples are grouping by biological replicates and experimental design. There exists a sharp difference between NExp and DSLT groups while the samples split into two subgroups inside DSLT depending on whether a culture was exposed to light. The sample pairwise distances were computed on a matrix of regularized log-transformed transcript counts. (a) Heat map of sample distances. Hierarchical sample clustering is presented on the top and left sides. (b) Principal component analysis of RNA-seq samples.
Summary of functional annotation results.
| Unique | Total | |
|---|---|---|
|
| ||
| BBxH* against Uniprot/Sprot | 12 190 | 12 383 |
| BBpH** against Uniprot/Sprot | 10 218 | 10 686 |
| Pfam hit | 11 227 | 11 817 |
| BBxH against NR | 20 556 | 20 637 |
| BBpH against NR | 17 198 | 17 776 |
| KEGGs mapped | 6 286 | 10 783 |
| EggNOGs mapped | 2 717 | 9 241 |
| GOs mapped from Uniprot/Sprot hits | 5 919 | 11 925 |
| GOs mapped from Pfam hits | 1 224 | 7 337 |
| TmHMM | 5 342 | 5 747 |
| SignalP | 2 235 | 3 368 |
| RNAMMER | 18 | 18 |
|
| ||
| Genes updated by TRAPID/Mercator | — | 6 314 |
| GOs mapped by TRAPID/Mercator | — | 18 622 |
| OrthoMCL group assigned | 7 893 | 19 339 |
*BBxH – best blastx hit.
**BBpH – best blastp hit.
| Measurement(s) | transcription profiling assay |
| Technology Type(s) | RNA sequencing |
| Factor Type(s) | experimental condition • strain • response to light |
| Sample Characteristic - Organism | Fragilaria • Fragilaria radians |
| Sample Characteristic - Environment | freshwater lake biome |
| Sample Characteristic - Location | Lake Baikal |