| Literature DB >> 28228091 |
Lina Yao1, Kenneth Wei Min Tan1, Tin Wee Tan2,3, Yuan Kun Lee4.
Abstract
BACKGROUND: RNA-Seq technology has received a lot of attention in recent years for microalgal global transcriptomic profiling. It is widely used in transcriptome-wide analysis of gene expression., particularly for microalgal strains with potential as biofuel sources. However, insufficient genomic or transcriptomic information of non-model microalgae has limited the understanding of their regulatory mechanisms and hampered genetic manipulation to enhance biofuel production. As such, an optimal microalgal transcriptomic database construction is a subject of urgent investigation.Entities:
Keywords: Dunaliella tertiolecta; HPC; Microalgae; RNA-Seq; Transcriptome
Mesh:
Substances:
Year: 2017 PMID: 28228091 PMCID: PMC5322580 DOI: 10.1186/s12859-017-1551-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Input raw data and post-analyzed data from MISEQ and HISEQ
| Data name | Data source | Number of protein-coding contigs |
|---|---|---|
| Dt_G (HISEQ 4000) | Yao et al. 2016 [ | 27,797 |
| Dt_Shin (MISEQ) | Shin et al. 2015 [ | 13,861 |
| Dt_KR (MISEQ) | Tan et al. 2016 [ | 25,475 |
| Dt_v10 (MISEQ) | Yao et al. 2015 [ | 20,229 |
| Merged contigs | - | 87,197 |
| Non-redundant contigs | - | 17,845 |
Fig. 1Pipeline of RNA-Seq data analysis workflow from short sequence raw data. The general pipeline includes workflow for transcriptome database construction, annotation, and differential gene expression analysis. Workflow in data center mainly consists of de novo assembly, mpiBLASTX. Dt_G and Dt_KR are in-house constructed samples. Dt_Shin and Dt_Yao_v10 are two published datasets. 10% of each datasets are randomly picked for mpiBLASTX test to get the best parameters for mpiBLASTX with NCBI database for the merged D. tertiolecta datasets. The mpiBLASTX output was further extracted and filtered for annotation using python scripts
Fig. 2Speedup achieved by mpiBLASTX calculated over run of 24 cores. It shows the scalability test based on the subsampling from the four sources of datasets, where we increased the number of cores in the system for mpiBLASTX application from 24 (1 node) to 1680 (70 nodes) cores and measured the speedup achieved. It was concluded that using 960 (40 nodes) cores was optimal regarding the time cost in this study
Fig. 3D. tertiolecta transcriptome information. a GC content in D. tertiolecta transcriptome compared with other species; b Identification and verification of the protein-coding transcripts in D. tertiolecta. This pie chart is the result from BLASTX output. The sum of top hit transcripts from each individual species. The right side species names: descending numbers of hits
Transcriptome assembly and annotation descriptions of different species
|
|
|
|
| Dt_v11 | |
|---|---|---|---|---|---|
| Genome description | 111.1 Mb arranged on 17 chromosomes and 37 minor scaffolds | 131.2 Mb arranged in 434 scaffolds | 343.7 Mb arranged in 5512 scaffolds | - | - |
| N50 (bp) | 3938 | 4188 | 2291 | 1540 | 1797 |
| Maximum contig length (bp) | 72,700 | 24,197 | 17,353 | 15,234 | 16,518 |
| Total size of contigs (bp) | 63,797,006 | 51,775,597 | 33,246,103 | 16,600,538 | 24,538,468 |
| Protein-coding transcripts | 19,526 | 16,075 | 18,801 | 13,861 | - |
| transcript_primaryTranscriptOnly | 17,741 | 14,247 | 16,697 | 9839 | 17,845 |
| Average length (bp) | 3267 | 3220 | 1768 | 1197 | 1375 |
| Alternatively spliced transcripts | 1785 | 1828 | 2104 | - | - |
The entry in italic represents data from Dt_v11.2
Fig. 4GO functional enrichment of up-regulated (blue) and down-regulated (red) genes under ND conditions. Categories were filtered by Fisher’s exact test with an FDR-corrected p-value ≤ 0
Comparison of dry cell weight and TAG content in D. tertiolecta ND culture
| Dry cell weight (g/L) | TAG content (pg/cell) | Fatty acid content (% DCW) | |
|---|---|---|---|
| N-replete | 0.31 ± 0.08 | 0.15 ± 0.02 | 6.2 ± 0.27 |
| N-deplete | 0.34 ± 0.04 | 1.29 ± 0.12 | 5.14 ± 0.45 |
The values are presented as the mean ± the standard deviation
Genes participating in important pathways that are exclusively found in Dt_v11
| KO | Name | Definition | Fold Change |
|---|---|---|---|
| ko00061 Fatty acid biosynthesis & ko01212 Fatty acid metabolism | |||
| ko:K00059 | fabG | 3-oxoacyl-[acyl-carrier protein] reductase [EC:1.1.1.100] | -3.100679049 |
| ko:K00208 | fabI | enoyl-[acyl-carrier protein] reductase I [EC:1.3.1.9 1.3.1.10] | -3.38232061 |
| ko:K00645 | fabD | [acyl-carrier-protein] S-malonyltransferase [EC:2.3.1.39] | -3.52911698 |
| ko:K01962 | accA | acetyl-CoA carboxylase carboxyl transferase subunit alpha [EC:6.4.1.2] | -2.126940035 |
| ko:K01963 | accD | acetyl-CoA carboxylase carboxyl transferase subunit beta [EC:6.4.1.2] | -3.075257707 |
| ko:K02160 | accB | acetyl-CoA carboxylase biotin carboxyl carrier protein | -7.80359902 |
| ko:K02372 | fabZ | 3-hydroxyacyl-[acyl-carrier-protein] dehydratase [EC:4.2.1.59] | -2.198560822 |
| ko:K09458 | fabF | 3-oxoacyl-[acyl-carrier-protein] synthase II [EC:2.3.1.179] | -3.332311424 |
| ko00910 Nitrogen metabolism | |||
| ko:K00264 | GLT1 | glutamate synthase (NADPH/NADH) [EC:1.4.1.13 1.4.1.14] | 23.8846 |
| ko:K00366 | nirA | ferredoxin-nitrite reductase [EC:1.7.7.1] | -13.89942929 |
| ko:K01915 | glnA | glutamine synthetase [EC:6.3.1.2] | 2.84411 |
| ko:K02575 | NRT | MFS transporter, NNP family, nitrate/nitrite transporter | -4.415225463 |
| ko:K10534 | NR | nitrate reductase (NAD(P)H) [EC:1.7.1.1 1.7.1.2 1.7.1.3] | -9.464319515 |